Predicting new observations from a Bayesian network
Predicting the values of one or more variables is the prototypical task of a machine learning model. In the context of Bayesian networks, we take it to involve just a single variable: producing the best possible instantiation of multiple variables is a most probable explanation (MPE) problem that deserves a separate implementation. Prediction is simpler and it can be optimized more effectively.
bnlearn provides a predict()
function (documented
here) for the fitted Bayesian networks returned by
bn.fit()
(illustrated here). predict()
provides different methods to
compute predictions, with different trade-offs: "parents"
, "bayes-lw"
and
"exact"
. For all methods, predict()
takes
- a
bn.fit
object encoding the Bayesian network; - a
node
that we would like to predict; - a
data
set we will predict it from; - a
prob
logical argument to return the prediction probabilities for discrete Bayesian networks;
and returns a vector with the predictions. Note that one or more predicted values may be NA
s if the
fitted Bayesian network contains NA
parameters, which can be avoided by fitting it using
posteriors estimators or using maximum likelihood estimators with replace.unidentifiable = TRUE
.
Predicting from the parents
The most computationally-efficient way of computing predictions for a variable is to use its local distribution and to find its most probable value conditional on the values of its parents. This does not require any inference: it boils down to a look-up in the conditional probability table (for a discrete node in a discrete or conditional Gaussian network) or to a vector product (for a continuous node in a Gaussian or conditional Gaussian network). However, it does not use the Bayesian network and the information in the observation being predicted to the fullest because it disregards the rest of the Markov blanket of the node that is the target of the prediction.
Consider a Bayesian network learned and fitted from a training.set
, with an associated
validation.set
for which we want to predict node E
.
> library(bnlearn) > dtraining.set = learning.test[1:4000, ] > dvalidation.set = learning.test[4001:5000, ] > dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > dfitted = bn.fit(dag, dtraining.set)
The method = "parents"
is the default method for prediction, and has no optional arguments.
> pred = predict(dfitted, node = "E", data = dvalidation.set, method = "parents") > head(pred)
[1] c b b a c b Levels: a b c
> head(dvalidation.set$E)
[1] c a c b c b Levels: a b c
> table(dvalidation.set$E, pred)
pred a b c a 176 156 30 b 26 258 36 c 23 99 196
Like other prediction methods, if the prob
argument is set to TRUE
and the network is a
discrete Bayesian network the prediction probabilities for all values of the target variables are attached as an
attribute to the predictions. The columns correspond to the observations in validation.set
and the rows
to the values of E
.
> pred = predict(dfitted, node = "E", data = dvalidation.set, prob = TRUE) > attr(pred, "prob")[, 1:5]
[,1] [,2] [,3] [,4] [,5] a 0.2301587 0.4125133 0.2345079 0.81066946 0.1191646 b 0.1785714 0.4772004 0.5018226 0.09309623 0.1117936 c 0.5912698 0.1102863 0.2636695 0.09623431 0.7690418
Similarly, for continuous data (this time predicting node F
).
> gtraining.set = gaussian.test[1:4000, ] > gvalidation.set = gaussian.test[4001:5000, ] > dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > gfitted = bn.fit(dag, gtraining.set) > pred = predict(gfitted, node = "F", data = gvalidation.set, method = "parents") > head(pred)
[1] 18.57683 17.26517 21.57541 28.03751 29.75274 24.87890
> head(gvalidation.set$F)
[1] 19.61011 16.12044 19.88823 26.99181 29.55846 25.65805
> cor(gvalidation.set$F, pred)
[1] 0.9869819
The main shortcoming of predicting from parents is that it fails to produce meaningful predictions for root nodes, since they have no parents. In the case of a discrete node, all predictions will be equal to the mode of its distribution.
> table(predict(dfitted, node = "A", data = dvalidation.set, method = "parents"))
a b c 0 1000 0
In the case of a continuous node, all predictions will be equal to the expected values of its distribution.
> unique(predict(gfitted, node = "A", data = gvalidation.set, method = "parents"))
[1] 1.002117
Predicting with Monte Carlo posterior inference
A better approach to prediction is to use some form of inference and predict from more nodes than just the parents of
the target node. Likelihood weighting is better suited for this task than logic sampling because it guarantees that none
of the particles it generates are discarded, no matter how unlikely the values taken by the predictors are. (However,
they must be observable.) This is what method = "bayes-lw"
does for each observation being predicted:
- generate a sufficiently large number of particles and the associated likelihood weights using the predictors as the evidence;
- compute the value with the largest weight mass (the posterior mode, for a discrete target node) or the weighted average of the particles (the posterior expectation, for a continuous target node).
Therefore, predict()
takes two optional arguments when method = "bayes-lw"
: n
,
the number of particles to generate for each observation, and from
, the nodes to use as predictors.
> pred = predict(dfitted, node = "E", data = dvalidation.set, method = "bayes-lw") > head(pred)
[1] c b b a c b Levels: a b c
> head(dvalidation.set$E)
[1] c a c b c b Levels: a b c
> table(dvalidation.set$E, pred)
pred a b c a 178 154 30 b 36 248 36 c 23 99 196
The predicted values are computed from the empirical posterior distribution of the particles, so in general they may
be different each time we invoke predict()
. Generating larger number of particles makes predictions more
stable at the cost of longer running times.
> system.time(predict(dfitted, node = "E", data = dvalidation.set, method = "bayes-lw", n = 500))
user system elapsed 0.061 0.000 0.061
> system.time(predict(dfitted, node = "E", data = dvalidation.set, method = "bayes-lw", n = 5000))
user system elapsed 0.445 0.008 0.453
As for the predictors, at a minimum from
should contain the nodes that are part of the Markov blanket of
the target node to achieve the best possible accuracy. By default, from
contains all nodes in the network
apart the predictor itself to minimize the computational effort of generating the particles. However, we can choose to
use any valid set of predictors depending on the application.
> pred = predict(dfitted, node = "E", data = dvalidation.set, method = "bayes-lw", from = c("A", "F")) > head(pred)
[1] c b b a c b Levels: a b c
Predicting with exact inference
Similarly, we can use exact inference instead of approximate inference to compute predictions. On the one hand, the predicted values will not have any simulation variability because we no longer rely on Monte Carlo simulations. On the other hand, exact inference is often more computationally expensive than approximate inference, especially for large Bayesian networks.
The method = "exact"
leverages either junction trees (interfacing with the gRain
package) or the closed-form properties of the multivariate normal distribution to produce predictions for discrete and
Gaussian Bayesian networks. Conditional Gaussian networks are not supported. It has an optional argument
from
like the "bayes-lw"
method, which by default contains the nodes in the Markov blanket of
the target node.
> pred = predict(dfitted, node = "E", data = dvalidation.set, method = "exact")
> head(pred)
[1] c b b a c b Levels: a b c
> pred = predict(gfitted, node = "F", data = gvalidation.set, method = "parents") > head(pred)
[1] 18.57683 17.26517 21.57541 28.03751 29.75274 24.87890
The number of nodes in from
and their location in the network relative to the target node has a strong
influence the prediction speed. In fact, it can make method = "exact"
faster than
method "bayes-lw"
.
> system.time(predict(dfitted, node = "E", data = dvalidation.set, method = "exact"))
user system elapsed 0.023 0.005 0.028
> system.time(predict(dfitted, node = "E", data = dvalidation.set, method = "bayes-lw"))
user system elapsed 0.065 0.000 0.064
> system.time(predict(dfitted, node = "E", data = dvalidation.set, method = "exact", from = c("A", "F", "B")))
user system elapsed 0.026 0.000 0.026
> system.time(predict(dfitted, node = "E", data = dvalidation.set, method = "bayes-lw", from = c("A", "F", "B")))
user system elapsed 0.069 0.000 0.070
Predictions in Bayesian network classifiers
Bayesian network classifiers like naive.bayes()
and tree.bayes()
(which implements TAN),
have separate predict()
methods that use the closed-form expressions available for those models. Note
this method has no method
argument to choose how to compute the predicted values, but it still has the
prob
argument as illustrated here.
Mon Aug 5 02:50:29 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.