Index | Topics |
predict and impute {bnlearn} | R Documentation |
Predict or impute missing data from a Bayesian network
Description
Impute missing values in a data set or predict a variable from a Bayesian network.
Usage
## S3 method for class 'bn.fit'
predict(object, node, data, cluster, method = "parents", ...,
prob = FALSE, debug = FALSE)
impute(object, data, cluster, method, ..., strict = TRUE, debug = FALSE)
Arguments
object |
an object of class |
data |
a data frame containing the data to be imputed. Complete observations will be ignored. |
node |
a character string, the label of a node. |
cluster |
an optional cluster object from package parallel. |
method |
a character string, the method used to impute the missing values or predict new ones. The default
value is |
... |
additional arguments for the imputation method. See below. |
prob |
a boolean value. If |
strict |
a boolean value. If |
debug |
a boolean value. If |
Details
predict()
returns the predicted values for node
given the data specified by
data
and the fitted network. Depending on the value of method
, the predicted
values are computed as follows.
-
parents
: the predicted values are computed by plugging in the new values for the parents ofnode
in the local probability distribution ofnode
extracted fromfitted
. -
bayes-lw
: the predicted values are computed by averaging likelihood weighting simulations performed using all the available nodes as evidence (obviously, with the exception of the node whose values we are predicting). The number of random samples which are averaged for each new observation is controlled by then
optional argument; the default is500
. If the variable being predicted is discrete, the predicted level is that with the highest conditional probability. If the variable is continuous, the predicted value is the expected value of the conditional distribution. The variables that are used to compute the predicted values can be specified with thefrom
optional argument; the default is to use all the relevant variables from the data. Note that the predicted values will differ in each call topredict()
since this method is based on a stochastic simulation. -
exact
: the predicted values are computed using exact inference. They are maximum a posteriori estimates obtained using junction trees and belief propagation in the case of discrete networks, or posterior expectations computed using closed-form results for the multivariate normal distribution for Gaussian networks. Conditional Gaussian networks are not supported. The variables that are used to compute the predicted values can be specified with thefrom
optional argument; the default is to use those in the Markov blanket ofnode
.
impute()
is based on predict()
, and can impute missing values with the same
methods
(parents
, bayes-lw
and exact
). The method
bayes-lw
can take an additional argument n
with the number of random samples
which are averaged for each observation. As in predict()
, imputed values will differ in each
call to impute()
when method
is set to bayes-lw
.
If object
contains NA
parameter estimates (because of unobserved discrete
parents configurations in the data the parameters were learned from), predict
will predict
NA
s when those parents configurations appear in data
. See bn.fit
for details on how to make sure bn.fit
objects contain no
NA
parameter estimates.
Value
predict()
returns a numeric vector (for Gaussian and conditional Gaussian nodes), a factor
(for categorical nodes) or an ordered factor (for ordinal nodes). If prob = TRUE
and the
network is discrete, the probabilities used for prediction are attached to the predicted values as an
attribute called prob
.
impute()
returns a data frame with the same structure as data
.
Note
Ties in prediction are broken using Bayesian tie breaking, i.e. sampling at random from the tied values. Therefore, setting the random seed is required to get reproducible results.
Classifiers have a separate predict()
method, see naive.bayes.
Author(s)
Marco Scutari
Examples
# missing data imputation.
with.missing.data = gaussian.test
with.missing.data[sample(nrow(with.missing.data), 500), "F"] = NA
fitted = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
gaussian.test)
imputed = impute(fitted, with.missing.data)
# predicting a variable in the test set.
training = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
gaussian.test[1:2000, ])
test = gaussian.test[2001:nrow(gaussian.test), ]
predicted = predict(training, node = "F", data = test)
# obtain the conditional probabilities for the values of a single variable
# given a subset of the rest, they are computed to determine the predicted
# values.
fitted = bn.fit(model2network("[A][C][F][B|A][D|A:C][E|B:F]"), learning.test)
evidence = data.frame(A = factor("a", levels = levels(learning.test$A)),
F = factor("b", levels = levels(learning.test$F)))
predicted = predict(fitted, "C", evidence,
method = "bayes-lw", prob = TRUE)
attr(predicted, "prob")
Index | Topics |