Predicting from incomplete data
The behaviour of predict()
when data are complete is described here here. When
data are incomplete, predict()
will try to use the observed values in the predictors to the best effect
if method = "bayes-lw"
or method = "exact"
.
> library(bnlearn) > dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > bn = bn.fit(dag, learning.test) > incomplete.data = rbn(bn, 6) > incomplete.data[1, "A"] = NA > incomplete.data[2, "B"] = NA > incomplete.data[3, "C"] = NA > incomplete.data[4, "D"] = NA > incomplete.data[5, "E"] = NA > incomplete.data[6, "F"] = NA > predict(bn, data = incomplete.data, node = "E", method = "exact", debug = TRUE)
* observation 1 , imputing E from: B C D F c a c b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 2 , imputing E from: A C D F c a c a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: A F > imputed value: E c * observation 3 , imputing E from: A B D F b b b a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E c * observation 4 , imputing E from: A B C F c c a b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 5 , imputing E from: A B C D F a b a a b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 6 , imputing E from: A B C D a c a c > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B > imputed value: E c
[1] b c c b b c Levels: a b c
If method = "parents"
, predictions for incomplete observations will be NA
if any of the
parents is NA
(that is, some of its parents are missing).
> predict(bn, data = incomplete.data, node = "E", debug = TRUE)
* predicting values for node E. > prediction for observation 1 is 'b' with probabilities: 0.237595 0.506679 0.255725 > prediction for observation 2 is NA because at least one parent is NA. > prediction for observation 3 is 'c' with probabilities: 0.205882 0.179739 0.614379 > prediction for observation 4 is 'b' with probabilities: 0.237595 0.506679 0.255725 > prediction for observation 5 is 'b' with probabilities: 0.316794 0.366412 0.316794 > prediction for observation 6 is NA because at least one parent is NA.
[1] b <NA> c b b <NA> Levels: a b c
If method = "bayes-lw"
or method = "exact"
, on the other hand, predictions will never be
NA
unless the model is singular or contains parameters that are themselves NA
. In the worst
case, the prediction will be made from the marginal distribution of the target variable.
> incomplete.data[6, c("A", "B", "C", "D", "F")] = NA > predict(bn, data = incomplete.data[6, ], node = "E", method = "exact", debug = TRUE)
* observation 1 , imputing E from: data frame with 0 columns and 1 row > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: (empty) > imputed value: E a
[1] a Levels: a b c
Mon Aug 5 02:47:33 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.