## Predicting from incomplete data

The behaviour of `predict()`

when data are complete is described here here. When
data are incomplete, `predict()`

will try to use the observed values in the predictors to the best effect
if `method = "bayes-lw"`

or `method = "exact"`

.

> library(bnlearn) > dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > bn = bn.fit(dag, learning.test) > incomplete.data = rbn(bn, 6) > incomplete.data[1, "A"] = NA > incomplete.data[2, "B"] = NA > incomplete.data[3, "C"] = NA > incomplete.data[4, "D"] = NA > incomplete.data[5, "E"] = NA > incomplete.data[6, "F"] = NA > predict(bn, data = incomplete.data, node = "E", method = "exact", debug = TRUE)

* observation 1 , imputing E from: B C D F c a c b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 2 , imputing E from: A C D F c a c a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: A F > imputed value: E c * observation 3 , imputing E from: A B D F b b b a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E c * observation 4 , imputing E from: A B C F c c a b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 5 , imputing E from: A B C D F a b a a b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 6 , imputing E from: A B C D a c a c > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B > imputed value: E c

[1] b c c b b c Levels: a b c

If `method = "parents"`

, predictions for incomplete observations will be `NA`

if any of the
parents is `NA`

(that is, some of its parents are missing).

> predict(bn, data = incomplete.data, node = "E", debug = TRUE)

* predicting values for node E. > prediction for observation 1 is 'b' with probabilities: 0.237595 0.506679 0.255725 > prediction for observation 2 is NA because at least one parent is NA. > prediction for observation 3 is 'c' with probabilities: 0.205882 0.179739 0.614379 > prediction for observation 4 is 'b' with probabilities: 0.237595 0.506679 0.255725 > prediction for observation 5 is 'b' with probabilities: 0.316794 0.366412 0.316794 > prediction for observation 6 is NA because at least one parent is NA.

[1] b <NA> c b b <NA> Levels: a b c

If `method = "bayes-lw"`

or `method = "exact"`

, on the other hand, predictions will never be
`NA`

unless the model is singular or contains parameters that are themselves `NA`

. In the worst
case, the prediction will be made from the marginal distribution of the target variable.

> incomplete.data[6, c("A", "B", "C", "D", "F")] = NA > predict(bn, data = incomplete.data[6, ], node = "E", method = "exact", debug = TRUE)

* observation 1 , imputing E from: data frame with 0 columns and 1 row > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: (empty) > imputed value: E a

[1] a Levels: a b c

`Mon Aug 5 02:47:33 2024`

with **bnlearn**

`5.0`

and `R version 4.4.1 (2024-06-14)`

.