## Predicting from incomplete data

The behaviour of `predict()`

when data are complete is described here here. When
data are incomplete, `predict()`

will try to use the observed values in the predictors to the best effect
if `method = "bayes-lw"`

or `method = "exact"`

.

> library(bnlearn) > dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > bn = bn.fit(dag, learning.test) > incomplete.data = rbn(bn, 6) > incomplete.data[1, "A"] = NA > incomplete.data[2, "B"] = NA > incomplete.data[3, "C"] = NA > incomplete.data[4, "D"] = NA > incomplete.data[5, "E"] = NA > incomplete.data[6, "F"] = NA > predict(bn, data = incomplete.data, node = "E", method = "exact", debug = TRUE)

* observation 1 , imputing E from: B F c b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 2 , imputing E from: F a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: F > imputed value: E a * observation 3 , imputing E from: B F c a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E c * observation 4 , imputing E from: B F a b > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E b * observation 5 , imputing E from: B F b a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B F > imputed value: E c * observation 6 , imputing E from: B a > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: B > imputed value: E a

[1] b a c b c a Levels: a b c

If `method = "parents"`

, predictions for incomplete observations will be `NA`

if any of the
parents is `NA`

(that is, some of its parents are missing).

> predict(bn, data = incomplete.data, node = "E", debug = TRUE)

* predicting values for node E. > prediction for observation 1 is 'b' with probabilities: 0.237595 0.506679 0.255725 > prediction for observation 2 is NA because at least one parent is NA. > prediction for observation 3 is 'c' with probabilities: 0.119374 0.114481 0.766145 > prediction for observation 4 is 'b' with probabilities: 0.400508 0.490262 0.109229 > prediction for observation 5 is 'c' with probabilities: 0.205882 0.179739 0.614379 > prediction for observation 6 is NA because at least one parent is NA.

[1] b <NA> c b c <NA> Levels: a b c

If `method = "bayes-lw"`

or `method = "exact"`

, on the other hand, predictions will never be
`NA`

unless the model is singular or contains parameters that are themselves `NA`

. In the worst
case, the prediction will be made from the marginal distribution of the target variable.

> incomplete.data[6, c("A", "B", "C", "D", "F")] = NA > predict(bn, data = incomplete.data[6, ], node = "E", method = "exact", debug = TRUE)

* observation 1 , imputing E from: data frame with 0 columns and 1 row > considering event node E with markov blanket B F @ query partitioned into: event: E evidence: (empty) > imputed value: E a

[1] a Levels: a b c

`Tue Sep 12 11:31:14 2023`

with **bnlearn**

`4.9`

and `R version 4.2.2 (2022-10-31)`

.