Predicting from incomplete data

The behaviour of predict() when data are complete is described here here. When data are incomplete, predict() will try to use the observed values in the predictors to the best effect if method = "bayes-lw" or method = "exact".

> library(bnlearn)
> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]")
> bn = bn.fit(dag, learning.test)
> incomplete.data = rbn(bn, 6)
> incomplete.data[1, "A"] = NA
> incomplete.data[2, "B"] = NA
> incomplete.data[3, "C"] = NA
> incomplete.data[4, "D"] = NA
> incomplete.data[5, "E"] = NA
> incomplete.data[6, "F"] = NA
> predict(bn, data = incomplete.data, node = "E", method = "exact", debug = TRUE)
* observation 1 , imputing E from: 
 B C D F
 c a c b
  > considering event node E with markov blanket B F 
  @ query partitioned into:
    event: E 
    evidence: B F 
  > imputed value: 
 E
 b
* observation 2 , imputing E from: 
 A C D F
 a a a b
  > considering event node E with markov blanket B F 
  @ query partitioned into:
    event: E 
    evidence: A F 
  > imputed value: 
 E
 b
* observation 3 , imputing E from: 
 A B D F
 b a b b
  > considering event node E with markov blanket B F 
  @ query partitioned into:
    event: E 
    evidence: B F 
  > imputed value: 
 E
 b
* observation 4 , imputing E from: 
 A B C F
 b a a b
  > considering event node E with markov blanket B F 
  @ query partitioned into:
    event: E 
    evidence: B F 
  > imputed value: 
 E
 b
* observation 5 , imputing E from: 
 A B C D F
 a a c c b
  > considering event node E with markov blanket B F 
  @ query partitioned into:
    event: E 
    evidence: B F 
  > imputed value: 
 E
 b
* observation 6 , imputing E from: 
 A B C D
 b c a b
  > considering event node E with markov blanket B F 
  @ query partitioned into:
    event: E 
    evidence: B 
  > imputed value: 
 E
 c
[1] b b b b b c
Levels: a b c

If method = "parents", predictions for incomplete observations will be NA if any of the parents is NA (that is, some of its parents are missing).

> predict(bn, data = incomplete.data, node = "E", debug = TRUE)
* predicting values for node E.
  > prediction for observation 1 is 'b' with probabilities:
    0.237595  0.506679  0.255725
  > prediction for observation 2 is NA because at least one parent is NA.
  > prediction for observation 3 is 'b' with probabilities:
    0.400508  0.490262  0.109229
  > prediction for observation 4 is 'b' with probabilities:
    0.400508  0.490262  0.109229
  > prediction for observation 5 is 'b' with probabilities:
    0.400508  0.490262  0.109229
  > prediction for observation 6 is NA because at least one parent is NA.
[1] b    <NA> b    b    b    <NA>
Levels: a b c

If method = "bayes-lw" or method = "exact", on the other hand, predictions will never be NA unless the model is singular or contains parameters that are themselves NA. In the worst case, the prediction will be made from the marginal distribution of the target variable.

> incomplete.data[6, c("A", "B", "C", "D", "F")] = NA
> predict(bn, data = incomplete.data[6, ], node = "E", method = "exact", debug = TRUE)
* observation 1 , imputing E from: 
data frame with 0 columns and 1 row
  > considering event node E with markov blanket B F 
  @ query partitioned into:
    event: E 
    evidence: (empty) 
  > imputed value: 
 E
 a
[1] a
Levels: a b c
Last updated on Sat Feb 17 23:38:41 2024 with bnlearn 5.0-20240208 and R version 4.3.2 (2023-10-31).