## Bootstrap and cross-validation from data with missing values

Resampling does not require any modifications to handle incomplete data: missing values are carried over together
with observed values when the data points are resampled. This is how `bn.boot()`

(documented
here), `boot.strength()`

(here) and `bn.cv()`

(here) handle incomplete data.

`bn.boot()`

, `boot.strength()`

and `bn.cv()`

no longer check that the data are
complete. The struture learning, parameter learning and inference methods that they call on the resampled data must
either be able to handle missing values or generate errors.

The loss functions called by `bn.cv()`

have been modified to propagate missing values correctly:

- The log-likelihood loss function (
`"logl"`

) is based on the node-average log-likelihood. - The predictive accuracy (
`"pred"`

), F1 score (`"f1"`

), predictive correlation (`"cor"`

) and predictive mean square error (`"mse"`

) use whatever complete (observed, predicted) pairs are available. The ability of producing non-`NA`

predictions depends on the prediction method specified in the`loss.args`

argument (see here). - The area under the ROC curve (
`"auroc"`

) will use whatever complete (observed, probability) pairs are available. Note that if the predictors are not complete, exact and posterior predictions will call`impute()`

to produce values for both the target and the missing predictors. As a result, it will be impossible to compute the`"auroc"`

loss because`impute()`

does not produce prediction probabilities just for the target node. However, predicting from parents will produce`NA`

prediction probabilities only for observations without complete parents.

Last updated on

`Mon Aug 5 03:09:14 2024`

with **bnlearn**`5.0`

and `R version 4.4.1 (2024-06-14)`

.