Bootstrap and cross-validation from data with missing values
Resampling does not require any modifications to handle incomplete data: missing values are carried over together
with observed values when the data points are resampled. This is how bn.boot()
(documented
here), boot.strength()
(here) and bn.cv()
(here) handle incomplete data.
bn.boot()
, boot.strength()
and bn.cv()
no longer check that the data are
complete. The struture learning, parameter learning and inference methods that they call on the resampled data must
either be able to handle missing values or generate errors.
The loss functions called by bn.cv()
have been modified to propagate missing values correctly:
- The log-likelihood loss function (
"logl"
) is based on the node-average log-likelihood. - The predictive accuracy (
"pred"
), F1 score ("f1"
), predictive correlation ("cor"
) and predictive mean square error ("mse"
) use whatever complete (observed, predicted) pairs are available. The ability of producing non-NA
predictions depends on the prediction method specified in theloss.args
argument (see here). - The area under the ROC curve (
"auroc"
) will use whatever complete (observed, probability) pairs are available. Note that if the predictors are not complete, exact and posterior predictions will callimpute()
to produce values for both the target and the missing predictors. As a result, it will be impossible to compute the"auroc"
loss becauseimpute()
does not produce prediction probabilities just for the target node. However, predicting from parents will produceNA
prediction probabilities only for observations without complete parents.
Last updated on
Mon Aug 5 03:09:14 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.