Bootstrap and cross-validation from data with missing values
Resampling does not require any modifications to handle incomplete data: missing values are carried over together
with observed values when the data points are resampled. This is how bn.boot()
(documented
here), boot.strength()
(here) and bn.cv()
(here) handle incomplete data.
bn.boot()
, boot.strength()
and bn.cv()
no longer check that the data are
complete. The struture learning, parameter learning and inference methods that they call on the resampled data must
either be able to handle missing values or generate errors.
The loss functions called by bn.cv()
have been modified to propagate missing values correctly:
- For log-likelihood loss functions (
"logl"
,"logl-g"
,"logl-cg"
): the node and all its parents must be observed for the node's log-likelihood to be well-defined. If any of them isNA
, the log-likelihood isNA
and is not used to compute the loss function. The average log-likelihood for each node, which is an empirical estimate of the node's entropy, is computed from locally-complete observations whose log-likelihood is notNA
. - For all other losses: we use the same code as
predict()
to produce the value of the target node. If the prediction isNA
, it is not used to compute the loss function.