Bayesian networks and cross-validation
Choosing a Bayesian network learning strategy
Cross-validation is a standard way to obtain unbiased estimates of a model's goodness of fit. By comparing such estimates for different learning strategies (different combinations of learning algorithms, fitting techniques and the respective parameters) we can choose the optimal one for the data at hand in a principled way.
bnlearn implements three cross-validation methods in the bn.cv()
function (documented here):
- k-fold cross-validation (the default): the data are randomly partitioned into k subsets. Each subset is used in turn to validate the model fitted on the remaining k - 1 subsets.
- Custom folds cross-validation: the data are manually partitioned into subsets by the user into subsets, which are then used as in k-fold cross-validation. Subsets are not constrained to have the same size.
- Hold-out cross-validation: the data are repeatedly (randomly) partitioned in a training and test subset of given size m and n - m, with observations assigned randomly to each subset at each repetition. Each test subset is used to validate the model fitted on the corresponding training subset.
bn.cv()
requires the following parameters for all methods:
method
: the label of the cross-validation method, either"k-fold"
(the default),"custom-fold"
or"hold-out"
;data
: the data set the Bayesian network will be estimated from;bn
: the structure learning algorithm to be used, or a fixed network structure (to cross-validate just parameter learning);loss:
: the loss function used to evaluate the goodness of fit of the learned network;fit
: the fitting method (such asmle
orbayes
) to be used to fit the parameters.
Additional parameters for all these components can be specified as well, providing a fine-grained
control of the learning strategy, using the following arguments of bn.cv()
:
algorithm.args
for additional arguments to be passed to the learning algorithm;loss.args
for additional arguments to be passed to the loss function;fit.args
for additional arguments to be passed tobn.fit()
for fitting the parameters of the Bayesian network.
k-fold cross-validation
For example, let's compare the performance of three different scores in a hill climbing search
on the ALARM data set included in
bnlearn. The loss funtion we use is the so-called log-likelihood loss,
which is the negated expected log-likelihood; so lower values are better. Hill cimbing is implemented
in the hc()
function.
> library(bnlearn) > data(alarm) > bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 1))
k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.82857
In this case the bn
argument specifies the label of a structure learning algorithm,
and algorithm.args
is an optional list of arguments for the algorithm specified by
bn
.
> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 10))
k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.82172
> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"))
k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.86586
Partially directed graphs obtained from learning are automatically extended to directed acyclic
graphs with cextend
, so that the parameters can be fit and the loss function computed.
> bn.cv(alarm, bn = "iamb")
k-fold cross-validation for Bayesian networks target learning algorithm: IAMB number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 13.70798
Tipically, 10-fold cross-validation is performed 10 times for each learning strategy (golden
standard originally introduced by Hastie, Tibshirani and Friedman in «The Elements of Statistical
Learning», link),
and the resulting sets of loss values compared by plotting them as boxplots. Below we do so that
for the BIC and BDe scores using hc()
.
> cv.bic = bn.cv(alarm, bn = "hc", runs = 10, + algorithm.args = list(score = "bic")) > cv.bde = bn.cv(alarm, bn = "hc", runs = 10, + algorithm.args = list(score = "bde", iss = 1)) > plot(cv.bic, cv.bde, xlab = c("BIC", "BDe"))
Note that, while it is customary to use k = 10
, a different number of folds can be
specified with the k
argument.
> bn.cv(alarm, bn = "hc", k = 5)
k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 5 loss function: Log-Likelihood Loss (disc.) expected loss: 10.87307
Custom folds in cross-validation
The custom-folds
method works in the same way as k-fold cross-validation,
but folds are specified manually with a folds
argument.
> folds = list(1:5000, 5001:15000, 15001:20000) > cv.custom = bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"), + method = "custom-folds", folds = folds) > cv.custom
custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) expected loss: 10.85651
The two methods are otherwise identical, as shown below: the R
objects returned by
bn.cv()
are identical (with the exception of the method's label) if the same folds are
used.
> cv.random = bn.cv(alarm, bn = "hc", method = "k-fold") > folds = lapply(cv.random, `[[`, "test") > cv.custom = bn.cv(alarm, bn = "hc", method = "custom-folds", + folds = folds) > all.equal(cv.random, cv.custom, check.attributes = FALSE)
[1] TRUE
Note that:
- Each observation can only be included in a single fold.
> folds = list(1:5000, 4000:10000, 10001:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
## Error: some observations are included in more than one fold.
- Each fold must contain at least one observation.
> folds = list(numeric(0), 1:2000, 2001:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
## Error: some folds contain no observations.
- All observations must be assigned to the folds.
> folds = list(1:1000, 2000:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
## Error: not all observations are assigned to a fold.
runs
is not a valid argument for this cross-validation method, because the ability of performing multiple runs relies on splitting the data randomly into folds.
To perform multiple runs, each set of manually specified folds (one for each run
)
should be saved in a list.
> folds = list( + list(1:5000, 5001:10000, 10001:20000), + list(1:10000, 10001:15000, 15001:20000), + list(1:5000, 5001:15000, 15001:20000) + ) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) number of runs: 3 average loss over the runs: 10.86871 standard deviation of the loss: 0.01063191
The following function can be used to generate the folds randomly,
> random.folds = function(data, k) split(sample(nrow(data)), seq(k))
for a single run or cross-validation:
> folds = random.folds(data = alarm, k = 10) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) expected loss: 10.86904
or for multiple runs.
> folds = replicate(10, random.folds(data = alarm, k = 10), simplify = FALSE) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss (disc.) number of runs: 10 average loss over the runs: 10.86851 standard deviation of the loss: 0.002829784
Hold-out cross-validation
The syntax for hold-out cross-validation is very similar to that for k-fold cross-validation, but:
method
must be set to "hold-out
";k
denotes the number of times the sample will be split into training and test subsamples (the default is 10);- and optionally
m
denotes the number of observations to be sampled for the test subsample (the default is 1/10th of the sample size).
So, for instance:
> bn.cv(alarm, bn = "hc", method = "hold-out", k = 25, m = 200, + algorithm.args = list(score = "bde", iss = 10))
hold-out cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of splits: 25 size of the test subset: 200 loss function: Log-Likelihood Loss (disc.) expected loss: 10.7858
Comparing different network structures
bn.cv()
can also be used to compare the goodness of fit of different network
structures suggested by expert knowledge; using all the data to fit the parameters of those networks
and to evaluate their goodness of fit produces optimistically biased results.
Let's start with the most obvious comparison: the true network structure against the empty one.
> res = empty.graph(names(alarm)) > modelstring(res) = paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]", + "[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]", + "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]", + "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]", + "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]", + "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]", + "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "") > bn.cv(alarm, res, algorithm.args = list(score = "aic"))
k-fold cross-validation for Bayesian networks target network structure: [FIO2][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][ERLO][ERCA][HIST|LVF] [TPR|APL][PAP|PMB][LVV|HYP:LVF][STKV|HYP:LVF][SHNT|PMB:INT][VMCH|MVS] [CVP|LVV][PCWP|LVV][VTUB|DISC:VMCH][PRSS|INT:KINK:VTUB][VLNG|INT:KINK:VTUB] [MINV|INT:VLNG][VALV|INT:VLNG][PVS|FIO2:VALV][ACO2|VALV][SAO2|SHNT:PVS] [ECO2|ACO2:VLNG][CCHL|TPR:SAO2:ANES:ACO2][HR|CCHL][CO|STKV:HR][HRBP|ERLO:HR] [HREK|HR:ERCA][HRSA|HR:ERCA][BP|TPR:CO] number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 10.81904
> bn.cv(alarm, empty.graph(names(alarm)))
k-fold cross-validation for Bayesian networks target network structure: [CVP][PCWP][HIST][TPR][BP][CO][HRBP][HREK][HRSA][PAP][SAO2][FIO2][PRSS][ECO2] [MINV][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][LVV][STKV][CCHL][ERLO] [HR][ERCA][SHNT][PVS][ACO2][VALV][VLNG][VTUB][VMCH] number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 21.46833
As expected, the first (true) network is a better fit than the second (empty network). The same is true for the following structure, which generated at random and has no relationship with the data.
> rand = random.graph(names(alarm)) > rand
Random/Generated Bayesian network model: [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL][ANES] [KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP][LVV|BP:ECO2:KINK] [PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP][HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP] [DISC|HIST:APL][ERLO|SAO2:PMB][VTUB|CVP:KINK:LVV][MVS|TPR:FIO2] [STKV|PRSS:KINK:DISC][CCHL|DISC][SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS] [ACO2|HYP] nodes: 37 arcs: 36 undirected arcs: 0 directed arcs: 36 average markov blanket size: 2.86 average neighbourhood size: 1.95 average branching factor: 0.97 generation algorithm: Full Ordering arc sampling probability: 0.05555556
> bn.cv(alarm, bn = rand)
k-fold cross-validation for Bayesian networks target network structure: [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL][ANES] [KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP][LVV|BP:ECO2:KINK] [PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP][HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP] [DISC|HIST:APL][ERLO|SAO2:PMB][VTUB|CVP:KINK:LVV][MVS|TPR:FIO2] [STKV|PRSS:KINK:DISC][CCHL|DISC][SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS] [ACO2|HYP] number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 21.24115
Cross-validation and predictive error
Cross-validation is also commonly used in classification problems to compute the
prediction error of a particular model. We can do that with bn.cv()
by
specifying
loss = "pred"
to make frequentist predictions withpredict(..., method = "parents")
; orloss = "pred-lw"
to make Bayesian predictions withpredict(..., method = "bayes-lw")
;
and passing the label of the target node to the loss function via the loss.args
argument. The first predicts the target node from its parents; the second from all the available
nodes or from a subset of nodes specified in loss.args
list as element called
from
.
> xval = bn.cv(alarm, bn = "hc", loss = "pred", loss.args = list(target = "STKV")) > xval
k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Classification Error training node: STKV expected loss: 0.1745
We may also want to compute other statistics from that run of cross-validation, like sensitivity
and specificity, or just to tabulate observed and predicted values. To that end, we can extract the
observed and predicted values for each fold from xval
as follows.
> OBS = unlist(lapply(xval, `[[`, "observed")) > PRED = unlist(lapply(xval, `[[`, "predicted")) > str(OBS)
Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 2 3 3 3 3 2 3 3 3 ...
> str(PRED)
Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 3 3 3 3 3 3 3 3 3 ...
> table(OBS, PRED)
PRED OBS HIGH LOW NORMAL HIGH 0 7 826 LOW 0 950 2624 NORMAL 0 33 15560
Cross-validation and predictive correlation
Predictive correlation and mean square error are used to evaluate Gaussian Bayesian networks in
the same way as prediction error is used to evaluate discrete Bayesian networks. bn.cv()
implements:
loss = "cor"
to compute the correlation between observed values and frequentist predictions frompredict(..., method = "parents")
;loss = "cor-lw"
to compute the correlation between observed values and frequentist predictions frompredict(..., method = "bayes-lw")
;loss = "mse"
to compute the correlation between observed values and frequentist predictions frompredict(..., method = "parents")
;loss = "mse-lw"
to compute the correlation between observed values and frequentist predictions frompredict(..., method = "bayes-lw")
.
> bn.cv(gaussian.test, bn = "hc", loss = "cor", loss.args = list(target = "C"))
k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Predictive Correlation training node: C expected loss: 0.9967885
> bn.cv(gaussian.test, bn = "hc", loss = "mse-lw", loss.args = list(target = "C"))
k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Mean Squared Error (Posterior, Gauss.) training node: C expected loss: 0.2592348
Thu Nov 24 19:16:36 2022
with bnlearn
4.9-20221107
and R version 4.2.2 (2022-10-31)
.