## Bayesian networks and cross-validation

### Choosing a Bayesian network learning strategy

Cross-validation is a standard way to obtain unbiased estimates of a model's goodness of fit. By comparing such estimates for different learning strategies (different combinations of learning algorithms, fitting techniques and the respective parameters) we can choose the optimal one for the data at hand in a principled way.

**bnlearn** implements three cross-validation methods in the `bn.cv()`

function (documented
here):

*k-fold cross-validation*(the default): the data are randomly partitioned into*k*subsets. Each subset is used in turn to validate the model fitted on the remaining*k - 1*subsets.*Custom folds cross-validation*: the data are manually partitioned into subsets by the user into subsets, which are then used as in*k*-fold cross-validation. Subsets are not constrained to have the same size.*Hold-out cross-validation*: the data are repeatedly (randomly) partitioned in a training and test subset of given size*m*and*n - m*, with observations assigned randomly to each subset at each repetition. Each test subset is used to validate the model fitted on the corresponding training subset.

`bn.cv()`

requires the following parameters for all methods:

`method`

: the label of the cross-validation method, either`"k-fold"`

(the default),`"custom-fold"`

or`"hold-out"`

;`data`

: the data set the Bayesian network will be estimated from;`bn`

: the structure learning algorithm to be used, or a fixed network structure (to cross-validate just parameter learning);`loss:`

: the loss function used to evaluate the goodness of fit of the learned network;`fit`

: the fitting method (such as`mle`

or`bayes`

) to be used to fit the parameters.

Additional parameters for all these components can be specified as well, providing a fine-grained control of the
learning strategy, using the following arguments of `bn.cv()`

:

`algorithm.args`

for additional arguments to be passed to the learning algorithm;`loss.args`

for additional arguments to be passed to the loss function;`fit.args`

for additional arguments to be passed to`bn.fit()`

for fitting the parameters of the Bayesian network.

*k*-fold cross-validation

For example, let's compare the performance of three different scores in a hill climbing search on the
ALARM data set included in **bnlearn**. The loss
funtion we use is the so-called *log-likelihood loss*, which is the negated expected log-likelihood; so lower
values are better. Hill cimbing is implemented in the `hc()`

function.

> library(bnlearn) > data(alarm) > bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 1), fit = "bayes")

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss expected loss: 10.86484

In this case the `bn`

argument specifies the label of a structure learning algorithm, and
`algorithm.args`

is an optional list of arguments for the algorithm specified by `bn`

.

> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 10), fit = "bayes")

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss expected loss: 10.84377

> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"), fit = "bayes")

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss expected loss: 10.90714

Partially directed graphs obtained from learning are automatically extended to directed acyclic graphs with
`cextend`

, so that the parameters can be fit and the loss function computed.

> bn.cv(alarm, bn = "iamb", fit = "bayes")

k-fold cross-validation for Bayesian networks target learning algorithm: IAMB number of folds: 10 loss function: Log-Likelihood Loss expected loss: 13.70839

Tipically, 10-fold cross-validation is performed 10 times for each learning strategy (golden standard originally
introduced by Hastie, Tibshirani and Friedman in «The Elements of Statistical Learning»,
link), and the
resulting sets of loss values compared by plotting them as boxplots. Below we do so that for the BIC and BDe scores
using `hc()`

.

> cv.bic = bn.cv(alarm, bn = "hc", runs = 10, + algorithm.args = list(score = "bic"), fit = "bayes") > cv.bde = bn.cv(alarm, bn = "hc", runs = 10, + algorithm.args = list(score = "bde", iss = 1), fit = "bayes") > plot(cv.bic, cv.bde, xlab = c("BIC", "BDe"))

Note that, while it is customary to use `k = 10`

, a different number of folds can be specified with the
`k`

argument.

> bn.cv(alarm, bn = "hc", k = 5, fit = "bayes")

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 5 loss function: Log-Likelihood Loss expected loss: 10.90896

#### Custom folds in cross-validation

The `custom-folds`

method works in the same way as *k*-fold cross-validation, but folds are
specified manually with a `folds`

argument.

> folds = list(1:5000, 5001:15000, 15001:20000) > cv.custom = bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"), + method = "custom-folds", folds = folds, fit = "bayes") > cv.custom

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss expected loss: 10.90074

The two methods are otherwise identical, as shown below: the `R`

objects returned by `bn.cv()`

are identical (with the exception of the method's label) if the same folds are used.

> cv.random = bn.cv(alarm, bn = "hc", method = "k-fold", fit = "bayes") > folds = lapply(cv.random, `[[`, "test") > cv.custom = bn.cv(alarm, bn = "hc", method = "custom-folds", + folds = folds, fit = "bayes") > all.equal(cv.random, cv.custom, check.attributes = FALSE)

[1] TRUE

Note that:

- Each observation can only be included in a single fold.
> folds = list(1:5000, 4000:10000, 10001:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

## Error: some observations are included in more than one fold.

- Each fold must contain at least one observation.
> folds = list(numeric(0), 1:2000, 2001:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

## Error: some folds contain no observations.

- All observations must be assigned to the folds.
> folds = list(1:1000, 2000:20000) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)

## Error: not all observations are assigned to a fold.

`runs`

is not a valid argument for this cross-validation method, because the ability of performing multiple runs relies on splitting the data randomly into folds.

To perform multiple runs, each set of manually specified folds (one for each `run`

) should be saved in a
list.

> folds = list( + list(1:5000, 5001:10000, 10001:20000), + list(1:10000, 10001:15000, 15001:20000), + list(1:5000, 5001:15000, 15001:20000) + ) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds, fit = "bayes")

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss number of runs: 3 average loss over the runs: 10.91115 standard deviation of the loss: 0.009248155

The following function can be used to generate the folds randomly,

> random.folds = function(data, k) split(sample(nrow(data)), seq(k))

for a single run or cross-validation:

> folds = random.folds(data = alarm, k = 10) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds, fit = "bayes")

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss expected loss: 10.90059

or for multiple runs.

> folds = replicate(10, random.folds(data = alarm, k = 10), simplify = FALSE) > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds, fit = "bayes")

custom-folds cross-validation for Bayesian networks target learning algorithm: Hill-Climbing loss function: Log-Likelihood Loss number of runs: 10 average loss over the runs: 10.90095 standard deviation of the loss: 0.002738641

#### Hold-out cross-validation

The syntax for hold-out cross-validation is very similar to that for *k*-fold cross-validation, but:

`method`

must be set to "`hold-out`

";`k`

denotes the number of times the sample will be split into training and test subsamples (the default is 10);- and optionally
`m`

denotes the number of observations to be sampled for the test subsample (the default is 1/10th of the sample size).

So, for instance:

> bn.cv(alarm, bn = "hc", method = "hold-out", k = 25, m = 200, + algorithm.args = list(score = "bde", iss = 10), fit = "bayes")

hold-out cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of splits: 25 size of the test subset: 200 loss function: Log-Likelihood Loss expected loss: 10.80679

### Comparing different network structures

`bn.cv()`

can also be used to compare the goodness of fit of different network structures suggested by
expert knowledge; using all the data to fit the parameters of those networks and to evaluate their goodness of fit
produces optimistically biased results.

Let's start with the most obvious comparison: the true network structure against the empty one.

> res = empty.graph(names(alarm)) > modelstring(res) = paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]", + "[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]", + "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]", + "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]", + "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]", + "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]", + "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "") > bn.cv(alarm, res, algorithm.args = list(score = "aic"), fit = "bayes")

k-fold cross-validation for Bayesian networks target network structure: [FIO2][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][ERLO][ERCA][HIST|LVF] [TPR|APL][PAP|PMB][LVV|HYP:LVF][STKV|HYP:LVF][SHNT|PMB:INT][VMCH|MVS] [CVP|LVV][PCWP|LVV][VTUB|DISC:VMCH][PRSS|INT:KINK:VTUB][VLNG|INT:KINK:VTUB] [MINV|INT:VLNG][VALV|INT:VLNG][PVS|FIO2:VALV][ACO2|VALV][SAO2|SHNT:PVS] [ECO2|ACO2:VLNG][CCHL|TPR:SAO2:ANES:ACO2][HR|CCHL][CO|STKV:HR][HRBP|ERLO:HR] [HREK|HR:ERCA][HRSA|HR:ERCA][BP|TPR:CO] number of folds: 10 loss function: Log-Likelihood Loss expected loss: 10.84303

> bn.cv(alarm, empty.graph(names(alarm)), fit = "bayes")

k-fold cross-validation for Bayesian networks target network structure: [CVP][PCWP][HIST][TPR][BP][CO][HRBP][HREK][HRSA][PAP][SAO2][FIO2][PRSS][ECO2] [MINV][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][LVV][STKV][CCHL][ERLO] [HR][ERCA][SHNT][PVS][ACO2][VALV][VLNG][VTUB][VMCH] number of folds: 10 loss function: Log-Likelihood Loss expected loss: 21.46832

As expected, the first (true) network is a better fit than the second (empty network). The same is true for the following structure, which generated at random and has no relationship with the data.

> rand = random.graph(names(alarm)) > rand

Random/Generated Bayesian network model: [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL][ANES] [KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP][LVV|BP:ECO2:KINK] [PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP][HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP] [DISC|HIST:APL][ERLO|SAO2:PMB][VTUB|CVP:KINK:LVV][MVS|TPR:FIO2] [STKV|PRSS:KINK:DISC][CCHL|DISC][SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS] [ACO2|HYP] nodes: 37 arcs: 36 undirected arcs: 0 directed arcs: 36 average markov blanket size: 2.86 average neighbourhood size: 1.95 average branching factor: 0.97 generation algorithm: Full Ordering arc sampling probability: 0.05555556

> bn.cv(alarm, bn = rand, fit = "bayes")

k-fold cross-validation for Bayesian networks target network structure: [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL][ANES] [KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP][LVV|BP:ECO2:KINK] [PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP][HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP] [DISC|HIST:APL][ERLO|SAO2:PMB][VTUB|CVP:KINK:LVV][MVS|TPR:FIO2] [STKV|PRSS:KINK:DISC][CCHL|DISC][SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS] [ACO2|HYP] number of folds: 10 loss function: Log-Likelihood Loss expected loss: 21.2474

### Cross-validation and predictive accuracy

Cross-validation is also commonly used in classification problems to compute the *predictive accuracy* of
a particular model or learning approach. We can do that with `bn.cv()`

by specifying

`loss = "pred"`

to compute the classification error;`loss = "f1"`

to compute the F1 score;`loss = "auroc"`

to compute the area under the ROC curve.

The optional arguments for these loss functions are `target`

to specify the target node,
`predict`

to specify how predictions will be used and `predict.args`

for the optional arguments
to the prediction method. The prediction method can be one of `"parents"`

, `"bayes-lw"`

and
`"exact"`

as in `predict()`

(see here), and its optional arguments are
likewise the same.

> xval = bn.cv(alarm, bn = "hc", loss = "pred", loss.args = list(target = "STKV", predict = "bayes-lw")) > xval

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Classification Error training node: STKV expected loss: 0.06295813

We may also want to compute other statistics from that run of cross-validation, like sensitivity and specificity, or
just to tabulate observed and predicted values. To that end, we can extract the observed and predicted values for each
fold from `xval`

as follows.

> OBS = unlist(lapply(xval, `[[`, "observed")) > PRED = unlist(lapply(xval, `[[`, "predicted")) > str(OBS)

Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 2 3 3 3 3 2 3 3 3 ...

> str(PRED)

Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 2 3 3 3 3 2 3 1 3 ...

> table(OBS, PRED)

PRED OBS HIGH LOW NORMAL HIGH 165 11 655 LOW 5 3357 193 NORMAL 53 337 15142

### Cross-validation and predictive correlation

Predictive correlation and mean square error are used to evaluate Gaussian Bayesian networks in the same way as the
classification error, F1 and AUROC are used to evaluate discrete Bayesian networks. `bn.cv()`

implements:

`loss = "cor"`

to compute the correlation between observed and predicted values;`loss = "mse"`

to compute the correlation between observed and predicted values.

Both loss functions take arguments `target`

, `predict`

and `predict.args`

via the
`loss.args`

argument of `bn.cv()`

.

> bn.cv(gaussian.test, bn = "hc", loss = "cor", loss.args = list(target = "C", predict = "exact"))

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Predictive Correlation training node: C expected loss: 0.9967889

> bn.cv(gaussian.test, bn = "hc", loss = "mse", loss.args = list(target = "C"))

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Mean Squared Error training node: C expected loss: 0.2592687

`Mon Aug 5 02:51:53 2024`

with **bnlearn**

`5.0`

and `R version 4.4.1 (2024-06-14)`

.