Bayesian networks and cross-validation

Choosing a Bayesian network learning strategy

Cross-validation is a standard way to obtain unbiased estimates of a model's goodness of fit. By comparing such estimates for different learning strategies (different combinations of learning algorithms, fitting techniques and the respective parameters) we can choose the optimal one for the data at hand in a principled way.

bnlearn implements three cross-validation methods in the bn.cv() function (documented here):

  1. k-fold cross-validation (the default): the data are randomly partitioned into k subsets. Each subset is used in turn to validate the model fitted on the remaining k - 1 subsets.
  2. Custom folds cross-validation: the data are manually partitioned into subsets by the user into subsets, which are then used as in k-fold cross-validation. Subsets are not constrained to have the same size.
  3. Hold-out cross-validation: the data are repeatedly (randomly) partitioned in a training and test subset of given size m and n - m, with observations assigned randomly to each subset at each repetition. Each test subset is used to validate the model fitted on the corresponding training subset.

bn.cv() requires the following parameters for all methods:

  1. method: the label of the cross-validation method, either "k-fold" (the default), "custom-fold" or "hold-out";
  2. data: the data set the Bayesian network will be estimated from;
  3. bn: the structure learning algorithm to be used, or a fixed network structure (to cross-validate just parameter learning);
  4. loss:: the loss function used to evaluate the goodness of fit of the learned network;
  5. fit: the fitting method (such as mle or bayes) to be used to fit the parameters.

Additional parameters for all these components can be specified as well, providing a fine-grained control of the learning strategy, using the following arguments of bn.cv():

  • algorithm.args for additional arguments to be passed to the learning algorithm;
  • loss.args for additional arguments to be passed to the loss function;
  • fit.args for additional arguments to be passed to bn.fit() for fitting the parameters of the Bayesian network.

k-fold cross-validation

For example, let's compare the performance of three different scores in a hill climbing search on the ALARM data set included in bnlearn. The loss funtion we use is the so-called log-likelihood loss, which is the negated expected log-likelihood; so lower values are better. Hill cimbing is implemented in the hc() function.

> library(bnlearn)
> data(alarm)
> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 1), fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.86484

In this case the bn argument specifies the label of a structure learning algorithm, and algorithm.args is an optional list of arguments for the algorithm specified by bn.

> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bde", iss = 10), fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.84377
> bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"), fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.90714

Partially directed graphs obtained from learning are automatically extended to directed acyclic graphs with cextend, so that the parameters can be fit and the loss function computed.

> bn.cv(alarm, bn = "iamb", fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             IAMB 
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         13.70839

Tipically, 10-fold cross-validation is performed 10 times for each learning strategy (golden standard originally introduced by Hastie, Tibshirani and Friedman in «The Elements of Statistical Learning», link), and the resulting sets of loss values compared by plotting them as boxplots. Below we do so that for the BIC and BDe scores using hc().

> cv.bic = bn.cv(alarm, bn = "hc", runs = 10,
+            algorithm.args = list(score = "bic"), fit = "bayes")
> cv.bde = bn.cv(alarm, bn = "hc", runs = 10,
+            algorithm.args = list(score = "bde", iss = 1), fit = "bayes")
> plot(cv.bic, cv.bde, xlab = c("BIC", "BDe"))
Loading required namespace: lattice
plot of chunk unnamed-chunk-6

Note that, while it is customary to use k = 10, a different number of folds can be specified with the k argument.

> bn.cv(alarm, bn = "hc", k = 5, fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       5 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.90896

Custom folds in cross-validation

The custom-folds method works in the same way as k-fold cross-validation, but folds are specified manually with a folds argument.

> folds = list(1:5000, 5001:15000, 15001:20000)
> cv.custom = bn.cv(alarm, bn = "hc", algorithm.args = list(score = "bic"),
+               method = "custom-folds", folds = folds, fit = "bayes")
> cv.custom

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.90074

The two methods are otherwise identical, as shown below: the R objects returned by bn.cv() are identical (with the exception of the method's label) if the same folds are used.

> cv.random = bn.cv(alarm, bn = "hc", method = "k-fold", fit = "bayes")
> folds = lapply(cv.random, `[[`, "test")
> cv.custom = bn.cv(alarm, bn = "hc", method = "custom-folds",
+                 folds = folds, fit = "bayes")
> all.equal(cv.random, cv.custom, check.attributes = FALSE)
[1] TRUE

Note that:

  1. Each observation can only be included in a single fold.
    > folds = list(1:5000, 4000:10000, 10001:20000)
    > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
    
    ## Error: some observations are included in more than one fold.
    
  2. Each fold must contain at least one observation.
    > folds = list(numeric(0), 1:2000, 2001:20000)
    > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
    
    ## Error: some folds contain no observations.
    
  3. All observations must be assigned to the folds.
    > folds = list(1:1000, 2000:20000)
    > bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds)
    
    ## Error: not all observations are assigned to a fold.
    
  4. runs is not a valid argument for this cross-validation method, because the ability of performing multiple runs relies on splitting the data randomly into folds.

To perform multiple runs, each set of manually specified folds (one for each run) should be saved in a list.

> folds = list(
+   list(1:5000, 5001:10000, 10001:20000),
+   list(1:10000, 10001:15000, 15001:20000),
+   list(1:5000, 5001:15000, 15001:20000)
+ )
> bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds, fit = "bayes")

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss 
  number of runs:                        3 
  average loss over the runs:            10.91115 
  standard deviation of the loss:        0.009248155

The following function can be used to generate the folds randomly,

> random.folds = function(data, k) split(sample(nrow(data)), seq(k))

for a single run or cross-validation:

> folds = random.folds(data = alarm, k = 10)
> bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds, fit = "bayes")

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.90059

or for multiple runs.

> folds = replicate(10, random.folds(data = alarm, k = 10), simplify = FALSE)
> bn.cv(alarm, bn = "hc", method = "custom-folds", folds = folds, fit = "bayes")

  custom-folds cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  loss function:                         Log-Likelihood Loss 
  number of runs:                        10 
  average loss over the runs:            10.90095 
  standard deviation of the loss:        0.002738641

Hold-out cross-validation

The syntax for hold-out cross-validation is very similar to that for k-fold cross-validation, but:

  • method must be set to "hold-out";
  • k denotes the number of times the sample will be split into training and test subsamples (the default is 10);
  • and optionally m denotes the number of observations to be sampled for the test subsample (the default is 1/10th of the sample size).

So, for instance:

> bn.cv(alarm, bn = "hc", method = "hold-out", k = 25, m = 200,
+   algorithm.args = list(score = "bde", iss = 10), fit = "bayes")

  hold-out cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of splits:                      25 
  size of the test subset:               200 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.80679

Comparing different network structures

bn.cv() can also be used to compare the goodness of fit of different network structures suggested by expert knowledge; using all the data to fit the parameters of those networks and to evaluate their goodness of fit produces optimistically biased results.

Let's start with the most obvious comparison: the true network structure against the empty one.

> res = empty.graph(names(alarm))
> modelstring(res) = paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]",
+   "[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]",
+   "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]",
+   "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]",
+   "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]",
+   "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]",
+   "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "")
> bn.cv(alarm, res, algorithm.args = list(score = "aic"), fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target network structure:
   [FIO2][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][ERLO][ERCA][HIST|LVF]
   [TPR|APL][PAP|PMB][LVV|HYP:LVF][STKV|HYP:LVF][SHNT|PMB:INT][VMCH|MVS]
   [CVP|LVV][PCWP|LVV][VTUB|DISC:VMCH][PRSS|INT:KINK:VTUB][VLNG|INT:KINK:VTUB]
   [MINV|INT:VLNG][VALV|INT:VLNG][PVS|FIO2:VALV][ACO2|VALV][SAO2|SHNT:PVS]
   [ECO2|ACO2:VLNG][CCHL|TPR:SAO2:ANES:ACO2][HR|CCHL][CO|STKV:HR][HRBP|ERLO:HR]
   [HREK|HR:ERCA][HRSA|HR:ERCA][BP|TPR:CO]
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         10.84303
> bn.cv(alarm, empty.graph(names(alarm)), fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target network structure:
   [CVP][PCWP][HIST][TPR][BP][CO][HRBP][HREK][HRSA][PAP][SAO2][FIO2][PRSS][ECO2]
   [MINV][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][LVV][STKV][CCHL][ERLO]
   [HR][ERCA][SHNT][PVS][ACO2][VALV][VLNG][VTUB][VMCH]
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         21.46832

As expected, the first (true) network is a better fit than the second (empty network). The same is true for the following structure, which generated at random and has no relationship with the data.

> rand = random.graph(names(alarm))
> rand

  Random/Generated Bayesian network

  model:
   [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL][ANES]
   [KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP][LVV|BP:ECO2:KINK]
   [PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP][HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP]
   [DISC|HIST:APL][ERLO|SAO2:PMB][VTUB|CVP:KINK:LVV][MVS|TPR:FIO2]
   [STKV|PRSS:KINK:DISC][CCHL|DISC][SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS]
   [ACO2|HYP]
  nodes:                                 37 
  arcs:                                  36 
    undirected arcs:                     0 
    directed arcs:                       36 
  average markov blanket size:           2.86 
  average neighbourhood size:            1.95 
  average branching factor:              0.97 

  generation algorithm:                  Full Ordering 
  arc sampling probability:              0.05555556
> bn.cv(alarm, bn = rand, fit = "bayes")

  k-fold cross-validation for Bayesian networks

  target network structure:
   [CVP][PCWP][TPR][BP][CO][HRBP][HREK][PAP][PRSS][ECO2][MINV][LVF][APL][ANES]
   [KINK][HR][ERCA][HIST|PCWP][SAO2|HREK][PMB|PCWP][INT|PAP][LVV|BP:ECO2:KINK]
   [PVS|CO:HREK:LVF][VALV|ANES][VLNG|HRBP][HRSA|PCWP:HIST:TPR][FIO2|HIST:PAP]
   [DISC|HIST:APL][ERLO|SAO2:PMB][VTUB|CVP:KINK:LVV][MVS|TPR:FIO2]
   [STKV|PRSS:KINK:DISC][CCHL|DISC][SHNT|HRSA][VMCH|PCWP:HRSA][HYP|FIO2:MVS]
   [ACO2|HYP]
  number of folds:                       10 
  loss function:                         Log-Likelihood Loss 
  expected loss:                         21.2474

Cross-validation and predictive accuracy

Cross-validation is also commonly used in classification problems to compute the predictive accuracy of a particular model or learning approach. We can do that with bn.cv() by specifying

  • loss = "pred" to compute the classification error;
  • loss = "f1" to compute the F1 score;
  • loss = "auroc" to compute the area under the ROC curve.

The optional arguments for these loss functions are target to specify the target node, predict to specify how predictions will be used and predict.args for the optional arguments to the prediction method. The prediction method can be one of "parents", "bayes-lw" and "exact" as in predict() (see here), and its optional arguments are likewise the same.

> xval = bn.cv(alarm, bn = "hc", loss = "pred", loss.args = list(target = "STKV", predict = "bayes-lw"))
> xval

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Classification Error 
  training node:                         STKV 
  expected loss:                         0.06295813

We may also want to compute other statistics from that run of cross-validation, like sensitivity and specificity, or just to tabulate observed and predicted values. To that end, we can extract the observed and predicted values for each fold from xval as follows.

> OBS = unlist(lapply(xval, `[[`, "observed"))
> PRED = unlist(lapply(xval, `[[`, "predicted"))
> str(OBS)
 Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 2 3 3 3 3 2 3 3 3 ...
> str(PRED)
 Factor w/ 3 levels "HIGH","LOW","NORMAL": 3 2 3 3 3 3 2 3 1 3 ...
> table(OBS, PRED)
        PRED
OBS       HIGH   LOW NORMAL
  HIGH     165    11    655
  LOW        5  3357    193
  NORMAL    53   337  15142

Cross-validation and predictive correlation

Predictive correlation and mean square error are used to evaluate Gaussian Bayesian networks in the same way as the classification error, F1 and AUROC are used to evaluate discrete Bayesian networks. bn.cv() implements:

  1. loss = "cor" to compute the correlation between observed and predicted values;
  2. loss = "mse" to compute the correlation between observed and predicted values.

Both loss functions take arguments target, predict and predict.args via the loss.args argument of bn.cv().

> bn.cv(gaussian.test, bn = "hc", loss = "cor", loss.args = list(target = "C", predict = "exact"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Predictive Correlation 
  training node:                         C 
  expected loss:                         0.9967889
> bn.cv(gaussian.test, bn = "hc", loss = "mse", loss.args = list(target = "C"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of folds:                       10 
  loss function:                         Mean Squared Error 
  training node:                         C 
  expected loss:                         0.2592687
Last updated on Mon Aug 5 02:51:53 2024 with bnlearn 5.0 and R version 4.4.1 (2024-06-14).