Bayesian networks and cross-validation

Choosing a Bayesian network learning strategy

Cross-validation provides a standard way to obtain unbiased estimates of a model's goodness of fit. By comparing such measures for different learning strategies (different combinations of learning algorithms, fitting techniques and the respective parameters) the optimal one for the data at hand can be chosen in a principled way.

bnlearn implements cross-validation in the bn.cv function (documented here, which requires the following parameters:

  1. the data set the Bayesian network will be estimated from.
  2. the structure learning algorithm to be used.
  3. the loss function used to evaluate the goodness of fit of the learned network.
  4. the fitting method (Bayesian or Maximum Likelihood) to be used to fit the parameters.

Additional arguments for all these components can be specified as well, providing absolute control of the learning strategy.

Let's for example compare the performance of three different scores in a hill climbing search on the ALARM data. The loss funtion is the so-called log-likelihood loss, which is the negated expected loglikelihood; so lower values are better.

> bn.cv(alarm, "hc", algorithm.args = list(score = "aic"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of subsets:                     10 
  loss function:                         Log-Likelihood Loss (discrete) 
  expected loss:                         10.81962 

> bn.cv(alarm, "hc", algorithm.args = list(score = "bde", iss = 10))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of subsets:                     10 
  loss function:                         Log-Likelihood Loss (discrete) 
  expected loss:                         10.82288 

> bn.cv(alarm, "hc", algorithm.args = list(score = "bic"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of subsets:                     10 
  loss function:                         Log-Likelihood Loss (discrete) 
  expected loss:                         10.87353 

The degraded performance of the bde score can be attributed to the value chosen for iss; the default one seems to work better.

> bn.cv(alarm, "hc", algorithm.args = list(score = "bde"))

  k-fold cross-validation for Bayesian networks

  target learning algorithm:             Hill-Climbing 
  number of subsets:                     10 
  loss function:                         Log-Likelihood Loss (discrete) 
  expected loss:                         10.81996 

Tipically 10-fold cross-validation is performed 10 times for each learning strategy, and the resulting sets of loss values compared by plotting them as boxplots.

Comparing different network structures

Another use of the bn.cv function is to compare the goodness of fit of different network structures; using the same data the netork were estimated from produces optimistically biased results.

Let's start with the most obvious comparison: the true network structure against the empty one.

> res = empty.graph(names(alarm))
> modelstring(res) = paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]",
+   "[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]",
+   "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]",
+   "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]",
+   "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]",
+   "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]",
+   "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "")
> bn.cv(alarm, res, algorithm.args = list(score = "aic"))

  k-fold cross-validation for Bayesian networks

  target network structure:
   [FIO2][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][ERLO][ERCA][HIST|LVF]
   [TPR|APL][PAP|PMB][LVV|HYP:LVF][STKV|HYP:LVF][SHNT|PMB:INT][VMCH|MVS][CVP|LVV]
   [PCWP|LVV][VTUB|DISC:VMCH][PRSS|INT:KINK:VTUB][VLNG|INT:KINK:VTUB][MINV|INT:VLNG]
   [VALV|INT:VLNG][PVS|FIO2:VALV][ACO2|VALV][SAO2|SHNT:PVS][ECO2|ACO2:VLNG]
   [CCHL|TPR:SAO2:ANES:ACO2][HR|CCHL][CO|STKV:HR][HRBP|ERLO:HR][HREK|HR:ERCA]
   [HRSA|HR:ERCA][BP|TPR:CO] 
  number of subsets:                     10 
  loss function:                         Log-Likelihood Loss (discrete) 
  expected loss:                         10.81905 

> bn.cv(alarm, empty.graph(names(alarm)))

  k-fold cross-validation for Bayesian networks

  target network structure:
   [CVP][PCWP][HIST][TPR][BP][CO][HRBP][HREK][HRSA][PAP][SAO2][FIO2][PRSS][ECO2]
   [MINV][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][LVV][STKV][CCHL][ERLO]
   [HR][ERCA][SHNT][PVS][ACO2][VALV][VLNG][VTUB][VMCH] 
  number of subsets:                     10 
  loss function:                         Log-Likelihood Loss (discrete) 
  expected loss:                         21.46784 

As expected the first one behaves a lot better the second one. Let's now try a third network, generated at random.

> rand = random.graph(names(alarm))
> rand

  Randomly generated Bayesian network

  model:
    [CVP][PCWP][HIST][TPR][CO][HRBP][HREK][HRSA][SAO2][FIO2][PRSS][ECO2][HYP][ANES]
    [PMB][INT][DISC][BP|TPR][PAP|HREK:HRSA][MVS|TPR:ECO2][LVF|HYP][APL|SAO2][KINK|ANES]
    [STKV|HREK][CCHL|HREK:HYP][ERCA|DISC][SHNT|HREK:SAO2][VTUB|FIO2:ANES][MINV|CO:PAP]
    [ERLO|APL:KINK][VLNG|PAP:ECO2:LVF][LVV|ECO2:MINV:KINK][HR|CVP:MINV:APL][VMCH|TPR:MINV]
    [PVS|LVV][ACO2|MINV:HR][VALV|HIST:KINK:PVS] 
  nodes:                                 37 
  arcs:                                  37 
    undirected arcs:                     0 
    directed arcs:                       37 
  average markov blanket size:           3.08 
  average neighbourhood size:            2.00 
  average branching factor:              1.00 

  generation algorithm:                  Full Ordering 
  arc sampling probability:              0.05555556 

> bn.cv(alarm, rand)

  k-fold cross-validation for Bayesian networks

  target network structure:
    [CVP][PCWP][HIST][TPR][CO][HRBP][HREK][HRSA][SAO2][FIO2][PRSS][ECO2][HYP][ANES]
    [PMB][INT][DISC][BP|TPR][PAP|HREK:HRSA][MVS|TPR:ECO2][LVF|HYP][APL|SAO2][KINK|ANES]
    [STKV|HREK][CCHL|HREK:HYP][ERCA|DISC][SHNT|HREK:SAO2][VTUB|FIO2:ANES][MINV|CO:PAP]
    [ERLO|APL:KINK][VLNG|PAP:ECO2:LVF][LVV|ECO2:MINV:KINK][HR|CVP:MINV:APL][VMCH|TPR:MINV]
    [PVS|LVV][ACO2|MINV:HR][VALV|HIST:KINK:PVS] 
  number of subsets:                     10 
  loss function:                         Log-Likelihood Loss (discrete) 
  expected loss:                         19.88102