Bayesian networks and cross-validation
Choosing a Bayesian network learning strategy
Cross-validation provides a standard way to obtain unbiased estimates of a model's goodness of fit. By comparing such measures for different learning strategies (different combinations of learning algorithms, fitting techniques and the respective parameters) the optimal one for the data at hand can be chosen in a principled way.
bnlearn implements cross-validation in the bn.cv
function (documented here, which
requires the following parameters:
- the data set the Bayesian network will be estimated from.
- the structure learning algorithm to be used.
- the loss function used to evaluate the goodness of fit of the learned network.
- the fitting method (Bayesian or Maximum Likelihood) to be used to fit the parameters.
Additional arguments for all these components can be specified as well, providing absolute control of the learning strategy.
Let's for example compare the performance of three different scores in a hill climbing search on the ALARM data. The loss funtion is the so-called log-likelihood loss, which is the negated expected loglikelihood; so lower values are better.
> bn.cv(alarm, "hc", algorithm.args = list(score = "aic")) k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of subsets: 10 loss function: Log-Likelihood Loss (discrete) expected loss: 10.81962 > bn.cv(alarm, "hc", algorithm.args = list(score = "bde", iss = 10)) k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of subsets: 10 loss function: Log-Likelihood Loss (discrete) expected loss: 10.82288 > bn.cv(alarm, "hc", algorithm.args = list(score = "bic")) k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of subsets: 10 loss function: Log-Likelihood Loss (discrete) expected loss: 10.87353
The degraded performance of the bde score can be attributed to
the value chosen for iss; the default one seems to work better.
> bn.cv(alarm, "hc", algorithm.args = list(score = "bde")) k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of subsets: 10 loss function: Log-Likelihood Loss (discrete) expected loss: 10.81996
Tipically 10-fold cross-validation is performed 10 times for each learning strategy, and the resulting sets of loss values compared by plotting them as boxplots.
Comparing different network structures
Another use of the bn.cv function is to compare the goodness of
fit of different network structures; using the same data the netork were estimated
from produces optimistically biased results.
Let's start with the most obvious comparison: the true network structure against the empty one.
> res = empty.graph(names(alarm))
> modelstring(res) = paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]",
+ "[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]",
+ "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]",
+ "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]",
+ "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]",
+ "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]",
+ "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "")
> bn.cv(alarm, res, algorithm.args = list(score = "aic"))
k-fold cross-validation for Bayesian networks
target network structure:
[FIO2][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][ERLO][ERCA][HIST|LVF]
[TPR|APL][PAP|PMB][LVV|HYP:LVF][STKV|HYP:LVF][SHNT|PMB:INT][VMCH|MVS][CVP|LVV]
[PCWP|LVV][VTUB|DISC:VMCH][PRSS|INT:KINK:VTUB][VLNG|INT:KINK:VTUB][MINV|INT:VLNG]
[VALV|INT:VLNG][PVS|FIO2:VALV][ACO2|VALV][SAO2|SHNT:PVS][ECO2|ACO2:VLNG]
[CCHL|TPR:SAO2:ANES:ACO2][HR|CCHL][CO|STKV:HR][HRBP|ERLO:HR][HREK|HR:ERCA]
[HRSA|HR:ERCA][BP|TPR:CO]
number of subsets: 10
loss function: Log-Likelihood Loss (discrete)
expected loss: 10.81905
> bn.cv(alarm, empty.graph(names(alarm)))
k-fold cross-validation for Bayesian networks
target network structure:
[CVP][PCWP][HIST][TPR][BP][CO][HRBP][HREK][HRSA][PAP][SAO2][FIO2][PRSS][ECO2]
[MINV][MVS][HYP][LVF][APL][ANES][PMB][INT][KINK][DISC][LVV][STKV][CCHL][ERLO]
[HR][ERCA][SHNT][PVS][ACO2][VALV][VLNG][VTUB][VMCH]
number of subsets: 10
loss function: Log-Likelihood Loss (discrete)
expected loss: 21.46784
As expected the first one behaves a lot better the second one. Let's now try a third network, generated at random.
> rand = random.graph(names(alarm))
> rand
Randomly generated Bayesian network
model:
[CVP][PCWP][HIST][TPR][CO][HRBP][HREK][HRSA][SAO2][FIO2][PRSS][ECO2][HYP][ANES]
[PMB][INT][DISC][BP|TPR][PAP|HREK:HRSA][MVS|TPR:ECO2][LVF|HYP][APL|SAO2][KINK|ANES]
[STKV|HREK][CCHL|HREK:HYP][ERCA|DISC][SHNT|HREK:SAO2][VTUB|FIO2:ANES][MINV|CO:PAP]
[ERLO|APL:KINK][VLNG|PAP:ECO2:LVF][LVV|ECO2:MINV:KINK][HR|CVP:MINV:APL][VMCH|TPR:MINV]
[PVS|LVV][ACO2|MINV:HR][VALV|HIST:KINK:PVS]
nodes: 37
arcs: 37
undirected arcs: 0
directed arcs: 37
average markov blanket size: 3.08
average neighbourhood size: 2.00
average branching factor: 1.00
generation algorithm: Full Ordering
arc sampling probability: 0.05555556
> bn.cv(alarm, rand)
k-fold cross-validation for Bayesian networks
target network structure:
[CVP][PCWP][HIST][TPR][CO][HRBP][HREK][HRSA][SAO2][FIO2][PRSS][ECO2][HYP][ANES]
[PMB][INT][DISC][BP|TPR][PAP|HREK:HRSA][MVS|TPR:ECO2][LVF|HYP][APL|SAO2][KINK|ANES]
[STKV|HREK][CCHL|HREK:HYP][ERCA|DISC][SHNT|HREK:SAO2][VTUB|FIO2:ANES][MINV|CO:PAP]
[ERLO|APL:KINK][VLNG|PAP:ECO2:LVF][LVV|ECO2:MINV:KINK][HR|CVP:MINV:APL][VMCH|TPR:MINV]
[PVS|LVV][ACO2|MINV:HR][VALV|HIST:KINK:PVS]
number of subsets: 10
loss function: Log-Likelihood Loss (discrete)
expected loss: 19.88102