Score functions: computing & comparing
Loading the reference data sets
First, we load the learning.test
and gaussian.test
data sets shipped
with bnlearn.
> library(bnlearn) > data(learning.test) > data(gaussian.test)
The true network structure for the learning.test
data, described in the
manual page, is the following:
> learn.net = empty.graph(names(learning.test)) > modelstring(learn.net) = "[A][C][F][B|A][D|A:C][E|B:F]" > learn.net
Random/Generated Bayesian network model: [A][C][F][B|A][D|A:C][E|B:F] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 generation algorithm: Empty
See the relevant examples for an overview on how bn
objects
are created using modelstring()
and other functions. Similarly, we create a
bn
object for the true structure of gaussian.test
(described
here).
> gauss.net = empty.graph(names(gaussian.test)) > modelstring(gauss.net) = "[A][B][E][G][C|A:B][D|B][F|A:D:E:G]" > gauss.net
Random/Generated Bayesian network model: [A][B][E][G][C|A:B][D|B][F|A:D:E:G] nodes: 7 arcs: 7 undirected arcs: 0 directed arcs: 7 average markov blanket size: 4.00 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty
Both networks can be correctly learned by all the learning algorithms implemented in bnlearn, and provide one discrete and one continuous test case.
Computing a network score
We can compute the network score of a particular graph for a particular data set with the
score()
function (manual); if the
score function is not specified, the BIC score is returned for both continuous and discrete data.
> score(learn.net, learning.test)
[1] -24006.73
> score(gauss.net, gaussian.test)
[1] -53221.35
Other score functions can be used by changing the type
, as documented in the manual
page of the function.
> score(learn.net, learning.test, type = "bic")
[1] -24006.73
> score(learn.net, learning.test, type = "aic")
[1] -23873.13
> score(learn.net, learning.test, type = "bde")
[1] -24028.09
Many of these score have additional, optional arguments whose defaults are chosen based on
literature golden standards and/or the data at hand. The most commonly used is the
imaginary sample size (iss
) of BDe and BGe, which is typically set to a very
small value to reduce the relative weight of the prior distribution.
> score(learn.net, learning.test, type = "bde", iss = 1)
[1] -24028.09
> score(gauss.net, gaussian.test, type = "bge", iss = 3)
## Warning in check.unused.args(extra.args, score.extra.args[[score]]): unused ## argument(s): 'iss'.
[1] -53258.94
Several check are run to ensure the network, the data and the test make sense with respect to each other:
- the number of variables in the network and in the data must be the same, although the order
is not important.
> score(learn.net, learning.test[, -1])
## Error: the network and the data have different numbers of variables.
> score(learn.net, learning.test[, sample(1:6)])
[1] -24006.73
- the names of the variables must match as well.
> renamed.test = learning.test > names(renamed.test) = LETTERS[11:16] > score(learn.net, renamed.test)
## Error: the variables in the data and in the network do not match.
- the data must satisfy the assumptions of the network score (e.g. discrete data and
discrete score, continuous data and Gaussian score, etc.).
> score(learn.net, learning.test, type = "loglik-g")
## Error: score 'loglik-g' may only be used with continuous data.
- the label of the score function must be valid; if it is not, valid scores are listed in the
error message.
> score(learn.net, learning.test, type = "???")
## Error: valid score(s) are "loglik" (Log-Likelihood (disc.)), "aic" (AIC (disc.)), "bic" (BIC (disc.)), "ebic" (eBIC (disc.)), "pred-loglik" (Predictive Log-Likelihood (disc.)), "fnml" (Factorized Normalized Maximum Likelihood), "qnml" (Quotient Normalized Maximum Likelihood), "nal" (Node-Average Likelihood (disc.)), "pnal" (Penalized Node-Average Likelihood (disc.)), "bde" (Bayesian Dirichlet (BDe)), "bds" (Bayesian Dirichlet Sparse (BDs)), "bdj" (Bayesian Dirichlet, Jeffrey's prior), "k2" (Cooper & Herskovits' K2), "mbde" (Bayesian Dirichlet (interventional data)), "bdla" (Bayesian Dirichlet, Locally Averaged), "loglik-g" (Log-Likelihood (Gauss.)), "aic-g" (AIC (Gauss.)), "bic-g" (BIC (Gauss.)), "ebic-g" (eBIC (Gauss.)), "pred-loglik-g" (Predictive Log-Likelihood (Gauss.)), "nal-g" (Node-Average Likelihood (Gauss.)), "pnal-g" (Penalized Node-Average Likelihood (Gauss.)), "bge" (Bayesian Gaussian (BGe)), "loglik-cg" (Log-Likelihood (cond. Gauss.)), "aic-cg" (AIC (cond. Gauss.)), "bic-cg" (BIC (cond. Gauss.)), "ebic-cg" (eBIC (cond. Gauss.)), "pred-loglik-cg" (Predictive Log-Likelihood (cond. Gauss.)), "nal-cg" (Node-Average Likelihood (cond. Gauss.)), "pnal-cg" (Penalized Node-Average Likelihood (cond. Gauss.)), "custom-score" (User-Provided Score Function). See ?bnlearn-package for details.
Testing score equivalence
Arcs whose direction does not influence the v-structures present in the network structure are said to be score equivalent, because their reversal does not alter the score of the network (with the notable exceptions of K2 and BDe/BGe with prior other than the uniform). Usually these arcs are not oriented in the networks learned with constraint-based algorithms.
In gaussian.test
the only score equivalent arc is D
→
B
, as shown below.
> eq.net = set.arc(gauss.net, "D", "B") > score(gauss.net, gaussian.test, type = "bic-g")
[1] -53221.35
> score(eq.net, gaussian.test, type = "bic-g")
[1] -53221.35
Equivalently, we can check that gauss.net
and eq.net
belong to the
same equivalence class (represented by a CPDAG, see the
manual).
> all.equal(cpdag(gauss.net), cpdag(eq.net))
[1] TRUE
Changing the direction of any other arc alters the score of the network; for example reversing
B
→ C
lowers the BIC by 6788.81
.
> noneq1.net = set.arc(gauss.net, from = "B", to = "C") > score(noneq1.net, gaussian.test, type = "bic-g")
[1] -53221.35
> noneq2.net = set.arc(gauss.net, from = "C", to = "B") > score(noneq2.net, gaussian.test, type = "bic-g")
[1] -60010.16
Mon Aug 5 02:50:41 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.