Using custom scores in structure learning
bnlearn implements a wide selection of network scores for use in structure learning, ranging from likelihood-based to Bayesian to predictive scores (documented here and here). However, there are always new scores appearing in the literature. For one thing, developments in theoretical statistics/information theory may lead to new scores with desirable properties (or at least, somewhat convincing empirical performance). For another, learning Bayesian networks from data often requires application-specific tweaks to produce quality models: tweaks that may require modifications to standard scores to incorporate expert knowledge beyond what can be done with whitelists, blacklists and graphical priors.
We can do this in bnlearn using the "custom-score"
score. From the documentation
(here), this score has two arguments:
fun
: a function with four arguments,node
,parents
,data
andargs
.node
will contain the label of the node to be scored (a character string);parents
will contain the labels of its parents (a character vector);data
will contain the complete data set, with all the variables (a data frame); andargs
will be a list containing any additional arguments to the score.args
: a list containing the optional arguments tofun
, for tuning custom score functions.
In practice, this leads to code like the following:
> network.score = function(node, parents, data, args) { + # compute a meaningful measure of goodness-of-fit and return it. + return(42) + }#NETWORK.SCORE > > hc(learning.test, score = "custom-score", fun = network.score, args = list())
Considering that we only have access to a node
and its parents
, the
"custom-score"
is mostly limited to decomposable scores. However, it has access to the complete data (with
all the variables) and it can take any optional arguments, which makes it quite flexible.
Replicating an existing score
The "custom-score"
score can be used to reimplement any other score in bnlearn, which
comes in handy when studying their behaviour or troubleshooting issues in structure learning. Consider, for instance,
the BIC score assigned to a Gaussian Bayesian network. The local distributions of the nodes of a Gaussian Bayesian
network are linear models, which we can fit with lm()
. We can then compute the BIC of the models estimated
by lm()
with the standard BIC()
method from package stats to get the
components of the network score associated with the individual nodes.
> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > > my.bic = function(node, parents, data, args) { + if (length(parents) == 0) + model = paste(node, "~ 1") + else + model = paste(node, "~", paste(parents, collapse = "+")) + - BIC(lm(model, data = data)) / 2 + }#MY.BIC > score(dag, gaussian.test, type = "custom-score", fun = my.bic, by.node = TRUE)
A B C D E F G -7123.829 -12652.302 -3733.466 -1542.920 -10543.031 -7098.443 -10527.354
> score(dag, gaussian.test, type = "bic-g", by.node = TRUE)
A B C D E F G -7123.829 -12652.302 -3733.467 -1542.920 -10543.031 -7098.444 -10527.354
As we can see, the score values match for all the nodes, modulo some small floating point rounding errors.
Similarly, we can compute the BIC network score for a discrete Bayesian network. This involves:
- constructing the contingency table of
data[, node]
against the configurations ofdata[, parents]
(assuming thatnode
has any parents); - computing the (conditional) probability table associated with the node from the contingency table;
- counting the numbers of parameters;
- computing the log-likelihood from the (conditional) probability table and subtracting the BIC penalty term.
Again, we can match the output of the built-in BIC score.
> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > > my.bic = function(node, parents, data, args) { + n = nrow(data) + if (length(parents) == 0) { + counts = table(data[, node]) + nparams = dim(counts) - 1 + sum(counts * log(counts / n)) - nparams * log(n) / 2 + }#THEN + else { + counts = table(data[, node], configs(data[, parents, drop = FALSE])) + nparams = ncol(counts) * (nrow(counts) - 1) + sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2 + }#ELSE + }#MY.BIC > score(dag, learning.test, type = "custom-score", fun = my.bic, by.node = TRUE)
A B C D E F -5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962
> score(dag, learning.test, type = "bic", by.node = TRUE)
A B C D E F -5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962
Instrumenting network scores to debug them
There are no limitations to what we can do in the function we pass to the "custom-score"
score. Hence we
can call the score()
function from bnlearn from inside the function we pass to the
"custom-score"
score via the fun
argument, using by.node = TRUE
to make
score()
return just the score component from target node
.
> instrumented = function(node, parents, data, args) { + dummy.dag = empty.graph(c(node, parents)) + for (par in parents) + dummy.dag = set.arc(dummy.dag, from = par, to = node) + score(dummy.dag, data[, c(node, parents), drop = FALSE], by.node = TRUE)[node] + }#INSTRUMENTED > > score(dag, learning.test, type = "custom-score", fun = instrumented, by.node = TRUE)
A B C D E F -5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962
Obviously, this approach is much slower than using the built-in scores directly. However, it makes it possible to
instrument network scoring and make it either print (or log into a file) any quantities of interest or give us an
interactive shell using browser()
. We can achieve a much finer-grained control on what to log and when to
stop normal execution than just using debug()
.
For instance, we may want to investigate the behaviour of the BDeu score when there are unobserved parent configurations (spoiler: not good). However, we do not want to stop structure learning every time a node is scored, just when 1) the node has parents and 2) at least one configuration of the parents is not observed in the data. We can do it as follows.
> instrumented = function(node, parents, data, args) { + # stop if there are unobserved parents configurations. + local.data = data[, c(node, parents), drop = FALSE] + if (length(parents) > 0) { + counts = table(data[, node], configs(data[, parents, drop = FALSE])) + if (any(colSums(counts) == 0)) + browser() + }#THEN + dummy.dag = empty.graph(c(node, parents)) + for (par in parents) + dummy.dag = set.arc(dummy.dag, from = par, to = node) + score(dummy.dag, local.data, type = "bde", by.node = TRUE)[node] + }#INSTRUMENTED
Extending existing network scores
As an example, let's try to extend the BIC score for discrete Bayesian networks to handle weighted observations. The
key difference from the classic definition of BIC is that we tally the weights of all the observations that display a
particular combination of values instead of just counting them. Intuitively, the latter is equivalent to the former when
all weights are equal to 1
.
In order to do the tallying, we define the wtable()
function below to use instead of
table()
.
> wtable = function(x, y, w) { + # normalize the weights to sum up to the sample size. + w = w / sum(w) * length(w) + # tally the weights. + if (missing(y)) + tab = tapply(w, list(x), sum) + else + tab = tapply(w, list(x, y), sum) + # unobserved combinations of values are tabulated as zeros. + tab[is.na(tab)] = 0 + return(as.table(tab)) + }#WTABLE
We then replace the code that computed the contingency tables in the examples above with calls to
wtable()
, and we pass it a vector of weights through the args
argument of the
"custom-score"
score.
> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > > weighted.bic = function(node, parents, data, args) { + n = nrow(data) + w = args$weights + if (length(parents) == 0) { + counts = wtable(data[, node], w = w) + nparams = dim(counts) - 1 + sum(counts * log(counts / n)) - nparams * log(n) / 2 + }#THEN + else { + counts = wtable(data[, node], configs(data[, parents, drop = FALSE]), w = w) + nparams = ncol(counts) * (nrow(counts) - 1) + sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2 + }#ELSE + }#WEIGHTED.BIC > score(dag, learning.test, type = "custom-score", fun = weighted.bic, + args = list(weights = runif(nrow(learning.test))), by.node = TRUE)
A B C D E F -5500.107 -3742.807 -3543.516 -3536.960 -4278.355 -3469.994
Incidentally, using random weights in this fashion is what Bayesian bootstrap does instead of resampling.
As a further example, handling weights in Gaussian Bayesian networks is much easier because lm()
handles
them natively, leaving nothing for us to code from scratch.
> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > > weighted.bic = function(node, parents, data, args) { + if (length(parents) == 0) + model = as.formula(paste(node, "~ 1")) + else + model = as.formula(paste(node, "~", paste(parents, collapse = "+"))) + - BIC(lm(model, data = data, weights = args$weights)) / 2 + }#WEIGHTED.BIC > score(dag, gaussian.test, type = "custom-score", fun = weighted.bic, + args = list(weights = runif(nrow(gaussian.test))), by.node = TRUE)
A B C D E F G -7950.444 -13478.338 -4579.217 -2327.128 -11363.648 -7923.659 -11309.268
Implementing new network scores
Mixed-Effects models as local distributions
The default assumption in conditional Gaussian Bayesian networks is that of no pooling: we fit a separate linear regression model for each configuration of the discrete parents of each continuous node. The continuous parents of the node appear as regressors. In practice, this means each linear regression is estimated idependently from the others, using the subset of observations for which we observe a particular configuration of the discrete parents.
It is well known that pooling information across models can improve their goodness of fit. We will now pursue this idea by defining a new score that computes the BIC of a mixed-effects model in which discrete parents contribute a random intercept effect. (We could add random slope effects as well, but we do not to keep the example readable.) This is equivalent to sharing information between linear regressions corresponding to different configurations of the discrete parents.
Below we use the mixed-effect models implementation in the lme4 package.
> library(lme4)
> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]") > > mixed.effects.cg = function(node, parents, data, args) { + # discrete nodes only have discrete parents, same as in a discrete BN. + if (is.factor(data[, node])) + return(my.bic(node, parents, data, args)) + if (length(parents) == 0) { + # no parents, no random intercepts. + model = paste(node, "~ 1") + - BIC(lm(model, data = data)) / 2 + }#THEN + else { + # separate discrete and continuous parents... + discrete.parents = names(which(sapply(data[, parents, drop = FALSE], is.factor))) + continuous.parents = setdiff(parents, discrete.parents) + model = paste(node, "~ 1") + if (length(discrete.parents) > 0) { + # ... add discrete parents as random intercept effects... + random.intercepts = paste0("(1|", discrete.parents, ")", collapse = " + ") + model = paste(model, "+", random.intercepts) + }#THEN + if (length(continuous.parents) > 0) { + # ... and add continuous parents as fixed slope effects. + fixed.slopes = paste(continuous.parents, collapse = " + ") + model = paste(model, "+", fixed.slopes) + }#THEN + - BIC(lmer(model, data = data)) / 2 + }#ELSE + }#MIXED.EFFECTS.CG > score(dag, clgaussian.test, type = "custom-score", fun = mixed.effects.cg, by.node = TRUE)
A B C D E F G -1571.783 -5239.826 -6474.286 -1606.884 -9504.812 -2957.185 -11875.395 H 3439.182
Additive Bayesian networks
Package abn (link) implements Bayesian
networks in which the local distribution of each node is a generalized linear model. This makes it possible to model
a data containing wide variety of variables: integers (Poisson), discrete (binomial or multinomial) and continuous
(Gaussian). We can use the "custom-score"
score to do something similar: model mixed binary and continuous
data using Gaussian and binomial generalized linear models without putting any restriction on the arcs. (Conditional
linear Gaussian Bayesian networks do not allow continuous nodes to be parents of discrete nodes.)
Defining the score to be the BIC of these models, we can implement the above as follows.
> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]") > > glm.bic = function(node, parents, data, args) { + if (length(parents) == 0) + model = as.formula(paste(node, "~ 1")) + else + model = as.formula(paste(node, "~", paste(parents, collapse = "+"))) + if (is.numeric(data[, node])) + - BIC(glm(model, data = data, family = "gaussian")) / 2 + else if (is.factor(data[, node])) + - BIC(glm(model, data = data, family = "binomial")) / 2 + }#GLM.BIC > levels(clgaussian.test$C) = c("a", "b", "a", "b") > score(dag, clgaussian.test, type = "custom-score", fun = glm.bic, by.node = TRUE)
A B C D E F G -1571.783 -3388.187 -3248.857 -1593.865 -9491.675 -3357.253 -11855.905 H 3439.182
Performing structure learning with this custom score, we get a different network that that learned using the built-in
"bic-cg"
: which is expected since there is no restriction on the arcs that can be included. However, note
that the only additional arc that is included, C
→ F
, in fact connects two discrete
nodes.
> dag.glm = hc(clgaussian.test, score = "custom-score", fun = glm.bic) > dag.cglbn = hc(clgaussian.test, score = "bic-cg") > modelstring(dag.glm)
[1] "[A][B][C][H][D|A:H][F|B][E|B:D][G|A:D:E:F]"
> modelstring(dag.cglbn)
[1] "[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]"
The oracle score
The oracle score is a theoretical device to evaluate structure learning algorithms. It represents a state of perfect knowledge: it learns new arcs only if they appear in the true network structure. Therefore, it is a score that never commits any errors by completely avoiding false positive and false negative inclusions.
We can implement it as follows.
> oracle.score = function(node, parents, data, args) { + if (length(parents) == 0) + return(0) + else if (all(parents %in% parents(args$dag, node))) + return(as.numeric(length(parents))) + else + return(-1) + }#ORACLE.SCORE > > true.dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > hc(gaussian.test, score = "custom-score", fun = oracle.score, + args = list(dag = true.dag))
Mon Aug 5 02:40:27 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.