## Using custom scores in structure learning

**bnlearn** implements a wide selection of network scores for use in structure learning, ranging from
likelihood-based to Bayesian to predictive scores (documented here and
here). However, there are always new scores appearing in the
literature. For one thing, developments in theoretical statistics/information theory may lead to new scores with
desirable properties (or at least, somewhat convincing empirical performance). For another, learning Bayesian networks
from data often requires application-specific tweaks to produce quality models: tweaks that may require modifications to
standard scores to incorporate expert knowledge beyond what can be done with whitelists, blacklists and graphical
priors.

We can do this in **bnlearn** using the `"custom"`

score. From the documentation
(here), this score has two arguments:

`fun`

: a function with four arguments,`node`

,`parents`

,`data`

and`args`

.`node`

will contain the label of the node to be scored (a character string);`parents`

will contain the labels of its parents (a character vector);`data`

will contain the complete data set, with all the variables (a data frame); and`args`

will be a list containing any additional arguments to the score.`args`

: a list containing the optional arguments to`fun`

, for tuning custom score functions.

In practice, this leads to code like the following:

> network.score = function(node, parents, data, args) { + # compute a meaningful measure of goodness-of-fit and return it. + return(42) + }#NETWORK.SCORE > > hc(learning.test, score = "custom", fun = network.score, args = list())

Considering that we only have access to a `node`

and its `parents`

, the `"custom"`

is mostly limited to decomposable scores. However, it has access to the complete data (with all the variables) and it
can take any optional arguments, which makes it quite flexible.

### Replicating an existing score

The `"custom"`

score can be used to reimplement any other score in **bnlearn**, which comes
in handy when studying their behaviour or troubleshooting issues in structure learning. Consider, for instance, the BIC
score assigned to a Gaussian Bayesian network. The local distributions of the nodes of a Gaussian Bayesian network are
linear models, which we can fit with `lm()`

. We can then compute the BIC of the models estimated by
`lm()`

with the standard `BIC()`

method from package **stats** to get the
components of the network score associated with the individual nodes.

> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > > my.bic = function(node, parents, data, args) { + if (length(parents) == 0) + model = paste(node, "~ 1") + else + model = paste(node, "~", paste(parents, collapse = "+")) + - BIC(lm(model, data = data)) / 2 + }#MY.BIC > score(dag, gaussian.test, type = "custom", fun = my.bic, by.node = TRUE)

A B C D E F G -7123.829 -12652.302 -3733.466 -1542.920 -10543.031 -7098.443 -10527.354

> score(dag, gaussian.test, type = "bic-g", by.node = TRUE)

A B C D E F G -7123.829 -12652.302 -3733.467 -1542.920 -10543.031 -7098.444 -10527.354

As we can see the score values match for all the nodes, modulo some small floating point rounding errors.

Similarly, we can compute the BIC network score for a discrete Bayesian network. This involves:

- constructing the contingency table of
`data[, node]`

against the configurations of`data[, parents]`

(assuming that`node`

has any parents); - computing the (conditional) probability table associated with the node from the contingency table;
- counting the numbers of parameters;
- computing the log-likelihood from the (conditional) probability table and subtracting the BIC penalty term.

Again, we can match the output of the built-in BIC score.

> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > > my.bic = function(node, parents, data, args) { + n = nrow(data) + if (length(parents) == 0) { + counts = table(data[, node]) + nparams = dim(counts) - 1 + sum(counts * log(counts / n)) - nparams * log(n) / 2 + }#THEN + else { + counts = table(data[, node], configs(data[, parents, drop = FALSE])) + nparams = ncol(counts) * (nrow(counts) - 1) + sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2 + }#ELSE + }#MY.BIC > score(dag, learning.test, type = "custom", fun = my.bic, by.node = TRUE)

A B C D E F -5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962

> score(dag, learning.test, type = "bic", by.node = TRUE)

A B C D E F -5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962

### Instrumenting network scores to debug them

There are no limitations to what we can do in the function we pass to the `"custom"`

score. Hence we can
call the `score()`

function from **bnlearn** from inside the function we pass to the
`"custom"`

score via the `fun`

argument, using `by.node = TRUE`

to make
`score()`

return just the score component from target `node`

.

> instrumented = function(node, parents, data, args) { + dummy.dag = empty.graph(c(node, parents)) + for (par in parents) + dummy.dag = set.arc(dummy.dag, from = par, to = node) + score(dummy.dag, data[, c(node, parents), drop = FALSE], by.node = TRUE)[node] + }#INSTRUMENTED > > score(dag, learning.test, type = "custom", fun = instrumented, by.node = TRUE)

A B C D E F -5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962

Obviously, this approach is much slower than using the built-in scores directly. However, it makes it possible to
instrument network scoring and make it either print (or log into a file) any quantities of interest or give us an
interactive shell using `browser()`

. We can achieve a much finer-grained control on what to log and when to
stop normal execution than just using `debug()`

.

For instance, we may want to investigate the behaviour of the BDeu score when there are unobserved parent configurations (spoiler: not good). However, we do not want to stop structure learning every time a node is scored, just when 1) the node has parents and 2) at least one configuration of the parents is not observed in the data. We can do it as follows.

> instrumented = function(node, parents, data, args) { + # stop if there are unobserved parents configurations. + local.data = data[, c(node, parents), drop = FALSE] + if (length(parents) > 0) { + counts = table(data[, node], configs(data[, parents, drop = FALSE])) + if (any(colSums(counts) == 0)) + browser() + }#THEN + dummy.dag = empty.graph(c(node, parents)) + for (par in parents) + dummy.dag = set.arc(dummy.dag, from = par, to = node) + score(dummy.dag, local.data, type = "bde", by.node = TRUE)[node] + }#INSTRUMENTED

### Extending existing network scores

As an example, let's try to extend the BIC score for discrete Bayesian networks to handle weighted observations. The
key difference from the classic definition of BIC is that we tally the weights of all the observations that display a
particular combination of values instead of just counting them. Intuitively, the latter is equivalent to the former when
all weights are equal to `1`

.

In order to do the tallying, we define the `wtable()`

function below to use instead of
`table()`

.

> wtable = function(x, y, w) { + # normalize the weights to sum up to the sample size. + w = w / sum(w) * length(w) + # tally the weights. + if (missing(y)) + tab = tapply(w, list(x), sum) + else + tab = tapply(w, list(x, y), sum) + # unobserved combinations of values are tabulated as zeros. + tab[is.na(tab)] = 0 + return(as.table(tab)) + }#WTABLE

We then replace the code that computed the contingency tables in the examples above with calls to
`wtable()`

, and we pass it a vector of weights through the `args`

argument of the
`"custom"`

score.

> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > > weighted.bic = function(node, parents, data, args) { + n = nrow(data) + w = args$weights + if (length(parents) == 0) { + counts = wtable(data[, node], w = w) + nparams = dim(counts) - 1 + sum(counts * log(counts / n)) - nparams * log(n) / 2 + }#THEN + else { + counts = wtable(data[, node], configs(data[, parents, drop = FALSE]), w = w) + nparams = ncol(counts) * (nrow(counts) - 1) + sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2 + }#ELSE + }#WEIGHTED.BIC > score(dag, learning.test, type = "custom", fun = weighted.bic, + args = list(weights = runif(nrow(learning.test))), by.node = TRUE)

A B C D E F -5501.570 -3643.557 -3504.093 -3518.797 -4296.884 -3469.656

Incidentally, using random weights in this fashion is what Bayesian bootstrap does instead of resampling.

As a further example, handling weights in Gaussian Bayesian networks is much easier because `lm()`

handles
them natively, leaving nothing for us to code from scratch.

> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > > weighted.bic = function(node, parents, data, args) { + if (length(parents) == 0) + model = as.formula(paste(node, "~ 1")) + else + model = as.formula(paste(node, "~", paste(parents, collapse = "+"))) + - BIC(lm(model, data = data, weights = args$weights)) / 2 + }#WEIGHTED.BIC > score(dag, gaussian.test, type = "custom", fun = weighted.bic, + args = list(weights = runif(nrow(gaussian.test))), by.node = TRUE)

A B C D E F G -7921.868 -13498.380 -4594.342 -2350.159 -11319.088 -7887.322 -11332.969

### Implementing new network scores

#### Mixed-Effects models as local distributions

The default assumption in conditional Gaussian Bayesian networks is that of *no pooling*: we fit a separate
linear regression model for each configuration of the discrete parents of each continuous node. The continuous parents
of the node appear as regressors. In practice, this means each linear regression is estimated idependently from the
others, using the subset of observations for which we observe a particular configuration of the discrete parents.

It is well known that *pooling* information across models can improve their goodness of fit. We will now
pursue this idea by defining a new score that computes the BIC of a mixed-effects model in which discrete parents
contribute a random intercept effect. (We could add random slope effects as well, but we do not to keep the example
readable.) This is equivalent to sharing information between linear regressions corresponding to different
configurations of the discrete parents.

Below we use the mixed-effect models implementation in the **lme4** package.

> library(lme4)

> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]") > > mixed.effects.cg = function(node, parents, data, args) { + # discrete nodes only have discrete parents, same as in a discrete BN. + if (is.factor(data[, node])) + return(my.bic(node, parents, data, args)) + if (length(parents) == 0) { + # no parents, no random intercepts. + model = paste(node, "~ 1") + - BIC(lm(model, data = data)) / 2 + }#THEN + else { + # separate discrete and continuous parents... + discrete.parents = names(which(sapply(data[, parents, drop = FALSE], is.factor))) + continuous.parents = setdiff(parents, discrete.parents) + model = paste(node, "~ 1") + if (length(discrete.parents) > 0) { + # ... add discrete parents as random intercept effects... + random.intercepts = paste0("(1|", discrete.parents, ")", collapse = " + ") + model = paste(model, "+", random.intercepts) + }#THEN + if (length(continuous.parents) > 0) { + # ... and add continuous parents as fixed slope effects. + fixed.slopes = paste(continuous.parents, collapse = " + ") + model = paste(model, "+", fixed.slopes) + }#THEN + - BIC(lmer(model, data = data)) / 2 + }#ELSE + }#MIXED.EFFECTS.CG > score(dag, clgaussian.test, type = "custom", fun = mixed.effects.cg, by.node = TRUE)

A B C D E F G -1571.783 -5239.826 -6474.286 -1606.884 -9504.812 -2957.185 -11875.395 H 3439.182

#### Additive Bayesian networks

Package **abn** (link) implements Bayesian
networks in which the local distribution of each node is a generalized linear model. This makes it possible to model
a data containing wide variety of variables: integers (Poisson), discrete (binomial or multinomial) and continuous
(Gaussian). We can use the `"custom"`

score to do something similar: model mixed binary and continuous data
using Gaussian and binomial generalized linear models without putting any restriction on the arcs. (Conditional linear
Gaussian Bayesian networks do not allow continuous nodes to be parents of discrete nodes.)

Defining the score to be the BIC of these models, we can implement the above as follows.

> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]") > > glm.bic = function(node, parents, data, args) { + if (length(parents) == 0) + model = as.formula(paste(node, "~ 1")) + else + model = as.formula(paste(node, "~", paste(parents, collapse = "+"))) + if (is.numeric(data[, node])) + - BIC(glm(model, data = data, family = "gaussian")) / 2 + else if (is.factor(data[, node])) + - BIC(glm(model, data = data, family = "binomial")) / 2 + }#GLM.BIC > levels(clgaussian.test$C) = c("a", "b", "a", "b") > score(dag, clgaussian.test, type = "custom", fun = glm.bic, by.node = TRUE)

A B C D E F G -1571.783 -3388.187 -3248.857 -1593.865 -9491.675 -3357.253 -11855.905 H 3439.182

Performing structure learning with this custom score, we get a different network that that learned using the built-in `"bic-cg"`

: which is expected since there is no restriction on the arcs that can be included.
However, note that the only additional arc that is included, `C`

→ `F`

, in fact connects two
discrete nodes.

> dag.glm = hc(clgaussian.test, score = "custom", fun = glm.bic) > dag.cglbn = hc(clgaussian.test, score = "bic-cg") > modelstring(dag.glm)

[1] "[A][B][C][H][D|A:H][F|B][E|B:D][G|A:D:E:F]"

> modelstring(dag.cglbn)

[1] "[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]"

`Tue Nov 8 16:03:03 2022`

with **bnlearn**

`4.9-20221107`

and `R version 4.2.2 (2022-10-31)`

.