bnlearn - Using custom scores in structure learning

Using custom scores in structure learning

bnlearn implements a wide selection of network scores for use in structure learning, ranging from likelihood-based to Bayesian to predictive scores (documented here and here). However, there are always new scores appearing in the literature. For one thing, developments in theoretical statistics/information theory may lead to new scores with desirable properties (or at least, somewhat convincing empirical performance). For another, learning Bayesian networks from data often requires application-specific tweaks to produce quality models: tweaks that may require modifications to standard scores to incorporate expert knowledge beyond what can be done with whitelists, blacklists and graphical priors.

We can do this in bnlearn using the "custom-score" score. From the documentation (here), this score has two arguments:

fun: a function with four arguments, node, parents, data and args. node will contain the label of the node to be scored (a character string); parents will contain the labels of its parents (a character vector); data will contain the complete data set, with all the variables (a data frame); and args will be a list containing any additional arguments to the score.
args: a list containing the optional arguments to fun, for tuning custom score functions.

In practice, this leads to code like the following:

> network.score = function(node, parents, data, args) {

+   # compute a meaningful measure of goodness-of-fit and return it.
+   return(42)

+ }#NETWORK.SCORE
> 
> hc(learning.test, score = "custom-score", fun = network.score, args = list())

Considering that we only have access to a node and its parents, the "custom-score" is mostly limited to decomposable scores. However, it has access to the complete data (with all the variables) and it can take any optional arguments, which makes it quite flexible.

Replicating an existing score

The "custom-score" score can be used to reimplement any other score in bnlearn, which comes in handy when studying their behaviour or troubleshooting issues in structure learning. Consider, for instance, the BIC score assigned to a Gaussian Bayesian network. The local distributions of the nodes of a Gaussian Bayesian network are linear models, which we can fit with lm(). We can then compute the BIC of the models estimated by lm() with the standard BIC() method from package stats to get the components of the network score associated with the individual nodes.

> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]")
> 
> my.bic = function(node, parents, data, args) {

+   if (length(parents) == 0)
+     model = paste(node, "~ 1")
+   else
+     model = paste(node, "~", paste(parents, collapse = "+"))

+   - BIC(lm(model, data = data)) / 2

+ }#MY.BIC
> score(dag, gaussian.test, type = "custom-score", fun = my.bic, by.node = TRUE)

         A          B          C          D          E          F          G 
 -7123.829 -12652.302  -3733.466  -1542.920 -10543.031  -7098.443 -10527.354

> score(dag, gaussian.test, type = "bic-g", by.node = TRUE)

         A          B          C          D          E          F          G 
 -7123.829 -12652.302  -3733.467  -1542.920 -10543.031  -7098.444 -10527.354

As we can see, the score values match for all the nodes, modulo some small floating point rounding errors.

Similarly, we can compute the BIC network score for a discrete Bayesian network. This involves:

constructing the contingency table of data[, node] against the configurations of data[, parents] (assuming that node has any parents);
computing the (conditional) probability table associated with the node from the contingency table;
counting the numbers of parameters;
computing the log-likelihood from the (conditional) probability table and subtracting the BIC penalty term.

Again, we can match the output of the built-in BIC score.

> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]")
> 
> my.bic = function(node, parents, data, args) {

+   n = nrow(data)

+   if (length(parents) == 0) {

+     counts = table(data[, node])
+     nparams = dim(counts) - 1
+     sum(counts * log(counts / n)) - nparams * log(n) / 2

+   }#THEN
+   else {

+     counts = table(data[, node], configs(data[, parents, drop = FALSE]))
+     nparams = ncol(counts) * (nrow(counts) - 1)
+     sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2

+   }#ELSE

+ }#MY.BIC
> score(dag, learning.test, type = "custom-score", fun = my.bic, by.node = TRUE)

        A         B         C         D         E         F 
-5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962

> score(dag, learning.test, type = "bic", by.node = TRUE)

        A         B         C         D         E         F 
-5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962

Instrumenting network scores to debug them

There are no limitations to what we can do in the function we pass to the "custom-score" score. Hence we can call the score() function from bnlearn from inside the function we pass to the "custom-score" score via the fun argument, using by.node = TRUE to make score() return just the score component from target node.

> instrumented = function(node, parents, data, args) {

+   dummy.dag = empty.graph(c(node, parents))
+   for (par in parents)
+     dummy.dag = set.arc(dummy.dag, from = par, to = node)
+   score(dummy.dag, data[, c(node, parents), drop = FALSE], by.node = TRUE)[node]

+ }#INSTRUMENTED
> 
> score(dag, learning.test, type = "custom-score", fun = instrumented, by.node = TRUE)

        A         B         C         D         E         F 
-5501.568 -3686.922 -3501.199 -3544.561 -4302.522 -3469.962

Obviously, this approach is much slower than using the built-in scores directly. However, it makes it possible to instrument network scoring and make it either print (or log into a file) any quantities of interest or give us an interactive shell using browser(). We can achieve a much finer-grained control on what to log and when to stop normal execution than just using debug().

For instance, we may want to investigate the behaviour of the BDeu score when there are unobserved parent configurations (spoiler: not good). However, we do not want to stop structure learning every time a node is scored, just when 1) the node has parents and 2) at least one configuration of the parents is not observed in the data. We can do it as follows.

> instrumented = function(node, parents, data, args) {

+   # stop if there are unobserved parents configurations.
+   local.data = data[, c(node, parents), drop = FALSE]

+   if (length(parents) > 0) {

+     counts = table(data[, node], configs(data[, parents, drop = FALSE]))

+     if (any(colSums(counts) == 0))
+       browser()

+   }#THEN

+   dummy.dag = empty.graph(c(node, parents))
+   for (par in parents)
+     dummy.dag = set.arc(dummy.dag, from = par, to = node)
+   score(dummy.dag, local.data, type = "bde", by.node = TRUE)[node]

+ }#INSTRUMENTED

Extending existing network scores

As an example, let's try to extend the BIC score for discrete Bayesian networks to handle weighted observations. The key difference from the classic definition of BIC is that we tally the weights of all the observations that display a particular combination of values instead of just counting them. Intuitively, the latter is equivalent to the former when all weights are equal to 1.

In order to do the tallying, we define the wtable() function below to use instead of table().

> wtable = function(x, y, w) {

+   # normalize the weights to sum up to the sample size.
+   w = w / sum(w) * length(w)
+   # tally the weights.
+   if (missing(y))
+     tab = tapply(w, list(x), sum)
+   else
+     tab = tapply(w, list(x, y), sum)
+   # unobserved combinations of values are tabulated as zeros.
+   tab[is.na(tab)] = 0

+   return(as.table(tab))

+ }#WTABLE

We then replace the code that computed the contingency tables in the examples above with calls to wtable(), and we pass it a vector of weights through the args argument of the "custom-score" score.

> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]")
> 
> weighted.bic = function(node, parents, data, args) {

+   n = nrow(data)
+   w = args$weights

+   if (length(parents) == 0) {

+     counts = wtable(data[, node], w = w)
+     nparams = dim(counts) - 1
+     sum(counts * log(counts / n)) - nparams * log(n) / 2

+   }#THEN
+   else {

+     counts = wtable(data[, node], configs(data[, parents, drop = FALSE]), w = w)
+     nparams = ncol(counts) * (nrow(counts) - 1)
+     sum(counts * log(prop.table(counts, 2))) - nparams * log(n) / 2

+   }#ELSE

+ }#WEIGHTED.BIC
> score(dag, learning.test, type = "custom-score", fun = weighted.bic,
+   args = list(weights = runif(nrow(learning.test))), by.node = TRUE)

        A         B         C         D         E         F 
-5500.107 -3742.807 -3543.516 -3536.960 -4278.355 -3469.994

Incidentally, using random weights in this fashion is what Bayesian bootstrap does instead of resampling.

As a further example, handling weights in Gaussian Bayesian networks is much easier because lm() handles them natively, leaving nothing for us to code from scratch.

> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]")
> 
> weighted.bic = function(node, parents, data, args) {

+   if (length(parents) == 0)
+     model = as.formula(paste(node, "~ 1"))
+   else
+     model = as.formula(paste(node, "~", paste(parents, collapse = "+")))

+   - BIC(lm(model, data = data, weights = args$weights)) / 2

+ }#WEIGHTED.BIC
> score(dag, gaussian.test, type = "custom-score", fun = weighted.bic,
+   args = list(weights = runif(nrow(gaussian.test))), by.node = TRUE)

         A          B          C          D          E          F          G 
 -7950.444 -13478.338  -4579.217  -2327.128 -11363.648  -7923.659 -11309.268

Implementing new network scores

Mixed-Effects models as local distributions

The default assumption in conditional Gaussian Bayesian networks is that of no pooling: we fit a separate linear regression model for each configuration of the discrete parents of each continuous node. The continuous parents of the node appear as regressors. In practice, this means each linear regression is estimated idependently from the others, using the subset of observations for which we observe a particular configuration of the discrete parents.

It is well known that pooling information across models can improve their goodness of fit. We will now pursue this idea by defining a new score that computes the BIC of a mixed-effects model in which discrete parents contribute a random intercept effect. (We could add random slope effects as well, but we do not to keep the example readable.) This is equivalent to sharing information between linear regressions corresponding to different configurations of the discrete parents.

Below we use the mixed-effect models implementation in the lme4 package.

> library(lme4)

Loading required package: Matrix

> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]")
> 
> mixed.effects.cg = function(node, parents, data, args) {

+   # discrete nodes only have discrete parents, same as in a discrete BN.
+   if (is.factor(data[, node]))
+     return(my.bic(node, parents, data, args))

+   if (length(parents) == 0) {

+     # no parents, no random intercepts.
+     model = paste(node, "~ 1")
+     - BIC(lm(model, data = data)) / 2

+   }#THEN
+   else {

+     # separate discrete and continuous parents...
+     discrete.parents = names(which(sapply(data[, parents, drop = FALSE], is.factor)))
+     continuous.parents = setdiff(parents, discrete.parents)

+     model =  paste(node, "~ 1")
+     if (length(discrete.parents) > 0) {

+       # ... add discrete parents as random intercept effects...
+       random.intercepts = paste0("(1|", discrete.parents, ")", collapse = " + ")
+       model = paste(model, "+", random.intercepts)

+     }#THEN
+     if (length(continuous.parents) > 0) {

+       # ... and add continuous parents as fixed slope effects.
+       fixed.slopes = paste(continuous.parents, collapse = " + ")
+       model = paste(model, "+", fixed.slopes)

+     }#THEN

+     - BIC(lmer(model, data = data)) / 2

+   }#ELSE

+ }#MIXED.EFFECTS.CG
> score(dag, clgaussian.test, type = "custom-score", fun = mixed.effects.cg, by.node = TRUE)

         A          B          C          D          E          F          G 
 -1571.783  -5239.826  -6474.286  -1606.884  -9504.812  -2957.185 -11875.395 
         H 
  3439.182

Additive Bayesian networks

Package abn (link) implements Bayesian networks in which the local distribution of each node is a generalized linear model. This makes it possible to model a data containing wide variety of variables: integers (Poisson), discrete (binomial or multinomial) and continuous (Gaussian). We can use the "custom-score" score to do something similar: model mixed binary and continuous data using Gaussian and binomial generalized linear models without putting any restriction on the arcs. (Conditional linear Gaussian Bayesian networks do not allow continuous nodes to be parents of discrete nodes.)

Defining the score to be the BIC of these models, we can implement the above as follows.

> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]")
> 
> glm.bic = function(node, parents, data, args) {

+   if (length(parents) == 0)
+     model = as.formula(paste(node, "~ 1"))
+   else
+     model = as.formula(paste(node, "~", paste(parents, collapse = "+")))

+   if (is.numeric(data[, node]))
+     - BIC(glm(model, data = data, family = "gaussian")) / 2
+   else if (is.factor(data[, node]))
+     - BIC(glm(model, data = data, family = "binomial")) / 2

+ }#GLM.BIC
> levels(clgaussian.test$C) = c("a", "b", "a", "b")
> score(dag, clgaussian.test, type = "custom-score", fun = glm.bic, by.node = TRUE)

         A          B          C          D          E          F          G 
 -1571.783  -3388.187  -3248.857  -1593.865  -9491.675  -3357.253 -11855.905 
         H 
  3439.182

Performing structure learning with this custom score, we get a different network that that learned using the built-in "bic-cg": which is expected since there is no restriction on the arcs that can be included. However, note that the only additional arc that is included, C → F, in fact connects two discrete nodes.

> dag.glm = hc(clgaussian.test, score = "custom-score", fun = glm.bic)
> dag.cglbn = hc(clgaussian.test, score = "bic-cg")
> modelstring(dag.glm)

[1] "[A][B][C][H][D|A:H][F|B][E|B:D][G|A:D:E:F]"

> modelstring(dag.cglbn)

[1] "[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]"

The oracle score

The oracle score is a theoretical device to evaluate structure learning algorithms. It represents a state of perfect knowledge: it learns new arcs only if they appear in the true network structure. Therefore, it is a score that never commits any errors by completely avoiding false positive and false negative inclusions.

We can implement it as follows.

> oracle.score = function(node, parents, data, args) {

+   if (length(parents) == 0)
+     return(0)
+   else if (all(parents %in% parents(args$dag, node)))
+     return(as.numeric(length(parents)))
+   else
+     return(-1)

+ }#ORACLE.SCORE
> 
> true.dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]")
> hc(gaussian.test, score = "custom-score", fun = oracle.score,
+     args = list(dag = true.dag))

Last updated on Mon Aug 5 02:40:27 2024 with bnlearn 5.0 and R version 4.4.1 (2024-06-14).