Bayesian network classifiers

Bayesian networks are general-purpose generative models that can be learned independently of the task they will be used for. They are typically structured to encode as arcs the mechanics of the phenomenon they model in a realistic way, which is why they can double as causal models. Bayesian network classifiers are the opposite: their structure is chosen for a single task (classification) and to optimize a single criterion (predictive accuracy). As a result, the arcs will not correspond to the causal structure of the phenomenon in a meaningful way.

Many such models have been proposed in the literature. bnlearn only implements two classic ones: the naive Bayes and the tree-augmented naive Bayes (TAN) classifiers. Furthermore, both models are limited to discrete variables as in the respective seminal papers. Their key arguments (documented here) are:

  1. the data, which must contain both the class and the explanatory variables;
  2. training: the name of variable that encodes the class;
  3. explanatory: the names of the explanatory variables that act as predictors.

Structure learning

The naive Bayes classifier has a completely fixed, star-shaped structure: the training variable is connected to each predictor by an arc, and there are no arcs between predictors. Therefore, there is nothing to learn from the data other than the variable names. We only really need to specify the training variable if we are using all other variables as predictors.

> library(bnlearn)
> training.set = learning.test[1:4000, ]
> validation.set = learning.test[4001:5000, ]
> nb = naive.bayes(training.set, training = "F")
> nb

  Bayesian network Classifier

  model:
   [F][A|F][B|F][C|F][D|F][E|F] 
  nodes:                                 6 
  arcs:                                  5 
    undirected arcs:                     0 
    directed arcs:                       5 
  average markov blanket size:           1.67 
  average neighbourhood size:            1.67 
  average branching factor:              0.83 

  learning algorithm:                    Naive Bayes Classifier 
  training node:                         F 
  tests used in the learning procedure:  0
> graphviz.plot(nb)
plot of chunk unnamed-chunk-2

TAN builds on the naive Bayes classifier by learning a tree structure to connect the predictors. The tree is learned from the data as the maximum-weight spanning tree over the predictors: the weights are estimated using the pair-wise mutual information between each arc's endpoints.

> tan = tree.bayes(training.set, training = "F")
> tan

  Bayesian network Classifier

  model:
   [F][A|F][B|F:A][D|F:A][C|F:D][E|F:B] 
  nodes:                                 6 
  arcs:                                  9 
    undirected arcs:                     0 
    directed arcs:                       9 
  average markov blanket size:           3.00 
  average neighbourhood size:            3.00 
  average branching factor:              1.50 

  learning algorithm:                    TAN Bayes Classifier 
  mutual information estimator:          Maximum Likelihood (disc.) 
  training node:                         F 
  tests used in the learning procedure:  10
> graphviz.plot(tan)
plot of chunk unnamed-chunk-3

We can control which arcs are included in the tree using a whitelist or a blacklist. If either makes it impossible to build a tree from the predictors, tree.bayes() will fail with an error. Note that arc directions are completely arbitrary in the maximum-weight spanning tree, so it is sufficient to whitelist arcs in a single direction but we need to blacklist arcs in both directions.

> wl = data.frame(from = c("B", "B"), to = c("C", "D"))
> bl = data.frame(from = c("A", "B"), to = c("B", "A"))
> tan = tree.bayes(training.set, training = "F", whitelist = wl, blacklist = bl)
> graphviz.plot(tan)
plot of chunk unnamed-chunk-4
> wl = data.frame(from = c("B", "C", "D"), to = c("C", "D", "B"))
> tree.bayes(training.set, training = "F", whitelist = wl)
## Error: this whitelist does not allow an acyclic graph.
> bl = data.frame(from = c(rep("A", 4), "B", "C", "D", "E"),
+                 to = c("B", "C", "D", "E", rep("A", 4)))
> tree.bayes(training.set, training = "F", blacklist = bl)
## Error in chow.liu.backend(x = features.data, nodes = explanatory, estimator = mi, : learned 3 arcs instead of 4, this is not a tree spanning all the nodes.

The direction of the arcs

> par(mfrow = c(1, 2))
> tanE = tree.bayes(training.set, training = "F", root = "E")
> tanC = tree.bayes(training.set, training = "F", root = "C")
> graphviz.compare(tanE, tanC, main = c("Root Node: E", "Root Node: C"))
plot of chunk unnamed-chunk-5

Parameter learning

There are no parameter learning methods that are specific to classifiers in bnlearn: those illustrated here are suitable for both naive Bayes and TAN models.

The Bayesian networks returned by naive.bayes() and tree.bayes() are objects of class bn, but they also have additional classes bn.naive and bn.tan that identify them as Bayesian network classifiers. Both additional classes are preserved by bn.fit().

> class(nb)
[1] "bn.naive" "bn"
> class(tan)
[1] "bn.tan" "bn"
> fitted.nb = bn.fit(nb, training.set, method = "bayes")
> fitted.tan = bn.fit(tan, training.set, method = "bayes")
> class(fitted.nb)
[1] "bn.naive"    "bn.fit"      "bn.fit.dnet"
> class(fitted.tan)
[1] "bn.tan"      "bn.fit"      "bn.fit.dnet"

Prediction

The additional classes bn.naive and bn.tan are used by bnlearn to implement predict() methods that use their closed-form exact posterior probability formulas for the class variable. Similarly to the general predict() method for bn.fit objects illustrated here, they have a prob argument to attach the prediction probabilities to the predicted values.

> predict(fitted.nb, data = validation.set[1:10, ], prob = TRUE)
 [1] a a a b a b a a b a
attr(,"prob")
       [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
a 0.7339231 0.5976894 0.6983055 0.1856294 0.7038234 0.1896427 0.5790076
b 0.2660769 0.4023106 0.3016945 0.8143706 0.2961766 0.8103573 0.4209924
       [,8]      [,9]     [,10]
a 0.5994686 0.1835357 0.5790076
b 0.4005314 0.8164643 0.4209924
Levels: a b
> predict(fitted.tan, data = validation.set[1:10, ], prob = TRUE)
 [1] a a a b a b a b b a
attr(,"prob")
       [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
a 0.7324098 0.6244423 0.7610035 0.1544624 0.7486086 0.1860744 0.6477813
b 0.2675902 0.3755577 0.2389965 0.8455376 0.2513914 0.8139256 0.3522187
       [,8]      [,9]     [,10]
a 0.3416894 0.1732989 0.6477813
b 0.6583106 0.8267011 0.3522187
Levels: a b

Furthermore, both methods have a prior argument which can be used to specify the prior probability for the classes. The default is a uniform prior.

> predict(fitted.nb, data = validation.set[1:10, ], prior = c(a = 0.4, b = 0.6))
 [1] a b a b a b b b b b
Levels: a b
> predict(fitted.tan, data = validation.set[1:10, ], prior = c(a = 0.75, b = 0.25))
 [1] a a a b a b a a b a
Levels: a b

Neither predict() method has optional arguments.

Last updated on Sat Feb 17 23:51:54 2024 with bnlearn 5.0-20240208 and R version 4.3.2 (2023-10-31).