## Bayesian network classifiers

Bayesian networks are general-purpose generative models that can be learned independently of the task they will be used for. They are typically structured to encode as arcs the mechanics of the phenomenon they model in a realistic way, which is why they can double as causal models. Bayesian network classifiers are the opposite: their structure is chosen for a single task (classification) and to optimize a single criterion (predictive accuracy). As a result, the arcs will not correspond to the causal structure of the phenomenon in a meaningful way.

Many such models have been proposed in the literature. bnlearn only implements two classic ones: the naive Bayes and the tree-augmented naive Bayes (TAN) classifiers. Furthermore, both models are limited to discrete variables as in the respective seminal papers. Their key arguments (documented here) are:

1. the data, which must contain both the class and the explanatory variables;
2. `training`: the name of variable that encodes the class;
3. `explanatory`: the names of the explanatory variables that act as predictors.

### Structure learning

The naive Bayes classifier has a completely fixed, star-shaped structure: the training variable is connected to each predictor by an arc, and there are no arcs between predictors. Therefore, there is nothing to learn from the data other than the variable names. We only really need to specify the `training` variable if we are using all other variables as predictors.

```> library(bnlearn)
> training.set = learning.test[1:4000, ]
> validation.set = learning.test[4001:5000, ]
> nb = naive.bayes(training.set, training = "F")
> nb
```
```
Bayesian network Classifier

model:
[F][A|F][B|F][C|F][D|F][E|F]
nodes:                                 6
arcs:                                  5
undirected arcs:                     0
directed arcs:                       5
average markov blanket size:           1.67
average neighbourhood size:            1.67
average branching factor:              0.83

learning algorithm:                    Naive Bayes Classifier
training node:                         F
tests used in the learning procedure:  0
```
```> graphviz.plot(nb)
``` TAN builds on the naive Bayes classifier by learning a tree structure to connect the predictors. The tree is learned from the data as the maximum-weight spanning tree over the predictors: the weights are estimated using the pair-wise mutual information between each arc's endpoints.

```> tan = tree.bayes(training.set, training = "F")
> tan
```
```
Bayesian network Classifier

model:
[F][A|F][B|F:A][D|F:A][C|F:D][E|F:B]
nodes:                                 6
arcs:                                  9
undirected arcs:                     0
directed arcs:                       9
average markov blanket size:           3.00
average neighbourhood size:            3.00
average branching factor:              1.50

learning algorithm:                    TAN Bayes Classifier
mutual information estimator:          Maximum Likelihood (disc.)
training node:                         F
tests used in the learning procedure:  10
```
```> graphviz.plot(tan)
``` We can control which arcs are included in the tree using a `whitelist` or a `blacklist`. If either makes it impossible to build a tree from the predictors, `tree.bayes()` will fail with an error. Note that arc directions are completely arbitrary in the maximum-weight spanning tree, so it is sufficient to whitelist arcs in a single direction but we need to blacklist arcs in both directions.

```> wl = data.frame(from = c("B", "B"), to = c("C", "D"))
> bl = data.frame(from = c("A", "B"), to = c("B", "A"))
> tan = tree.bayes(training.set, training = "F", whitelist = wl, blacklist = bl)
> graphviz.plot(tan)
``` ```> wl = data.frame(from = c("B", "C", "D"), to = c("C", "D", "B"))
> tree.bayes(training.set, training = "F", whitelist = wl)
```
```## Error: this whitelist does not allow an acyclic graph.
```
```> bl = data.frame(from = c(rep("A", 4), "B", "C", "D", "E"),
+                 to = c("B", "C", "D", "E", rep("A", 4)))
> tree.bayes(training.set, training = "F", blacklist = bl)
```
```## Error in chow.liu.backend(x = features.data, nodes = explanatory, estimator = mi, : learned 3 arcs instead of 4, this is not a tree spanning all the nodes.
```

The direction of the arcs

```> par(mfrow = c(1, 2))
> tanE = tree.bayes(training.set, training = "F", root = "E")
> tanC = tree.bayes(training.set, training = "F", root = "C")
> graphviz.compare(tanE, tanC, main = c("Root Node: E", "Root Node: C"))
``` ### Parameter learning

There are no parameter learning methods that are specific to classifiers in bnlearn: those illustrated here are suitable for both naive Bayes and TAN models.

The Bayesian networks returned by `naive.bayes()` and `tree.bayes()` are objects of class `bn`, but they also have additional classes `bn.naive` and `bn.tan` that identify them as Bayesian network classifiers. Both additional classes are preserved by `bn.fit()`.

```> class(nb)
```
``` "bn.naive" "bn"
```
```> class(tan)
```
``` "bn.tan" "bn"
```
```> fitted.nb = bn.fit(nb, training.set, method = "bayes")
> fitted.tan = bn.fit(tan, training.set, method = "bayes")
> class(fitted.nb)
```
``` "bn.naive"    "bn.fit"      "bn.fit.dnet"
```
```> class(fitted.tan)
```
``` "bn.tan"      "bn.fit"      "bn.fit.dnet"
```

### Prediction

The additional classes `bn.naive` and `bn.tan` are used by bnlearn to implement `predict()` methods that use their closed-form exact posterior probability formulas for the class variable. Similarly to the general `predict()` method for `bn.fit` objects illustrated here, they have a `prob` argument to attach the prediction probabilities to the predicted values.

```> predict(fitted.nb, data = validation.set[1:10, ], prob = TRUE)
```
```  a a a b a b a a b a
attr(,"prob")
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
a 0.7339231 0.5976894 0.6983055 0.1856294 0.7038234 0.1896427 0.5790076
b 0.2660769 0.4023106 0.3016945 0.8143706 0.2961766 0.8103573 0.4209924
[,8]      [,9]     [,10]
a 0.5994686 0.1835357 0.5790076
b 0.4005314 0.8164643 0.4209924
Levels: a b
```
```> predict(fitted.tan, data = validation.set[1:10, ], prob = TRUE)
```
```  a a a b a b a b b a
attr(,"prob")
[,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
a 0.7324098 0.6244423 0.7610035 0.1544624 0.7486086 0.1860744 0.6477813
b 0.2675902 0.3755577 0.2389965 0.8455376 0.2513914 0.8139256 0.3522187
[,8]      [,9]     [,10]
a 0.3416894 0.1732989 0.6477813
b 0.6583106 0.8267011 0.3522187
Levels: a b
```

Furthermore, both methods have a `prior` argument which can be used to specify the prior probability for the classes. The default is a uniform prior.

```> predict(fitted.nb, data = validation.set[1:10, ], prior = c(a = 0.4, b = 0.6))
```
```  a b a b a b b b b b
Levels: a b
```
```> predict(fitted.tan, data = validation.set[1:10, ], prior = c(a = 0.75, b = 0.25))
```
```  a a a b a b a a b a
Levels: a b
```

Neither `predict()` method has optional arguments.

Last updated on `Thu Nov 24 18:28:37 2022` with bnlearn `4.9-20221107` and `R version 4.2.2 (2022-10-31)`.