## Bayesian network classifiers

Bayesian networks are general-purpose generative models that can be learned independently of the task they will be used for. They are typically structured to encode as arcs the mechanics of the phenomenon they model in a realistic way, which is why they can double as causal models. Bayesian network classifiers are the opposite: their structure is chosen for a single task (classification) and to optimize a single criterion (predictive accuracy). As a result, the arcs will not correspond to the causal structure of the phenomenon in a meaningful way.

Many such models have been proposed in the literature. **bnlearn** only implements two classic ones: the
*naive Bayes* and the *tree-augmented naive Bayes* (TAN) classifiers. Furthermore, both models are limited
to discrete variables as in the respective seminal papers. Their key arguments (documented
here) are:

- the data, which must contain both the class and the explanatory variables;
`training`

: the name of variable that encodes the class;`explanatory`

: the names of the explanatory variables that act as predictors.

### Structure learning

The naive Bayes classifier has a completely fixed, star-shaped structure: the training variable is connected to each
predictor by an arc, and there are no arcs between predictors. Therefore, there is nothing to learn from the data other
than the variable names. We only really need to specify the `training`

variable if we are using
all other variables as predictors.

> library(bnlearn) > training.set = learning.test[1:4000, ] > validation.set = learning.test[4001:5000, ] > nb = naive.bayes(training.set, training = "F") > nb

Bayesian network Classifier model: [F][A|F][B|F][C|F][D|F][E|F] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 1.67 average neighbourhood size: 1.67 average branching factor: 0.83 learning algorithm: Naive Bayes Classifier training node: F tests used in the learning procedure: 0

> graphviz.plot(nb)

TAN builds on the naive Bayes classifier by learning a tree structure to connect the predictors. The tree is learned from the data as the maximum-weight spanning tree over the predictors: the weights are estimated using the pair-wise mutual information between each arc's endpoints.

> tan = tree.bayes(training.set, training = "F") > tan

Bayesian network Classifier model: [F][A|F][B|F:A][D|F:A][C|F:D][E|F:B] nodes: 6 arcs: 9 undirected arcs: 0 directed arcs: 9 average markov blanket size: 3.00 average neighbourhood size: 3.00 average branching factor: 1.50 learning algorithm: TAN Bayes Classifier mutual information estimator: Maximum Likelihood (disc.) training node: F tests used in the learning procedure: 10

> graphviz.plot(tan)

We can control which arcs are included in the tree using a `whitelist`

or a `blacklist`

. If
either makes it impossible to build a tree from the predictors, `tree.bayes()`

will fail with an error. Note
that arc directions are completely arbitrary in the maximum-weight spanning tree, so it is sufficient to whitelist arcs
in a single direction but we need to blacklist arcs in both directions.

> wl = data.frame(from = c("B", "B"), to = c("C", "D")) > bl = data.frame(from = c("A", "B"), to = c("B", "A")) > tan = tree.bayes(training.set, training = "F", whitelist = wl, blacklist = bl) > graphviz.plot(tan)

> wl = data.frame(from = c("B", "C", "D"), to = c("C", "D", "B")) > tree.bayes(training.set, training = "F", whitelist = wl)

## Error: this whitelist does not allow an acyclic graph.

> bl = data.frame(from = c(rep("A", 4), "B", "C", "D", "E"), + to = c("B", "C", "D", "E", rep("A", 4))) > tree.bayes(training.set, training = "F", blacklist = bl)

## Error in chow.liu.backend(x = features.data, nodes = explanatory, estimator = mi, : learned 3 arcs instead of 4, this is not a tree spanning all the nodes.

The direction of the arcs

> par(mfrow = c(1, 2)) > tanE = tree.bayes(training.set, training = "F", root = "E") > tanC = tree.bayes(training.set, training = "F", root = "C") > graphviz.compare(tanE, tanC, main = c("Root Node: E", "Root Node: C"))

### Parameter learning

There are no parameter learning methods that are specific to classifiers in **bnlearn**: those
illustrated here are suitable for both naive Bayes and TAN models.

The Bayesian networks returned by `naive.bayes()`

and `tree.bayes()`

are objects of class
`bn`

, but they also have additional classes `bn.naive`

and `bn.tan`

that identify them
as Bayesian network classifiers. Both additional classes are preserved by `bn.fit()`

.

> class(nb)

[1] "bn.naive" "bn"

> class(tan)

[1] "bn.tan" "bn"

> fitted.nb = bn.fit(nb, training.set, method = "bayes") > fitted.tan = bn.fit(tan, training.set, method = "bayes") > class(fitted.nb)

[1] "bn.naive" "bn.fit" "bn.fit.dnet"

> class(fitted.tan)

[1] "bn.tan" "bn.fit" "bn.fit.dnet"

### Prediction

The additional classes `bn.naive`

and `bn.tan`

are used by **bnlearn** to
implement `predict()`

methods that use their closed-form exact posterior probability formulas for the class
variable. Similarly to the general `predict()`

method for `bn.fit`

objects illustrated
here, they have a `prob`

argument to attach the prediction
probabilities to the predicted values.

> predict(fitted.nb, data = validation.set[1:10, ], prob = TRUE)

[1] a a a b a b a a b a attr(,"prob") [,1] [,2] [,3] [,4] [,5] [,6] [,7] a 0.7339231 0.5976894 0.6983055 0.1856294 0.7038234 0.1896427 0.5790076 b 0.2660769 0.4023106 0.3016945 0.8143706 0.2961766 0.8103573 0.4209924 [,8] [,9] [,10] a 0.5994686 0.1835357 0.5790076 b 0.4005314 0.8164643 0.4209924 Levels: a b

> predict(fitted.tan, data = validation.set[1:10, ], prob = TRUE)

[1] a a a b a b a b b a attr(,"prob") [,1] [,2] [,3] [,4] [,5] [,6] [,7] a 0.7324098 0.6244423 0.7610035 0.1544624 0.7486086 0.1860744 0.6477813 b 0.2675902 0.3755577 0.2389965 0.8455376 0.2513914 0.8139256 0.3522187 [,8] [,9] [,10] a 0.3416894 0.1732989 0.6477813 b 0.6583106 0.8267011 0.3522187 Levels: a b

Furthermore, both methods have a `prior`

argument which can be used to specify the prior probability for
the classes. The default is a uniform prior.

> predict(fitted.nb, data = validation.set[1:10, ], prior = c(a = 0.4, b = 0.6))

[1] a b a b a b b b b b Levels: a b

> predict(fitted.tan, data = validation.set[1:10, ], prior = c(a = 0.75, b = 0.25))

[1] a a a b a b a a b a Levels: a b

Neither `predict()`

method has optional arguments.

`Thu Nov 24 18:28:37 2022`

with **bnlearn**

`4.9-20221107`

and `R version 4.2.2 (2022-10-31)`

.