Bayesian network classifiers
Bayesian networks are general-purpose generative models that can be learned independently of the task they will be used for. They are typically structured to encode as arcs the mechanics of the phenomenon they model in a realistic way, which is why they can double as causal models. Bayesian network classifiers are the opposite: their structure is chosen for a single task (classification) and to optimize a single criterion (predictive accuracy). As a result, the arcs will not correspond to the causal structure of the phenomenon in a meaningful way.
Many such models have been proposed in the literature. bnlearn only implements two classic ones: the naive Bayes and the tree-augmented naive Bayes (TAN) classifiers. Furthermore, both models are limited to discrete variables as in the respective seminal papers. Their key arguments (documented here) are:
- the data, which must contain both the class and the explanatory variables;
training
: the name of variable that encodes the class;explanatory
: the names of the explanatory variables that act as predictors.
Structure learning
The naive Bayes classifier has a completely fixed, star-shaped structure: the training variable is connected to each
predictor by an arc, and there are no arcs between predictors. Therefore, there is nothing to learn from the data other
than the variable names. We only really need to specify the training
variable if we are using
all other variables as predictors.
> library(bnlearn) > training.set = learning.test[1:4000, ] > validation.set = learning.test[4001:5000, ] > nb = naive.bayes(training.set, training = "F") > nb
Bayesian network Classifier model: [F][A|F][B|F][C|F][D|F][E|F] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 1.67 average neighbourhood size: 1.67 average branching factor: 0.83 learning algorithm: Naive Bayes Classifier training node: F tests used in the learning procedure: 0
> graphviz.plot(nb)
TAN builds on the naive Bayes classifier by learning a tree structure to connect the predictors. The tree is learned from the data as the maximum-weight spanning tree over the predictors: the weights are estimated using the pair-wise mutual information between each arc's endpoints.
> tan = tree.bayes(training.set, training = "F") > tan
Bayesian network Classifier model: [F][A|F][B|F:A][D|F:A][C|F:D][E|F:B] nodes: 6 arcs: 9 undirected arcs: 0 directed arcs: 9 average markov blanket size: 3.00 average neighbourhood size: 3.00 average branching factor: 1.50 learning algorithm: TAN Bayes Classifier mutual information estimator: Maximum Likelihood (disc.) training node: F tests used in the learning procedure: 10
> graphviz.plot(tan)
We can control which arcs are included in the tree using a whitelist
or a blacklist
. If
either makes it impossible to build a tree from the predictors, tree.bayes()
will fail with an error. Note
that arc directions are completely arbitrary in the maximum-weight spanning tree, so it is sufficient to whitelist arcs
in a single direction but we need to blacklist arcs in both directions.
> wl = data.frame(from = c("B", "B"), to = c("C", "D")) > bl = data.frame(from = c("A", "B"), to = c("B", "A")) > tan = tree.bayes(training.set, training = "F", whitelist = wl, blacklist = bl) > graphviz.plot(tan)
> wl = data.frame(from = c("B", "C", "D"), to = c("C", "D", "B")) > tree.bayes(training.set, training = "F", whitelist = wl)
## Error: this whitelist does not allow an acyclic graph.
> bl = data.frame(from = c(rep("A", 4), "B", "C", "D", "E"), + to = c("B", "C", "D", "E", rep("A", 4))) > tree.bayes(training.set, training = "F", blacklist = bl)
## Error in chow.liu.backend(x = features.data, nodes = explanatory, estimator = mi, : learned 3 arcs instead of 4, this is not a tree spanning all the nodes.
The direction of the arcs
> par(mfrow = c(1, 2)) > tanE = tree.bayes(training.set, training = "F", root = "E") > tanC = tree.bayes(training.set, training = "F", root = "C") > graphviz.compare(tanE, tanC, main = c("Root Node: E", "Root Node: C"))
Parameter learning
There are no parameter learning methods that are specific to classifiers in bnlearn: those illustrated here are suitable for both naive Bayes and TAN models.
The Bayesian networks returned by naive.bayes()
and tree.bayes()
are objects of class
bn
, but they also have additional classes bn.naive
and bn.tan
that identify them
as Bayesian network classifiers. Both additional classes are preserved by bn.fit()
.
> class(nb)
[1] "bn.naive" "bn"
> class(tan)
[1] "bn.tan" "bn"
> fitted.nb = bn.fit(nb, training.set, method = "bayes") > fitted.tan = bn.fit(tan, training.set, method = "bayes") > class(fitted.nb)
[1] "bn.naive" "bn.fit" "bn.fit.dnet"
> class(fitted.tan)
[1] "bn.tan" "bn.fit" "bn.fit.dnet"
Prediction
The additional classes bn.naive
and bn.tan
are used by bnlearn to
implement predict()
methods that use their closed-form exact posterior probability formulas for the class
variable. Similarly to the general predict()
method for bn.fit
objects illustrated
here, they have a prob
argument to attach the prediction
probabilities to the predicted values.
> predict(fitted.nb, data = validation.set[1:10, ], prob = TRUE)
[1] a a a b a b a a b a attr(,"prob") [,1] [,2] [,3] [,4] [,5] [,6] [,7] a 0.7339231 0.5976894 0.6983055 0.1856294 0.7038234 0.1896427 0.5790076 b 0.2660769 0.4023106 0.3016945 0.8143706 0.2961766 0.8103573 0.4209924 [,8] [,9] [,10] a 0.5994686 0.1835357 0.5790076 b 0.4005314 0.8164643 0.4209924 Levels: a b
> predict(fitted.tan, data = validation.set[1:10, ], prob = TRUE)
[1] a a a b a b a b b a attr(,"prob") [,1] [,2] [,3] [,4] [,5] [,6] [,7] a 0.7324098 0.6244423 0.7610035 0.1544624 0.7486086 0.1860744 0.6477813 b 0.2675902 0.3755577 0.2389965 0.8455376 0.2513914 0.8139256 0.3522187 [,8] [,9] [,10] a 0.3416894 0.1732989 0.6477813 b 0.6583106 0.8267011 0.3522187 Levels: a b
Furthermore, both methods have a prior
argument which can be used to specify the prior probability for
the classes. The default is a uniform prior.
> predict(fitted.nb, data = validation.set[1:10, ], prior = c(a = 0.4, b = 0.6))
[1] a b a b a b b b b b Levels: a b
> predict(fitted.tan, data = validation.set[1:10, ], prior = c(a = 0.75, b = 0.25))
[1] a a a b a b a a b a Levels: a b
Neither predict()
method has optional arguments.
Mon Aug 5 02:39:20 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.