Fitting Bayesian network's parameters
Learning the network structure
Let's first load the learning.test data set.
> library(bnlearn) > data(learning.test)
The network structure associated with this data (described here) can be learned with any of the algorithms implemented in bnlearn; we will use the Incremental Association (IAMB) by Tsamardinos et al. for this example.
> res = iamb(learning.test)
> res
Bayesian network learned via Constraint-based methods
model:
[partially directed graph]
nodes: 6
arcs: 5
undirected arcs: 1
directed arcs: 4
average markov blanket size: 2.33
average neighbourhood size: 1.67
average branching factor: 0.67
learning algorithm: Incremental Association
conditional independence test: Mutual Information (discrete)
alpha threshold: 0.05
tests used in the learning procedure: 50
optimized: TRUE
There is a single undirected arc in the resulting network; the learning algorithm was not able to set its orientation because its two possible direction are score equivalent.
> score(set.arc(res, "A", "B"), learning.test) [1] -24006.73 > score(set.arc(res, "B", "A"), learning.test) [1] -24006.73
Setting the direction of undirected arcs
Due to the way Bayesian networks are defined the network structure must be a directed acyclic graph (DAG); otherwise their parameters cannot be estimated because the factorization of the global probability distribution of the data into the loca ones (one for each variable in the model) is not completely known.
If a network structure contains one or more undirected arcs their direction can be set in the following ways:
- with the
set.arcfunction, if the direction of the arc is known or can be guessed from the experimental setting of the data:> res2 = set.arc(res, "B", "A")
- with the
pdag2dagfunction, if the topological ordering of the nodes is known or (again) if it can be guessed in terms of causal relationships from the experimental setting:> res2 = pdag2dag(res, ordering=c("A", "B", "C", "D", "E", "F")) - with the
choose.directionfunction and the K2 score, which is not score equivalent:> choose.direction(res, c("A", "B"), data = learning.test, criterion = "k2")
Fitting the parameters (Maximum Likelihood estimates)
Discrete data
Now that the Bayesian network structure is completely directed we can fit the parameters of the distribution, which in this case are the conditional probability tables of the variables:
> fit = bn.fit(res2, learning.test)
> fit
Bayesian network parameters
Parameters of node A (multinomial distribution)
Conditional probability table:
a b c
0.3336 0.3340 0.3324
Parameters of node B (multinomial distribution)
Conditional probability table:
A
B a b c
a 0.85611511 0.44491018 0.11492178
b 0.02517986 0.22095808 0.09446450
c 0.11870504 0.33413174 0.79061372
Parameters of node C (multinomial distribution)
Conditional probability table:
a b c
0.7434 0.2048 0.0518
Parameters of node D (multinomial distribution)
Conditional probability table:
, , C = a
A
D a b c
a 0.80081301 0.09251810 0.10530547
b 0.09024390 0.80209171 0.11173633
c 0.10894309 0.10539019 0.78295820
, , C = b
A
D a b c
a 0.18079096 0.88304094 0.24695122
b 0.13276836 0.07017544 0.49390244
c 0.68644068 0.04678363 0.25914634
, , C = c
A
D a b c
a 0.42857143 0.34117647 0.13333333
b 0.20238095 0.38823529 0.44444444
c 0.36904762 0.27058824 0.42222222
Parameters of node E (multinomial distribution)
Conditional probability table:
, , F = a
B
E a b c
a 0.8052498 0.2058824 0.1193738
b 0.0973751 0.1797386 0.1144814
c 0.0973751 0.6143791 0.7661448
, , F = b
B
E a b c
a 0.4005080 0.3167939 0.2375954
b 0.4902625 0.3664122 0.5066794
c 0.1092295 0.3167939 0.2557252
Parameters of node F (multinomial distribution)
Conditional probability table:
a b
0.5018 0.4982
The conditional probability tables of the single variables can be accessed
with the usual $ operator:
> fit$A
Parameters of node A (multinomial distribution)
Conditional probability table:
a b c
0.3336 0.3340 0.3324
and can be plotted with the bn.fit.barchart and bn.fit.dotplot
(see their manual page):
> bn.fit.barchart(fit$D)
> bn.fit.dotplot(fit$D)
Continuous data
Fitting the parameters of a Gaussian Bayesian networks (the regression coefficients for each variable against its parents) is done in the same way.
> data(gaussian.test)
> res = iamb(gaussian.test)
> undirected.arcs(res)
from to
[1,] "B" "D"
[2,] "D" "B"
> res = set.arc(res, "D", "B")
> fit = bn.fit(res, gaussian.test)
> fit
Bayesian network parameters
Parameters of node A (gaussian distribution)
Conditional density: A
Coefficients:
(Intercept)
1.007493
Standard deviation of the residuals: 1.004233
Parameters of node B (gaussian distribution)
Conditional density: B | D
Coefficients:
(Intercept) D
-3.9695713 0.6639115
Standard deviation of the residuals: 0.2187534
Parameters of node C (conditional gaussian distribution)
Conditional density: C | A + B
Coefficients:
(Intercept) A B
2.001083 1.995901 1.999108
Standard deviation of the residuals: 0.5088754
Parameters of node D (gaussian distribution)
Conditional density: D
Coefficients:
(Intercept)
9.05101
Standard deviation of the residuals: 4.55816
Parameters of node E (gaussian distribution)
Conditional density: E
Coefficients:
(Intercept)
3.493906
Standard deviation of the residuals: 1.98986
Parameters of node F (conditional gaussian distribution)
Conditional density: F | A + D + E + G
Coefficients:
(Intercept) A D E G
-0.006047321 1.994853041 1.005636909 1.002577002 1.494373265
Standard deviation of the residuals: 0.9957489
Parameters of node G (gaussian distribution)
Conditional density: G
Coefficients:
(Intercept)
5.028076
Standard deviation of the residuals: 1.983631
The functions coefficients, fitted and
residuals
(manual page)
allow an easy manipulation of the quantities of interest for both
a single variable and the whole network.
> coefficients(fit$F) (Intercept) A D E G -0.006047321 1.994853041 1.005636909 1.002577002 1.494373265 > str(residuals(fit$F)) num [1:5000] -0.861 1.271 -0.262 -0.479 -0.782 ... > str(fitted(fit$F)) num [1:5000] 25.6 35.5 22.3 23.8 25.3 ...
As before there are some functions to plot these quantities:
bn.fit.qqplot, bn.fit.xyplot and
bn.fit.histogram (see their
manual page).
For Gaussian Bayesian networks however plotting all the nodes in
a single plot is supported, as shown below.
> bn.fit.qqplot(fit)
> bn.fit.xyplot(fit)
> bn.fit.histogram(fit)
Fitting the parameters (Bayesian Posterior estimates)
Discrete data
As an alternative to the traditional maximum likelihood approach it is also
possible to estimate the parameters of the network according to the Bayesian
paradigm, that is using the expected value of the posterior distribution.
The only difference from the procedure illustrated above is that
method = "bayes" must be specified in bn.fit.
> fit = bn.fit(res2, learning.test, method = "bayes")
> fit
Bayesian network parameters
Parameters of node A (multinomial distribution)
Conditional probability table:
a b c
0.3335967 0.3339918 0.3324114
Parameters of node B (multinomial distribution)
Conditional probability table:
A
B a b c
a 0.84971707 0.44354627 0.11760433
b 0.02895118 0.22233176 0.09739831
c 0.12133175 0.33412198 0.78499736
Parameters of node C (multinomial distribution)
Conditional probability table:
a b c
0.73837745 0.20637429 0.05524825
Parameters of node D (multinomial distribution)
Conditional probability table:
, , C = a
A
D a b c
a 0.79820937 0.09384538 0.10656126
b 0.09159780 0.79950810 0.11295671
c 0.11019284 0.10664652 0.78048203
, , C = b
A
D a b c
a 0.18370279 0.87218684 0.24872816
b 0.13659688 0.07537155 0.49059942
c 0.67970033 0.05244161 0.26067242
, , C = c
A
D a b c
a 0.42135289 0.34058847 0.14755352
b 0.21230644 0.38411931 0.43654434
c 0.36634067 0.27529222 0.41590214
Parameters of node E (multinomial distribution)
Conditional probability table:
, , F = a
B
E a b c
a 0.80115650 0.21004566 0.12151545
b 0.09942175 0.18475588 0.11667205
c 0.09942175 0.60519845 0.76181251
, , F = b
B
E a b c
a 0.39992539 0.31742146 0.23853018
b 0.48890132 0.36515708 0.50498688
c 0.11117329 0.31742146 0.25648294
Parameters of node F (multinomial distribution)
Conditional probability table:
a b
0.501778 0.498222
The imaginary/equivalent sample size of the prior distribution can be
specified using the iss parameter (documented
here); it tunes the amount of
regularization applied to the posterior distributions.