Fitting Bayesian network's parameters

Learning the network structure

Let's first load the learning.test data set.

> library(bnlearn)
> data(learning.test)

The network structure associated with this data (described here) can be learned with any of the algorithms implemented in bnlearn; we will use the Incremental Association (IAMB) by Tsamardinos et al. for this example.

> res = iamb(learning.test)
> res

  Bayesian network learned via Constraint-based methods

  model:
    [partially directed graph] 
  nodes:                                 6 
  arcs:                                  5 
    undirected arcs:                     1 
    directed arcs:                       4 
  average markov blanket size:           2.33 
  average neighbourhood size:            1.67 
  average branching factor:              0.67 

  learning algorithm:                    Incremental Association 
  conditional independence test:         Mutual Information (discrete) 
  alpha threshold:                       0.05 
  tests used in the learning procedure:  50 
  optimized:                             TRUE 

There is a single undirected arc in the resulting network; the learning algorithm was not able to set its orientation because its two possible direction are score equivalent.

> score(set.arc(res, "A", "B"), learning.test)
[1] -24006.73
> score(set.arc(res, "B", "A"), learning.test)
[1] -24006.73

Setting the direction of undirected arcs

Due to the way Bayesian networks are defined the network structure must be a directed acyclic graph (DAG); otherwise their parameters cannot be estimated because the factorization of the global probability distribution of the data into the loca ones (one for each variable in the model) is not completely known.

If a network structure contains one or more undirected arcs their direction can be set in the following ways:

  • with the set.arc function, if the direction of the arc is known or can be guessed from the experimental setting of the data:
    > res2 = set.arc(res, "B", "A")
    
  • with the pdag2dag function, if the topological ordering of the nodes is known or (again) if it can be guessed in terms of causal relationships from the experimental setting:
    > res2 = pdag2dag(res, ordering=c("A", "B", "C", "D", "E", "F"))
    
  • with the choose.direction function and the K2 score, which is not score equivalent:
    > choose.direction(res, c("A", "B"), data = learning.test, criterion = "k2")
    

Fitting the parameters (Maximum Likelihood estimates)

Discrete data

Now that the Bayesian network structure is completely directed we can fit the parameters of the distribution, which in this case are the conditional probability tables of the variables:

> fit = bn.fit(res2, learning.test)
> fit

  Bayesian network parameters

  Parameters of node A (multinomial distribution)

Conditional probability table:
 
     a      b      c 
0.3336 0.3340 0.3324 

  Parameters of node B (multinomial distribution)

Conditional probability table:
 
   A
B            a          b          c
  a 0.85611511 0.44491018 0.11492178
  b 0.02517986 0.22095808 0.09446450
  c 0.11870504 0.33413174 0.79061372

  Parameters of node C (multinomial distribution)

Conditional probability table:
 
     a      b      c 
0.7434 0.2048 0.0518 

  Parameters of node D (multinomial distribution)

Conditional probability table:
 
, , C = a

   A
D            a          b          c
  a 0.80081301 0.09251810 0.10530547
  b 0.09024390 0.80209171 0.11173633
  c 0.10894309 0.10539019 0.78295820

, , C = b

   A
D            a          b          c
  a 0.18079096 0.88304094 0.24695122
  b 0.13276836 0.07017544 0.49390244
  c 0.68644068 0.04678363 0.25914634

, , C = c

   A
D            a          b          c
  a 0.42857143 0.34117647 0.13333333
  b 0.20238095 0.38823529 0.44444444
  c 0.36904762 0.27058824 0.42222222

  Parameters of node E (multinomial distribution)

Conditional probability table:
 
, , F = a

   B
E           a         b         c
  a 0.8052498 0.2058824 0.1193738
  b 0.0973751 0.1797386 0.1144814
  c 0.0973751 0.6143791 0.7661448

, , F = b

   B
E           a         b         c
  a 0.4005080 0.3167939 0.2375954
  b 0.4902625 0.3664122 0.5066794
  c 0.1092295 0.3167939 0.2557252

  Parameters of node F (multinomial distribution)

Conditional probability table:
 
     a      b 
0.5018 0.4982 

The conditional probability tables of the single variables can be accessed with the usual $ operator:

> fit$A

  Parameters of node A (multinomial distribution)

Conditional probability table:
 
     a      b      c 
0.3336 0.3340 0.3324 

and can be plotted with the bn.fit.barchart and bn.fit.dotplot (see their manual page):

> bn.fit.barchart(fit$D)
barchart
> bn.fit.dotplot(fit$D)
dotplot

Continuous data

Fitting the parameters of a Gaussian Bayesian networks (the regression coefficients for each variable against its parents) is done in the same way.

> data(gaussian.test)
> res = iamb(gaussian.test)
> undirected.arcs(res)
     from to 
[1,] "B"  "D"
[2,] "D"  "B"
> res = set.arc(res, "D", "B")
> fit = bn.fit(res, gaussian.test)
> fit

  Bayesian network parameters

  Parameters of node A (gaussian distribution) 

Conditional density: A
Coefficients:
(Intercept)  
   1.007493  
Standard deviation of the residuals: 1.004233 

  Parameters of node B (gaussian distribution) 

Conditional density: B | D
Coefficients:
(Intercept)            D  
 -3.9695713    0.6639115  
Standard deviation of the residuals: 0.2187534 

  Parameters of node C (conditional gaussian distribution) 

Conditional density: C | A + B
Coefficients:
(Intercept)            A            B  
   2.001083     1.995901     1.999108  
Standard deviation of the residuals: 0.5088754 

  Parameters of node D (gaussian distribution) 

Conditional density: D
Coefficients:
(Intercept)  
    9.05101  
Standard deviation of the residuals: 4.55816 

  Parameters of node E (gaussian distribution) 

Conditional density: E
Coefficients:
(Intercept)  
   3.493906  
Standard deviation of the residuals: 1.98986 

  Parameters of node F (conditional gaussian distribution) 

Conditional density: F | A + D + E + G
Coefficients:
 (Intercept)             A             D             E             G  
-0.006047321   1.994853041   1.005636909   1.002577002   1.494373265  
Standard deviation of the residuals: 0.9957489 

  Parameters of node G (gaussian distribution) 

Conditional density: G
Coefficients:
(Intercept)  
   5.028076  
Standard deviation of the residuals: 1.983631 

The functions coefficients, fitted and residuals (manual page) allow an easy manipulation of the quantities of interest for both a single variable and the whole network.

> coefficients(fit$F)
 (Intercept)            A            D            E            G 
-0.006047321  1.994853041  1.005636909  1.002577002  1.494373265 
> str(residuals(fit$F))
 num [1:5000] -0.861 1.271 -0.262 -0.479 -0.782 ...
> str(fitted(fit$F))
 num [1:5000] 25.6 35.5 22.3 23.8 25.3 ...

As before there are some functions to plot these quantities: bn.fit.qqplot, bn.fit.xyplot and bn.fit.histogram (see their manual page). For Gaussian Bayesian networks however plotting all the nodes in a single plot is supported, as shown below.

> bn.fit.qqplot(fit)
qqplot
> bn.fit.xyplot(fit)
xyplot
> bn.fit.histogram(fit)
histogram

Fitting the parameters (Bayesian Posterior estimates)

Discrete data

As an alternative to the traditional maximum likelihood approach it is also possible to estimate the parameters of the network according to the Bayesian paradigm, that is using the expected value of the posterior distribution. The only difference from the procedure illustrated above is that method = "bayes" must be specified in bn.fit.

> fit = bn.fit(res2, learning.test, method = "bayes")
> fit

  Bayesian network parameters

  Parameters of node A (multinomial distribution)

Conditional probability table:
 
        a         b         c 
0.3335967 0.3339918 0.3324114 

  Parameters of node B (multinomial distribution)

Conditional probability table:
 
   A
B            a          b          c
  a 0.84971707 0.44354627 0.11760433
  b 0.02895118 0.22233176 0.09739831
  c 0.12133175 0.33412198 0.78499736

  Parameters of node C (multinomial distribution)

Conditional probability table:
 
         a          b          c 
0.73837745 0.20637429 0.05524825 

  Parameters of node D (multinomial distribution)

Conditional probability table:
 
, , C = a

   A
D            a          b          c
  a 0.79820937 0.09384538 0.10656126
  b 0.09159780 0.79950810 0.11295671
  c 0.11019284 0.10664652 0.78048203

, , C = b

   A
D            a          b          c
  a 0.18370279 0.87218684 0.24872816
  b 0.13659688 0.07537155 0.49059942
  c 0.67970033 0.05244161 0.26067242

, , C = c

   A
D            a          b          c
  a 0.42135289 0.34058847 0.14755352
  b 0.21230644 0.38411931 0.43654434
  c 0.36634067 0.27529222 0.41590214

  Parameters of node E (multinomial distribution)

Conditional probability table:
 
, , F = a

   B
E            a          b          c
  a 0.80115650 0.21004566 0.12151545
  b 0.09942175 0.18475588 0.11667205
  c 0.09942175 0.60519845 0.76181251

, , F = b

   B
E            a          b          c
  a 0.39992539 0.31742146 0.23853018
  b 0.48890132 0.36515708 0.50498688
  c 0.11117329 0.31742146 0.25648294

  Parameters of node F (multinomial distribution)

Conditional probability table:
 
       a        b 
0.501778 0.498222 

The imaginary/equivalent sample size of the prior distribution can be specified using the iss parameter (documented here); it tunes the amount of regularization applied to the posterior distributions.