## Fitting the parameters of a Bayesian network

### Learning the network structure

For this example we will initially use the `learning.test` data set shipped with bnlearn. Its network structure (described here and here) can be learned with any of the algorithms implemented in bnlearn; we will use IAMB in the following.

```> library(bnlearn)
> data(learning.test)
> pdag = iamb(learning.test)
> pdag
```
```
Bayesian network learned via Constraint-based methods

model:
[partially directed graph]
nodes:                                 6
arcs:                                  5
undirected arcs:                     1
directed arcs:                       4
average markov blanket size:           2.33
average neighbourhood size:            1.67
average branching factor:              0.67

learning algorithm:                    IAMB
conditional independence test:         Mutual Information (disc.)
alpha threshold:                       0.05
tests used in the learning procedure:  134
```

As we can see from the output above, there is a single undirected arc in `pdag`; IAMB was not able to set its orientation because its two possible direction are score equivalent.

```> score(set.arc(pdag, from = "A", to = "B"), learning.test)
```
``` -24006.73
```
```> score(set.arc(pdag, from = "B", to = "A"), learning.test)
```
``` -24006.73
```

### Setting the direction of undirected arcs

Due to the way Bayesian networks are defined the network structure must be a directed acyclic graph (DAG); otherwise their parameters cannot be estimated because the factorization of the global probability distribution of the data into the local ones (one for each variable in the model) is not completely known.

If a network structure contains one or more undirected arcs, their direction can be set in two ways:

• with the `set.arc()` function, if the direction of the arc is known or can be guessed from the experimental setting of the data:
```> dag = set.arc(pdag, from = "B", to = "A")
```
• with the `pdag2dag()` function, if the topological ordering of the nodes is known or (again) if it can be guessed in terms of causal relationships from the experimental setting:
```> dag = pdag2dag(pdag, ordering = c("A", "B", "C", "D", "E", "F"))
```
• with the `cextend()` function, which picks a DAG from the equivalence class represented by the network structure:
```> dag = cextend(pdag)
```

### Fitting the parameters (Maximum Likelihood estimates)

#### Discrete data

Now that the Bayesian network structure is completely directed, we can fit the parameters of the local distributions, which take the form of conditional probability tables. By default, `bn.fit()` uses maximum likelihood estimators: we can select to use them explicitly with `method = "mle"`.

```> fitted = bn.fit(dag, learning.test, method = "mle")
> fitted
```
```
Bayesian network parameters

Parameters of node A (multinomial distribution)

Conditional probability table:

B
A        a      b      c
a 0.6046 0.0739 0.0957
b 0.3146 0.6496 0.2696
c 0.0809 0.2764 0.6348

Parameters of node B (multinomial distribution)

Conditional probability table:
a     b     c
0.472 0.114 0.414

Parameters of node C (multinomial distribution)

Conditional probability table:
a      b      c
0.7434 0.2048 0.0518

Parameters of node D (multinomial distribution)

Conditional probability table:

, , C = a

A
D        a      b      c
a 0.8008 0.0925 0.1053
b 0.0902 0.8021 0.1117
c 0.1089 0.1054 0.7830

, , C = b

A
D        a      b      c
a 0.1808 0.8830 0.2470
b 0.1328 0.0702 0.4939
c 0.6864 0.0468 0.2591

, , C = c

A
D        a      b      c
a 0.4286 0.3412 0.1333
b 0.2024 0.3882 0.4444
c 0.3690 0.2706 0.4222

Parameters of node E (multinomial distribution)

Conditional probability table:

, , F = a

B
E        a      b      c
a 0.8052 0.2059 0.1194
b 0.0974 0.1797 0.1145
c 0.0974 0.6144 0.7661

, , F = b

B
E        a      b      c
a 0.4005 0.3168 0.2376
b 0.4903 0.3664 0.5067
c 0.1092 0.3168 0.2557

Parameters of node F (multinomial distribution)

Conditional probability table:
a     b
0.502 0.498
```

The conditional probability table of each variable can be accessed with the usual `\$` operator:

```> fitted\$D
```
```
Parameters of node D (multinomial distribution)

Conditional probability table:

, , C = a

A
D        a      b      c
a 0.8008 0.0925 0.1053
b 0.0902 0.8021 0.1117
c 0.1089 0.1054 0.7830

, , C = b

A
D        a      b      c
a 0.1808 0.8830 0.2470
b 0.1328 0.0702 0.4939
c 0.6864 0.0468 0.2591

, , C = c

A
D        a      b      c
a 0.4286 0.3412 0.1333
b 0.2024 0.3882 0.4444
c 0.3690 0.2706 0.4222
```

and it can be `print()`ed arranging the dimensions in various ways with the `perm` argument.

```> print(fitted\$D, perm = c("D", "C", "A"))
```
```
Parameters of node D (multinomial distribution)

Conditional probability table:

, , A = a

C
D        a      b      c
a 0.8008 0.1808 0.4286
b 0.0902 0.1328 0.2024
c 0.1089 0.6864 0.3690

, , A = b

C
D        a      b      c
a 0.0925 0.8830 0.3412
b 0.8021 0.0702 0.3882
c 0.1054 0.0468 0.2706

, , A = c

C
D        a      b      c
a 0.1053 0.2470 0.1333
b 0.1117 0.4939 0.4444
c 0.7830 0.2591 0.4222
```

It can also be plotted with the `bn.fit.barchart()` and `bn.fit.dotplot()` functions (manual page).

```> bn.fit.barchart(fitted\$D)
```
```Loading required namespace: lattice
``` ```> bn.fit.dotplot(fitted\$D)
``` Note that conditional probabilities may be estimated as `NA` if there are parent configurations that are not observed in the data.

```> fitted = bn.fit(dag, learning.test[learning.test\$A != "a", ], method = "mle")
```
```## Warning in check.data(data, allow.missing = TRUE): variable A in the data has
## levels that are not observed in the data.
```
```> fitted\$B
```
```
Parameters of node B (multinomial distribution)

Conditional probability table:
a     b     c
0.280 0.158 0.562
```

Such probabilities will be replaced with a uniform distribution if we specify `replace.unidentifiable = TRUE`.

```> fitted = bn.fit(dag, learning.test[learning.test\$A != "a", ], method = "mle", replace.unidentifiable = TRUE)
```
```## Warning in check.data(data, allow.missing = TRUE): variable A in the data has
## levels that are not observed in the data.
```
```> fitted\$B
```
```
Parameters of node B (multinomial distribution)

Conditional probability table:
a     b     c
0.280 0.158 0.562
```

#### Continuous data

Fitting the parameters of a Gaussian Bayesian network (that is, the regression coefficients for each variable against its parents) is done in the same way. Again, by default the parameters are estimated via maximum likelihood which corresponds to `method = "mle-g"`.

```> data(gaussian.test)
> pdag = iamb(gaussian.test)
> undirected.arcs(pdag)
```
```     from to
[1,] "B"  "D"
[2,] "D"  "B"
```
```> dag = set.arc(pdag, "D", "B")
> fitted = bn.fit(dag, gaussian.test, method = "mle-g")
> fitted
```
```
Bayesian network parameters

Parameters of node A (Gaussian distribution)

Conditional density: A
Coefficients:
(Intercept)
1.01
Standard deviation of the residuals: 1

Parameters of node B (Gaussian distribution)

Conditional density: B | D
Coefficients:
(Intercept)            D
-3.970        0.664
Standard deviation of the residuals: 0.219

Parameters of node C (Gaussian distribution)

Conditional density: C | A + B
Coefficients:
(Intercept)            A            B
2            2            2
Standard deviation of the residuals: 0.509

Parameters of node D (Gaussian distribution)

Conditional density: D
Coefficients:
(Intercept)
9.05
Standard deviation of the residuals: 4.56

Parameters of node E (Gaussian distribution)

Conditional density: E
Coefficients:
(Intercept)
3.49
Standard deviation of the residuals: 1.99

Parameters of node F (Gaussian distribution)

Conditional density: F | A + D + E + G
Coefficients:
(Intercept)            A            D            E            G
-0.00605      1.99485      1.00564      1.00258      1.49437
Standard deviation of the residuals: 0.996

Parameters of node G (Gaussian distribution)

Conditional density: G
Coefficients:
(Intercept)
5.03
Standard deviation of the residuals: 1.98
```

The functions `coefficients()`, `fitted()` and `residuals()` (manual page) allow an easy extraction of the quantities of interest from both a single variable and the whole network.

```> coefficients(fitted\$F)
```
```(Intercept)           A           D           E           G
-0.00605     1.99485     1.00564     1.00258     1.49437
```
```> str(residuals(fitted\$F))
```
``` num [1:5000] -0.861 1.271 -0.262 -0.479 -0.782 ...
```
```> str(fitted(fitted\$F))
```
``` num [1:5000] 25.6 35.5 22.3 23.8 25.3 ...
```
```> str(fitted(fitted))
```
```List of 7
\$ A: num [1:5000] 1.01 1.01 1.01 1.01 1.01 ...
\$ B: num [1:5000] 1.78 11.54 3.37 3.96 4.35 ...
\$ C: num [1:5000] 8.09 24.16 11.76 11.38 12 ...
\$ D: num [1:5000] 9.05 9.05 9.05 9.05 9.05 ...
\$ E: num [1:5000] 3.49 3.49 3.49 3.49 3.49 ...
\$ F: num [1:5000] 25.6 35.5 22.3 23.8 25.3 ...
\$ G: num [1:5000] 5.03 5.03 5.03 5.03 5.03 ...
```

As before, there are some functions to plot these quantities: `bn.fit.qqplot()`, `bn.fit.xyplot()` and `bn.fit.histogram()` (see their manual page). In addition to single-node plots such as those shown above, in the case of Gaussian Bayesian networks we can plotting all the nodes in a single plot.

```> bn.fit.qqplot(fitted)
``` ```> bn.fit.xyplot(fitted)
``` ```> bn.fit.histogram(fitted)
``` Note that if two parents are collinear, one will have `NA` as a regression coefficient for compatibility with `lm()`. If we specify `replace.unidentifiable = TRUE`, those coefficients will be replaced with zeroes.

```> gaussian.test\$D = gaussian.test\$E
> fitted = bn.fit(dag, gaussian.test, method = "mle-g")
> fitted\$F
```
```
Parameters of node F (Gaussian distribution)

Conditional density: F | A + D + E + G
Coefficients:
(Intercept)            A            D            E            G
9.20         1.88         1.00           NA         1.50
Standard deviation of the residuals: 4.69
```
```> fitted = bn.fit(dag, gaussian.test, method = "mle-g", replace.unidentifiable = TRUE)
> fitted\$F
```
```
Parameters of node F (Gaussian distribution)

Conditional density: F | A + D + E + G
Coefficients:
(Intercept)            A            D            E            G
9.20         1.88         1.00         0.00         1.50
Standard deviation of the residuals: 4.69
```

#### Hybrid data (mixed discrete and continuous)

The parameters of a conditional linear Gaussian network take the form of:

• conditional probability tables for discrete nodes, which can only have other discrete nodes as parents;
• collections of linear regressions for Gaussian nodes.

In the latter case, there is one regression for each configuration of the discrete parents of the Gaussian nodes; and each regression has all the Gaussian parents of the node as explanatory variables. For consistency, the default is again to use maximum likelihood estimates which corresponds to `method = "mle-cg"`.

```> data(clgaussian.test)
> dag = hc(clgaussian.test)
> fitted = bn.fit(dag, clgaussian.test, method = "mle-cg")
> fitted
```
```
Bayesian network parameters

Parameters of node A (multinomial distribution)

Conditional probability table:
a      b
0.0948 0.9052

Parameters of node B (multinomial distribution)

Conditional probability table:
a     b     c
0.410 0.188 0.402

Parameters of node C (multinomial distribution)

Conditional probability table:
a     b     c     d
0.249 0.251 0.398 0.102

Parameters of node D (conditional Gaussian distribution)

Conditional density: D | A + H
Coefficients:
0       1
(Intercept)   5.292  10.041
H             0.882   0.983
Standard deviation of the residuals:
0      1
0.510  0.307
Discrete parents' configurations:
A
0  a
1  b

Parameters of node E (conditional Gaussian distribution)

Conditional density: E | B + D
Coefficients:
0      1      2
(Intercept)  0.995  4.344  7.919
D            2.352  1.151  0.674
Standard deviation of the residuals:
0      1      2
0.508  0.992  1.519
Discrete parents' configurations:
B
0  a
1  b
2  c

Parameters of node F (multinomial distribution)

Conditional probability table:

, , C = a

B
F        a      b      c
a 0.0788 0.1498 0.5116
b 0.9212 0.8502 0.4884

, , C = b

B
F        a      b      c
a 0.4553 0.2629 0.4773
b 0.5447 0.7371 0.5227

, , C = c

B
F        a      b      c
a 0.6791 0.4230 0.7494
b 0.3209 0.5770 0.2506

, , C = d

B
F        a      b      c
a 0.8571 0.4545 0.7368
b 0.1429 0.5455 0.2632

Parameters of node G (conditional Gaussian distribution)

Conditional density: G | A + D + E + F
Coefficients:
0     1     2     3
(Intercept)  4.95  2.38  3.49  2.14
D            2.25  4.07  2.99  5.81
E            1.00  1.00  1.00  1.00
Standard deviation of the residuals:
0       1       2       3
0.0546  0.1495  0.2627  0.3518
Discrete parents' configurations:
A  F
0  a  a
1  b  a
2  a  b
3  b  b

Parameters of node H (Gaussian distribution)

Conditional density: H
Coefficients:
(Intercept)
2.34
Standard deviation of the residuals: 0.121
```

The functions `coefficients()`, `fitted()` and `residuals()` (manual page) work as before. Plotting functions can be applied to single nodes to produce plots analogous to those above. In the case of Gaussian nodes, one panel is produced for every configuration of the discrete parents; the labels match those in the output of `print()`.

```> bn.fit.qqplot(fitted\$G)
``` Note that both conditional probabilities (in discrete nodes) and regression coefficients (in Gaussian and conditional Gaussian nodes) may be estimates as `NA` for the same reasons discussed above. Using `replace.unidentifiable = TRUE` replaces all `NA`s with zeroes.

### Fitting the parameters (Bayesian Posterior estimates)

#### Discrete data

As an alternative to classic maximum likelihood approaches, we can also fit the parameters of the network in a Bayesian way using the expected value of their posterior distribution. The only difference from the workflow illustrated above is that `method = "bayes"` must be specified in `bn.fit()`.

```> pdag = iamb(learning.test)
> dag = set.arc(pdag, from = "A", to = "B")
> fitted = bn.fit(dag, learning.test, method = "bayes")
> fitted
```
```
Bayesian network parameters

Parameters of node A (multinomial distribution)

Conditional probability table:
a     b     c
0.334 0.334 0.332

Parameters of node B (multinomial distribution)

Conditional probability table:

A
B        a      b      c
a 0.8560 0.4449 0.1150
b 0.0252 0.2210 0.0945
c 0.1187 0.3341 0.7905

Parameters of node C (multinomial distribution)

Conditional probability table:
a      b      c
0.7433 0.2048 0.0519

Parameters of node D (multinomial distribution)

Conditional probability table:

, , C = a

A
D        a      b      c
a 0.8008 0.0925 0.1053
b 0.0903 0.8020 0.1118
c 0.1090 0.1054 0.7829

, , C = b

A
D        a      b      c
a 0.1808 0.8829 0.2470
b 0.1328 0.0703 0.4938
c 0.6863 0.0469 0.2592

, , C = c

A
D        a      b      c
a 0.4284 0.3412 0.1336
b 0.2026 0.3882 0.4443
c 0.3690 0.2707 0.4221

Parameters of node E (multinomial distribution)

Conditional probability table:

, , F = a

B
E        a      b      c
a 0.8052 0.2060 0.1194
b 0.0974 0.1798 0.1145
c 0.0974 0.6142 0.7661

, , F = b

B
E        a      b      c
a 0.4005 0.3168 0.2376
b 0.4902 0.3664 0.5067
c 0.1093 0.3168 0.2557

Parameters of node F (multinomial distribution)

Conditional probability table:
a     b
0.502 0.498
```

The imaginary or equivalent sample size of the prior distribution can be specified using the `iss` parameter (documented here); it specifies the weight of the prior compared to the sample and thus controls the smoothness of the posterior distribution. The weight is divided equally among the cells of each conditional probability table to obtain the same prior used by the Bayesian Dirichlet equivalent (BDe) score.

An alternative Bayesian estimator for related data sets may also be used if the data contain a grouping variable by setting `method = "hdir"`. Compared to the posterior arising from a flat prior in `method = "bayes"`, `method = "hdir"` pools information across the groups of observations identified by the levels of the grouping variable specified by the argument `group`. We can also specify the `iss` argument, which has the same meaning as before, as needed.

```> dag = tree.bayes(learning.test, training = "A")
> fitted = bn.fit(dag, learning.test, method = "hdir", group = "A")
```
```   A
B        a      b      c
a 0.8560 0.4449 0.1150
b 0.0252 0.2210 0.0945
c 0.1187 0.3341 0.7905
, , A = a

E
C        a      b      c
a 0.7417 0.7151 0.7639
b 0.2130 0.2384 0.1611
c 0.0453 0.0465 0.0750

, , A = b

E
C        a      b      c
a 0.7531 0.7281 0.7494
b 0.1997 0.2059 0.2100
c 0.0472 0.0661 0.0405

, , A = c

E
C        a      b      c
a 0.7618 0.7639 0.7320
b 0.1980 0.1739 0.2116
c 0.0402 0.0622 0.0564

, , A = a

C
D        a      b      c
a 0.8008 0.1808 0.4284
b 0.0903 0.1328 0.2025
c 0.1090 0.6863 0.3690

, , A = b

C
D        a      b      c
a 0.0925 0.8829 0.3412
b 0.8020 0.0703 0.3882
c 0.1054 0.0469 0.2707

, , A = c

C
D        a      b      c
a 0.1053 0.2470 0.1336
b 0.1118 0.4938 0.4443
c 0.7829 0.2592 0.4221

, , A = a

B
E       a     b     c
a 0.603 0.215 0.182
b 0.296 0.286 0.303
c 0.101 0.500 0.515

, , A = b

B
E       a     b     c
a 0.602 0.255 0.170
b 0.295 0.287 0.341
c 0.104 0.458 0.489

, , A = c

B
E       a     b     c
a 0.607 0.274 0.183
b 0.272 0.210 0.303
c 0.121 0.516 0.514

, , A = a

E
F       a     b     c
a 0.637 0.160 0.607
b 0.363 0.840 0.393

, , A = b

E
F       a     b     c
a 0.604 0.206 0.688
b 0.396 0.794 0.312

, , A = c

E
F       a     b     c
a 0.439 0.211 0.727
b 0.561 0.789 0.273
```
```> fitted
```
```
Parameters of node A (multinomial distribution)

Conditional probability table:
a     b     c
0.334 0.334 0.332

Parameters of node B (multinomial distribution)

Conditional probability table:

A
B        a      b      c
a 0.8560 0.4449 0.1150
b 0.0252 0.2210 0.0945
c 0.1187 0.3341 0.7905

Parameters of node C (multinomial distribution)

Conditional probability table:

, , A = a

E
C        a      b      c
a 0.7417 0.7151 0.7639
b 0.2130 0.2384 0.1611
c 0.0453 0.0465 0.0750

, , A = b

E
C        a      b      c
a 0.7531 0.7281 0.7494
b 0.1997 0.2059 0.2100
c 0.0472 0.0661 0.0405

, , A = c

E
C        a      b      c
a 0.7618 0.7639 0.7320
b 0.1980 0.1739 0.2116
c 0.0402 0.0622 0.0564

Parameters of node D (multinomial distribution)

Conditional probability table:

, , A = a

C
D        a      b      c
a 0.8008 0.1808 0.4284
b 0.0903 0.1328 0.2025
c 0.1090 0.6863 0.3690

, , A = b

C
D        a      b      c
a 0.0925 0.8829 0.3412
b 0.8020 0.0703 0.3882
c 0.1054 0.0469 0.2707

, , A = c

C
D        a      b      c
a 0.1053 0.2470 0.1336
b 0.1118 0.4938 0.4443
c 0.7829 0.2592 0.4221

Parameters of node E (multinomial distribution)

Conditional probability table:

, , A = a

B
E       a     b     c
a 0.603 0.215 0.182
b 0.296 0.286 0.303
c 0.101 0.500 0.515

, , A = b

B
E       a     b     c
a 0.602 0.255 0.170
b 0.295 0.287 0.341
c 0.104 0.458 0.489

, , A = c

B
E       a     b     c
a 0.607 0.274 0.183
b 0.272 0.210 0.303
c 0.121 0.516 0.514

Parameters of node F (multinomial distribution)

Conditional probability table:

, , A = a

E
F       a     b     c
a 0.637 0.160 0.607
b 0.363 0.840 0.393

, , A = b

E
F       a     b     c
a 0.604 0.206 0.688
b 0.396 0.794 0.312

, , A = c

E
F       a     b     c
a 0.439 0.211 0.727
b 0.561 0.789 0.273
```

### Fitting the parameters (Expectation-Maximization estimates)

When the `data` contain missing values, `bn.fit()` defaults to using the Expectation-Maximization algorithm to estimate parameters, as described here.

Last updated on `Tue Sep 12 10:10:49 2023` with bnlearn `4.9` and `R version 4.2.2 (2022-10-31)`.