## Creating custom fitted Bayesian networks

In general, there are three ways of creating a `bn.fit` object representing a Bayesian network:

1. a data-driven approach, learning it from a data set using `bn.fit()` and a network structure (in a `bn` object) as illustrated here;
2. an expert-driven approach, in which both the network structure and the parameters are specified by the user;
3. a hybrid approach combining the above.

How the parameters are specified depends on the type of data the Bayesian network encodes (discrete, continuous, a mixture of the two). The relevant functions are:

• `custom.fit()` (manual page), which takes a `bn` object encoding the network structure and a list with the parameters of the local distributions of the nodes;
• the assignment operator for `bn.fit` objects, which replaces the parameters of a single local distribution.

### Creating custom fitted Bayesian networks using expert knowledge

#### Discrete networks

For discrete data, the parameters of the local distribution are specified as conditional probability tables. Each column of such a table represents the probability distribution of the node conditional on a particular configuration of levels of the parents.

```> library(bnlearn)
> cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH")))
> cptA
```
```     LOW HIGH
[1,] 0.4  0.6
```
```> cptB = matrix(c(0.8, 0.2), ncol = 2, dimnames = list(NULL, c("GOOD", "BAD")))
> cptB
```
```     GOOD BAD
[1,]  0.8 0.2
```
```> cptC = c(0.5, 0.5, 0.4, 0.6, 0.3, 0.7, 0.2, 0.8)
> dim(cptC) = c(2, 2, 2)
> dimnames(cptC) = list("C" = c("TRUE", "FALSE"), "A" =  c("LOW", "HIGH"),
> cptC
```
```, , B = GOOD

A
C       LOW HIGH
TRUE  0.5  0.4
FALSE 0.5  0.6

A
C       LOW HIGH
TRUE  0.3  0.2
FALSE 0.7  0.8
```

The tables are passed to `custom.fit()` as a list whose names are the labels of the nodes the conditional probability tables will be assigned to.

```> net = model2network("[A][B][C|A:B]")
> dfit = custom.fit(net, dist = list(A = cptA, B = cptB, C = cptC))
> dfit
```
```
Bayesian network parameters

Parameters of node A (multinomial distribution)

Conditional probability table:

LOW HIGH
0.4  0.6

Parameters of node B (multinomial distribution)

Conditional probability table:

0.8  0.2

Parameters of node C (multinomial distribution)

Conditional probability table:

, , B = GOOD

A
C       LOW HIGH
TRUE  0.5  0.4
FALSE 0.5  0.6

A
C       LOW HIGH
TRUE  0.3  0.2
FALSE 0.7  0.8
```

If one or more nodes in the network are associated with ordinal variables, their labels should be passed to `custom.fit()` with the `ordinal` argument; all other arguments are the same as above.

```> # for ordinal nodes it is nearly the same.
> dfit = custom.fit(net, dist = list(A = cptA, B = cptB, C = cptC),
+          ordinal = c("A", "B"))
> dfit
```
```
Bayesian network parameters

Parameters of node A (ordinal distribution)

Conditional probability table:

LOW HIGH
0.4  0.6

Parameters of node B (ordinal distribution)

Conditional probability table:

0.8  0.2

Parameters of node C (multinomial distribution)

Conditional probability table:

, , B = GOOD

A
C       LOW HIGH
TRUE  0.5  0.4
FALSE 0.5  0.6

A
C       LOW HIGH
TRUE  0.3  0.2
FALSE 0.7  0.8
```

Note that `custom.fit()` performs several checks on the local distributions:

1. The probabilities should sum up to one in each (conditional) distribution.
```> cpt.bad = matrix(c(0.8, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH")))
> custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
```
```## Error: the probability distribution of node A does not sum to one.
```
2. The dimension names must be consistent between different local distributions (the names of the levels of the parents are considered authoritative, so the error messages mention the child node).
```> cpt.bad = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("BOGUS", "WRONG")))
> custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
```
```## Error: the levels of the parent A of node C do not match.
```
3. The dimensions must also be consistent across different local distributions.
```> cpt.bad = matrix(c(0.4, 0.5, 0.1), ncol = 3, dimnames = list(NULL, c("LOW", "AVERAGE", "HIGH")))
> custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
```
```## Error: the levels of the parent A of node C do not match.
```
4. The probabilities are checked for missing values.
```> cpt.bad = matrix(c(0.4, NaN), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH")))
> custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
```
```## Error: the probabilities provided for node A do not form a valid conditional probability distribution.
```

#### Continuous networks

Continuous nodes are modelled as linear regressions in Gaussian Bayesian networks; as such, the relevant parameters of each local distribution are the regression coefficients (one for each of the parents) and the standard deviation of the residuals. Both should be stored as the elements of a list, respectively names `coef` and `sd`.

```> distA = list(coef = c("(Intercept)" = 2), sd = 1)
> distB = list(coef = c("(Intercept)" = 1), sd = 1.5)
> distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), sd = 0.4)
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
> cfit
```
```
Bayesian network parameters

Parameters of node A (Gaussian distribution)

Conditional density: A
Coefficients:
(Intercept)
2
Standard deviation of the residuals: 1

Parameters of node B (Gaussian distribution)

Conditional density: B
Coefficients:
(Intercept)
1
Standard deviation of the residuals: 1.5

Parameters of node C (Gaussian distribution)

Conditional density: C | A + B
Coefficients:
(Intercept)            A            B
0.50         0.75         1.32
Standard deviation of the residuals: 0.4
```

In addition, we can include fitted values and residuals for each node.

```> distA = list(coef = c("(Intercept)" = 2), fitted = 1:10, resid = rnorm(10))
> distB = list(coef = c("(Intercept)" = 1), fitted = 3:12, resid = rnorm(10))
> distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32),
+           fitted = 2:11, resid = rnorm(10))
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```

However, either all the nodes have both fitted values and residuals or none of them. If residuals are provided, `sd` is computed automatically by `custom.fit()`.

Another option is to use `lm()` to produce the parameter for each node, which is convenient when we need to use advanced regression options such as weights or we have missing values in the data.

```> distA = lm(A ~ 1, data = gaussian.test)
> distB = lm(B ~ 1, data = gaussian.test)
> distC = lm(C ~ A + B, data = gaussian.test)
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```

We can also use `penalized()` to fit ridge and elastic net regressions as local distributions instead of using ordinary least squares estimates. (Note that using LASSO does not make sense in this context, all parents of a node are assumed to have non-zero coefficients or they would not be parents at all.)

```> library(penalized)
```
```Loading required package: survival
```
```Welcome to penalized. For extended examples, see vignette("penalized").
```
```> distC = penalized(C ~ A + B, data = gaussian.test,
+           lambda1 = 0, lambda2 = 0.5, trace = FALSE)
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```

Again note that `custom.fit()` performs several checks on the local distributions:

1. We must have exactly one regression coeffient for each parent of a node, with labels to match.
```> distA = list(coef = c("(Intercept)" = 2), sd = 1)
> distB = list(coef = c("(Intercept)" = 1, "A" = 2), sd = 1.5)
> distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), sd = 0.4)
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```
```## Error: wrong number of coefficients for node B.
```
```> distA = list(coef = c("(Intercept)" = 2), sd = 1)
> distB = list(coef = c("(Intercept)" = 1), sd = 1.5)
> distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "X" = 1.32), sd = 0.4)
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```
```## Error: wrong regression coefficients for node C (B).
```
2. If both the residuals and their standard deviation are specified, they must match up to (a very lax) numerical precision.
```> distA = list(coef = c("(Intercept)" = 2), fitted = 1:10,
+           resid = rnorm(10), sd = 20)
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```
```## Error: the reported standard deviation of the residuals of node A does not match the observed one.
```
3. Missing values are not allowed anywhere, even in the fitted values or in the residuals; regression coefficients are checked to be finite and standard errors are also checked to be non-negative.
```> distA = list(coef = c("(Intercept)" = 2), fitted = c(NA, 2:10),
+           resid = rnorm(10))
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```
```## Error: fitted must be a vector of numeric values, the fitted values for node A given its parents.
```
```> distA = list(coef = c("(Intercept)" = 2), sd = -1)
> cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
```
```## Error: sd must be a non-negative scalar, the standard deviation of the residuals of node A.
```

#### Hybrid networks (mixed discrete and continuous)

In the case of hybrid data and conditional Gaussian Bayesian networks, the nature of each node and of its parents determine the form of the parameters of its local distribution:

1. if the node is discrete, it can only have discrete parents and therefore the parameters of its local distribution come in the form of a conditional probability table;
2. if the node is continuous and only has continuous parents, the parameters of its local distribution are the regression coefficients and the standard deviation of the residuals;
3. if the node is continuous and only has at least one discrete parent, the regression coefficients for the continuous parents are stored in a matrix. Each row corresponds to a regression coefficient, and each column to one configuration of the levels of the discrete parents. The standard deviations of the residuals for the configurations of the levels of the discrete parents are stored in a vector.

In other words:

1. discrete nodes are specified as in discrete Bayesian networks;
2. continuous nodes with continuous parents are specifies as in Gaussian Bayesian networks;
3. continuous nodes with discrete and (possibly) continuous parents get one set of regression coefficients and one standard deviation for each configuration of the discrete parents.

As a result specifying the parameters of each local distribution works as before in the first two cases.

```> cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH")))
> distB = list(coef = c("(Intercept)" = 1), sd = 1.5)
```

In the third case, the parameters are structured as follows.

```> distC = list(coef = matrix(c(1.2, 2.3, 3.4, 4.5), ncol = 2,
+                dimnames = list(c("(Intercept)", "B"), NULL)),
+           sd = c(0.3, 0.6))
> distC
```
```\$coef
[,1] [,2]
(Intercept)  1.2  3.4
B            2.3  4.5

\$sd
 0.3 0.6
```
```> cgfit = custom.fit(net, dist = list(A = cptA, B = distB, C = distC))
> cgfit
```
```
Bayesian network parameters

Parameters of node A (multinomial distribution)

Conditional probability table:

LOW HIGH
0.4  0.6

Parameters of node B (Gaussian distribution)

Conditional density: B
Coefficients:
(Intercept)
1
Standard deviation of the residuals: 1.5

Parameters of node C (conditional Gaussian distribution)

Conditional density: C | A + B
Coefficients:
0    1
(Intercept)  1.2  3.4
B            2.3  4.5
Standard deviation of the residuals:
0    1
0.3  0.6
Discrete parents' configurations:
A
0   LOW
1  HIGH
```

Fitted values and residuals can be specified for both Gaussian and conditional Gaussian nodes. In the former case, all parents of the node are continuous and therefore the parameters have the same form as for nodes in a Gaussian Bayesian network. In the latter, we must also specify which configuration of the discrete parents is associated with each fitted value and residual.

```> fake = scale(rnorm(10))
> net = model2network("[A][B][C|A:B]")
> cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH")))
> distB = list(coef = c("(Intercept)" = 1), resid = fake, fitted = fake)
> distC = list(coef = matrix(c(1.2, 2.3, 3.4, 4.5), ncol = 2,
+                dimnames = list(c("(Intercept)", "B"), NULL)),
+           resid = fake, fitted = fake,
+           configs = factor(c(rep(0, 5), rep(1, 5))))
> cgfit2 = custom.fit(net, dist = list(A = cptA, B = distB, C = distC))
```

### Creating custom fitted Bayesian networks using both data and expert knowledge

Specifying all the local distributions can be problematic in a large network. In most cases it can be more convenient to create a `bn.fit` object from (possibly dummy) data using `bn.fit()`, and then to modify just the local distributions of the nodes of interest.

• For discrete Bayesian networks (or discrete nodes in conditional Gaussian networks) we can extract the conditional probability table stored in the `bn.fit` object with `coef()`, modify it, and re-save it.
```> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]")
> fitted = bn.fit(dag, learning.test)
> fitted\$C
```
```
Parameters of node C (multinomial distribution)

Conditional probability table:
a      b      c
0.7434 0.2048 0.0518
```
```> cpt = coef(fitted\$C)
> cpt[1:3] = c(0.50, 0.25, 0.25)
> fitted\$C = cpt
> fitted\$C
```
```
Parameters of node C (multinomial distribution)

Conditional probability table:
a    b    c
0.50 0.25 0.25
```
• For Gaussian networks (and Gaussian nodes is conditional Gaussian networks) we can do the same or just assign a new set of regression coefficients.
```> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]")
> fitted = bn.fit(dag, gaussian.test)
> fitted\$F
```
```
Parameters of node F (Gaussian distribution)

Conditional density: F | A + D + E + G
Coefficients:
(Intercept)             A             D             E             G
-0.006047321   1.994853041   1.005636909   1.002577002   1.494373265
Standard deviation of the residuals: 0.9961475
```
```> fitted\$F = list(coef = c(0, 2, 1, 0.5, 0.5), sd = 0.42)
> fitted\$F
```
```
Parameters of node F (Gaussian distribution)

Conditional density: F | A + D + E + G
Coefficients:
(Intercept)            A            D            E            G
0.0          2.0          1.0          0.5          0.5
Standard deviation of the residuals: 0.42
```
We can also use `lm()` or `penalized()` to re-estimate the regression coefficients from data in a different way.
```> fitted\$F = penalized(F ~ A + D + E + G, data = gaussian.test, lambda2 = 5,
+              trace = FALSE)
> fitted\$F
```
```
Parameters of node F (Gaussian distribution)

Conditional density: F | A + D + E + G
Coefficients:
(Intercept)              A              D              E              G
-0.0007259323   1.9928727472   1.0055773818   1.0023303829   1.4939902546
Standard deviation of the residuals: 0.9961499
```
• For conditional Gaussian nodes,
```> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]")
> fitted = bn.fit(dag, clgaussian.test)
> fitted\$G
```
```
Parameters of node G (conditional Gaussian distribution)

Conditional density: G | A + D + E + F
Coefficients:
0         1         2         3
(Intercept)  4.949135  2.384710  3.485634  2.140170
D            2.252665  4.066430  2.989924  5.806728
E            1.002022  1.000199  1.003836  1.001002
Standard deviation of the residuals:
0           1           2           3
0.05456809  0.14946301  0.26272612  0.35182957
Discrete parents' configurations:
A  F
0  a  a
1  b  a
2  a  b
3  b  b
```
```> beta = coef(fitted\$G)
> beta[, "3"] = c(2, 1, 4)
> fitted\$G = list(coef = beta, sd = c(0.5, 0.5, 0.5, 0.5))
> fitted\$G
```
```
Parameters of node G (conditional Gaussian distribution)

Conditional density: G | A + D + E + F
Coefficients:
0         1         2         3
(Intercept)  4.949135  2.384710  3.485634  2.000000
D            2.252665  4.066430  2.989924  1.000000
E            1.002022  1.000199  1.003836  4.000000
Standard deviation of the residuals:
0    1    2    3
0.5  0.5  0.5  0.5
Discrete parents' configurations:
A  F
0  a  a
1  b  a
2  a  b
3  b  b
```
Last updated on `Tue Nov 8 16:01:59 2022` with bnlearn `4.9-20221107` and `R version 4.2.2 (2022-10-31)`.