Creating custom fitted Bayesian networks
In general, there are three ways of creating a bn.fit
object representing a
Bayesian network:
- a data-driven approach, learning it from a data set using
bn.fit()
and a network structure (in abn
object) as illustrated here; - an expert-driven approach, in which both the network structure and the parameters are specified by the user;
- a hybrid approach combining the above.
How the parameters are specified depends on the type of data the Bayesian network encodes (discrete, continuous, a mixture of the two). The relevant functions are:
custom.fit()
(manual page), which takes abn
object encoding the network structure and a list with the parameters of the local distributions of the nodes;- the assignment operator for
bn.fit
objects, which replaces the parameters of a single local distribution.
Creating custom fitted Bayesian networks using expert knowledge
Discrete networks
For discrete data, the parameters of the local distribution are specified as conditional probability tables. Each column of such a table represents the probability distribution of the node conditional on a particular configuration of levels of the parents.
> library(bnlearn) > cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > cptA
LOW HIGH [1,] 0.4 0.6
> cptB = matrix(c(0.8, 0.2), ncol = 2, dimnames = list(NULL, c("GOOD", "BAD"))) > cptB
GOOD BAD [1,] 0.8 0.2
> cptC = c(0.5, 0.5, 0.4, 0.6, 0.3, 0.7, 0.2, 0.8) > dim(cptC) = c(2, 2, 2) > dimnames(cptC) = list("C" = c("TRUE", "FALSE"), "A" = c("LOW", "HIGH"), + "B" = c("GOOD", "BAD")) > cptC
, , B = GOOD A C LOW HIGH TRUE 0.5 0.4 FALSE 0.5 0.6 , , B = BAD A C LOW HIGH TRUE 0.3 0.2 FALSE 0.7 0.8
The tables are passed to custom.fit()
as a list whose names are the labels of the
nodes the conditional probability tables will be assigned to.
> net = model2network("[A][B][C|A:B]") > dfit = custom.fit(net, dist = list(A = cptA, B = cptB, C = cptC)) > dfit
Bayesian network parameters Parameters of node A (multinomial distribution) Conditional probability table: LOW HIGH 0.4 0.6 Parameters of node B (multinomial distribution) Conditional probability table: GOOD BAD 0.8 0.2 Parameters of node C (multinomial distribution) Conditional probability table: , , B = GOOD A C LOW HIGH TRUE 0.5 0.4 FALSE 0.5 0.6 , , B = BAD A C LOW HIGH TRUE 0.3 0.2 FALSE 0.7 0.8
If one or more nodes in the network are associated with ordinal variables, their labels should
be passed to custom.fit()
with the ordinal
argument; all other arguments
are the same as above.
> # for ordinal nodes it is nearly the same. > dfit = custom.fit(net, dist = list(A = cptA, B = cptB, C = cptC), + ordinal = c("A", "B")) > dfit
Bayesian network parameters Parameters of node A (ordinal distribution) Conditional probability table: LOW HIGH 0.4 0.6 Parameters of node B (ordinal distribution) Conditional probability table: GOOD BAD 0.8 0.2 Parameters of node C (multinomial distribution) Conditional probability table: , , B = GOOD A C LOW HIGH TRUE 0.5 0.4 FALSE 0.5 0.6 , , B = BAD A C LOW HIGH TRUE 0.3 0.2 FALSE 0.7 0.8
Note that custom.fit()
performs several checks on the local distributions:
- The probabilities should sum up to one in each (conditional) distribution.
> cpt.bad = matrix(c(0.8, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
## Error: the probability distribution of node A does not sum to one.
- The dimension names must be consistent between different local distributions (the names of
the levels of the parents are considered authoritative, so the error messages mention the child
node).
> cpt.bad = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("BOGUS", "WRONG"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
## Error: the levels of the parent A of node C do not match.
- The dimensions must also be consistent across different local distributions.
> cpt.bad = matrix(c(0.4, 0.5, 0.1), ncol = 3, dimnames = list(NULL, c("LOW", "AVERAGE", "HIGH"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
## Error: the levels of the parent A of node C do not match.
- The probabilities are checked for missing values.
> cpt.bad = matrix(c(0.4, NaN), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > custom.fit(net, dist = list(A = cpt.bad, B = cptB, C = cptC))
## Error: the probabilities provided for node A do not form a valid conditional probability distribution.
Continuous networks
Continuous nodes are modelled as linear regressions in Gaussian Bayesian networks; as such, the
relevant parameters of each local distribution are the regression coefficients (one for each of the
parents) and the standard deviation of the residuals. Both should be stored as the elements of a
list, respectively names coef
and sd
.
> distA = list(coef = c("(Intercept)" = 2), sd = 1) > distB = list(coef = c("(Intercept)" = 1), sd = 1.5) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), sd = 0.4) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC)) > cfit
Bayesian network parameters Parameters of node A (Gaussian distribution) Conditional density: A Coefficients: (Intercept) 2 Standard deviation of the residuals: 1 Parameters of node B (Gaussian distribution) Conditional density: B Coefficients: (Intercept) 1 Standard deviation of the residuals: 1.5 Parameters of node C (Gaussian distribution) Conditional density: C | A + B Coefficients: (Intercept) A B 0.50 0.75 1.32 Standard deviation of the residuals: 0.4
In addition, we can include fitted values and residuals for each node.
> distA = list(coef = c("(Intercept)" = 2), fitted = 1:10, resid = rnorm(10)) > distB = list(coef = c("(Intercept)" = 1), fitted = 3:12, resid = rnorm(10)) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), + fitted = 2:11, resid = rnorm(10)) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
However, either all the nodes have both fitted values and residuals or none of them. If residuals
are provided, sd
is computed automatically by custom.fit()
.
Another option is to use lm()
to produce the parameter for each node, which is
convenient when we need to use advanced regression options such as weights or we have missing values
in the data.
> distA = lm(A ~ 1, data = gaussian.test) > distB = lm(B ~ 1, data = gaussian.test) > distC = lm(C ~ A + B, data = gaussian.test) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
We can also use penalized()
to fit ridge and elastic net regressions as local
distributions instead of using ordinary least squares estimates. (Note that using LASSO does not
make sense in this context, all parents of a node are assumed to have non-zero coefficients or they
would not be parents at all.)
> library(penalized)
> distC = penalized(C ~ A + B, data = gaussian.test, + lambda1 = 0, lambda2 = 0.5, trace = FALSE) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
Again note that custom.fit()
performs several checks on the local distributions:
- We must have exactly one regression coeffient for each parent of a node, with labels to match.
> distA = list(coef = c("(Intercept)" = 2), sd = 1) > distB = list(coef = c("(Intercept)" = 1, "A" = 2), sd = 1.5) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "B" = 1.32), sd = 0.4) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
## Error: wrong number of coefficients for node B.
> distA = list(coef = c("(Intercept)" = 2), sd = 1) > distB = list(coef = c("(Intercept)" = 1), sd = 1.5) > distC = list(coef = c("(Intercept)" = 0.5, "A" = 0.75, "X" = 1.32), sd = 0.4) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
## Error: wrong regression coefficients for node C (B).
- If both the residuals and their standard deviation are specified, they must match up to (a
very lax) numerical precision.
> distA = list(coef = c("(Intercept)" = 2), fitted = 1:10, + resid = rnorm(10), sd = 20) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
## Error: the reported standard deviation of the residuals of node A does not match the observed one.
- Missing values are not allowed anywhere, even in the fitted values or in the residuals;
regression coefficients are checked to be finite and standard errors are also checked to be
non-negative.
> distA = list(coef = c("(Intercept)" = 2), fitted = c(NA, 2:10), + resid = rnorm(10)) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
## Error: fitted must be a vector of numeric values, the fitted values for node A given its parents.
> distA = list(coef = c("(Intercept)" = 2), sd = -1) > cfit = custom.fit(net, dist = list(A = distA, B = distB, C = distC))
## Error: sd must be a non-negative scalar, the standard deviation of the residuals of node A.
Hybrid networks (mixed discrete and continuous)
In the case of hybrid data and conditional Gaussian Bayesian networks, the nature of each node and of its parents determine the form of the parameters of its local distribution:
- if the node is discrete, it can only have discrete parents and therefore the parameters of its local distribution come in the form of a conditional probability table;
- if the node is continuous and only has continuous parents, the parameters of its local distribution are the regression coefficients and the standard deviation of the residuals;
- if the node is continuous and only has at least one discrete parent, the regression coefficients for the continuous parents are stored in a matrix. Each row corresponds to a regression coefficient, and each column to one configuration of the levels of the discrete parents. The standard deviations of the residuals for the configurations of the levels of the discrete parents are stored in a vector.
In other words:
- discrete nodes are specified as in discrete Bayesian networks;
- continuous nodes with continuous parents are specifies as in Gaussian Bayesian networks;
- continuous nodes with discrete and (possibly) continuous parents get one set of regression coefficients and one standard deviation for each configuration of the discrete parents.
As a result specifying the parameters of each local distribution works as before in the first two cases.
> cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > distB = list(coef = c("(Intercept)" = 1), sd = 1.5)
In the third case, the parameters are structured as follows.
> distC = list(coef = matrix(c(1.2, 2.3, 3.4, 4.5), ncol = 2, + dimnames = list(c("(Intercept)", "B"), NULL)), + sd = c(0.3, 0.6)) > distC
$coef [,1] [,2] (Intercept) 1.2 3.4 B 2.3 4.5 $sd [1] 0.3 0.6
> cgfit = custom.fit(net, dist = list(A = cptA, B = distB, C = distC)) > cgfit
Bayesian network parameters Parameters of node A (multinomial distribution) Conditional probability table: LOW HIGH 0.4 0.6 Parameters of node B (Gaussian distribution) Conditional density: B Coefficients: (Intercept) 1 Standard deviation of the residuals: 1.5 Parameters of node C (conditional Gaussian distribution) Conditional density: C | A + B Coefficients: 0 1 (Intercept) 1.2 3.4 B 2.3 4.5 Standard deviation of the residuals: 0 1 0.3 0.6 Discrete parents' configurations: A 0 LOW 1 HIGH
Fitted values and residuals can be specified for both Gaussian and conditional Gaussian nodes. In the former case, all parents of the node are continuous and therefore the parameters have the same form as for nodes in a Gaussian Bayesian network. In the latter, we must also specify which configuration of the discrete parents is associated with each fitted value and residual.
> fake = scale(rnorm(10)) > net = model2network("[A][B][C|A:B]") > cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("LOW", "HIGH"))) > distB = list(coef = c("(Intercept)" = 1), resid = fake, fitted = fake) > distC = list(coef = matrix(c(1.2, 2.3, 3.4, 4.5), ncol = 2, + dimnames = list(c("(Intercept)", "B"), NULL)), + resid = fake, fitted = fake, + configs = factor(c(rep(0, 5), rep(1, 5)))) > cgfit2 = custom.fit(net, dist = list(A = cptA, B = distB, C = distC))
Creating custom fitted Bayesian networks using both data and expert knowledge
Specifying all the local distributions can be problematic in a large network. In most cases it
can be more convenient to create a bn.fit
object from (possibly dummy) data using
bn.fit()
, and then to modify just the local distributions of the nodes of interest.
- For discrete Bayesian networks (or discrete nodes in conditional Gaussian networks) we can
extract the conditional probability table stored in the
bn.fit
object withcoef()
, modify it, and re-save it.> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > fitted = bn.fit(dag, learning.test) > fitted$C
Parameters of node C (multinomial distribution) Conditional probability table: a b c 0.7434 0.2048 0.0518
> cpt = coef(fitted$C) > cpt[1:3] = c(0.50, 0.25, 0.25) > fitted$C = cpt > fitted$C
Parameters of node C (multinomial distribution) Conditional probability table: a b c 0.50 0.25 0.25
- For Gaussian networks (and Gaussian nodes is conditional Gaussian networks) we can do the
same or just assign a new set of regression coefficients.
We can also use
> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > fitted = bn.fit(dag, gaussian.test) > fitted$F
Parameters of node F (Gaussian distribution) Conditional density: F | A + D + E + G Coefficients: (Intercept) A D E G -0.006047321 1.994853041 1.005636909 1.002577002 1.494373265 Standard deviation of the residuals: 0.9961475
> fitted$F = list(coef = c(0, 2, 1, 0.5, 0.5), sd = 0.42) > fitted$F
Parameters of node F (Gaussian distribution) Conditional density: F | A + D + E + G Coefficients: (Intercept) A D E G 0.0 2.0 1.0 0.5 0.5 Standard deviation of the residuals: 0.42
lm()
orpenalized()
to re-estimate the regression coefficients from data in a different way.> fitted$F = penalized(F ~ A + D + E + G, data = gaussian.test, lambda2 = 5, + trace = FALSE) > fitted$F
Parameters of node F (Gaussian distribution) Conditional density: F | A + D + E + G Coefficients: (Intercept) A D E G -0.0007259323 1.9928727472 1.0055773818 1.0023303829 1.4939902546 Standard deviation of the residuals: 0.9961499
- For conditional Gaussian nodes,
> dag = model2network("[A][B][C][H][D|A:H][F|B:C][E|B:D][G|A:D:E:F]") > fitted = bn.fit(dag, clgaussian.test) > fitted$G
Parameters of node G (conditional Gaussian distribution) Conditional density: G | A + D + E + F Coefficients: 0 1 2 3 (Intercept) 4.949135 2.384710 3.485634 2.140170 D 2.252665 4.066430 2.989924 5.806728 E 1.002022 1.000199 1.003836 1.001002 Standard deviation of the residuals: 0 1 2 3 0.05456809 0.14946301 0.26272612 0.35182957 Discrete parents' configurations: A F 0 a a 1 b a 2 a b 3 b b
> beta = coef(fitted$G) > beta[, "3"] = c(2, 1, 4) > fitted$G = list(coef = beta, sd = c(0.5, 0.5, 0.5, 0.5)) > fitted$G
Parameters of node G (conditional Gaussian distribution) Conditional density: G | A + D + E + F Coefficients: 0 1 2 3 (Intercept) 4.949135 2.384710 3.485634 2.000000 D 2.252665 4.066430 2.989924 1.000000 E 1.002022 1.000199 1.003836 4.000000 Standard deviation of the residuals: 0 1 2 3 0.5 0.5 0.5 0.5 Discrete parents' configurations: A F 0 a a 1 b a 2 a b 3 b b
Mon Aug 5 02:39:53 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.