## Entropy and Kullback-Leibler divergence

Entropy and the Kullback-Leibler divergence are fundamental quantities in information theory and in machine learning,
and they play a central role in learning and evaluating probabilistic models. **bnlearn** implements both
for all of discrete, Gaussian and conditional Gaussian networks. The full mathematical description of how they are
computed is in Scutari (2024, Algorithms), available
here.

Consider three different Bayesian networks for `learning.test`

: the empty network, the generating model
for the data and the complete network.

> dag1 = empty.graph(names(learning.test)) > bn1 = bn.fit(dag1, learning.test, method = "mle") > dag2 = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > bn2 = bn.fit(dag2, learning.test, method = "mle") > dag3 = complete.graph(names(learning.test)) > bn3 = bn.fit(dag3, learning.test, method = "mle", replace.unidentifiable = TRUE)

The entropy, which can be computed with `H()`

, gradually decreases as the Bayesian networks become more
informative for the data. Even the complete network is not a singular models, so they are all strictly positive.

> H(bn1)

[1] 5.64615

> H(bn2)

[1] 4.76624

> H(bn3)

[1] 4.721232

The Kullback-Leibler divergence, computed with `KL()`

, are all positive since the Bayesian networks
represent different probability distributions.

> KL(bn2, bn1)

[1] 0.8797299

> KL(bn3, bn2)

[1] 0.04519399

Naturally, the Kullback-Leibler divergence between two Bayesian network in the same equivalence class will be zero
because they encode the same probability distribution. We can verify that by changing the direction of the
arc `A`

→ `B`

in `bn2`

to produce `bn4`

, which is score-equivalent.

> dag4 = model2network("[A|B][C][F][B][D|A:C][E|B:F]") > bn4 = bn.fit(dag2, learning.test, method = "mle") > > KL(bn2, bn4)

[1] 0

In the case of Gaussian and conditional Gaussian networks, `H()`

and `KL`

compute the
differential entropy and the Kullback-Leibler divergences. The former can be negative!

> dag1 = model2network("[A][B|A]") > bn1 = custom.fit(dag1, list(A = list(coef = 0, sd = 1e-3), + B = list(coef = c(0, 1), sd = 1))) > > H(bn1)

[1] -4.069878

Note the following corner cases:

`H()`

returns`-Inf`

if a(t least one) continuous variable in the network is singular.> dag1 = model2network("[A][B|A]") > bn1 = custom.fit(dag1, list(A = list(coef = 0, sd = 0), + B = list(coef = c(0, 1), sd = 1))) > > H(bn1)

[1] -Inf

`H()`

does not returns`-Inf`

even if a(t least one) discrete variable has a zero-one distribution.> dag1 = model2network("[A][B|A]") > cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("a", "b"))) > cptB = matrix(c(0.8, 0.2, 0, 1), ncol = 2, + dimnames = list(c("a", "b"), c("a", "b"))) > bn1 = custom.fit(dag1, list(A = cptA, B = cptB)) > > H(bn1)

[1] 0.8731726

`KL()`

returns`Inf`

or`-Inf`

if at least one Bayesian network is singular This is true for Gaussian and conditional Gaussian networks.For discrete networks, the second network must contain a zero-one distribution for this to happen.> dag1 = model2network("[A][B|A]") > bn1 = custom.fit(dag1, list(A = list(coef = 0, sd = 0), + B = list(coef = c(0, 1), sd = 1))) > bn2 = custom.fit(dag1, list(A = list(coef = 0, sd = 1), + B = list(coef = c(0, 1), sd = 1))) > > KL(bn1, bn2)

[1] Inf

> KL(bn2, bn1)

[1] -Inf

> dag1 = model2network("[A][B|A]") > cptA = matrix(c(0.4, 0.6), ncol = 2, dimnames = list(NULL, c("a", "b"))) > cptB = matrix(c(0.8, 0.2, 0, 1), ncol = 2, + dimnames = list(c("a", "b"), c("a", "b"))) > bn1 = custom.fit(dag1, list(A = cptA, B = cptB)) > cptB = matrix(c(0.8, 0.2, 0.4, 0.6), ncol = 2, + dimnames = list(c("a", "b"), c("a", "b"))) > bn2 = custom.fit(dag1, list(A = cptA, B = cptB)) > > KL(bn2, bn1)

[1] Inf

> KL(bn1, bn2)

[1] 0.3064954

- If any parameter is
`NA`

, both`H()`

and`KL()`

will return`NA`

.> class(bn1) = NULL > bn1$B$prob[, "b"] = c(NA, NA) > class(bn1) = c("bn.fit", "bn.fit.dnet") > > H(bn1)

[1] NA

> KL(bn1, bn2)

[1] NA

> KL(bn2, bn1)

[1] NA

`Mon Aug 5 02:44:12 2024`

with **bnlearn**

`5.0`

and `R version 4.4.1 (2024-06-14)`

.