## Approximate Monte Carlo inference

Exact inference is an extremely powerful, if computational expensive, tool to use Bayesian networks in applications. However, it is limited by the belief propagation algorithm that underlies it: it can only provide answers to queries that use hard evidence and some limited forms of soft evidence.

On the other hand, approximate inference based on (Monte Carlo) particle filters scales better to larger networks and has virtually no limits to the complexity of the queries it can answer. Basically, it works like this:

- Generate a large number of random samples from the Bayesian network.
- Check how many of these samples match the conditions set by the evidence.
- Check how many of these samples also match the desired values for the query variables.
- Use these two numbers to estimate the probability of the event of interest.

In practice, this is not much more complex that running `subset()`

on a data frame full of observations.

### Evaluating conditional probability queries

`cpquery()`

(documented here) implements this approach
to estimate the answer to *conditional probability queries*. That is, we encode a Bayesian network as a
`bn.fit`

object,

> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]") > dbn = bn.fit(dag, learning.test)

and then we ask it the probability of a certain *event* of interest assuming that we have observed some
*evidence*. Say that the event is `A`

taking value `"a"`

, and the evidence is
`D`

taking value `"c"`

.

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"))

[1] 0.2299546

If we run this command again, we will get a slightly different result because of the simulation noise inherent to Monte Carlo methods.

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"))

[1] 0.2457627

We can reduce the simulation noise by increasing the number of random samples we generate, at the cost of the query taking longer.

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"), n = 10^7)

[1] 0.2412176

Since `cpquery()`

is works similarly to `subset()`

, we can use arbitrary expressions for both
the event and the evidence. For instance, either can reference multiple values from the same variable,

> cpquery(dbn, event = (A == "a"), evidence = (D %in% c("a", "c")))

[1] 0.4417008

> cpquery(dbn, event = (A %in% c("a", "b")), evidence = (D == "c"))

[1] 0.3319477

involve multiple variables,

> cpquery(dbn, event = (A == "a") & (B != "A"), evidence = (D %in% c("a", "c")))

[1] 0.4375907

> cpquery(dbn, event = (A %in% c("a", "b")), + evidence = (D == "c") & ((C == "a") | (F == "b")))

[1] 0.2882883

and string together different expression with logical operators. The key limitations in writing these expressions is
that they must evaluate to a vector of logical values, one for each of the random samples used internally by
`cpquery()`

. As a side effect, this makes it possible to write (unconditional) queries without evidence as
follows:

> cpquery(dbn, event = (A == "a"), evidence = TRUE)

[1] 0.3361

This flexibility is even more useful when working with Bayesian networks containing continuous nodes. For instance, consider a Gaussian network.

> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]") > gbn = bn.fit(dag, gaussian.test)

`cpquery()`

can check individual tails of one or more nodes,

> cpquery(gbn, event = (F >= quantile(gaussian.test$F, 0.90)), evidence = (A <= 0))

[1] 0.03043478

> cpquery(gbn, event = (abs(F) >= quantile(gaussian.test$F, 0.95)), + evidence = (abs(A) >= quantile(gaussian.test$A, 0.90)))

[1] 0.1384758

use simultaneous variable transforms,

> cpquery(gbn, event = (F >= A^2), evidence = (A <= 0))

[1] 0.998806

> cpquery(gbn, event = (log(2 + A + F) >= D), evidence = (A <= 0))

[1] 0.06580494

and use complex variable transforms.

> cpquery(gbn, event = rank(F) > rank(A), evidence = (A >= 0))

[1] 0.4316872

### Sampling from conditional probability distributions

`cpdist()`

works like a partial implementation of `cpquery()`

: it generates the random samples,
it filters them using the provided evidence, but it does not compute the probability of an event. Instead, it returns
the random samples for further analyses.

> cpdist(gbn, nodes = c("A", "C", "F"), evidence = (A >= 0), n = 10)

A C F 1 2.9188487 10.757771 25.41570 2 2.4546295 6.325242 21.23949 3 0.5581860 13.255901 23.38719 4 0.6666492 -2.032756 12.80954 5 2.1796092 13.023408 28.25364 6 0.7372265 8.038061 24.04888 7 0.3968961 10.764516 25.18689 8 1.7306326 7.852705 17.97200 9 1.8424067 10.724754 25.60063

For this purpose, `cpquery()`

has a `nodes`

argument (to specify which variables should be
returned for the random samples) but no `event`

argument.

### Likelihood weighting

All the examples above use logic sampling, the default approach in both `cpquery()`

and
`cpdist()`

. Logic sampling has a big limitation: if the evidence has a small probability, all the random
samples generated will be discarded because they do not match it, wasting running time and limiting the precision
of subsequent analyses. To address this issue, `cpquery()`

and `cpdist()`

also implement
*likelihood weighting*, an alternative method that generates only random samples that match the evidence and
gives them weights to correct their distribution.

Likelihood weighting when `method = "lw"`

as opposed to the default `method = "ls"`

. Unlike
logic sampling, it requires that the `evidence`

is passed as a named list specifying the values of each
node. The `event`

is still an arbitrary expression.

> cpquery(dbn, event = (A == "a"), evidence = list(D = "c"), method = "lw")

[1] 0.2429329

The practical difference between logic sampling and likelihood weighting becomes more apparent when looking at the respective debugging outputs.

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"), method = "ls", debug = TRUE)

* checking which nodes are needed. > event involves the following nodes: A > evidence involves the following nodes: D > upper closure is ' D A C ' > generating observations from 3 / 6 nodes. * generated 10000 samples from the bayesian network. > evidence matches 3360 samples out of 10000 (p = 0.336). > event matches 830 samples out of 3360 (p = 0.2470238).

[1] 0.2470238

> cpquery(dbn, event = (A == "a"), evidence = list(D = "c"), method = "lw", debug = TRUE)

* checking which nodes are needed. > event involves the following nodes: A > evidence involves the following nodes: D > upper closure is ' D A C ' > generating observations from 3 / 6 nodes. * generated 10000 samples from the bayesian network. * processing node D. > event has a probability mass of 1005.538 out of 4224.681.

[1] 0.2380152

In the case of discrete nodes, the `evidence`

argument allows for multiple values per node which will be
treated as a set. For instance, the `D`

variable can take values `"b"`

or `"c"`

with
equal probability.

> cpquery(dbn, event = (A == "a"), evidence = list(D = c("b", "c")), method = "lw")

[1] 0.1831293

As for continuous nodes, the `evidence`

argument allows for two values per nodes which will be interpreted
as the boundaries of a finite interval. The values within that intervals will be weighted equally, that is, they will
have a uniform distribution.

> cpquery(gbn, event = rank(F) > rank(A), evidence = list(A = c(0, 3)), method = "lw")

[1] 0.5664669

Another key difference is that the likelihood weightin can deal with point evidence in continuous variables but the logic sampling can not. Consider the following:

> cpquery(gbn, event = rank(F) > rank(A), evidence = (A == 2), method = "ls", debug = TRUE)

* checking which nodes are needed. > event involves the following nodes: A F > evidence involves the following nodes: A > upper closure is ' F A D E G B ' > generating observations from 6 / 7 nodes. * generated 10000 samples from the bayesian network. > evidence matches 0 samples out of 10000 (p = 0). > event matches 0 samples out of 0 (p = 0). * generated 500 samples from the bayesian network. > evidence matches 0 samples out of 500 (p = 0). > event matches 0 samples out of 0 (p = 0).

[1] 0

The probability that `A`

is exactly equal to `2`

in any of the random samples generated by
logic sampling is zero, so the only possible answer `cpquery()`

can give is zero. However, likelihood
weighting only generates random samples that contain the evidence and, as a result, it can answer the conditional
probability query without issue.

> cpquery(gbn, event = rank(F) > rank(A), evidence = list(A = 2), method = "lw", debug = TRUE)

* checking which nodes are needed. > event involves the following nodes: A F > evidence involves the following nodes: A > upper closure is ' F A D E G B ' > generating observations from 6 / 7 nodes. * generated 10000 samples from the bayesian network. * processing node A. > event has a probability mass of 5000 out of 10000. * generated 500 samples from the bayesian network. * processing node A. > event has a probability mass of 5250 out of 10500.

[1] 0.5

The same behaviour is apparent in `cpdist()`

.

> particles = cpdist(gbn, nodes = nodes(gbn), evidence = (A == 2), method = "ls", debug = TRUE)

* checking which nodes are needed. > event involves the following nodes: A B C D E F G > evidence involves the following nodes: A > upper closure is ' C F A D E G B ' > generating observations from 7 / 7 nodes. * generated 10000 samples from the bayesian network. > evidence matches 0 samples out of 10000 (p = 0). * generated 500 samples from the bayesian network. > evidence matches 0 samples out of 500 (p = 0).

> nrow(particles)

[1] 0

> particles = cpdist(gbn, nodes = nodes(gbn), evidence = list(A = 2), method = "lw", debug = TRUE)

* checking which nodes are needed. > event involves the following nodes: A B C D E F G > evidence involves the following nodes: A > upper closure is ' C F A D E G B ' > generating observations from 7 / 7 nodes. * generated 10500 samples from the bayesian network.

> nrow(particles)

[1] 10500

> summary(particles$A)

Min. 1st Qu. Median Mean 3rd Qu. Max. 2 2 2 2 2 2

`Fri Aug 16 12:36:44 2024`

with **bnlearn**

`5.0-20240725`

and `R version 4.4.1 (2024-06-14)`

.