Approximate Monte Carlo inference

Exact inference is an extremely powerful, if computational expensive, tool to use Bayesian networks in applications. However, it is limited by the belief propagation algorithm that underlies it: it can only provide answers to queries that use hard evidence and some limited forms of soft evidence.

On the other hand, approximate inference based on (Monte Carlo) particle filters scales better to larger networks and has virtually no limits to the complexity of the queries it can answer. Basically, it works like this:

  1. Generate a large number of random samples from the Bayesian network.
  2. Check how many of these samples match the conditions set by the evidence.
  3. Check how many of these samples also match the desired values for the query variables.
  4. Use these two numbers to estimate the probability of the event of interest.

In practice, this is not much more complex that running subset() on a data frame full of observations.

Evaluating conditional probability queries

cpquery() (documented here) implements this approach to estimate the answer to conditional probability queries. That is, we encode a Bayesian network as a bn.fit object,

> dag = model2network("[A][C][F][B|A][D|A:C][E|B:F]")
> dbn = bn.fit(dag, learning.test)

and then we ask it the probability of a certain event of interest assuming that we have observed some evidence. Say that the event is A taking value "a", and the evidence is D taking value "c".

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"))
[1] 0.2299546

If we run this command again, we will get a slightly different result because of the simulation noise inherent to Monte Carlo methods.

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"))
[1] 0.2457627

We can reduce the simulation noise by increasing the number of random samples we generate, at the cost of the query taking longer.

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"), n = 10^7)
[1] 0.2412176

Since cpquery() is works similarly to subset(), we can use arbitrary expressions for both the event and the evidence. For instance, either can reference multiple values from the same variable,

> cpquery(dbn, event = (A == "a"), evidence = (D %in% c("a", "c")))
[1] 0.4417008
> cpquery(dbn, event = (A %in% c("a", "b")), evidence = (D == "c"))
[1] 0.3319477

involve multiple variables,

> cpquery(dbn, event = (A == "a") & (B != "A"), evidence = (D %in% c("a", "c")))
[1] 0.4375907
> cpquery(dbn, event = (A %in% c("a", "b")),
+             evidence = (D == "c") & ((C == "a") | (F == "b")))
[1] 0.2882883

and string together different expression with logical operators. The key limitations in writing these expressions is that they must evaluate to a vector of logical values, one for each of the random samples used internally by cpquery(). As a side effect, this makes it possible to write (unconditional) queries without evidence as follows:

> cpquery(dbn, event = (A == "a"), evidence = TRUE)
[1] 0.3361

This flexibility is even more useful when working with Bayesian networks containing continuous nodes. For instance, consider a Gaussian network.

> dag = model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]")
> gbn = bn.fit(dag, gaussian.test)

cpquery() can check individual tails of one or more nodes,

> cpquery(gbn, event = (F >= quantile(gaussian.test$F, 0.90)), evidence = (A <= 0))
[1] 0.03043478
> cpquery(gbn, event = (abs(F) >= quantile(gaussian.test$F, 0.95)),
+             evidence = (abs(A) >= quantile(gaussian.test$A, 0.90)))
[1] 0.1384758

use simultaneous variable transforms,

> cpquery(gbn, event = (F >= A^2), evidence = (A <= 0))
[1] 0.998806
> cpquery(gbn, event = (log(2 + A + F) >= D), evidence = (A <= 0))
[1] 0.06580494

and use complex variable transforms.

> cpquery(gbn, event = rank(F) > rank(A), evidence = (A >= 0))
[1] 0.4316872

Sampling from conditional probability distributions

cpdist() works like a partial implementation of cpquery(): it generates the random samples, it filters them using the provided evidence, but it does not compute the probability of an event. Instead, it returns the random samples for further analyses.

> cpdist(gbn, nodes = c("A", "C", "F"), evidence = (A >= 0), n = 10)
          A         C        F
1 2.9188487 10.757771 25.41570
2 2.4546295  6.325242 21.23949
3 0.5581860 13.255901 23.38719
4 0.6666492 -2.032756 12.80954
5 2.1796092 13.023408 28.25364
6 0.7372265  8.038061 24.04888
7 0.3968961 10.764516 25.18689
8 1.7306326  7.852705 17.97200
9 1.8424067 10.724754 25.60063

For this purpose, cpquery() has a nodes argument (to specify which variables should be returned for the random samples) but no event argument.

Likelihood weighting

All the examples above use logic sampling, the default approach in both cpquery() and cpdist(). Logic sampling has a big limitation: if the evidence has a small probability, all the random samples generated will be discarded because they do not match it, wasting running time and limiting the precision of subsequent analyses. To address this issue, cpquery() and cpdist() also implement likelihood weighting, an alternative method that generates only random samples that match the evidence and gives them weights to correct their distribution.

Likelihood weighting when method = "lw" as opposed to the default method = "ls". Unlike logic sampling, it requires that the evidence is passed as a named list specifying the values of each node. The event is still an arbitrary expression.

> cpquery(dbn, event = (A == "a"), evidence = list(D = "c"), method = "lw")
[1] 0.2429329

The practical difference between logic sampling and likelihood weighting becomes more apparent when looking at the respective debugging outputs.

> cpquery(dbn, event = (A == "a"), evidence = (D == "c"), method = "ls", debug = TRUE)
* checking which nodes are needed.
  > event involves the following nodes: A 
  > evidence involves the following nodes: D 
  > upper closure is ' D A C '
  > generating observations from 3 / 6 nodes.
* generated 10000 samples from the bayesian network.
  > evidence matches 3360 samples out of 10000 (p = 0.336).
  > event matches 830 samples out of 3360 (p = 0.2470238).
[1] 0.2470238
> cpquery(dbn, event = (A == "a"), evidence = list(D = "c"), method = "lw", debug = TRUE)
* checking which nodes are needed.
  > event involves the following nodes: A 
  > evidence involves the following nodes: D 
  > upper closure is ' D A C '
  > generating observations from 3 / 6 nodes.
* generated 10000 samples from the bayesian network.
* processing node D.
  > event has a probability mass of 1005.538 out of 4224.681.
[1] 0.2380152

In the case of discrete nodes, the evidence argument allows for multiple values per node which will be treated as a set. For instance, the D variable can take values "b" or "c" with equal probability.

> cpquery(dbn, event = (A == "a"), evidence = list(D = c("b", "c")), method = "lw")
[1] 0.1831293

As for continuous nodes, the evidence argument allows for two values per nodes which will be interpreted as the boundaries of a finite interval. The values within that intervals will be weighted equally, that is, they will have a uniform distribution.

> cpquery(gbn, event = rank(F) > rank(A), evidence = list(A = c(0, 3)), method = "lw")
[1] 0.5664669

Another key difference is that the likelihood weightin can deal with point evidence in continuous variables but the logic sampling can not. Consider the following:

> cpquery(gbn, event = rank(F) > rank(A), evidence = (A == 2), method = "ls", debug = TRUE)
* checking which nodes are needed.
  > event involves the following nodes: A F 
  > evidence involves the following nodes: A 
  > upper closure is ' F A D E G B '
  > generating observations from 6 / 7 nodes.
* generated 10000 samples from the bayesian network.
  > evidence matches 0 samples out of 10000 (p = 0).
  > event matches 0 samples out of 0 (p = 0).
* generated 500 samples from the bayesian network.
  > evidence matches 0 samples out of 500 (p = 0).
  > event matches 0 samples out of 0 (p = 0).
[1] 0

The probability that A is exactly equal to 2 in any of the random samples generated by logic sampling is zero, so the only possible answer cpquery() can give is zero. However, likelihood weighting only generates random samples that contain the evidence and, as a result, it can answer the conditional probability query without issue.

> cpquery(gbn, event = rank(F) > rank(A), evidence = list(A = 2), method = "lw", debug = TRUE)
* checking which nodes are needed.
  > event involves the following nodes: A F 
  > evidence involves the following nodes: A 
  > upper closure is ' F A D E G B '
  > generating observations from 6 / 7 nodes.
* generated 10000 samples from the bayesian network.
* processing node A.
  > event has a probability mass of 5000 out of 10000.
* generated 500 samples from the bayesian network.
* processing node A.
  > event has a probability mass of 5250 out of 10500.
[1] 0.5

The same behaviour is apparent in cpdist().

> particles = cpdist(gbn, nodes = nodes(gbn), evidence = (A == 2), method = "ls", debug = TRUE)
* checking which nodes are needed.
  > event involves the following nodes: A B C D E F G 
  > evidence involves the following nodes: A 
  > upper closure is ' C F A D E G B '
  > generating observations from 7 / 7 nodes.
* generated 10000 samples from the bayesian network.
  > evidence matches 0 samples out of 10000 (p = 0).
* generated 500 samples from the bayesian network.
  > evidence matches 0 samples out of 500 (p = 0).
> nrow(particles)
[1] 0
> particles = cpdist(gbn, nodes = nodes(gbn), evidence = list(A = 2), method = "lw", debug = TRUE)
* checking which nodes are needed.
  > event involves the following nodes: A B C D E F G 
  > evidence involves the following nodes: A 
  > upper closure is ' C F A D E G B '
  > generating observations from 7 / 7 nodes.
* generated 10500 samples from the bayesian network.
> nrow(particles)
[1] 10500
> summary(particles$A)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       2       2       2       2       2
Last updated on Fri Aug 16 12:36:44 2024 with bnlearn 5.0-20240725 and R version 4.4.1 (2024-06-14).