bnlearn - Interventions and counterfactuals

Interventions and counterfactuals

The fundamental techniques of Judea's Pearl framework for causality are interventions and counterfactuals, which operate on Bayesian networks through their structural causal model representation. Both have implementations in bnlearn, basic but under active development (documented here).

Consider the following Bayesian network.

> bn = model2network("[X1][X3][X5][X6|X8][X2|X1][X7|X5][X4|X1:X2][X8|X3:X7][X9|X2:X7][X10|X1:X9]")
> graphviz.plot(bn)

Interventions

Interventions alter the graphical structure of the network by removing the arcs incoming to the nodes we choose to intervene upon. The resulting networks is also known as the mutilated network. The intervention() function takes a bn or bn.fit object; and a named list containing the nodes we intervene on and the values we fix them to as arguments.

> mutilated = intervention(bn, evidence = list(X9 = 6.9, X6 = 4.2))
> par(mfrow = c(1, 2))
> graphviz.compare(bn, mutilated, main = c("ORIGINAL", "MUTILATED"))

This corresponds to a hard intervention. Soft interventions are not supported by intervention(), but can be implemented by manually changing the network structure (see some examples here.)

The values of the variables listed in the evidence argument are ignored when the network is a bn object. On the other hand, if the network is a bn.fit object the intervention nodes are also set to the specified values. In Gaussian and conditional Gaussian nodes, the local distribution will become a simple regression model with a standard error of zero.

> data = as.data.frame(matrix(rnorm(100), nrow = 10, ncol = 10,
+          dimnames = list(NULL, nodes(bn))))
> fitted = bn.fit(bn, data)
> mutilated = intervention(fitted, evidence = list(X9 = 6.9, X6 = 4.2))
> mutilated$X6


  Parameters of node X6 (Gaussian distribution)

Conditional density: X6
Coefficients:
(Intercept)  
        4.2  
Standard deviation of the residuals: 0

In discrete nodes, the local distribution will be a marginal distribution assigning a probability of 1 to the specified value.

> data = discretize(data, breaks = 3)
> for (var in colnames(data))
+   levels(data[, var]) = letters[1:3]
> fitted = bn.fit(bn, data)
> mutilated = intervention(fitted, evidence = list(X9 = "c", X6 = "b"))
> mutilated$X9


  Parameters of node X9 (multinomial distribution)

Conditional probability table:
 a b c 
0 0 1

The function mutilated is an alias of intervention, retained for backward compatibility.

Counterfactuals

Unlike intervention(), counterfactuals operate on the twin network constructed from the Bayesian network and thus explicitly require to convert it into a structural causal model. The main difference between a Bayesian network and a structural causal model is that all nodes have a local distribution in the former but not in the latter. Instead, a structural causal model splits each node into two: one with the exogenous noise and one with a deterministic functional relationship between the node and its parents.

The twin() creates the twin network, in which each node from the Bayesian network is duplicated into its factual and counterfactual copies. The exogenous noise is separated and connected to both as a parent.

> twn = twin(bn)
> graphviz.plot(twn)

The returned twin network has class bn, bn.twin. For node X in the original Bayesian network, it contains the node X itself, its counterfactual copy X. and the exogenous noise uX linke to both.

A counterfactual() is essentially an intervention on the counterfactual copies of the evidence nodes in the twin network. For instance, intervening on node X8. removes all the arcs incoming on X8., including that from the noise node uX8.

> ctf = counterfactual(bn, evidence = list(X8. = 4.2))
> graphviz.plot(ctf)

Comparing ctf with the twin network twn, we can see the missing arc uX8 → X8.. We can also see that ctf has fewer nodes. By default, counterfactual() merges factual and counterfactual nodes with the same distribution, and removes the associated noise nodes. This facilitates interpretation and significantly reduces the computational burden of exact and approximate inference.

> par(mfrow = c(1, 2))
> graphviz.compare(twn, ctf)

Passing merging = FALSE to counterfactual() will make it skip the node merging.

The evidence in counterfactual(), by definition, should only refer to counterfactual nodes (like X8. above). If evidence contains factual nodes, they are replaced by the corresponding counterfactual nodes with a warning.

> counterfactual(bn, evidence = list(X8 = 4.2))

## Warning in check.evidence(evidence, graph = twin, ideal.only = FALSE,
## counterfactual = TRUE): evidence contains factual nodes X8.


  Causal network

  model:
   [uX1][uX10][uX2][uX3][uX4][uX5][uX6][uX7][uX8][uX9][X8.][X1|uX1][X3|uX3]
   [X5|uX5][X6.|uX6:X8.][X2|X1:uX2][X7|X5:uX7][X4|X1:X2:uX4][X8|X3:X7:uX8]
   [X9|X2:X7:uX9][X10|X1:X9:uX10][X6|X8:uX6]
  nodes:                                 22 
  arcs:                                  23 
    undirected arcs:                     0 
    directed arcs:                       23 
  average markov blanket size:           3.45 
  average neighbourhood size:            2.09 
  average branching factor:              1.05 

  inference type:                        Twin Network

Referring to exogenous nodes in the evidence always produces an error.

> counterfactual(bn, evidence = list(uX8 = 4.2))

## Error: evidence cannot include exogenous nodes.

Finally, bnlearn only supports introducing one counterfactual at a time. Trying to call counterfactual() more than once produces an error.

> ctf1 = counterfactual(bn, evidence = list(X8 = 4.2))

## Warning in check.evidence(evidence, graph = twin, ideal.only = FALSE,
## counterfactual = TRUE): evidence contains factual nodes X8.

> ctf2 = counterfactual(ctf1, evidence = list(X4 = 6.9))

## Error: 'x' is already a counterfactual network.

Introducing multiple counterfactuals would require adding multiple sets of counterfactual nodes to the network, each representing a different alternate reality. This is not supported at the moment (and might never be). On the other hand, introducing a single counterfactual that touches multiple nodes simultaneously is supported.

> ctf = counterfactual(bn, evidence = list(X8. = 4.2, X4. = 6.9))
> graphviz.plot(ctf)

Fitted Bayesian networks

Having modified a network with intervention() and/or counterfactual(), cpquery() (see here), predict() (see here) or the gRain package (see here).

Currently, counterfactual() only supports Gaussian Bayesian networks, whereas intervention() supports all types of Bayesian networks.

As an example, consider the probability of X9. < 3 conditional on X8 = 6.9 (factual) and X8. = 4.2 (counterfactual). In other words: X8 = 6.9 has been observed, but if we had observed X8. = 4.2 instead, what would the probability of X9. < 3 be in that alternative reality?

> fitted = custom.fit(bn, dist = list(
+   X1 = list(coef = c("(Intercept)" = 1), sd = 1.01),
+   X2 = list(coef = c("(Intercept)" = 1, X1 = 0.1), sd = 1.02),
+   X3 = list(coef = c("(Intercept)" = 1), sd = 1.03),
+   X4 = list(coef = c("(Intercept)" = 1, X1 = 0.2, X2 = 0.3), sd = 1.04),
+   X5 = list(coef = c("(Intercept)" = 1), sd = 1.05),
+   X6 = list(coef = c("(Intercept)" = 1, X8 = 0.4), sd = 1.06),
+   X7 = list(coef = c("(Intercept)" = 1, X5 = 0.5), sd = 1.07),
+   X8 = list(coef = c("(Intercept)" = 1, X3 = 0.6, X7 = 0.7), sd = 1.08),
+   X9 = list(coef = c("(Intercept)" = 1, X2 = 0.8, X7 = 0.9), sd = 1.09),
+   X10 = list(coef = c("(Intercept)" = 1, X1 = 1.0, X9 = 1.1), sd = 1.10)
+ ))
> ctf = counterfactual(fitted, evidence = list(X8. = 4.2), merging = FALSE)
> ctf$X8.


  Parameters of node X8. (Gaussian distribution)

Conditional density: X8.
Coefficients:
(Intercept)  
        4.2  
Standard deviation of the residuals: 0

> cpquery(ctf, event = (X9. < 3), evidence = list(X8 = 6.9), method = "lw")

[1] 0.2

The counterfactual probability is different from the factual probability, and the interventional probability we get after calling intervention().

> cpquery(fitted, event = (X9 < 3), evidence = list(X8 = 6.9), method = "lw")

[1] 0.1278857

> mut = intervention(fitted, evidence = list(X8 = 6.9))
> cpquery(mut, event = (X9 < 3), evidence = TRUE, method = "lw")

[1] 0.4552414

Last updated on Thu Nov 27 15:21:24 2025 with bnlearn 5.2-20250910 and R version 4.5.2 (2025-10-31).