Interventions and counterfactuals

The fundamental techniques of Judea's Pearl framework for causality are interventions and counterfactuals, which operate on Bayesian networks through their structural causal model representation. Both have implementations in bnlearn, basic but under active development (documented here).

Consider the following Bayesian network.

> bn = model2network("[X1][X3][X5][X6|X8][X2|X1][X7|X5][X4|X1:X2][X8|X3:X7][X9|X2:X7][X10|X1:X9]")
> graphviz.plot(bn)
plot of chunk unnamed-chunk-2

Interventions

Interventions alter the graphical structure of the network by removing the arcs incoming to the nodes we choose to intervene upon. The resulting netwoks is also known as the mutilated network. The intervention() function takes a bn or bn.fit object; and a named list containing the nodes we intervene on and the values we fix them to as arguments.

> mutilated = intervention(bn, evidence = list(X9 = 6.9, X6 = 4.2))
> par(mfrow = c(1, 2))
> graphviz.compare(bn, mutilated, main = c("ORIGINAL", "MUTILATED"))
plot of chunk unnamed-chunk-3

This corresponds to a hard intervention. Soft interventions are not supported by intervention(), but can be implemented by manually changing the network structure (see some examples here.)

The values of the variables listed in the evidence argument are ignored when the network is a bn object. On the other hand, if the network is a bn.fit object the intervention nodes are also set to the specified values. In Gaussian and conditional Gaussian nodes, the local distribution will become a simple regression model with a standard error of zero.

> data = as.data.frame(matrix(rnorm(100), nrow = 10, ncol = 10,
+          dimnames = list(NULL, nodes(bn))))
> fitted = bn.fit(bn, data)
> mutilated = intervention(fitted, evidence = list(X9 = 6.9, X6 = 4.2))
> mutilated$X6

  Parameters of node X6 (Gaussian distribution)

Conditional density: X6
Coefficients:
(Intercept)  
        4.2  
Standard deviation of the residuals: 0

In discrete nodes, the local distribution will be a marginal distribution assigning a probability of 1 to the specified value.

> data = discretize(data, breaks = 3)
> for (var in colnames(data))
+   levels(data[, var]) = letters[1:3]
> fitted = bn.fit(bn, data)
> mutilated = intervention(fitted, evidence = list(X9 = "c", X6 = "b"))
> mutilated$X9

  Parameters of node X9 (multinomial distribution)

Conditional probability table:
 a b c 
0 0 1

The function mutilated is an alias of intervention, retained for backward compatibility.

Counterfactuals

Unlike intervention(), conterfactuals operate on the twin network constructed from the Bayesian network and thus explicitly require to convert it into a structural causal model. The main difference between a Bayesian network and a structural causal model is that all nodes have a local distribution in the former but not in the latter. Instead, a structural causal model splits each node into two: one with the exhogenous noise and one with a deterministic functional relationship between the node and its parents.

The twin() creates the twin network, in which each node from the Bayesian network is duplicated into its factual and counterfactual copies. The exhogenous noise is separated and connected to both as a parent.

> twn = twin(bn)
> graphviz.plot(twn)
plot of chunk unnamed-chunk-6

The returned twin network has class bn, bn.twin. For node X in the original Bayesian network, it contains the node X itself, its counterfactual copy X. and the exhogenous noise uX linke to both.

A counterfactual() is essentially an intervention on the counterfactual copies of the evidence nodes in the twin network. For instance, intervening on node X8 removes all the arcs incoming on X8., including that from the noise node uX8.

> ctf = counterfactual(bn, evidence = list(X8 = 4.2))
> graphviz.plot(ctf)
plot of chunk unnamed-chunk-7

Comparing ctf with the twin network twn, we can see the missing arc uX8X8.. We can also see that ctf has fewer nodes. By default, counterfactual() merges factual and counterfactual nodes with the same distribution, and removes the associated noise nodes. This facilitates interpretation and significantly reduces the computational burden of exact and approximate inference.

> par(mfrow = c(1, 2))
> graphviz.compare(twn, ctf)
plot of chunk unnamed-chunk-8

Passing merging = FALSE to counterfactual() will make it skip the node merging.

Last updated on Tue Aug 5 22:03:12 2025 with bnlearn 5.1 and R version 4.5.0 (2025-04-11).