Interventions and counterfactuals
The fundamental techniques of Judea's Pearl framework for causality are interventions and counterfactuals, which operate on Bayesian networks through their structural causal model representation. Both have implementations in bnlearn, basic but under active development (documented here).
Consider the following Bayesian network.
> bn = model2network("[X1][X3][X5][X6|X8][X2|X1][X7|X5][X4|X1:X2][X8|X3:X7][X9|X2:X7][X10|X1:X9]") > graphviz.plot(bn)
Interventions
Interventions alter the graphical structure of the network by removing the arcs incoming to the nodes we choose to
intervene upon. The resulting networks is also known as the mutilated network. The intervention()
function takes a bn or bn.fit object; and a named list containing the nodes we intervene on
and the values we fix them to as arguments.
> mutilated = intervention(bn, evidence = list(X9 = 6.9, X6 = 4.2)) > par(mfrow = c(1, 2)) > graphviz.compare(bn, mutilated, main = c("ORIGINAL", "MUTILATED"))
This corresponds to a hard intervention. Soft interventions are not supported by intervention(), but can
be implemented by manually changing the network structure (see some examples here.)
The values of the variables listed in the evidence argument are ignored when the network is a
bn object. On the other hand, if the network is a bn.fit object the intervention nodes are
also set to the specified values. In Gaussian and conditional Gaussian nodes, the local distribution will become a
simple regression model with a standard error of zero.
> data = as.data.frame(matrix(rnorm(100), nrow = 10, ncol = 10, + dimnames = list(NULL, nodes(bn)))) > fitted = bn.fit(bn, data) > mutilated = intervention(fitted, evidence = list(X9 = 6.9, X6 = 4.2)) > mutilated$X6
Parameters of node X6 (Gaussian distribution)
Conditional density: X6
Coefficients:
(Intercept)
4.2
Standard deviation of the residuals: 0
In discrete nodes, the local distribution will be a marginal distribution assigning a probability of 1 to the specified value.
> data = discretize(data, breaks = 3) > for (var in colnames(data)) + levels(data[, var]) = letters[1:3] > fitted = bn.fit(bn, data) > mutilated = intervention(fitted, evidence = list(X9 = "c", X6 = "b")) > mutilated$X9
Parameters of node X9 (multinomial distribution) Conditional probability table: a b c 0 0 1
The function mutilated is an alias of intervention, retained for backward compatibility.
Counterfactuals
Unlike intervention(), counterfactuals operate on the twin network constructed from the Bayesian
network and thus explicitly require to convert it into a structural causal model. The main difference between a Bayesian
network and a structural causal model is that all nodes have a local distribution in the former but not in the latter.
Instead, a structural causal model splits each node into two: one with the exogenous noise and one with a deterministic
functional relationship between the node and its parents.
The twin() creates the twin network, in which each node from the Bayesian network is duplicated into
its factual and counterfactual copies. The exogenous noise is separated and connected to both as a
parent.
> twn = twin(bn) > graphviz.plot(twn)
The returned twin network has class bn, bn.twin. For node X in the original Bayesian
network, it contains the node X itself, its counterfactual copy X. and the exogenous noise
uX linke to both.
A counterfactual() is essentially an intervention on the counterfactual copies of the
evidence nodes in the twin network. For instance, intervening on node X8. removes all the arcs
incoming on X8., including that from the noise node uX8.
> ctf = counterfactual(bn, evidence = list(X8. = 4.2)) > graphviz.plot(ctf)
Comparing ctf with the twin network twn, we can see the missing arc uX8
→ X8.. We can also see that ctf has fewer nodes. By default,
counterfactual() merges factual and counterfactual nodes with the same distribution, and removes the
associated noise nodes. This facilitates interpretation and significantly reduces the computational burden of exact
and approximate inference.
> par(mfrow = c(1, 2)) > graphviz.compare(twn, ctf)
Passing merging = FALSE to counterfactual() will make it skip the node merging.
The evidence in counterfactual(), by definition, should only refer to counterfactual nodes
(like X8. above). If evidence contains factual nodes, they are replaced by the corresponding
counterfactual nodes with a warning.
> counterfactual(bn, evidence = list(X8 = 4.2))
## Warning in check.evidence(evidence, graph = twin, ideal.only = FALSE, ## counterfactual = TRUE): evidence contains factual nodes X8.
Causal network
model:
[uX1][uX10][uX2][uX3][uX4][uX5][uX6][uX7][uX8][uX9][X8.][X1|uX1][X3|uX3]
[X5|uX5][X6.|uX6:X8.][X2|X1:uX2][X7|X5:uX7][X4|X1:X2:uX4][X8|X3:X7:uX8]
[X9|X2:X7:uX9][X10|X1:X9:uX10][X6|X8:uX6]
nodes: 22
arcs: 23
undirected arcs: 0
directed arcs: 23
average markov blanket size: 3.45
average neighbourhood size: 2.09
average branching factor: 1.05
inference type: Twin Network
Referring to exogenous nodes in the evidence always produces an error.
> counterfactual(bn, evidence = list(uX8 = 4.2))
## Error: evidence cannot include exogenous nodes.
Finally, bnlearn only supports introducing one counterfactual at a time. Trying to call
counterfactual() more than once produces an error.
> ctf1 = counterfactual(bn, evidence = list(X8 = 4.2))
## Warning in check.evidence(evidence, graph = twin, ideal.only = FALSE, ## counterfactual = TRUE): evidence contains factual nodes X8.
> ctf2 = counterfactual(ctf1, evidence = list(X4 = 6.9))
## Error: 'x' is already a counterfactual network.
Introducing multiple counterfactuals would require adding multiple sets of counterfactual nodes to the network, each representing a different alternate reality. This is not supported at the moment (and might never be). On the other hand, introducing a single counterfactual that touches multiple nodes simultaneously is supported.
> ctf = counterfactual(bn, evidence = list(X8. = 4.2, X4. = 6.9)) > graphviz.plot(ctf)
Fitted Bayesian networks
Having modified a network with intervention() and/or counterfactual(),
cpquery() (see here), predict() (see here)
or the gRain package (see here).
Currently, counterfactual() only supports Gaussian Bayesian networks, whereas
intervention() supports all types of Bayesian networks.
As an example, consider the probability of X9. < 3 conditional on X8 = 6.9 (factual) and
X8. = 4.2 (counterfactual). In other words: X8 = 6.9 has been observed, but if we had
observed X8. = 4.2 instead, what would the probability of X9. < 3 be in that alternative
reality?
> fitted = custom.fit(bn, dist = list( + X1 = list(coef = c("(Intercept)" = 1), sd = 1.01), + X2 = list(coef = c("(Intercept)" = 1, X1 = 0.1), sd = 1.02), + X3 = list(coef = c("(Intercept)" = 1), sd = 1.03), + X4 = list(coef = c("(Intercept)" = 1, X1 = 0.2, X2 = 0.3), sd = 1.04), + X5 = list(coef = c("(Intercept)" = 1), sd = 1.05), + X6 = list(coef = c("(Intercept)" = 1, X8 = 0.4), sd = 1.06), + X7 = list(coef = c("(Intercept)" = 1, X5 = 0.5), sd = 1.07), + X8 = list(coef = c("(Intercept)" = 1, X3 = 0.6, X7 = 0.7), sd = 1.08), + X9 = list(coef = c("(Intercept)" = 1, X2 = 0.8, X7 = 0.9), sd = 1.09), + X10 = list(coef = c("(Intercept)" = 1, X1 = 1.0, X9 = 1.1), sd = 1.10) + )) > ctf = counterfactual(fitted, evidence = list(X8. = 4.2), merging = FALSE) > ctf$X8.
Parameters of node X8. (Gaussian distribution)
Conditional density: X8.
Coefficients:
(Intercept)
4.2
Standard deviation of the residuals: 0
> cpquery(ctf, event = (X9. < 3), evidence = list(X8 = 6.9), method = "lw")
[1] 0.2
The counterfactual probability is different from the factual probability, and the interventional probability we get
after calling intervention().
> cpquery(fitted, event = (X9 < 3), evidence = list(X8 = 6.9), method = "lw")
[1] 0.1278857
> mut = intervention(fitted, evidence = list(X8 = 6.9)) > cpquery(mut, event = (X9 < 3), evidence = TRUE, method = "lw")
[1] 0.4552414
Thu Nov 27 15:21:24 2025 with bnlearn
5.2-20250910
and R version 4.5.2 (2025-10-31).