Comparing Bayesian network structures
The network structures of Bayesian networks are stored in objects of class bn
(documented here); they can be learned from
real-world data; they can be learned from synthetic data and golden-standard
networks in simulations (examples here); or they can be created
manually (see here). Having multiple bn
objects, we
are then interested in comparing those network structures to check whether they are different;
and if they are different, which arcs are present (or not) only in some of them and how different
they are.
bnlearn implements several functions for this task, all documented here and summarized below:
- Summaries:
all.equal()
: checks whether two networks have the same structure.compare()
: takes one network as a reference network, and computes the number of true positive, false positive and false negative arcs in the other network.
- Structural distances:
hamming()
: computes the Hamming distance between two networks.shd()
: computes the Structural Hamming distance (SHD) between two networks.
- Visual (graphical) side-by-side comparison:
graphviz.compare()
: takes one network as a reference network, and plots all other networks highlighting which are a different from the reference using Rgraphviz.
Summaries
The simplest, most elementary comparison is to check whether two networks are the same; meaning
whether they have the same nodes and same arcs (including their direction). bnlearn
provides an all.equal()
method for bn
objects that returns TRUE
if the two networks are indeed the same; or a character string describing the difference.
For instance, the networks may have different arcs;
> library(bnlearn) > dag1 = model2network("[A][B|A][C|A]") > dag2 = model2network("[A|B:C][B][C]") > all.equal(dag1, dag2)
[1] "Different arc sets"
> arcs(dag1)
from to [1,] "A" "B" [2,] "A" "C"
> arcs(dag2)
from to [1,] "B" "A" [2,] "C" "A"
or it may have different nodes.
> dag3 = model2network("[A][B|A][C|A][D]") > dag4 = model2network("[A][B|A][F|A]") > all.equal(dag1, dag3)
[1] "Different number of nodes"
> all.equal(dag1, dag4)
[1] "Different node sets"
This approach using all.equal()
is more robust than using other functions to compare
objects available from R core, because the all.equal()
method for bn
will
not say two networks are different just because the internals of the bn
objects are
different.
compare()
takes two networks with the same nodes; the first is the target
network, and the second is the current
network. The latter will be compared to the
former, which is taken to be the “true” or “golden standard”
network. Thew default output of compare looks like this:
> compare(dag1, dag2)
$tp [1] 0 $fp [1] 2 $fn [1] 2
The three elements of the list are the counts of:
- the true positive (
tp
) arcs, which appear both intarget
and incurrent
; - the false positive (
fp
) arcs, which appear incurrent
but not intarget
; - the false negative (
fn
) arcs, which appear intarget
but not incurrent
.
Setting arcs = TRUE
makes compare()
return the actual arcs that fall
into each of these categories instead of just the counts.
> compare(dag1, dag2, arcs = TRUE)
$tp from to $fp from to [1,] "B" "A" [2,] "C" "A" $fn from to [1,] "A" "B" [2,] "A" "C"
Structural distances
Structural distance functions take two networks and return a number that says how much these networks are different from each other in terms of how many arcs differ between them. Being distances, the order in which the networks are passed as arguments does not matter since distances are symmetric by default.
hamming()
implements Hamming distance, which simply counts the number of arcs that
are different between two network. For undirected graphs, this means arcs that are present in one
network and not in the other; for directed graphs, compare()
just ignores arc directions
and compares the skeletons of the two networks.
So, for example, dag1
and dag2
above have the same skeleton and therefore
their Hamming distance is zero even though they have different arcs.
> all.equal(skeleton(dag1), skeleton(dag2))
[1] TRUE
> hamming(dag1, dag2)
[1] 0
However, if we add an additional arc to dag1
the Hamming distance between this new
network and dag1
will reflect this since the two have different skeletons.
> dag5 = set.arc(dag1, from = "B", to = "C") > all.equal(skeleton(dag1), skeleton(dag5))
[1] "Different number of directed/undirected arcs"
> hamming(dag1, dag5)
[1] 1
shd()
implements the Structural Hamming distance, which counts how many arcs differ
between the CPDAGs of the two networks. So, if we look at dag1
and dag2
again, these two networks will have a positive distance because dag2
contains a
v-structure that dag1
.
> all.equal(cpdag(dag1), cpdag(dag2))
[1] "Different number of directed/undirected arcs"
> shd(dag1, dag2)
[1] 2
Visual comparisons
A third approach to compare network structures is to plot them side by side and highlight
differences with different colours. graphviz.compare()
does that using
Rgraphviz while taking care that the nodes have the same position for all
networks, to make it easier to spot which arcs are different. As we can see below, and unlike
any of the functions we covered above, graphviz.compare()
can take more than two
networks as arguments.
> dag6 = drop.arc(dag5, from = "A", to = "C") > par(mfrow = c(3, 2)) > graphviz.compare(dag1, dag2, dag5, dag6)
The default scheme to highlight differences (diff = "diff-from-first"
) in the
arcs is:
- the first network is taken as the “true” network;
- in the other networks, true positive arcs are in black;
- false positive arcs (which are missing or have different directions in the true network) are in red;
- false negative arcs are in blue, and drawn using a dashed line.
The style of the graphical comparison can be further modified using the arguments that
graphviz.compare()
shares with graphviz.plot()
, such as
shape
, layout
, main
and sub
.
> par(mfrow = c(2, 2)) > graphviz.compare(dag1, dag2, dag5, dag6, shape = "rectangle", layout = "fdp", + main = c("ONE", "TWO", "THREE", "FOUR"), + sub = paste("SHD =", c("0", shd(dag1, dag2), shd(dag1, dag5), shd(dag1, dag6))))
Other optional arguments specific to diff = "diff-from-first"
can be passed
as a list via the argument diff.args
. They are:
-
tp.col
,tp.lty
,tp.lwd
are the colour, line type and line width of true positive arcs; -
fp.col
,fp.lty
,fp.lwd
are the colour, line type and line width of false positive arcs; -
fn.col
,fn.lty
,fn.lwd
are the colour, line type and line width of negative arcs.
An example:
> par(mfrow = c(2, 2)) > graphviz.compare(dag1, dag2, dag5, dag6, shape = "rectangle", layout = "fdp", + main = c("ONE", "TWO", "THREE", "FOUR"), + sub = paste("SHD =", c("0", shd(dag1, dag2), shd(dag1, dag5), shd(dag1, dag6))), + diff.args = list(tp.lwd = 2, tp.col = "green", fn.col = "orange"))
Mon Aug 5 02:39:38 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.