## Comparing Bayesian network structures

The network structures of Bayesian networks are stored in objects of class `bn` (documented here); they can be learned from real-world data; they can be learned from synthetic data and golden-standard networks in simulations (examples here); or they can be created manually (see here). Having multiple `bn` objects, we are then interested in comparing those network structures to check whether they are different; and if they are different, which arcs are present (or not) only in some of them and how different they are.

bnlearn implements several functions for this task, all documented here and summarized below:

• Summaries:
• `all.equal()`: checks whether two networks have the same structure.
• `compare()`: takes one network as a reference network, and computes the number of true positive, false positive and false negative arcs in the other network.
• Structural distances:
• `hamming()`: computes the Hamming distance between two networks.
• `shd()`: computes the Structural Hamming distance (SHD) between two networks.
• Visual (graphical) side-by-side comparison:
• `graphviz.compare()`: takes one network as a reference network, and plots all other networks highlighting which are a different from the reference using Rgraphviz.

### Summaries

The simplest, most elementary comparison is to check whether two networks are the same; meaning whether they have the same nodes and same arcs (including their direction). bnlearn provides an `all.equal()` method for `bn` objects that returns `TRUE` if the two networks are indeed the same; or a character string describing the difference. For instance, the networks may have different arcs;

```> library(bnlearn)
> dag1 = model2network("[A][B|A][C|A]")
> dag2 = model2network("[A|B:C][B][C]")
> all.equal(dag1, dag2)
```
```[1] "Different arc sets"
```
```> arcs(dag1)
```
```     from to
[1,] "A"  "B"
[2,] "A"  "C"
```
```> arcs(dag2)
```
```     from to
[1,] "B"  "A"
[2,] "C"  "A"
```

or it may have different nodes.

```> dag3 = model2network("[A][B|A][C|A][D]")
> dag4 = model2network("[A][B|A][F|A]")
> all.equal(dag1, dag3)
```
```[1] "Different number of nodes"
```
```> all.equal(dag1, dag4)
```
```[1] "Different node sets"
```

This approach using `all.equal()` is more robust than using other functions to compare objects available from R core, because the `all.equal()` method for `bn` will not say two networks are different just because the internals of the `bn` objects are different.

`compare()` takes two networks with the same nodes; the first is the `target` network, and the second is the `current` network. The latter will be compared to the former, which is taken to be the “true” or “golden standard” network. Thew default output of compare looks like this:

```> compare(dag1, dag2)
```
```\$tp
[1] 0

\$fp
[1] 2

\$fn
[1] 2
```

The three elements of the list are the counts of:

• the true positive (`tp`) arcs, which appear both in `target` and in `current`;
• the false positive (`fp`) arcs, which appear in `current` but not in `target`;
• the false negative (`fn`) arcs, which appear in `target` but not in `current`.

Setting `arcs = TRUE` makes `compare()` return the actual arcs that fall into each of these categories instead of just the counts.

```> compare(dag1, dag2, arcs = TRUE)
```
```\$tp
from to

\$fp
from to
[1,] "B"  "A"
[2,] "C"  "A"

\$fn
from to
[1,] "A"  "B"
[2,] "A"  "C"
```

### Structural distances

Structural distance functions take two networks and return a number that says how much these networks are different from each other in terms of how many arcs differ between them. Being distances, the order in which the networks are passed as arguments does not matter since distances are symmetric by default.

`hamming()` implements Hamming distance, which simply counts the number of arcs that are different between two network. For undirected graphs, this means arcs that are present in one network and not in the other; for directed graphs, `compare()` just ignores arc directions and compares the skeletons of the two networks.

So, for example, `dag1` and `dag2` above have the same skeleton and therefore their Hamming distance is zero even though they have different arcs.

```> all.equal(skeleton(dag1), skeleton(dag2))
```
```[1] TRUE
```
```> hamming(dag1, dag2)
```
```[1] 0
```

However, if we add an additional arc to `dag1` the Hamming distance between this new network and `dag1` will reflect this since the two have different skeletons.

```> dag5 = set.arc(dag1, from = "B", to = "C")
> all.equal(skeleton(dag1), skeleton(dag5))
```
```[1] "Different number of directed/undirected arcs"
```
```> hamming(dag1, dag5)
```
```[1] 1
```

`shd()` implements the Structural Hamming distance, which counts how many arcs differ between the CPDAGs of the two networks. So, if we look at `dag1` and `dag2` again, these two networks will have a positive distance because `dag2` contains a v-structure that `dag1`.

```> all.equal(cpdag(dag1), cpdag(dag2))
```
```[1] "Different number of directed/undirected arcs"
```
```> shd(dag1, dag2)
```
```[1] 2
```

### Visual comparisons

A third approach to compare network structures is to plot them side by side and highlight differences with different colours. `graphviz.compare()` does that using Rgraphviz while taking care that the nodes have the same position for all networks, to make it easier to spot which arcs are different. As we can see below, and unlike any of the functions we covered above, `graphviz.compare()` can take more than two networks as arguments.

```> dag6 = drop.arc(dag5, from = "A", to = "C")
> par(mfrow = c(3, 2))
> graphviz.compare(dag1, dag2, dag5, dag6)
```
```Loading required namespace: Rgraphviz
```

The default scheme to highlight differences (`diff = "diff-from-first"`) in the arcs is:

• the first network is taken as the “true” network;
• in the other networks, true positive arcs are in black;
• false positive arcs (which are missing or have different directions in the true network) are in red;
• false negative arcs are in blue, and drawn using a dashed line.

The style of the graphical comparison can be further modified using the arguments that `graphviz.compare()` shares with `graphviz.plot()`, such as `shape`, `layout`, `main` and `sub`.

```> par(mfrow = c(2, 2))
> graphviz.compare(dag1, dag2, dag5, dag6, shape = "rectangle", layout = "fdp",
+   main = c("ONE", "TWO", "THREE", "FOUR"),
+   sub = paste("SHD =", c("0", shd(dag1, dag2), shd(dag1, dag5), shd(dag1, dag6))))
```

Other optional arguments specific to `diff = "diff-from-first"` can be passed as a list via the argument `diff.args`. They are:

• `tp.col`, `tp.lty`, `tp.lwd` are the colour, line type and line width of true positive arcs;
• `fp.col`, `fp.lty`, `fp.lwd` are the colour, line type and line width of false positive arcs;
• `fn.col`, `fn.lty`, `fn.lwd` are the colour, line type and line width of negative arcs.

An example:

```> par(mfrow = c(2, 2))
> graphviz.compare(dag1, dag2, dag5, dag6, shape = "rectangle", layout = "fdp",
+   main = c("ONE", "TWO", "THREE", "FOUR"),
+   sub = paste("SHD =", c("0", shd(dag1, dag2), shd(dag1, dag5), shd(dag1, dag6))),
+   diff.args = list(tp.lwd = 2, tp.col = "green", fn.col = "orange"))
```
Last updated on `Tue Nov 8 16:00:56 2022` with bnlearn `4.9-20221107` and `R version 4.2.2 (2022-10-31)`.