## Plotting arc strengths with Rgraphviz and lattice

Measuring the strength of the relationship represented by each arc in a Bayesian network is a fundamental tool to investigate and validate its structure. Arc strengths are mostly used programmatically for both purposes; nevertheless it is sometimes useful to explore them visually.

We will consider all the different measures of arc strengths implemented in **bnlearn**:

- p-values from the conditional independence tests that would remove individual arcs present in a network;
> library(bnlearn) > dag = model2network("[A][C][F|C:A][B|A][D|A:C][E|B:F]") > pvalues = arc.strength(dag, data = learning.test, criterion = "x2") > head(pvalues)

from to strength 1 C F 6.564063e-01 2 A F 7.470134e-01 3 A B 0.000000e+00 4 A D 0.000000e+00 5 C D 0.000000e+00 6 B E 6.226919e-289

- differences in network score from removing individual arcs present in a network;
> score.deltas = arc.strength(dag, data = learning.test, criterion = "bic") > head(score.deltas)

from to strength 1 C F 23.47069 2 A F 23.81276 3 A B -1153.87956 4 A D -1938.94993 5 C D -823.76049 6 B E -720.82662

- probabilities of inclusion of all possible arcs, and their directions;
> bagging = boot.strength(learning.test, algorithm = "tabu", R = 200) > head(bagging)

from to strength direction 1 A B 1.000 0.5 2 A C 0.000 0.0 3 A D 1.000 1.0 4 A E 0.000 0.0 5 A F 0.005 0.5 6 B A 1.000 0.5

- probabilities of inclusion of individual arcs present in a network, and their directions.
> bayes.factors = bf.strength(dag, data = learning.test)

> head(bayes.factors)

from to strength direction 1 A B 1.000000e+00 5.000000e-01 2 A C 3.251115e-08 5.000000e-01 3 A D 1.000000e+00 1.000000e+00 4 A E 2.667283e-54 1.000000e+00 5 A F 1.016754e-04 2.065548e-09 6 B A 1.000000e+00 5.000000e-01

As far as plotting is concerned, the `bn.strength`

objects produced above differ in two key ways:

`pvalues`

and`score.deltas`

contain an entry for each of the arcs in`dag`

, and provide the confidence level in each arc's presence (in the`strength`

column);`bagging`

and`bayes.factors`

contain an entry for every possible arc (that is, for every pair of nodes), and provide the confidence level in each arc's presence (in the`strength`

column) and direction (in the`direction`

column).

The `strength`

values in `pvalues`

, `bagging`

and `bayes.factors`

are
bound in the [0, 1] interval; and the same is true for the `direction`

values in `bagging`

and
`bayes.factors`

. The `strength`

values in `score.deltas`

are real numbers.

### Plotting the distribution of arc strengths

The `bagging`

and `bayes.factors`

encode the complete distribution of the arc strengths, since
they cover all possible arcs. This makes it possible to plot the empirical cumulative distribution function (ECDF) of
the confidence in the arcs' presence (modulo their directions) using the values stored in the `strength`

column. This is what the default `plot()`

method does for the `bn.strength`

objects produced
by `boot.strength()`

and `bf.strength()`

.

> plot(bagging)

The plot is defined in [0, 1]^{2} because both the arc strengths and their cumulative probabilities are
defined in [0, 1]. At the left-hand side, the vertical gap between `0.0`

and the first point gives the
proportion of the possible arcs that have strength equal to zero: they are completely unsupported by the data. At the
right hand side, the gap between the horizontal line and `1.0`

gives the proportion of arcs that have
strength equal to one, the maximum possible. The vertical dashed line is the default threshold for significance computed
by `inclusion.threshold()`

(we can choose not to draw it by setting `draw.threshold = FALSE`

).
Points to the right of the threshold represent arcs that would be included in the Bayesian network produced by
`averaged.network()`

.

If structure learning is stable, that is, it reliably learns the same set of arcs, then the ECDF looks like the above: arcs cluster around strength zero and around strength one, with a large gap between the two groups. If, on the other hand, structure learning is noisy then the points in the ECDF are more uniformly distributed along the x-axis to represent arcs with varying strength values. Look, for instance, at how the ECDF changes if we reduce the sample size from 5000 to 50.

> weak = boot.strength(learning.test[1:50, ], algorithm = "tabu", R = 200) > plot(weak)

The `pvalues`

and `score.deltas`

objects typically contain information on just a few arcs, so
it is not meaningful to plot their ECDF.

### Plotting the network structure together with the arc strengths

Another useful type of plot is overlaying arc strengths on a network to get a visual impression of the layout of the
arcs and their presence at the same time. `strength.plot()`

implements this by increasing each arc's
thickness in proportion to its strength, and by drawing weak arcs with dashed lines. Modes and arcs are laid out using
the **Rgraphviz** package as in `graphviz.plot()`

.

When arc strengths are estimated using p-values, by default they are grouped in the following intervals to determine the thickness of the arcs:

> c(0, 0.05 / c(10, 5, 2, 1.5, 1), 1)

[1] 0.00000000 0.00500000 0.01000000 0.02500000 0.03333333 0.05000000 1.00000000

The value of the significance threshold is the `0.05`

above by default, but it can be changed using the
`threshold`

argument of `strength.plot()`

. Arcs with strengths in the right-most interval are
associated with p-values larger than the threshold and are plotted with dashed lines. Other arcs are drawn with
increasingly thick lines as we move intervals from right to left, that is, the closer to zero the interval the pvalues
fall in is.

Arc strengths estimated using scores are discretized in a similar way by default. Since the scale of the score
differences depends on the specific network and on the data they were learned from, intervals are defined using the
`c(0.50, 0.75, 0.90, 0.95, 1)`

empirical quantiles of positive score differences. (They correspond to arcs
that are well supported by the data: if we remove one of those arcs, the network score increases.) The significance
threshold is equal to zero by default, but it can be changed with the `threshold`

argument.

When arc strength is estimated using `boot.strength()`

or `bf.strength()`

, it takes values
in [0, 1] again but this time higher values are better. (With p-values, lower values are better.) Hence intervals by
default are defined as:

> c(0, (1 - threshold) / c(10, 5, 2, 1.5, 1), 1)

The resulting plots for `pvalues`

, `score.deltas`

, `bagging`

and
`bayes.factors`

are below.

> par(mfrow = c(2, 2)) > strength.plot(dag, strength = pvalues, main = "pvalues")

> strength.plot(dag, strength = score.deltas, main = "score.deltas") > strength.plot(dag, strength = bagging, main = "bagging") > strength.plot(dag, strength = bayes.factors, main = "bayes.factors")

The plots produced by `strength.plot()`

can be customized much in the same way as those produced by
`graphviz.plot()`

, which we illustrated here. An example:

> strength.plot(dag, strength = bagging, main = "bagging", shape = "rectangle", + highlight = list(nodes = "F", arcs = incident.arcs(dag, "F"), + col = "darkblue", fill = "lightblue"))

Obviously, setting either `lwd`

or `lty`

in the `highlight`

argument will reset the
formatting introduced by `strength.plot()`

and should be avoided. In addition, we can manually specify both
the `threshold`

of significance and the `cutpoints`

used to bin the arc strengths into
intervals.

> strength.plot(dag, strength = pvalues, + threshold = 0.01, cutpoints = c(0, 1e-300, 1e-250, 1e-200, 0.01, 1))

`Thu Nov 24 19:19:44 2022`

with **bnlearn**

`4.9-20221107`

and `R version 4.2.2 (2022-10-31)`

.