Plotting networks with the Rgraphviz package

Plotting network structures, and graphs more in general, is a complex process that involves laying out nodes and arcs to avoid overlaps and to make arc patterns easy to identify. Hence bnlearn leverages the de facto-standard Graphviz library through the Rgraphviz package for this task; the built-in plot() method (documented here) is extremely limited in what networks it can plot in a legible way and it should be considered as a last resort for when Rgraphviz is not available.

The main interface to Rgraphviz in bnlearn is the graphviz.plot() function (documented here), which is designed to hide much of the complexity of the underlying Rgraphviz functions. It has few arguments that allow the most common customizations, and therefore it is not as flexible as Rgraphviz. However, it is simpler to use; and it returns a graph object that can be further customised with Rgraphviz for greater flexibility.

> library(bnlearn)

Options directly exposed from Rgraphviz

graphviz.plot() shares the following arguments with the functions in Rgraphviz:

  • layout: how nodes and arcs are laid out in the plot, can take values "dots" (the default), "neato", "twopi", "circo" and "fdp".
  • shape: the shape of (the frame surrounding the) nodes, can take values "circle" (the default), "ellipse" or "rectangle".
  • fontsize: the font size of the node labels, defaulting to 12. This size is somewhat smaller that Rgraphviz's, which sets for size to 14 by default.
  • main: the title of the plot (defaults to an empty string), which is displayed at the top of the plot.
  • sub: the subtitle of the plot (defaults to an empty string), which is displayed at the bottom of the plot.
  • groups: nodes that should be displayed close together because they belong to the particular subgraphs.

The ideas behind the individual layouts are:

  • dot: nodes are displayed top to bottom following their topological order, so parents are above children and arcs point downwards. Results may be less than intuitive if undirected arcs are present.
    > tree = model2network("[A][B|A][C|A][D|A][E|B][F|B][G|C][H|C][I|C]")
    > graphviz.plot(tree, layout = "dot")
    
    plot of chunk unnamed-chunk-3
  • neato: a spring layout that positions the nodes so that their geometric distance in the plot approximates their path distance in the graph. That is, each arc acts as a spring and the layout is the result of all the springs pushing the nodes.
    > graphviz.plot(tree, layout = "neato")
    
    plot of chunk unnamed-chunk-4
  • twopi: a radial layout that positions nodes in concentric circles as the distance from some root node increases.
    > graphviz.plot(tree, layout = "twopi")
    
    plot of chunk unnamed-chunk-5
  • circo: a circular layout that finds clusters of connected nodes and represents the as circles.
    > graphviz.plot(tree, layout = "circo")
    
    plot of chunk unnamed-chunk-6
  • fdp: another spring layout. It often produces nearly the same plots as neato, but it tries harder to keep nodes apart from each other.
    > graphviz.plot(tree, layout = "fdp")
    
    plot of chunk unnamed-chunk-7

The other argument that affects the layout of the graph is groups, which takes a list containing the nodes in each subgraph. The nodes in each element of that list will be displayed close together as much as they possibly can given the arcs in the graph. As an example, below we define four different subgraphs: one with nodes A, B, D and E; one with nodes F, H and I; and one with nodes C and G.

> graphviz.plot(tree, groups = list(c("A", "B", "E", "D"), c("F", "H", "I"), c("C", "G")))
plot of chunk unnamed-chunk-8

The groups argument works with all the layouts above.

The different node shapes are self-explanatory: the default "circle" is best when node labels are one- or two-letters strings, while "rectangle" is the most space-efficient choice when node labels are longer (it leaves the least space between the label and the surrounding frame). "ellipse" is a middle-ground choice that gives a classical look to the plot and can accommodate short node labels.

> dag = model2network("[ok label][long node label|ok label][even longer node label|ok label]")
> par(mfrow = c(1, 3))
> for (s in c("circle", "ellipse", "rectangle"))
+   graphviz.plot(dag, shape = s)
plot of chunk unnamed-chunk-9

The fontsize argument works with any combination of the other arguments. Its main use is to tune the dimension of the node labels so that they are large enough to be readable and small enough to fit comfortably within the node frames. The default in Rgraphviz is often too large for single-letter or short labels and round/elliptic node frames: bnlearn defaults to a smaller size for this reason.

> dag = random.graph(LETTERS[1:6])
> par(mfrow = c(1, 3))
> for (fs in c(6, 12, 16))
+   graphviz.plot(dag, fontsize = fs)
plot of chunk unnamed-chunk-10

Highlighting nodes and arcs

Furthermore, graphviz.plot() has a highlight argument to highlight particular nodes and/or arcs in the plot, albeit in a very basic way. highlight takes a list with at least one of the following elements:

  • nodes: a character vector, the labels of the nodes that will be highlighted.
  • arcs: the arcs that will be highlighted (a two-column matrix like that returned from the arcs() function).

Then the following optional graphical parameters describe how the nodes and arcs will be highlighted:

  • col: the highlight colour for the arcs and the node frame. The default value is "red".
  • textCol: the highlight colour for the node labels. The default value is "black", which effectively means labels are not highlighted.
  • fill: the background colour for the nodes (inside the node frames). The default value is "transparent".
  • lwd: the line width of highlighted arcs, a positive number.
  • lty: the line type of highlighted arcs. Possible values are 0, 1, 2, 3, 4, 5, 6, "blank", "solid", "dashed", "dotted", "dotdash", "longdash" and "twodash".

col, textCol and fill take a colour in any form that R understands, such as an integer or a character string (see ?colors to get a list of valid colour names). Each argument takes a single colour, that is then used for all nodes and arcs. lwd and lty also take a single value each, and default to the corresponding Rgraphviz setting.

As an example, we can highlight node B with a bright colour,

> graphviz.plot(tree, highlight = list(nodes = "B",
+                                      col = "tomato", fill = "orange"))
plot of chunk unnamed-chunk-11

or its descendants,

> graphviz.plot(tree, highlight = list(nodes = descendants(tree, "B"),
+                                      col = "tomato", fill = "orange"))
plot of chunk unnamed-chunk-12

or its Markov blanket.

> graphviz.plot(tree, highlight = list(nodes = mb(tree, "B"),
+                                      col = "tomato", fill = "orange"))
plot of chunk unnamed-chunk-13

Basically any function that returns node labels (mb, nbr, parents, children, etc.) can be used in conjuction with highlight to programmatically highlight sets of nodes. The same goes for arcs: we can specify which arcs to highlight either manually or programmatically (with functions such as incoming.arcs, outgoing.arcs, vstructs, etc.).

> graphviz.plot(tree, highlight = list(arcs = outgoing.arcs(tree, "B"),
+                                      col = "limegreen", lwd = 3, lty = "dashed"))
plot of chunk unnamed-chunk-14

We can also highlight both nodes and arcs at the same time, but as noted above the same colours to highlight both.

> graphviz.plot(tree,
+   highlight = list(nodes = c("B", descendants(tree, "B")),
+                    arcs = rbind(outgoing.arcs(tree, "B"), incoming.arcs(tree, "B")),
+                    col = "darkblue", fill = "lightblue", lwd = 3, lty = "dashed"))
plot of chunk unnamed-chunk-15

Refining a plot using Rgraphviz functions

The interface provided by graphviz.plot() is intentionally simple to make it easy to use for producing common plots. When we need more flexibility, we can still use graphviz.plot() to produce a plot with basic highlighting and we can then customize that with the functions in Rgraphviz. graphviz.plot() returns an object of class graph, which we can save for later use; and it has an logical argument called render that controls whether the plot is actually drawn. Hence we can set render = FALSE to get graphviz.plot() to generate a graph object without displaying anything.

> gR = graphviz.plot(tree, render = FALSE,
+        highlight = list(nodes = c("B", descendants(tree, "B")),
+                         arcs = rbind(outgoing.arcs(tree, "B"), incoming.arcs(tree, "B")),
+                         col = "darkblue", fill = "lightblue", lwd = 3, lty = "dashed"))

We can the modify the gR object with one of:

  • nodeRenderInfo(), which modifies how nodes are formatted;
  • edgeRenderInfo(), which modifies how arcs are formatted.

nodeRenderInfo() can extract all the options that control how nodes are formatted, which are returned as a list with one element per option. Each element of that list contains a vector with one (named) element for each node that stores the value for that particular node.

> str(nodeRenderInfo(gR))
List of 16
 $ fixedsize : Named logi [1:9] FALSE FALSE FALSE FALSE FALSE FALSE ...
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ rWidth    : Named num [1:9] 27 27 27 27 27 27 27 27 27
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ lWidth    : Named num [1:9] 27 27 27 27 27 27 27 27 27
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ height    : Named num [1:9] 36 36 36 36 36 36 36 36 36
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ nodeX     : Named num [1:9] 304 145 304 410 39 145 251 357 463
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ nodeY     : Named num [1:9] 454 252 252 252 50 50 50 50 50
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ labelX    : Named num [1:9] 304 145 304 410 39 145 251 357 463
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ labelY    : Named num [1:9] 454 252 252 252 50 50 50 50 50
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ labelJust : Named chr [1:9] "n" "n" "n" "n" ...
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ labelWidth: Named num [1:9] 10 9 9 10 8 7 10 10 4
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ shape     : Named chr [1:9] "rectangle" "rectangle" "rectangle" "rectangle" ...
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ style     : Named chr [1:9] "" "" "" "" ...
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ col       : Named chr [1:9] "black" "darkblue" "black" "black" ...
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ fill      : Named chr [1:9] "transparent" "lightblue" "transparent" "transparent" ...
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ fontsize  : Named num [1:9] 12 12 12 12 12 12 12 12 12
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
 $ textCol   : Named chr [1:9] NA "black" NA NA ...
  ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...

nodeRenderInfo() can also be used to change the formatting of the nodes in the object before re-drawing the graph. This can be done by saving, modifying and reassigning the list above. As an example, below we change the children of C to have "tomato"-coloured labels and (rectangular) frames.

> node.attrs = nodeRenderInfo(gR)
> node.attrs$textCol[children(tree, "C")] = "tomato"
> node.attrs$col[children(tree, "C")] = "tomato"
> node.attrs$shape[children(tree, "C")] = "rectangle"
> nodeRenderInfo(gR) = node.attrs

We can then call the renderGraph() function from Rgraphviz to display (or to save to a file) the modified graph in gR.

> renderGraph(gR)
plot of chunk unnamed-chunk-19

The code below does the same thing in a more compact way, but beware that is noticeably slower with large graphs.

> nodeRenderInfo(gR)$textCol[children(tree, "C")] = "tomato"
> nodeRenderInfo(gR)$col[children(tree, "C")] = "tomato"
> nodeRenderInfo(gR)$shape[children(tree, "C")] = "rectangle"

edgeRenderInfo() works in the same way, but for the attributes that control how arcs are formatted. The list it returns is structured as that returned by nodeRenderInfo(); we do not print it below because it is quite long.

> str(edgeRenderInfo(gR))

An important difference is that the named elements in the vectors that correspond to the various attributes are named using a concatenation of the endpoints of the arc separated by a "~"; the order in which the nodes appear in the concatenated string is an implementation detail of the graph package and it is difficult to predict.

> names(edgeRenderInfo(gR)$col)
[1] "A~B" "A~C" "A~D" "B~E" "B~F" "C~G" "C~H" "C~I"

As an example, below we change the arcs pointing towards G, H and I to use two-dash, "tomato"-coloured lines and "diamond"-shaped arrowheads.

> arc.attrs = edgeRenderInfo(gR)
> arc.attrs$col[c("C~G", "C~H", "C~I")] = "tomato"
> arc.attrs$lty[c("C~G", "C~H", "C~I")] = "twodash"
> arc.attrs$arrowhead[c("C~G", "C~H", "C~I")] = "diamond"
> edgeRenderInfo(gR) = arc.attrs
> renderGraph(gR)
plot of chunk unnamed-chunk-23

edgeRenderInfo() can also be used in the same compact way as nodeRenderInfo(), with the same issues about speed when formatting large graphs.

Plotting networks left-to-right instead of top-to-bottom

The layoutGraph() function from the Rgraphviz package is called internally from graphviz.plot to position the nodes and lay out the arcs. It is worth mentioning, however, because we can call it explicitly on a graph object to change the default top-to-bottom layout given by layout = "dot" into the left-to-right layout commonly used to draw dynamic Bayesian networks and decision trees.

> gR = layoutGraph(gR, attrs = list(graph = list(rankdir = "LR")))
> renderGraph(gR)
plot of chunk unnamed-chunk-24

As it should be clear from the plot above, unfortunately layoutGraph() removes many of the attributes we have previously set on the nodes and the arcs when re-drawing them in the new layout. Hence it should be called before using nodeRenderInfo() and edgeRenderInfo() to avoid having to format arcs and nodes twice, which is what we are doing below.

> nodeRenderInfo(gR)$col[c("B", descendants(tree, "B"))] = "darkblue"
> nodeRenderInfo(gR)$fill[c("B", descendants(tree, "B"))] = "lightblue"
> edgeRenderInfo(gR)$col[c("A~B", "B~E", "B~F")] = "darkblue"
> nodeRenderInfo(gR)$textCol[children(tree, "C")] = "tomato"
> nodeRenderInfo(gR)$col[children(tree, "C")] = "tomato"
> nodeRenderInfo(gR)$shape[children(tree, "C")] = "rectangle"
> edgeRenderInfo(gR)$col[c("C~G", "C~H", "C~I")] = "tomato"
> edgeRenderInfo(gR)$lty[c("C~G", "C~H", "C~I")] = "twodash"
> edgeRenderInfo(gR)$arrowhead[c("C~G", "C~H", "C~I")] = "diamond"
> renderGraph(gR)
plot of chunk unnamed-chunk-25

Note that we cannot just reuse the node.attrs and arc.attrs objects we created earlier because (among other things) they store the positions of the nodes and the arcs from the graph object originally created by graphviz.plot(). Assigning node.attrs and arc.attrs via nodeRenderInfo(gR) and edgeRenderInfo(gR) would thus undo the changes made by layoutgraph().

Last updated on Mon Aug 5 02:43:22 2024 with bnlearn 5.0 and R version 4.4.1 (2024-06-14).