Plotting networks with the Rgraphviz package
Plotting network structures, and graphs more in general, is a complex process that involves laying out nodes and
arcs to avoid overlaps and to make arc patterns easy to identify. Hence bnlearn leverages the
de facto-standard Graphviz library through the Rgraphviz package for this task; the built-in
plot()
method (documented here) is extremely limited in what
networks it can plot in a legible way and it should be considered as a last resort for when Rgraphviz
is not available.
The main interface to Rgraphviz in bnlearn is the graphviz.plot()
function (documented here), which is designed to hide much of the
complexity of the underlying Rgraphviz functions. It has few arguments that allow the most common
customizations, and therefore it is not as flexible as Rgraphviz. However, it is simpler to use; and
it returns a graph
object that can be further customised with Rgraphviz for greater
flexibility.
> library(bnlearn)
Options directly exposed from Rgraphviz
graphviz.plot()
shares the following arguments with the functions in Rgraphviz:
layout
: how nodes and arcs are laid out in the plot, can take values"dots"
(the default),"neato"
,"twopi"
,"circo"
and"fdp"
.shape
: the shape of (the frame surrounding the) nodes, can take values"circle"
(the default),"ellipse"
or"rectangle"
.fontsize
: the font size of the node labels, defaulting to12
. This size is somewhat smaller that Rgraphviz's, which sets for size to14
by default.main
: the title of the plot (defaults to an empty string), which is displayed at the top of the plot.sub
: the subtitle of the plot (defaults to an empty string), which is displayed at the bottom of the plot.groups
: nodes that should be displayed close together because they belong to the particular subgraphs.
The ideas behind the individual layouts are:
dot
: nodes are displayed top to bottom following their topological order, so parents are above children and arcs point downwards. Results may be less than intuitive if undirected arcs are present.> tree = model2network("[A][B|A][C|A][D|A][E|B][F|B][G|C][H|C][I|C]") > graphviz.plot(tree, layout = "dot")
neato
: a spring layout that positions the nodes so that their geometric distance in the plot approximates their path distance in the graph. That is, each arc acts as a spring and the layout is the result of all the springs pushing the nodes.> graphviz.plot(tree, layout = "neato")
twopi
: a radial layout that positions nodes in concentric circles as the distance from some root node increases.> graphviz.plot(tree, layout = "twopi")
circo
: a circular layout that finds clusters of connected nodes and represents the as circles.> graphviz.plot(tree, layout = "circo")
fdp
: another spring layout. It often produces nearly the same plots asneato
, but it tries harder to keep nodes apart from each other.> graphviz.plot(tree, layout = "fdp")
The other argument that affects the layout of the graph is groups
, which takes a list containing
the nodes in each subgraph. The nodes in each element of that list will be displayed close together as much as they
possibly can given the arcs in the graph. As an example, below we define four different subgraphs: one with nodes
A
, B
, D
and E
; one with nodes F
, H
and
I
; and one with nodes C
and G
.
> graphviz.plot(tree, groups = list(c("A", "B", "E", "D"), c("F", "H", "I"), c("C", "G")))
The groups
argument works with all the layout
s above.
The different node shape
s are self-explanatory: the default "circle"
is best when node
labels are one- or two-letters strings, while "rectangle"
is the most space-efficient choice when node
labels are longer (it leaves the least space between the label and the surrounding frame). "ellipse"
is
a middle-ground choice that gives a classical look to the plot and can accommodate short node labels.
> dag = model2network("[ok label][long node label|ok label][even longer node label|ok label]") > par(mfrow = c(1, 3)) > for (s in c("circle", "ellipse", "rectangle")) + graphviz.plot(dag, shape = s)
The fontsize
argument works with any combination of the other arguments. Its main use is to tune the
dimension of the node labels so that they are large enough to be readable and small enough to fit comfortably within the
node frames. The default in Rgraphviz is often too large for single-letter or short labels and
round/elliptic node frames: bnlearn defaults to a smaller size for this reason.
> dag = random.graph(LETTERS[1:6]) > par(mfrow = c(1, 3)) > for (fs in c(6, 12, 16)) + graphviz.plot(dag, fontsize = fs)
Highlighting nodes and arcs
Furthermore, graphviz.plot()
has a highlight
argument to highlight particular nodes and/or
arcs in the plot, albeit in a very basic way. highlight
takes a list with at least one of the following
elements:
nodes
: a character vector, the labels of the nodes that will be highlighted.arcs
: the arcs that will be highlighted (a two-column matrix like that returned from thearcs()
function).
Then the following optional graphical parameters describe how the nodes and arcs will be highlighted:
col
: the highlight colour for the arcs and the node frame. The default value is"red"
.textCol
: the highlight colour for the node labels. The default value is"black"
, which effectively means labels are not highlighted.fill
: the background colour for the nodes (inside the node frames). The default value is"transparent"
.lwd
: the line width of highlighted arcs, a positive number.lty
: the line type of highlighted arcs. Possible values are0
,1
,2
,3
,4
,5
,6
,"blank"
,"solid"
,"dashed"
,"dotted"
,"dotdash"
,"longdash"
and"twodash"
.
col
, textCol
and fill
take a colour in any form that R understands, such as an
integer or a character string (see ?colors
to get a list of valid colour names). Each argument takes a
single colour, that is then used for all nodes and arcs. lwd
and lty
also take a single value
each, and default to the corresponding Rgraphviz setting.
As an example, we can highlight node B
with a bright colour,
> graphviz.plot(tree, highlight = list(nodes = "B", + col = "tomato", fill = "orange"))
or its descendants,
> graphviz.plot(tree, highlight = list(nodes = descendants(tree, "B"), + col = "tomato", fill = "orange"))
or its Markov blanket.
> graphviz.plot(tree, highlight = list(nodes = mb(tree, "B"), + col = "tomato", fill = "orange"))
Basically any function that returns node labels (mb
, nbr
, parents
,
children
, etc.) can be used in conjuction with highlight
to programmatically highlight sets
of nodes. The same goes for arcs: we can specify which arcs to highlight either manually or programmatically (with
functions such as incoming.arcs
, outgoing.arcs
, vstructs
, etc.).
> graphviz.plot(tree, highlight = list(arcs = outgoing.arcs(tree, "B"), + col = "limegreen", lwd = 3, lty = "dashed"))
We can also highlight both nodes and arcs at the same time, but as noted above the same colours to highlight both.
> graphviz.plot(tree, + highlight = list(nodes = c("B", descendants(tree, "B")), + arcs = rbind(outgoing.arcs(tree, "B"), incoming.arcs(tree, "B")), + col = "darkblue", fill = "lightblue", lwd = 3, lty = "dashed"))
Refining a plot using Rgraphviz functions
The interface provided by graphviz.plot()
is intentionally simple to make it easy to use for producing
common plots. When we need more flexibility, we can still use graphviz.plot()
to produce a plot with basic
highlighting and we can then customize that with the functions in Rgraphviz.
graphviz.plot()
returns an object of class graph
, which we can save for later use; and it has
an logical argument called render
that controls whether the plot is actually drawn. Hence we can set
render = FALSE
to get graphviz.plot()
to generate a graph object without displaying
anything.
> gR = graphviz.plot(tree, render = FALSE, + highlight = list(nodes = c("B", descendants(tree, "B")), + arcs = rbind(outgoing.arcs(tree, "B"), incoming.arcs(tree, "B")), + col = "darkblue", fill = "lightblue", lwd = 3, lty = "dashed"))
We can the modify the gR
object with one of:
nodeRenderInfo()
, which modifies how nodes are formatted;edgeRenderInfo()
, which modifies how arcs are formatted.
nodeRenderInfo()
can extract all the options that control how nodes are formatted, which are returned as
a list with one element per option. Each element of that list contains a vector with one (named) element for each node
that stores the value for that particular node.
> str(nodeRenderInfo(gR))
List of 16 $ fixedsize : Named logi [1:9] FALSE FALSE FALSE FALSE FALSE FALSE ... ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ rWidth : Named num [1:9] 27 27 27 27 27 27 27 27 27 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ lWidth : Named num [1:9] 27 27 27 27 27 27 27 27 27 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ height : Named num [1:9] 36 36 36 36 36 36 36 36 36 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ nodeX : Named num [1:9] 304 145 304 410 39 145 251 357 463 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ nodeY : Named num [1:9] 454 252 252 252 50 50 50 50 50 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ labelX : Named num [1:9] 304 145 304 410 39 145 251 357 463 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ labelY : Named num [1:9] 454 252 252 252 50 50 50 50 50 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ labelJust : Named chr [1:9] "n" "n" "n" "n" ... ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ labelWidth: Named num [1:9] 10 9 9 10 8 7 10 10 4 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ shape : Named chr [1:9] "rectangle" "rectangle" "rectangle" "rectangle" ... ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ style : Named chr [1:9] "" "" "" "" ... ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ col : Named chr [1:9] "black" "darkblue" "black" "black" ... ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ fill : Named chr [1:9] "transparent" "lightblue" "transparent" "transparent" ... ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ fontsize : Named num [1:9] 12 12 12 12 12 12 12 12 12 ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ... $ textCol : Named chr [1:9] NA "black" NA NA ... ..- attr(*, "names")= chr [1:9] "A" "B" "C" "D" ...
nodeRenderInfo()
can also be used to change the formatting of the nodes in the object before re-drawing
the graph. This can be done by saving, modifying and reassigning the list above. As an example, below we change the
children of C
to have "tomato"
-coloured labels and (rectangular) frames.
> node.attrs = nodeRenderInfo(gR) > node.attrs$textCol[children(tree, "C")] = "tomato" > node.attrs$col[children(tree, "C")] = "tomato" > node.attrs$shape[children(tree, "C")] = "rectangle" > nodeRenderInfo(gR) = node.attrs
We can then call the renderGraph()
function from Rgraphviz to display (or to save to
a file) the modified graph in gR
.
> renderGraph(gR)
The code below does the same thing in a more compact way, but beware that is noticeably slower with large graphs.
> nodeRenderInfo(gR)$textCol[children(tree, "C")] = "tomato" > nodeRenderInfo(gR)$col[children(tree, "C")] = "tomato" > nodeRenderInfo(gR)$shape[children(tree, "C")] = "rectangle"
edgeRenderInfo()
works in the same way, but for the attributes that control how arcs are formatted. The
list it returns is structured as that returned by nodeRenderInfo()
; we do not print it below because it is
quite long.
> str(edgeRenderInfo(gR))
An important difference is that the named elements in the vectors that correspond to the various attributes are named
using a concatenation of the endpoints of the arc separated by a "~"
; the order in which the nodes appear
in the concatenated string is an implementation detail of the graph package and it is difficult to
predict.
> names(edgeRenderInfo(gR)$col)
[1] "A~B" "A~C" "A~D" "B~E" "B~F" "C~G" "C~H" "C~I"
As an example, below we change the arcs pointing towards G
, H
and I
to use
two-dash
, "tomato"
-coloured lines and "diamond"
-shaped arrowheads.
> arc.attrs = edgeRenderInfo(gR) > arc.attrs$col[c("C~G", "C~H", "C~I")] = "tomato" > arc.attrs$lty[c("C~G", "C~H", "C~I")] = "twodash" > arc.attrs$arrowhead[c("C~G", "C~H", "C~I")] = "diamond" > edgeRenderInfo(gR) = arc.attrs > renderGraph(gR)
edgeRenderInfo()
can also be used in the same compact way as nodeRenderInfo()
, with the
same issues about speed when formatting large graphs.
Plotting networks left-to-right instead of top-to-bottom
The layoutGraph()
function from the Rgraphviz package is called internally from
graphviz.plot
to position the nodes and lay out the arcs. It is worth mentioning, however, because we can
call it explicitly on a graph
object to change the default top-to-bottom layout given by
layout = "dot"
into the left-to-right layout commonly used to draw dynamic Bayesian networks and decision
trees.
> gR = layoutGraph(gR, attrs = list(graph = list(rankdir = "LR"))) > renderGraph(gR)
As it should be clear from the plot above, unfortunately layoutGraph()
removes many of the attributes we
have previously set on the nodes and the arcs when re-drawing them in the new layout. Hence it should be called before
using nodeRenderInfo()
and edgeRenderInfo()
to avoid having to format arcs and nodes twice,
which is what we are doing below.
> nodeRenderInfo(gR)$col[c("B", descendants(tree, "B"))] = "darkblue" > nodeRenderInfo(gR)$fill[c("B", descendants(tree, "B"))] = "lightblue" > edgeRenderInfo(gR)$col[c("A~B", "B~E", "B~F")] = "darkblue" > nodeRenderInfo(gR)$textCol[children(tree, "C")] = "tomato" > nodeRenderInfo(gR)$col[children(tree, "C")] = "tomato" > nodeRenderInfo(gR)$shape[children(tree, "C")] = "rectangle" > edgeRenderInfo(gR)$col[c("C~G", "C~H", "C~I")] = "tomato" > edgeRenderInfo(gR)$lty[c("C~G", "C~H", "C~I")] = "twodash" > edgeRenderInfo(gR)$arrowhead[c("C~G", "C~H", "C~I")] = "diamond" > renderGraph(gR)
Note that we cannot just reuse the node.attrs
and arc.attrs
objects we created earlier
because (among other things) they store the positions of the nodes and the arcs from the graph
object
originally created by graphviz.plot()
. Assigning node.attrs
and arc.attrs
via
nodeRenderInfo(gR)
and edgeRenderInfo(gR)
would thus undo the changes made by
layoutgraph()
.
Mon Aug 5 02:43:22 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.