Creating Bayesian network structures
The graph structure of a Bayesian network is stored in an object of class bn
(documented
here). We can create such an object in various ways through three
possible representations: the arc set of the graph, its adjacency matrix or a model formula.
In addition, we can also generate empty and random network structures with the
empty.graph()
and random.graph()
functions (both documented
here).
Creating an empty network
> library(bnlearn) > e = empty.graph(LETTERS[1:6]) > class(e)
[1] "bn"
> e
Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty
Note that by default empty.graph()
returns a single object of class bn
. We can generate
multiple graphs in a single batch by specifying the num
argument of empty.graph()
; in that
case the graphs will be stored in a list of bn
objects.
> empty.graph(LETTERS[1:6], num = 2)
[[1]] Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty [[2]] Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty
Creating a complete network
> complete.graph(LETTERS[1:6])
Random/Generated Bayesian network model: [A][B|A][C|A:B][D|A:B:C][E|A:B:C:D][F|A:B:C:D:E] nodes: 6 arcs: 15 undirected arcs: 0 directed arcs: 15 average markov blanket size: 5.00 average neighbourhood size: 5.00 average branching factor: 2.50 generation algorithm: Complete DAGs
A complete directed acyclic graph has an arc between each pair of nodes. The direction of the arcs is determined by
the order of node labels in the call to complete.graph
.
> par(mfrow = c(1, 2)) > graphviz.compare(complete.graph(LETTERS[1:6]), complete.graph(LETTERS[6:1]), + layout = "circo", main = c("From A to E", "From E to A"))
Creating a network structure
With a specific arc set
First, we create an arc set in the form of a matrix with two columns, optionally labelled from
and
to
; each row describes one arc using the labels of the nodes at the tail (from
) and at the
head (to
) of the arc.
> arc.set = matrix(c("A", "C", "B", "F", "C", "F"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > arc.set
from to [1,] "A" "C" [2,] "B" "F" [3,] "C" "F"
Then we can assign the newly created arc.set
to a bn
object using the assignment version of
the arcs()
function (documented here).
> arcs(e) = arc.set > e
Random/Generated Bayesian network model: [A][B][D][E][C|A][F|B:C] nodes: 6 arcs: 3 undirected arcs: 0 directed arcs: 3 average markov blanket size: 1.33 average neighbourhood size: 1.00 average branching factor: 0.50 generation algorithm: Empty
Note that by default arcs()
performs several checks on the arc set, so:
- We cannot introduce arcs with node labels that are not present in the graph.
> bogus = matrix(c("X", "Y", "W", "Z"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > bogus
from to [1,] "X" "Y" [2,] "W" "Z"
> arcs(e) = bogus
## Error: node(s) 'X' 'W' 'Y' 'Z' not present in the graph.
- We cannot introduce cycles by mistake (e.g.
A
→B
→C
→A
) unless we set theignore.cycles
toTRUE
.> cycle = matrix(c("A", "C", "C", "B", "B", "A"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > cycle
from to [1,] "A" "C" [2,] "C" "B" [3,] "B" "A"
> arcs(e) = cycle
## Error: the specified network contains cycles.
> arcs(e, ignore.cycles = TRUE) = cycle
## Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = structure(c("A", : unused argument (ignore.cycles = TRUE)
> acyclic(e)
[1] TRUE
- We cannot introduce loops by mistake (e.g.
A
→A
).> loops = matrix(c("A", "A", "B", "B", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > loops
from to [1,] "A" "A" [2,] "B" "B" [3,] "C" "D"
> arcs(e) = loops
## Error: invalid arcs that are actually loops: ## A -> A ## B -> B
- We can, however, introduce undirected arcs (i.e. edges) by including both directions of an arc in the
arc set (e.g.
A
→B
andB
→A
) as long as it is clear no cycle would be introduced by either direction. Again, specifyingignore.cycles
overrides this check.> edges = matrix(c("A", "B", "B", "A", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > edges
from to [1,] "A" "B" [2,] "B" "A" [3,] "C" "D"
> arcs(e) = edges
With a specific adjacency matrix
We can create a graph structure using an adjaceny matrix with amat()
(documented
here) in the same way as we did above using an arc set with
arcs()
. First, we create an adjacency matrix with 0/1 integer elements (note the use of 0L
and
1L
) with the 1s corresponding to the arcs. Using a matrix with numeric, non-integer entries (that is,
0
and 1
instead of 0L
and 1L
) works as well but uses more memory,
which may be critical in large graphs.
> adj = matrix(0L, ncol = 6, nrow = 6, + dimnames = list(LETTERS[1:6], LETTERS[1:6])) > adj["A", "C"] = 1L > adj["B", "F"] = 1L > adj["C", "F"] = 1L > adj["D", "E"] = 1L > adj["A", "E"] = 1L > adj
A B C D E F A 0 0 1 0 1 0 B 0 0 0 0 0 1 C 0 0 0 0 0 1 D 0 0 0 0 1 0 E 0 0 0 0 0 0 F 0 0 0 0 0 0
Then we introduce the arcs in the graph with the assignment version of amat()
.
> amat(e) = adj > e
Random/Generated Bayesian network model: [A][B][D][C|A][E|A:D][F|B:C] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 generation algorithm: Empty
Note that amat()
performs the same checks as arcs()
to prevent the creation of misspecified
graph structures.
With a specific model formula
The format of the model formula is originally from the deal package (CRAN), and is defined as follows:
- each node is enclosed in square brackets; and
- its parents are listed after a pipe and separated by colons, also within the same set of square brackets.
Such a formula can be used in two ways to define a graph structure:
- With the
model2network()
function (documented here), without creating an empty graph first.> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")
Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty
- With
modelstring()
(documented here) and an existingbn
object, as forarcs()
andamat()
above.> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]" > e
Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty
Again, note that both model2network()
and modelstring()
perform checks on the model formula
to prevent the creation of misspecified graphs.
Creating one or more random network structures
With a specified node ordering
By default, random.graph()
generates graphs consistent with the ordering of the nodes provided by the
user; in each arc, the node at the tail precedes the node at the head in the node set. Arcs are samples independently
with a probability of inclusion specified by the prob
argument, whose default is set so that there are on
average as many arcs as nodes in the graph.
> random.graph(LETTERS[1:6], prob = 0.1)
Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Full Ordering arc sampling probability: 0.1
As was the case for empty.graph()
, we can generate multiple graphs in one batch using the
num
argument. In that case, graphs are returned in a list which is very convenient to pass to
lapply()
for further computations.
Sampling from the space of connected directed acyclic graphs with uniform probability
In addition to the default method = "ordered"
, we can also set method = "ic-dag"
to sample
with uniform probability from the space of connected directed acyclic graphs using Ide & Cozman MCMC sampler. Note
that the suggested length of the burn-in scales quadratically with the number of nodes, so generating multiple graphs in
a single batch with num
is much more efficient than generating them one at a time.
> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
[[1]] Random/Generated Bayesian network model: [D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F] nodes: 6 arcs: 11 undirected arcs: 0 directed arcs: 11 average markov blanket size: 4.00 average neighbourhood size: 3.67 average branching factor: 1.83 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf [[2]] Random/Generated Bayesian network model: [D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F] nodes: 6 arcs: 11 undirected arcs: 0 directed arcs: 11 average markov blanket size: 4.00 average neighbourhood size: 3.67 average branching factor: 1.83 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf
This methods accepts several optional constraints on the structure of the generated graphs, as specified by the following arguments:
burn.in
: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.every
: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.max.degree
: the maximum degree of a node.max.in.degree
: the maximum in-degree.max.out.degree
: the maximum out-degree.
So, for example:
> random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4, + every = 50, max.degree = 3)
[[1]] Random/Generated Bayesian network model: [F][B|F][A|B][C|B][D|A:C][E|A:D:F] nodes: 6 arcs: 8 undirected arcs: 0 directed arcs: 8 average markov blanket size: 3.67 average neighbourhood size: 2.67 average branching factor: 1.33 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3 [[2]] Random/Generated Bayesian network model: [B][A|B][F|B][D|A][E|D:F][C|A:B:E] nodes: 6 arcs: 8 undirected arcs: 0 directed arcs: 8 average markov blanket size: 3.67 average neighbourhood size: 2.67 average branching factor: 1.33 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3
Sampling from the space of the directed acyclic graphs with uniform probability
Melançon's MCMC algorithm samples with uniform probability from the space of directed acyclic graphs (not
necessarily connected); we can use it by specifying method = "melancon"
.
> random.graph(LETTERS[1:6], method = "melancon")
Random/Generated Bayesian network model: [D][C|D][F|C][A|D:F][B|C:D:F][E|C:F] nodes: 6 arcs: 9 undirected arcs: 0 directed arcs: 9 average markov blanket size: 3.33 average neighbourhood size: 3.00 average branching factor: 1.50 generation algorithm: Melancon's Uniform Probability DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf
As far as optional arguments are concerned, it is identical to Ide & Cozman's algorithm.
Mon Aug 5 02:41:34 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.