Creating Bayesian network structures
The graph structure of a Bayesian network is stored in an object of class bn (documented
here). We can create such an object in various ways through three
possible representations: the arc set of the graph, its adjacency matrix or a model formula.
In addition, we can also generate empty and random network structures with the
empty.graph() and random.graph() functions (both documented
here).
Creating an empty network
> library(bnlearn) > e = empty.graph(LETTERS[1:6]) > class(e)
[1] "bn"
> e
Random/Generated Bayesian network
model:
[A][B][C][D][E][F]
nodes: 6
arcs: 0
undirected arcs: 0
directed arcs: 0
average markov blanket size: 0.00
average neighbourhood size: 0.00
average branching factor: 0.00
generation algorithm: Empty
Note that by default empty.graph() returns a single object of class bn. We can generate
multiple graphs in a single batch by specifying the num argument of empty.graph(); in that
case the graphs will be stored in a list of bn objects.
> empty.graph(LETTERS[1:6], num = 2)
[[1]]
Random/Generated Bayesian network
model:
[A][B][C][D][E][F]
nodes: 6
arcs: 0
undirected arcs: 0
directed arcs: 0
average markov blanket size: 0.00
average neighbourhood size: 0.00
average branching factor: 0.00
generation algorithm: Empty
[[2]]
Random/Generated Bayesian network
model:
[A][B][C][D][E][F]
nodes: 6
arcs: 0
undirected arcs: 0
directed arcs: 0
average markov blanket size: 0.00
average neighbourhood size: 0.00
average branching factor: 0.00
generation algorithm: Empty
Creating a complete network
> complete.graph(LETTERS[1:6])
Random/Generated Bayesian network
model:
[A][B|A][C|A:B][D|A:B:C][E|A:B:C:D][F|A:B:C:D:E]
nodes: 6
arcs: 15
undirected arcs: 0
directed arcs: 15
average markov blanket size: 5.00
average neighbourhood size: 5.00
average branching factor: 2.50
generation algorithm: Complete DAGs
A complete directed acyclic graph has an arc between each pair of nodes. The direction of the arcs is determined by
the order of node labels in the call to complete.graph.
> par(mfrow = c(1, 2)) > graphviz.compare(complete.graph(LETTERS[1:6]), complete.graph(LETTERS[6:1]), + layout = "circo", main = c("From A to E", "From E to A"))
Creating a network structure
With a specific arc set
First, we create an arc set in the form of a matrix with two columns, optionally labelled from and
to; each row describes one arc using the labels of the nodes at the tail (from) and at the
head (to) of the arc.
> arc.set = matrix(c("A", "C", "B", "F", "C", "F"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > arc.set
from to
[1,] "A" "C"
[2,] "B" "F"
[3,] "C" "F"
Then we can assign the newly created arc.set to a bn object using the assignment version of
the arcs() function (documented here).
> arcs(e) = arc.set > e
Random/Generated Bayesian network
model:
[A][B][D][E][C|A][F|B:C]
nodes: 6
arcs: 3
undirected arcs: 0
directed arcs: 3
average markov blanket size: 1.33
average neighbourhood size: 1.00
average branching factor: 0.50
generation algorithm: Empty
Note that by default arcs() performs several checks on the arc set, so:
- We cannot introduce arcs with node labels that are not present in the graph.
> bogus = matrix(c("X", "Y", "W", "Z"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > bogus
from to [1,] "X" "Y" [2,] "W" "Z"> arcs(e) = bogus
## Error: node(s) 'X' 'W' 'Y' 'Z' not present in the graph.
- We cannot introduce cycles by mistake (e.g.
A→B→C→A) unless we set theignore.cyclestoTRUE.> cycle = matrix(c("A", "C", "C", "B", "B", "A"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > cycle
from to [1,] "A" "C" [2,] "C" "B" [3,] "B" "A"> arcs(e) = cycle
## Error: the specified network contains cycles.
> arcs(e, ignore.cycles = TRUE) = cycle
## Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = structure(c("A", : unused argument (ignore.cycles = TRUE)> acyclic(e)
[1] TRUE
- We cannot introduce loops by mistake (e.g.
A→A).> loops = matrix(c("A", "A", "B", "B", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > loops
from to [1,] "A" "A" [2,] "B" "B" [3,] "C" "D"> arcs(e) = loops
## Error: invalid arcs that are actually loops: ## A -> A ## B -> B
- We can, however, introduce undirected arcs (i.e. edges) by including both directions of an arc in the
arc set (e.g.
A→BandB→A) as long as it is clear no cycle would be introduced by either direction. Again, specifyingignore.cyclesoverrides this check.> edges = matrix(c("A", "B", "B", "A", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > edges
from to [1,] "A" "B" [2,] "B" "A" [3,] "C" "D"> arcs(e) = edges
With a specific adjacency matrix
We can create a graph structure using an adjaceny matrix with amat() (documented
here) in the same way as we did above using an arc set with
arcs(). First, we create an adjacency matrix with 0/1 integer elements (note the use of 0L and
1L) with the 1s corresponding to the arcs. Using a matrix with numeric, non-integer entries (that is,
0 and 1 instead of 0L and 1L) works as well but uses more memory,
which may be critical in large graphs.
> adj = matrix(0L, ncol = 6, nrow = 6, + dimnames = list(LETTERS[1:6], LETTERS[1:6])) > adj["A", "C"] = 1L > adj["B", "F"] = 1L > adj["C", "F"] = 1L > adj["D", "E"] = 1L > adj["A", "E"] = 1L > adj
A B C D E F A 0 0 1 0 1 0 B 0 0 0 0 0 1 C 0 0 0 0 0 1 D 0 0 0 0 1 0 E 0 0 0 0 0 0 F 0 0 0 0 0 0
Then we introduce the arcs in the graph with the assignment version of amat().
> amat(e) = adj > e
Random/Generated Bayesian network
model:
[A][B][D][C|A][E|A:D][F|B:C]
nodes: 6
arcs: 5
undirected arcs: 0
directed arcs: 5
average markov blanket size: 2.33
average neighbourhood size: 1.67
average branching factor: 0.83
generation algorithm: Empty
Note that amat() performs the same checks as arcs() to prevent the creation of misspecified
graph structures.
With a specific model formula
The format of the model formula is originally from the deal package (CRAN), and is defined as follows:
- each node is enclosed in square brackets; and
- its parents are listed after a pipe and separated by colons, also within the same set of square brackets.
Such a formula can be used in two ways to define a graph structure:
- With the
model2network()function (documented here), without creating an empty graph first.> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")
Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty - With
modelstring()(documented here) and an existingbnobject, as forarcs()andamat()above.> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]" > e
Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty
Again, note that both model2network() and modelstring() perform checks on the model formula
to prevent the creation of misspecified graphs.
Creating one or more random network structures
With a specified node ordering
By default, random.graph() generates graphs consistent with the ordering of the nodes provided by the
user; in each arc, the node at the tail precedes the node at the head in the node set. Arcs are samples independently
with a probability of inclusion specified by the prob argument, whose default is set so that there are on
average as many arcs as nodes in the graph.
> random.graph(LETTERS[1:6], prob = 0.1)
Random/Generated Bayesian network
model:
[A][B][C][D][E][F]
nodes: 6
arcs: 0
undirected arcs: 0
directed arcs: 0
average markov blanket size: 0.00
average neighbourhood size: 0.00
average branching factor: 0.00
generation algorithm: Full Ordering
arc sampling probability: 0.1
As was the case for empty.graph(), we can generate multiple graphs in one batch using the
num argument. In that case, graphs are returned in a list which is very convenient to pass to
lapply() for further computations.
Sampling from the space of connected directed acyclic graphs with uniform probability
In addition to the default method = "ordered", we can also set method = "ic-dag" to sample
with uniform probability from the space of connected directed acyclic graphs using Ide & Cozman MCMC sampler. Note
that the suggested length of the burn-in scales quadratically with the number of nodes, so generating multiple graphs in
a single batch with num is much more efficient than generating them one at a time.
> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
[[1]]
Random/Generated Bayesian network
model:
[D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F]
nodes: 6
arcs: 11
undirected arcs: 0
directed arcs: 11
average markov blanket size: 4.00
average neighbourhood size: 3.67
average branching factor: 1.83
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 216
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: Inf
[[2]]
Random/Generated Bayesian network
model:
[D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F]
nodes: 6
arcs: 11
undirected arcs: 0
directed arcs: 11
average markov blanket size: 4.00
average neighbourhood size: 3.67
average branching factor: 1.83
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 216
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: Inf
This methods accepts several optional constraints on the structure of the generated graphs, as specified by the following arguments:
burn.in: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.every: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.max.degree: the maximum degree of a node.max.in.degree: the maximum in-degree.max.out.degree: the maximum out-degree.
So, for example:
> random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4, + every = 50, max.degree = 3)
[[1]]
Random/Generated Bayesian network
model:
[F][B|F][A|B][C|B][D|A:C][E|A:D:F]
nodes: 6
arcs: 8
undirected arcs: 0
directed arcs: 8
average markov blanket size: 3.67
average neighbourhood size: 2.67
average branching factor: 1.33
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 10000
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: 3
[[2]]
Random/Generated Bayesian network
model:
[B][A|B][F|B][D|A][E|D:F][C|A:B:E]
nodes: 6
arcs: 8
undirected arcs: 0
directed arcs: 8
average markov blanket size: 3.67
average neighbourhood size: 2.67
average branching factor: 1.33
generation algorithm: Ide & Cozman's Multiconnected DAGs
burn in length: 10000
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: 3
Sampling from the space of the directed acyclic graphs with uniform probability
Melançon's MCMC algorithm samples with uniform probability from the space of directed acyclic graphs (not
necessarily connected); we can use it by specifying method = "melancon".
> random.graph(LETTERS[1:6], method = "melancon")
Random/Generated Bayesian network
model:
[D][C|D][F|C][A|D:F][B|C:D:F][E|C:F]
nodes: 6
arcs: 9
undirected arcs: 0
directed arcs: 9
average markov blanket size: 3.33
average neighbourhood size: 3.00
average branching factor: 1.50
generation algorithm: Melancon's Uniform Probability DAGs
burn in length: 216
maximum in-degree: Inf
maximum out-degree: Inf
maximum degree: Inf
As far as optional arguments are concerned, it is identical to Ide & Cozman's algorithm.
Fri Aug 1 22:40:06 2025 with bnlearn
5.1
and R version 4.5.0 (2025-04-11).