## Creating Bayesian network structures

The graph structure of a Bayesian network is stored in an object of class `bn`

(documented
here). We can create such an object in various ways through three
possible representations: the *arc set* of the graph, its *adjacency matrix* or a *model formula*.
In addition, we can also generate *empty* and *random* network structures with the
`empty.graph()`

and `random.graph()`

functions (both documented
here).

### Creating an empty network

> library(bnlearn) > e = empty.graph(LETTERS[1:6]) > class(e)

[1] "bn"

> e

Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty

Note that by default `empty.graph()`

returns a single object of class `bn`

. We can generate
multiple graphs in a single batch by specifying the `num`

argument of `empty.graph()`

; in that
case the graphs will be stored in a list of `bn`

objects.

> empty.graph(LETTERS[1:6], num = 2)

[[1]] Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty [[2]] Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Empty

### Creating a complete network

> complete.graph(LETTERS[1:6])

Random/Generated Bayesian network model: [A][B|A][C|A:B][D|A:B:C][E|A:B:C:D][F|A:B:C:D:E] nodes: 6 arcs: 15 undirected arcs: 0 directed arcs: 15 average markov blanket size: 5.00 average neighbourhood size: 5.00 average branching factor: 2.50 generation algorithm: Complete DAGs

A complete directed acyclic graph has an arc between each pair of nodes. The direction of the arcs is determined by
the order of node labels in the call to `complete.graph`

.

> par(mfrow = c(1, 2)) > graphviz.compare(complete.graph(LETTERS[1:6]), complete.graph(LETTERS[6:1]), + layout = "circo", main = c("From A to E", "From E to A"))

### Creating a network structure

#### With a specific arc set

First, we create an arc set in the form of a matrix with two columns, optionally labelled `from`

and
`to`

; each row describes one arc using the labels of the nodes at the tail (`from`

) and at the
head (`to`

) of the arc.

> arc.set = matrix(c("A", "C", "B", "F", "C", "F"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > arc.set

from to [1,] "A" "C" [2,] "B" "F" [3,] "C" "F"

Then we can assign the newly created `arc.set`

to a `bn`

object using the assignment version of
the `arcs()`

function (documented here).

> arcs(e) = arc.set > e

Random/Generated Bayesian network model: [A][B][D][E][C|A][F|B:C] nodes: 6 arcs: 3 undirected arcs: 0 directed arcs: 3 average markov blanket size: 1.33 average neighbourhood size: 1.00 average branching factor: 0.50 generation algorithm: Empty

Note that by default `arcs()`

performs several checks on the arc set, so:

- We cannot introduce arcs with node labels that are not present in the graph.
> bogus = matrix(c("X", "Y", "W", "Z"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > bogus

from to [1,] "X" "Y" [2,] "W" "Z"

> arcs(e) = bogus

## Error: node(s) 'X' 'W' 'Y' 'Z' not present in the graph.

- We cannot introduce cycles by mistake (
*e.g.*`A`

→`B`

→`C`

→`A`

) unless we set the`ignore.cycles`

to`TRUE`

.> cycle = matrix(c("A", "C", "C", "B", "B", "A"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > cycle

from to [1,] "A" "C" [2,] "C" "B" [3,] "B" "A"

> arcs(e) = cycle

## Error: the specified network contains cycles.

> arcs(e, ignore.cycles = TRUE) = cycle

## Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = structure(c("A", : unused argument (ignore.cycles = TRUE)

> acyclic(e)

[1] TRUE

- We cannot introduce loops by mistake (
*e.g.*`A`

→`A`

).> loops = matrix(c("A", "A", "B", "B", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > loops

from to [1,] "A" "A" [2,] "B" "B" [3,] "C" "D"

> arcs(e) = loops

## Error: invalid arcs that are actually loops: ## A -> A ## B -> B

- We can, however, introduce undirected arcs (
*i.e.*edges) by including both directions of an arc in the arc set (*e.g.*`A`

→`B`

and`B`

→`A`

) as long as it is clear no cycle would be introduced by either direction. Again, specifying`ignore.cycles`

overrides this check.> edges = matrix(c("A", "B", "B", "A", "C", "D"), + ncol = 2, byrow = TRUE, + dimnames = list(NULL, c("from", "to"))) > edges

from to [1,] "A" "B" [2,] "B" "A" [3,] "C" "D"

> arcs(e) = edges

#### With a specific adjacency matrix

We can create a graph structure using an adjaceny matrix with `amat()`

(documented
here) in the same way as we did above using an arc set with
`arcs()`

. First, we create an adjacency matrix with 0/1 integer elements (note the use of `0L`

and
`1L`

) with the 1s corresponding to the arcs. Using a matrix with numeric, non-integer entries (that is,
`0`

and `1`

instead of `0L`

and `1L`

) works as well but uses more memory,
which may be critical in large graphs.

> adj = matrix(0L, ncol = 6, nrow = 6, + dimnames = list(LETTERS[1:6], LETTERS[1:6])) > adj["A", "C"] = 1L > adj["B", "F"] = 1L > adj["C", "F"] = 1L > adj["D", "E"] = 1L > adj["A", "E"] = 1L > adj

A B C D E F A 0 0 1 0 1 0 B 0 0 0 0 0 1 C 0 0 0 0 0 1 D 0 0 0 0 1 0 E 0 0 0 0 0 0 F 0 0 0 0 0 0

Then we introduce the arcs in the graph with the assignment version of `amat()`

.

> amat(e) = adj > e

Random/Generated Bayesian network model: [A][B][D][C|A][E|A:D][F|B:C] nodes: 6 arcs: 5 undirected arcs: 0 directed arcs: 5 average markov blanket size: 2.33 average neighbourhood size: 1.67 average branching factor: 0.83 generation algorithm: Empty

Note that `amat()`

performs the same checks as `arcs()`

to prevent the creation of misspecified
graph structures.

#### With a specific model formula

The format of the model formula is originally from the **deal** package
(CRAN), and is defined as follows:

- each node is enclosed in square brackets; and
- its parents are listed after a pipe and separated by colons, also within the same set of square brackets.

Such a formula can be used in two ways to define a graph structure:

- With the
`model2network()`

function (documented here), without creating an empty graph first.> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")

Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty

- With
`modelstring()`

(documented here) and an existing`bn`

object, as for`arcs()`

and`amat()`

above.> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]" > e

Random/Generated Bayesian network model: [A][C][B|A][D|C][F|A:B:C][E|F] nodes: 6 arcs: 6 undirected arcs: 0 directed arcs: 6 average markov blanket size: 2.67 average neighbourhood size: 2.00 average branching factor: 1.00 generation algorithm: Empty

Again, note that both `model2network()`

and `modelstring()`

perform checks on the model formula
to prevent the creation of misspecified graphs.

### Creating one or more random network structures

#### With a specified node ordering

By default, `random.graph()`

generates graphs consistent with the ordering of the nodes provided by the
user; in each arc, the node at the tail precedes the node at the head in the node set. Arcs are samples independently
with a probability of inclusion specified by the `prob`

argument, whose default is set so that there are on
average as many arcs as nodes in the graph.

> random.graph(LETTERS[1:6], prob = 0.1)

Random/Generated Bayesian network model: [A][B][C][D][E][F] nodes: 6 arcs: 0 undirected arcs: 0 directed arcs: 0 average markov blanket size: 0.00 average neighbourhood size: 0.00 average branching factor: 0.00 generation algorithm: Full Ordering arc sampling probability: 0.1

As was the case for `empty.graph()`

, we can generate multiple graphs in one batch using the
`num`

argument. In that case, graphs are returned in a list which is very convenient to pass to
`lapply()`

for further computations.

#### Sampling from the space of connected directed acyclic graphs with uniform probability

In addition to the default `method = "ordered"`

, we can also set `method = "ic-dag"`

to sample
with uniform probability from the space of connected directed acyclic graphs using Ide & Cozman MCMC sampler. Note
that the suggested length of the burn-in scales quadratically with the number of nodes, so generating multiple graphs in
a single batch with `num`

is much more efficient than generating them one at a time.

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")

[[1]] Random/Generated Bayesian network model: [D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F] nodes: 6 arcs: 11 undirected arcs: 0 directed arcs: 11 average markov blanket size: 4.00 average neighbourhood size: 3.67 average branching factor: 1.83 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf [[2]] Random/Generated Bayesian network model: [D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F] nodes: 6 arcs: 11 undirected arcs: 0 directed arcs: 11 average markov blanket size: 4.00 average neighbourhood size: 3.67 average branching factor: 1.83 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf

This methods accepts several optional constraints on the structure of the generated graphs, as specified by the following arguments:

`burn.in`

: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.`every`

: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.`max.degree`

: the maximum degree of a node.`max.in.degree`

: the maximum in-degree.`max.out.degree`

: the maximum out-degree.

So, for example:

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4, + every = 50, max.degree = 3)

[[1]] Random/Generated Bayesian network model: [F][B|F][A|B][C|B][D|A:C][E|A:D:F] nodes: 6 arcs: 8 undirected arcs: 0 directed arcs: 8 average markov blanket size: 3.67 average neighbourhood size: 2.67 average branching factor: 1.33 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3 [[2]] Random/Generated Bayesian network model: [B][A|B][F|B][D|A][E|D:F][C|A:B:E] nodes: 6 arcs: 8 undirected arcs: 0 directed arcs: 8 average markov blanket size: 3.67 average neighbourhood size: 2.67 average branching factor: 1.33 generation algorithm: Ide & Cozman's Multiconnected DAGs burn in length: 10000 maximum in-degree: Inf maximum out-degree: Inf maximum degree: 3

#### Sampling from the space of the directed acyclic graphs with uniform probability

Melançon's MCMC algorithm samples with uniform probability from the space of directed acyclic graphs (not
necessarily connected); we can use it by specifying `method = "melancon"`

.

> random.graph(LETTERS[1:6], method = "melancon")

Random/Generated Bayesian network model: [D][C|D][F|C][A|D:F][B|C:D:F][E|C:F] nodes: 6 arcs: 9 undirected arcs: 0 directed arcs: 9 average markov blanket size: 3.33 average neighbourhood size: 3.00 average branching factor: 1.50 generation algorithm: Melancon's Uniform Probability DAGs burn in length: 216 maximum in-degree: Inf maximum out-degree: Inf maximum degree: Inf

As far as optional arguments are concerned, it is identical to Ide & Cozman's algorithm.

`Mon Aug 5 02:41:34 2024`

with **bnlearn**

`5.0`

and `R version 4.4.1 (2024-06-14)`

.