## Creating Bayesian network structures

The graph structure of a Bayesian network is stored in an object of class `bn` (documented here). We can create such an object in various ways through three possible representations: the arc set of the graph, its adjacency matrix or a model formula. In addition, we can also generate empty and random network structures with the `empty.graph()` and `random.graph()` functions (both documented here).

### Creating an empty network

```> library(bnlearn)
> e = empty.graph(LETTERS[1:6])
> class(e)
```
``` "bn"
```
```> e
```
```
Random/Generated Bayesian network

model:
[A][B][C][D][E][F]
nodes:                                 6
arcs:                                  0
undirected arcs:                     0
directed arcs:                       0
average markov blanket size:           0.00
average neighbourhood size:            0.00
average branching factor:              0.00

generation algorithm:                  Empty
```

Note that by default `empty.graph()` returns a single object of class `bn`. We can generate multiple graphs in a single batch by specifying the `num` argument of `empty.graph()`; in that case the graphs will be stored in a list of `bn` objects.

```> empty.graph(LETTERS[1:6], num = 2)
```
```[]

Random/Generated Bayesian network

model:
[A][B][C][D][E][F]
nodes:                                 6
arcs:                                  0
undirected arcs:                     0
directed arcs:                       0
average markov blanket size:           0.00
average neighbourhood size:            0.00
average branching factor:              0.00

generation algorithm:                  Empty

[]

Random/Generated Bayesian network

model:
[A][B][C][D][E][F]
nodes:                                 6
arcs:                                  0
undirected arcs:                     0
directed arcs:                       0
average markov blanket size:           0.00
average neighbourhood size:            0.00
average branching factor:              0.00

generation algorithm:                  Empty
```

### Creating a complete network

```> complete.graph(LETTERS[1:6])
```
```
Random/Generated Bayesian network

model:
[A][B|A][C|A:B][D|A:B:C][E|A:B:C:D][F|A:B:C:D:E]
nodes:                                 6
arcs:                                  15
undirected arcs:                     0
directed arcs:                       15
average markov blanket size:           5.00
average neighbourhood size:            5.00
average branching factor:              2.50

generation algorithm:                  Complete DAGs
```

A complete directed acyclic graph has an arc between each pair of nodes. The direction of the arcs is determined by the order of node labels in the call to `complete.graph`.

```> par(mfrow = c(1, 2))
> graphviz.compare(complete.graph(LETTERS[1:6]), complete.graph(LETTERS[6:1]),
+   layout = "circo", main = c("From A to E", "From E to A"))
```
```Loading required namespace: Rgraphviz
``` ### Creating a network structure

#### With a specific arc set

First, we create an arc set in the form of a matrix with two columns, optionally labelled `from` and `to`; each row describes one arc using the labels of the nodes at the tail (`from`) and at the head (`to`) of the arc.

```> arc.set = matrix(c("A", "C", "B", "F", "C", "F"),
+             ncol = 2, byrow = TRUE,
+             dimnames = list(NULL, c("from", "to")))
> arc.set
```
```     from to
[1,] "A"  "C"
[2,] "B"  "F"
[3,] "C"  "F"
```

Then we can assign the newly created `arc.set` to a `bn` object using the assignment version of the `arcs()` function (documented here).

```> arcs(e) = arc.set
> e
```
```
Random/Generated Bayesian network

model:
[A][B][D][E][C|A][F|B:C]
nodes:                                 6
arcs:                                  3
undirected arcs:                     0
directed arcs:                       3
average markov blanket size:           1.33
average neighbourhood size:            1.00
average branching factor:              0.50

generation algorithm:                  Empty
```

Note that by default `arcs()` performs several checks on the arc set, so:

1. We cannot introduce arcs with node labels that are not present in the graph.
```> bogus = matrix(c("X", "Y", "W", "Z"),
+           ncol = 2, byrow = TRUE,
+           dimnames = list(NULL, c("from", "to")))
> bogus
```
```     from to
[1,] "X"  "Y"
[2,] "W"  "Z"
```
```> arcs(e) = bogus
```
```## Error: node(s) 'X' 'W' 'Y' 'Z' not present in the graph.
```
2. We cannot introduce cycles by mistake (e.g. `A``B``C``A`) unless we set the `ignore.cycles` to `TRUE`.
```> cycle = matrix(c("A", "C", "C", "B", "B", "A"),
+           ncol = 2, byrow = TRUE,
+           dimnames = list(NULL, c("from", "to")))
> cycle
```
```     from to
[1,] "A"  "C"
[2,] "C"  "B"
[3,] "B"  "A"
```
```> arcs(e) = cycle
```
```## Error: the specified network contains cycles.
```
```> arcs(e, ignore.cycles = TRUE) = cycle
```
```## Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = structure(c("A", : unused argument (ignore.cycles = TRUE)
```
```> acyclic(e)
```
``` TRUE
```
3. We cannot introduce loops by mistake (e.g. `A``A`).
```> loops = matrix(c("A", "A", "B", "B", "C", "D"),
+           ncol = 2, byrow = TRUE,
+           dimnames = list(NULL, c("from", "to")))
> loops
```
```     from to
[1,] "A"  "A"
[2,] "B"  "B"
[3,] "C"  "D"
```
```> arcs(e) = loops
```
```## Error: invalid arcs that are actually loops:
##    A -> A
##    B -> B
```
4. We can, however, introduce undirected arcs (i.e. edges) by including both directions of an arc in the arc set (e.g. `A``B` and `B``A`) as long as it is clear no cycle would be introduced by either direction. Again, specifying `ignore.cycles` overrides this check.
```> edges = matrix(c("A", "B", "B", "A", "C", "D"),
+           ncol = 2, byrow = TRUE,
+           dimnames = list(NULL, c("from", "to")))
> edges
```
```     from to
[1,] "A"  "B"
[2,] "B"  "A"
[3,] "C"  "D"
```
```> arcs(e) = edges
```

#### With a specific adjacency matrix

We can create a graph structure using an adjaceny matrix with `amat()` (documented here) in the same way as we did above using an arc set with `arcs()`. First, we create an adjacency matrix with 0/1 integer elements (note the use of `0L` and `1L`) with the 1s corresponding to the arcs. Using a matrix with numeric, non-integer entries (that is, `0` and `1` instead of `0L` and `1L`) works as well but uses more memory, which may be critical in large graphs.

```> adj = matrix(0L, ncol = 6, nrow = 6,
+         dimnames = list(LETTERS[1:6], LETTERS[1:6]))
```
```  A B C D E F
A 0 0 1 0 1 0
B 0 0 0 0 0 1
C 0 0 0 0 0 1
D 0 0 0 0 1 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0
```

Then we introduce the arcs in the graph with the assignment version of `amat()`.

```> amat(e) = adj
> e
```
```
Random/Generated Bayesian network

model:
[A][B][D][C|A][E|A:D][F|B:C]
nodes:                                 6
arcs:                                  5
undirected arcs:                     0
directed arcs:                       5
average markov blanket size:           2.33
average neighbourhood size:            1.67
average branching factor:              0.83

generation algorithm:                  Empty
```

Note that `amat()` performs the same checks as `arcs()` to prevent the creation of misspecified graph structures.

#### With a specific model formula

The format of the model formula is originally from the deal package (CRAN), and is defined as follows:

• each node is enclosed in square brackets; and
• its parents are listed after a pipe and separated by colons, also within the same set of square brackets.

Such a formula can be used in two ways to define a graph structure:

1. With the `model2network()` function (documented here), without creating an empty graph first.
```> model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")
```
```
Random/Generated Bayesian network

model:
[A][C][B|A][D|C][F|A:B:C][E|F]
nodes:                                 6
arcs:                                  6
undirected arcs:                     0
directed arcs:                       6
average markov blanket size:           2.67
average neighbourhood size:            2.00
average branching factor:              1.00

generation algorithm:                  Empty
```
2. With `modelstring()` (documented here) and an existing `bn` object, as for `arcs()` and `amat()` above.
```> modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]"
> e
```
```
Random/Generated Bayesian network

model:
[A][C][B|A][D|C][F|A:B:C][E|F]
nodes:                                 6
arcs:                                  6
undirected arcs:                     0
directed arcs:                       6
average markov blanket size:           2.67
average neighbourhood size:            2.00
average branching factor:              1.00

generation algorithm:                  Empty
```

Again, note that both `model2network()` and `modelstring()` perform checks on the model formula to prevent the creation of misspecified graphs.

### Creating one or more random network structures

#### With a specified node ordering

By default, `random.graph()` generates graphs consistent with the ordering of the nodes provided by the user; in each arc, the node at the tail precedes the node at the head in the node set. Arcs are samples independently with a probability of inclusion specified by the `prob` argument, whose default is set so that there are on average as many arcs as nodes in the graph.

```> random.graph(LETTERS[1:6], prob = 0.1)
```
```
Random/Generated Bayesian network

model:
[A][B][C][D][E][F|A:B]
nodes:                                 6
arcs:                                  2
undirected arcs:                     0
directed arcs:                       2
average markov blanket size:           1.00
average neighbourhood size:            0.67
average branching factor:              0.33

generation algorithm:                  Full Ordering
arc sampling probability:              0.1
```

As was the case for `empty.graph()`, we can generate multiple graphs in one batch using the `num` argument. In that case, graphs are returned in a list which is very convenient to pass to `lapply()` for further computations.

#### Sampling from the space of connected directed acyclic graphs with uniform probability

In addition to the default `method = "ordered"`, we can also set `method = "ic-dag"` to sample with uniform probability from the space of connected directed acyclic graphs using Ide & Cozman MCMC sampler. Note that the suggested length of the burn-in scales quadratically with the number of nodes, so generating multiple graphs in a single batch with `num` is much more efficient than generating them one at a time.

```> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
```
```[]

Random/Generated Bayesian network

model:
[D][F][B|F][E|F][C|B:D:E:F][A|B:C:F]
nodes:                                 6
arcs:                                  9
undirected arcs:                     0
directed arcs:                       9
average markov blanket size:           4.33
average neighbourhood size:            3.00
average branching factor:              1.50

generation algorithm:                  Ide & Cozman's Multiconnected DAGs
burn in length:                        216
maximum in-degree:                     Inf
maximum out-degree:                    Inf
maximum degree:                        Inf

[]

Random/Generated Bayesian network

model:
[D][F][E|F][B|E:F][C|B:D:E:F][A|B:C:F]
nodes:                                 6
arcs:                                  10
undirected arcs:                     0
directed arcs:                       10
average markov blanket size:           4.33
average neighbourhood size:            3.33
average branching factor:              1.67

generation algorithm:                  Ide & Cozman's Multiconnected DAGs
burn in length:                        216
maximum in-degree:                     Inf
maximum out-degree:                    Inf
maximum degree:                        Inf
```

This methods accepts several optional constraints on the structure of the generated graphs, as specified by the following arguments:

• `burn.in`: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.
• `every`: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.
• `max.degree`: the maximum degree of a node.
• `max.in.degree`: the maximum in-degree.
• `max.out.degree`: the maximum out-degree.

So, for example:

```> random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4,
+   every = 50, max.degree = 3)
```
```[]

Random/Generated Bayesian network

model:
[C][D][B|D][F|D][A|B:C:F][E|B:F]
nodes:                                 6
arcs:                                  7
undirected arcs:                     0
directed arcs:                       7
average markov blanket size:           3.33
average neighbourhood size:            2.33
average branching factor:              1.17

generation algorithm:                  Ide & Cozman's Multiconnected DAGs
burn in length:                        10000
maximum in-degree:                     Inf
maximum out-degree:                    Inf
maximum degree:                        3

[]

Random/Generated Bayesian network

model:
[C][E][B|C:E][A|B:C][D|A:E][F|D:E]
nodes:                                 6
arcs:                                  8
undirected arcs:                     0
directed arcs:                       8
average markov blanket size:           3.33
average neighbourhood size:            2.67
average branching factor:              1.33

generation algorithm:                  Ide & Cozman's Multiconnected DAGs
burn in length:                        10000
maximum in-degree:                     Inf
maximum out-degree:                    Inf
maximum degree:                        3
```

#### Sampling from the space of the directed acyclic graphs with uniform probability

Melançon's MCMC algorithm samples with uniform probability from the space of directed acyclic graphs (not necessarily connected); we can use it by specifying `method = "melancon"`.

```> random.graph(LETTERS[1:6], method = "melancon")
```
```
Random/Generated Bayesian network

model:
[D][E][B|D:E][C|E][A|B:D][F|A:B:E]
nodes:                                 6
arcs:                                  8
undirected arcs:                     0
directed arcs:                       8
average markov blanket size:           3.33
average neighbourhood size:            2.67
average branching factor:              1.33

generation algorithm:                  Melancon's Uniform Probability DAGs
burn in length:                        216
maximum in-degree:                     Inf
maximum out-degree:                    Inf
maximum degree:                        Inf
```

As far as optional arguments are concerned, it is identical to Ide & Cozman's algorithm.

Last updated on `Thu Nov 24 18:16:53 2022` with bnlearn `4.9-20221107` and `R version 4.2.2 (2022-10-31)`.