Creating Bayesian network structures

The graph structure of a Bayesian network is stored in an object of class bn (documented here). We can create such an object in various ways through three possible representations: the arc set of the graph, its adjacency matrix or a model formula. In addition, we can also generate empty and random network structures with the empty.graph() and random.graph() functions (both documented here).

Creating an empty network

> library(bnlearn)
> e = empty.graph(LETTERS[1:6])
> class(e)
[1] "bn"
> e

  Random/Generated Bayesian network

  model:
   [A][B][C][D][E][F] 
  nodes:                                 6 
  arcs:                                  0 
    undirected arcs:                     0 
    directed arcs:                       0 
  average markov blanket size:           0.00 
  average neighbourhood size:            0.00 
  average branching factor:              0.00 

  generation algorithm:                  Empty

Note that by default empty.graph() returns a single object of class bn. We can generate multiple graphs in a single batch by specifying the num argument of empty.graph(); in that case the graphs will be stored in a list of bn objects.

> empty.graph(LETTERS[1:6], num = 2)
[[1]]

  Random/Generated Bayesian network

  model:
   [A][B][C][D][E][F] 
  nodes:                                 6 
  arcs:                                  0 
    undirected arcs:                     0 
    directed arcs:                       0 
  average markov blanket size:           0.00 
  average neighbourhood size:            0.00 
  average branching factor:              0.00 

  generation algorithm:                  Empty 


[[2]]

  Random/Generated Bayesian network

  model:
   [A][B][C][D][E][F] 
  nodes:                                 6 
  arcs:                                  0 
    undirected arcs:                     0 
    directed arcs:                       0 
  average markov blanket size:           0.00 
  average neighbourhood size:            0.00 
  average branching factor:              0.00 

  generation algorithm:                  Empty

Creating a complete network

> complete.graph(LETTERS[1:6])

  Random/Generated Bayesian network

  model:
   [A][B|A][C|A:B][D|A:B:C][E|A:B:C:D][F|A:B:C:D:E] 
  nodes:                                 6 
  arcs:                                  15 
    undirected arcs:                     0 
    directed arcs:                       15 
  average markov blanket size:           5.00 
  average neighbourhood size:            5.00 
  average branching factor:              2.50 

  generation algorithm:                  Complete DAGs

A complete directed acyclic graph has an arc between each pair of nodes. The direction of the arcs is determined by the order of node labels in the call to complete.graph.

> par(mfrow = c(1, 2))
> graphviz.compare(complete.graph(LETTERS[1:6]), complete.graph(LETTERS[6:1]),
+   layout = "circo", main = c("From A to E", "From E to A"))
Loading required namespace: Rgraphviz
plot of chunk unnamed-chunk-5

Creating a network structure

With a specific arc set

First, we create an arc set in the form of a matrix with two columns, optionally labelled from and to; each row describes one arc using the labels of the nodes at the tail (from) and at the head (to) of the arc.

> arc.set = matrix(c("A", "C", "B", "F", "C", "F"),
+             ncol = 2, byrow = TRUE,
+             dimnames = list(NULL, c("from", "to")))
> arc.set
     from to 
[1,] "A"  "C"
[2,] "B"  "F"
[3,] "C"  "F"

Then we can assign the newly created arc.set to a bn object using the assignment version of the arcs() function (documented here).

> arcs(e) = arc.set
> e

  Random/Generated Bayesian network

  model:
   [A][B][D][E][C|A][F|B:C] 
  nodes:                                 6 
  arcs:                                  3 
    undirected arcs:                     0 
    directed arcs:                       3 
  average markov blanket size:           1.33 
  average neighbourhood size:            1.00 
  average branching factor:              0.50 

  generation algorithm:                  Empty

Note that by default arcs() performs several checks on the arc set, so:

  1. We cannot introduce arcs with node labels that are not present in the graph.
    > bogus = matrix(c("X", "Y", "W", "Z"),
    +           ncol = 2, byrow = TRUE,
    +           dimnames = list(NULL, c("from", "to")))
    > bogus
    
         from to 
    [1,] "X"  "Y"
    [2,] "W"  "Z"
    
    > arcs(e) = bogus
    
    ## Error: node(s) 'X' 'W' 'Y' 'Z' not present in the graph.
    
  2. We cannot introduce cycles by mistake (e.g. ABCA) unless we set the ignore.cycles to TRUE.
    > cycle = matrix(c("A", "C", "C", "B", "B", "A"),
    +           ncol = 2, byrow = TRUE,
    +           dimnames = list(NULL, c("from", "to")))
    > cycle
    
         from to 
    [1,] "A"  "C"
    [2,] "C"  "B"
    [3,] "B"  "A"
    
    > arcs(e) = cycle
    
    ## Error: the specified network contains cycles.
    
    > arcs(e, ignore.cycles = TRUE) = cycle
    
    ## Error in `arcs<-`(`*tmp*`, ignore.cycles = TRUE, value = structure(c("A", : unused argument (ignore.cycles = TRUE)
    
    > acyclic(e)
    
    [1] TRUE
    
  3. We cannot introduce loops by mistake (e.g. AA).
    > loops = matrix(c("A", "A", "B", "B", "C", "D"),
    +           ncol = 2, byrow = TRUE,
    +           dimnames = list(NULL, c("from", "to")))
    > loops
    
         from to 
    [1,] "A"  "A"
    [2,] "B"  "B"
    [3,] "C"  "D"
    
    > arcs(e) = loops
    
    ## Error: invalid arcs that are actually loops:
    ##    A -> A 
    ##    B -> B
    
  4. We can, however, introduce undirected arcs (i.e. edges) by including both directions of an arc in the arc set (e.g. AB and BA) as long as it is clear no cycle would be introduced by either direction. Again, specifying ignore.cycles overrides this check.
    > edges = matrix(c("A", "B", "B", "A", "C", "D"),
    +           ncol = 2, byrow = TRUE,
    +           dimnames = list(NULL, c("from", "to")))
    > edges
    
         from to 
    [1,] "A"  "B"
    [2,] "B"  "A"
    [3,] "C"  "D"
    
    > arcs(e) = edges
    

With a specific adjacency matrix

We can create a graph structure using an adjaceny matrix with amat() (documented here) in the same way as we did above using an arc set with arcs(). First, we create an adjacency matrix with 0/1 integer elements (note the use of 0L and 1L) with the 1s corresponding to the arcs. Using a matrix with numeric, non-integer entries (that is, 0 and 1 instead of 0L and 1L) works as well but uses more memory, which may be critical in large graphs.

> adj = matrix(0L, ncol = 6, nrow = 6,
+         dimnames = list(LETTERS[1:6], LETTERS[1:6]))
> adj["A", "C"] = 1L
> adj["B", "F"] = 1L
> adj["C", "F"] = 1L
> adj["D", "E"] = 1L
> adj["A", "E"] = 1L
> adj
  A B C D E F
A 0 0 1 0 1 0
B 0 0 0 0 0 1
C 0 0 0 0 0 1
D 0 0 0 0 1 0
E 0 0 0 0 0 0
F 0 0 0 0 0 0

Then we introduce the arcs in the graph with the assignment version of amat().

> amat(e) = adj
> e

  Random/Generated Bayesian network

  model:
   [A][B][D][C|A][E|A:D][F|B:C] 
  nodes:                                 6 
  arcs:                                  5 
    undirected arcs:                     0 
    directed arcs:                       5 
  average markov blanket size:           2.33 
  average neighbourhood size:            1.67 
  average branching factor:              0.83 

  generation algorithm:                  Empty

Note that amat() performs the same checks as arcs() to prevent the creation of misspecified graph structures.

With a specific model formula

The format of the model formula is originally from the deal package (CRAN), and is defined as follows:

  • each node is enclosed in square brackets; and
  • its parents are listed after a pipe and separated by colons, also within the same set of square brackets.

Such a formula can be used in two ways to define a graph structure:

  1. With the model2network() function (documented here), without creating an empty graph first.
    > model2network("[A][C][B|A][D|C][F|A:B:C][E|F]")
    
    
      Random/Generated Bayesian network
    
      model:
       [A][C][B|A][D|C][F|A:B:C][E|F] 
      nodes:                                 6 
      arcs:                                  6 
        undirected arcs:                     0 
        directed arcs:                       6 
      average markov blanket size:           2.67 
      average neighbourhood size:            2.00 
      average branching factor:              1.00 
    
      generation algorithm:                  Empty
    
  2. With modelstring() (documented here) and an existing bn object, as for arcs() and amat() above.
    > modelstring(e) = "[A][C][B|A][D|C][F|A:B:C][E|F]"
    > e
    
    
      Random/Generated Bayesian network
    
      model:
       [A][C][B|A][D|C][F|A:B:C][E|F] 
      nodes:                                 6 
      arcs:                                  6 
        undirected arcs:                     0 
        directed arcs:                       6 
      average markov blanket size:           2.67 
      average neighbourhood size:            2.00 
      average branching factor:              1.00 
    
      generation algorithm:                  Empty
    

Again, note that both model2network() and modelstring() perform checks on the model formula to prevent the creation of misspecified graphs.

Creating one or more random network structures

With a specified node ordering

By default, random.graph() generates graphs consistent with the ordering of the nodes provided by the user; in each arc, the node at the tail precedes the node at the head in the node set. Arcs are samples independently with a probability of inclusion specified by the prob argument, whose default is set so that there are on average as many arcs as nodes in the graph.

> random.graph(LETTERS[1:6], prob = 0.1)

  Random/Generated Bayesian network

  model:
   [A][B][C][D][E][F] 
  nodes:                                 6 
  arcs:                                  0 
    undirected arcs:                     0 
    directed arcs:                       0 
  average markov blanket size:           0.00 
  average neighbourhood size:            0.00 
  average branching factor:              0.00 

  generation algorithm:                  Full Ordering 
  arc sampling probability:              0.1

As was the case for empty.graph(), we can generate multiple graphs in one batch using the num argument. In that case, graphs are returned in a list which is very convenient to pass to lapply() for further computations.

Sampling from the space of connected directed acyclic graphs with uniform probability

In addition to the default method = "ordered", we can also set method = "ic-dag" to sample with uniform probability from the space of connected directed acyclic graphs using Ide & Cozman MCMC sampler. Note that the suggested length of the burn-in scales quadratically with the number of nodes, so generating multiple graphs in a single batch with num is much more efficient than generating them one at a time.

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag")
[[1]]

  Random/Generated Bayesian network

  model:
   [D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F] 
  nodes:                                 6 
  arcs:                                  11 
    undirected arcs:                     0 
    directed arcs:                       11 
  average markov blanket size:           4.00 
  average neighbourhood size:            3.67 
  average branching factor:              1.83 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf 


[[2]]

  Random/Generated Bayesian network

  model:
   [D][B|D][F|B][C|B:D:F][A|C:D:F][E|C:D:F] 
  nodes:                                 6 
  arcs:                                  11 
    undirected arcs:                     0 
    directed arcs:                       11 
  average markov blanket size:           4.00 
  average neighbourhood size:            3.67 
  average branching factor:              1.83 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf

This methods accepts several optional constraints on the structure of the generated graphs, as specified by the following arguments:

  • burn.in: the number of iterations for the algorithm to converge to a stationary (and uniform) probability distribution.
  • every: return only one graph every number of steps instead of all the generated graphs. High values of every result in a more diverse set.
  • max.degree: the maximum degree of a node.
  • max.in.degree: the maximum in-degree.
  • max.out.degree: the maximum out-degree.

So, for example:

> random.graph(LETTERS[1:6], num = 2, method = "ic-dag", burn.in = 10^4,
+   every = 50, max.degree = 3)
[[1]]

  Random/Generated Bayesian network

  model:
   [F][B|F][A|B][C|B][D|A:C][E|A:D:F] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.67 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        10000 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        3 


[[2]]

  Random/Generated Bayesian network

  model:
   [B][A|B][F|B][D|A][E|D:F][C|A:B:E] 
  nodes:                                 6 
  arcs:                                  8 
    undirected arcs:                     0 
    directed arcs:                       8 
  average markov blanket size:           3.67 
  average neighbourhood size:            2.67 
  average branching factor:              1.33 

  generation algorithm:                  Ide & Cozman's Multiconnected DAGs 
  burn in length:                        10000 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        3

Sampling from the space of the directed acyclic graphs with uniform probability

Melançon's MCMC algorithm samples with uniform probability from the space of directed acyclic graphs (not necessarily connected); we can use it by specifying method = "melancon".

> random.graph(LETTERS[1:6], method = "melancon")

  Random/Generated Bayesian network

  model:
   [D][C|D][F|C][A|D:F][B|C:D:F][E|C:F] 
  nodes:                                 6 
  arcs:                                  9 
    undirected arcs:                     0 
    directed arcs:                       9 
  average markov blanket size:           3.33 
  average neighbourhood size:            3.00 
  average branching factor:              1.50 

  generation algorithm:                  Melancon's Uniform Probability DAGs 
  burn in length:                        216 
  maximum in-degree:                     Inf 
  maximum out-degree:                    Inf 
  maximum degree:                        Inf

As far as optional arguments are concerned, it is identical to Ide & Cozman's algorithm.

Last updated on Mon Aug 5 02:41:34 2024 with bnlearn 5.0 and R version 4.4.1 (2024-06-14).