Parallel structure learning benchmarking in Scutari, Journal of Statistical Software (2017)

This is a short HOWTO describing the simulation setup used to benchmark the performance and the accuracy of constraint-based structure learning in “Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimised Implementations in the bnlearn R Package” by Scutari (Journal of Statistical Software, 2017).

All the code and data sets discussed below is attached to the paper as supplementary material, and only slightly edited here for convenience. Note that even though the paper has been published in 2017, it was originally submitted to the Journal of Statistical Software in 2014 and accepted in 2015, and bnlearn has seen many updates and bug fixes since then. (Including a complete re-implementation of all conditional independence tests and most of the code behind constraint-based algorithms in C.) Hence results (especially running times) may not be reproducible with current versions of R and bnlearn.

The simulations performed to generate the results the paper is based on can be divided in two groups:

  • parallel computing benchmarks to measure the speed-ups that can be obtained by splitting constraint-based structure learning across multiple processors; and
  • benchmarks of the effects symmetry correction applied via backtracking to make sure Markov blankets and neighbourhoods of various nodes learned in constraint-based algorithms are symmetric.

The former are implemented in a series of scripts in the benchmarks directory:

cd benchmarks
Rscript --vanilla link.gs20k.R
Rscript --vanilla link.hiton20k.R
Rscript --vanilla link.iamb20k.R
Rscript --vanilla link.mmpc20k.R
Rscript --vanilla link.opt.R
Rscript --vanilla munin.gs20k.R
Rscript --vanilla munin.hiton20k.R
Rscript --vanilla munin.iamb20k.R
Rscript --vanilla munin.mmpc20k.R
Rscript --vanilla munin.opt.R
Rscript --vanilla carcinoma.hiton.R
Rscript --vanilla mice.hiton.R

The latter are implemented in a single script, symmetry.R, that takes the label of a constraint-based structure learning algorithm (algorithm) and the ratio between sample size and the number of parameters of the golden standard network (p).

cd symmetry
for algorithm in gs inter.iamb mmpc si.hiton.pc
  for p in 0.1 0.2 0.5 1.0 2.0 5.0
    Rscript --vanilla symmetry.R $algorithm $p

The golden standard networks used in the scripts are available from the Bayesian network repository; the mice and carcinoma are included in the supplementary material and are originally from literature papers.

Parallel computing benchmarks

The R scripts in benchmarks all have the same general structure, differing only in the structure learning algorithm being called and the data. Below is the R script for si.hiton.pc() and the carcinoma data set.

First, we load the data and remove highly correlated variables with dedup(), sourced from dedup.R. This preprocessing step is strongly recommended to improve the numeric stability of structure learning: the conditional independence tests used by si.hiton.pc() require the (Moore-Penrose generalized) inversion of the covariance matrices of the variables involved, so we need those covariance matrices to be well behaved (or at least not perfectly singular). It also helps in keeping running times bounded, because the preprocessing reduces highly connected clusters of nodes whose structure takes an exponential time to learn.

> library(bnlearn)
> library(parallel)
> source("dedup.R")
> xzfd = xzfile("prepd-carcinoma.txt.xz")
> data = read.table(xzfd, header = TRUE, colClasses = "numeric")
> data = dedup(data[, -(1:2)], threshold = 0.95)

Then we set the tuning parameters of the simulation: the number of parallel processes (nprocs) and the number of runs for each number of parallel processes (run).

> nprocs = c(1, 2, 3, 4, 6, 8, 10, 12, 14, 16, 18, 20)
> run = 1:5
> elapsed = numeric(0)

Then, we start recording the running times with system.time(). Running with a single processor is a special case, since it does not require “real” parallel computing nor the parallel package.

> for (r in run) {

+   # get the running time.
+   t = system.time(si.hiton.pc(data, optimized = FALSE, alpha = 0.01))
+   # save the elapsed time.
+   elapsed = c(elapsed, t["elapsed"])
+   # debugging output.
+   cat("NPROCS: ", 1, "TIME:", t["elapsed"], "\n")

+ }#FOR

Running with more than one processor is essentially the same; the only difference is that we create a cluster of slave processes with makeCluster() and pass it as an argument to si.hiton.pc().

> # these are for parallel execution.
> for (np in nprocs[-1]) {

+   # start the cluster with the correct number of slave processes.
+   cl = makeCluster(np, type = "SOCK")

+   for (r in run) {

+     # get the running time.
+     t = system.time(si.hiton.pc(data, cluster = cl, alpha = 0.01))
+     # save the elapsed time.
+     elapsed = c(elapsed, t["elapsed"])
+     # debugging output.
+     cat("NPROCS: ", np, "TIME:", t["elapsed"], "\n")

+   }#FOR

+   # stop the cluster.
+   stopCluster(cl)

+ }#FOR

Note that to obtain smooth results it is important to run all the slave processes on the same machine. Starting with one machine and then moving to two, three, etc. as the number of slave processes grows introduces bias in the timings because inter-process communications become slower when slave processes are spread over multiple machines.

Finally, we save the results in a data frame which we write to a file for later use.

> timings = data.frame(
+   PROCS = c(rep("OPT", length(run)),
+             rep(1, length(run)), rep(2, length(run)), rep(3, length(run)),
+             rep(4, length(run)), rep(6, length(run)), rep(8, length(run)),
+             rep(10, length(run)), rep(12, length(run)), rep(14, length(run)),
+             rep(16, length(run)), rep(18, length(run)), rep(20, length(run))),
+   RUN = rep(run, length(nprocs)),
+   SECS = round(elapsed, digits = 3)
+ )
> write.table(timings, file = "carcinoma.hiton.timings.txt", row.names = FALSE)

Symmetry correction benchmarks

This second R scripts is different in that it takes the structure learning algorithm and the number of samples to generate relative to the number of parameters of the golden standard network as commandline arguments from Rscript. The golden standard networks are loaded from the current directory, where they are assumed to be stored as *.rda files.

> library(bnlearn)
> rda = system("ls -1 *.rda", intern = TRUE)
> options("stringsAsFactors" = FALSE)
> algo = get(commandArgs()[7])
> mult = as.numeric(commandArgs()[8])

First, we create a zero-rows data frame to hold the results of the simulations, which will be appended in batches later. This ensures that variable types are consistent. The meaning of the columns is as follows:

  • algorithm: the label of the constraint-based algorithm (gs, inter.iamb, etc.);
  • bn: the file the golden standard network has been read from (alarm.rda, etc.);
  • sample: the size of the sample generated from the golden standard network and passed to the constraint-based algorithm;
  • flip: the Hamming distance between the network learned from the generated data, and the same data with the columns in the opposite order, without backtracking and symmetry correction (optimized = FALSE);
  • flip2: the same, but with backtracking and symmetry correction (optimized = TRUE);
> d = data.frame(
+   algorithm = character(0),
+   bn = character(0),
+   sample = numeric(0),
+   flip = numeric(0),
+   flip2 = numeric(0)
+ )

For each of 20 reps, we then learn the skeletons of 4 networks with the same algorithm but with/without backtracking and with/without reversing the number of the variables. We use the 4 skeletons to compute flip and flip2 and we save them in the data frame.

> reps = 20
> for (file in rda) {

+   load(file)
+   n = ceiling(min(nparams(bn) * mult, 2 * 10^5))
+   print(n)
+   print(file)

+   for (i in seq(reps)) {

+     sim = rbn(bn, n = n)

+     skel.orig = skeleton(algo(sim, optimized = FALSE, alpha = 0.01))
+     skel.back = skeleton(algo(sim, optimized = TRUE, alpha = 0.01))
+     rskel.orig = skeleton(algo(sim[, rev(names(sim))], optimized = FALSE, alpha = 0.01))
+     rskel.back = skeleton(algo(sim[, rev(names(sim))], optimized = TRUE, alpha = 0.01))

+     d = rbind(d, list(algorithm = toupper(algo),
+          bn = file, sample = n,
+          flip = hamming(skel.orig, rskel.orig),
+          flip2 = hamming(skel.back, rskel.back)
+        ))

+     write.table(d, file = paste(toupper(algo), mult, "log", sep = "."),
+       row.names = FALSE)

+   }#FOR

+ }#FOR

Note that we save the data frame with the results at every iteration, so that if the R script crashes for some reason we can still recover all the results produced up to that time.

Last updated on Tue Apr 25 11:38:44 2017 with bnlearn 4.2-20170416 and R version 3.0.2 (2013-09-25).