## Parallel structure learning benchmarking in Scutari, Journal of Statistical Software (2017)

This is a short HOWTO describing the simulation setup used to benchmark the performance and the accuracy of constraint-based structure learning in “Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimised Implementations in the bnlearn R Package” by Scutari (Journal of Statistical Software, 2017).

All the code and data sets discussed below is attached to the paper as supplementary
material, and only slightly edited here for convenience. Note that even though the paper has
been published in 2017, it was originally submitted to the Journal of Statistical Software
in 2014 and accepted in 2015, and **bnlearn** has seen many updates and bug
fixes since then. (Including a complete re-implementation of all conditional independence
tests and most of the code behind constraint-based algorithms in C.) Hence results
(especially running times) may not be reproducible with current versions of R and
**bnlearn**.

The simulations performed to generate the results the paper is based on can be divided in two groups:

*parallel computing*benchmarks to measure the speed-ups that can be obtained by splitting constraint-based structure learning across multiple processors; and- benchmarks of the effects
*symmetry correction*applied via backtracking to make sure Markov blankets and neighbourhoods of various nodes learned in constraint-based algorithms are symmetric.

The former are implemented in a series of scripts in the `benchmarks`

directory:

cd benchmarks Rscript --vanilla link.gs20k.R Rscript --vanilla link.hiton20k.R Rscript --vanilla link.iamb20k.R Rscript --vanilla link.mmpc20k.R Rscript --vanilla link.opt.R Rscript --vanilla munin.gs20k.R Rscript --vanilla munin.hiton20k.R Rscript --vanilla munin.iamb20k.R Rscript --vanilla munin.mmpc20k.R Rscript --vanilla munin.opt.R Rscript --vanilla carcinoma.hiton.R Rscript --vanilla mice.hiton.R

The latter are implemented in a single script, `symmetry.R`

, that takes the
label of a constraint-based structure learning algorithm (`algorithm`

) and the
ratio between sample size and the number of parameters of the golden standard network
(`p`

).

cd symmetry for algorithm in gs inter.iamb mmpc si.hiton.pc do for p in 0.1 0.2 0.5 1.0 2.0 5.0 do Rscript --vanilla symmetry.R $algorithm $p done done

The golden standard networks used in the scripts are available from the
Bayesian network repository; the `mice`

and
`carcinoma`

are included in the supplementary material and are originally
from literature papers.

### Parallel computing benchmarks

The R scripts in `benchmarks`

all have the same general structure, differing
only in the structure learning algorithm being called and the data. Below is the R script
for `si.hiton.pc()`

and the `carcinoma`

data set.

First, we load the data and remove highly correlated variables with `dedup()`

,
sourced from `dedup.R`

.
This preprocessing step is strongly recommended to improve the numeric stability of
structure learning: the conditional independence tests used by `si.hiton.pc()`

require the (Moore-Penrose generalized) inversion of the covariance matrices of the
variables involved, so we need those covariance matrices to be well behaved (or at least
not perfectly singular). It also helps in keeping running times bounded, because the
preprocessing reduces highly connected clusters of nodes whose structure takes an
exponential time to learn.

> library(bnlearn) > library(parallel) > > source("dedup.R") > > xzfd = xzfile("prepd-carcinoma.txt.xz") > data = read.table(xzfd, header = TRUE, colClasses = "numeric") > data = dedup(data[, -(1:2)], threshold = 0.95)

Then we set the tuning parameters of the simulation: the number of parallel processes
(`nprocs`

) and the number of runs for each number of parallel processes
(`run`

).

> nprocs = c(1, 2, 3, 4, 6, 8, 10, 12, 14, 16, 18, 20) > run = 1:5 > elapsed = numeric(0)

Then, we start recording the running times with `system.time()`

. Running
with a single processor is a special case, since it does not require “real”
parallel computing nor the **parallel** package.

> for (r in run) { + # get the running time. + t = system.time(si.hiton.pc(data, optimized = FALSE, alpha = 0.01)) + # save the elapsed time. + elapsed = c(elapsed, t["elapsed"]) + # debugging output. + cat("NPROCS: ", 1, "TIME:", t["elapsed"], "\n") + }#FOR

Running with more than one processor is essentially the same; the only difference
is that we create a cluster of slave processes with `makeCluster()`

and
pass it as an argument to `si.hiton.pc()`

.

> # these are for parallel execution. > for (np in nprocs[-1]) { + # start the cluster with the correct number of slave processes. + cl = makeCluster(np, type = "SOCK") + for (r in run) { + # get the running time. + t = system.time(si.hiton.pc(data, cluster = cl, alpha = 0.01)) + # save the elapsed time. + elapsed = c(elapsed, t["elapsed"]) + # debugging output. + cat("NPROCS: ", np, "TIME:", t["elapsed"], "\n") + }#FOR + # stop the cluster. + stopCluster(cl) + }#FOR

Note that to obtain smooth results it is important to run all the slave processes on the same machine. Starting with one machine and then moving to two, three, etc. as the number of slave processes grows introduces bias in the timings because inter-process communications become slower when slave processes are spread over multiple machines.

Finally, we save the results in a data frame which we write to a file for later use.

> timings = data.frame( + PROCS = c(rep("OPT", length(run)), + rep(1, length(run)), rep(2, length(run)), rep(3, length(run)), + rep(4, length(run)), rep(6, length(run)), rep(8, length(run)), + rep(10, length(run)), rep(12, length(run)), rep(14, length(run)), + rep(16, length(run)), rep(18, length(run)), rep(20, length(run))), + RUN = rep(run, length(nprocs)), + SECS = round(elapsed, digits = 3) + ) > > write.table(timings, file = "carcinoma.hiton.timings.txt", row.names = FALSE)

### Symmetry correction benchmarks

This second R scripts is different in that it takes the structure learning algorithm
and the number of samples to generate relative to the number of parameters of the
golden standard network as commandline arguments from `Rscript`

.
The golden standard networks are loaded from the current directory, where they are
assumed to be stored as `*.rda`

files.

> library(bnlearn) > > rda = system("ls -1 *.rda", intern = TRUE) > > options("stringsAsFactors" = FALSE) > > algo = get(commandArgs()[7]) > mult = as.numeric(commandArgs()[8])

First, we create a zero-rows data frame to hold the results of the simulations, which will be appended in batches later. This ensures that variable types are consistent. The meaning of the columns is as follows:

*algorithm*: the label of the constraint-based algorithm (`gs`

,`inter.iamb`

, etc.);*bn*: the file the golden standard network has been read from (`alarm.rda`

, etc.);*sample*: the size of the sample generated from the golden standard network and passed to the constraint-based algorithm;*flip*: the Hamming distance between the network learned from the generated data, and the same data with the columns in the opposite order, without backtracking and symmetry correction (`optimized = FALSE`

);*flip2*: the same, but with backtracking and symmetry correction (`optimized = TRUE`

);

> d = data.frame( + algorithm = character(0), + bn = character(0), + sample = numeric(0), + flip = numeric(0), + flip2 = numeric(0) + )

For each of 20 `reps`

, we then learn the skeletons of 4 networks with the
same algorithm but with/without backtracking and with/without reversing the number of the
variables. We use the 4 skeletons to compute `flip`

and `flip2`

and
we save them in the data frame.

> reps = 20 > > for (file in rda) { + load(file) + n = ceiling(min(nparams(bn) * mult, 2 * 10^5)) + print(n) + print(file) + for (i in seq(reps)) { + sim = rbn(bn, n = n) + skel.orig = skeleton(algo(sim, optimized = FALSE, alpha = 0.01)) + skel.back = skeleton(algo(sim, optimized = TRUE, alpha = 0.01)) + rskel.orig = skeleton(algo(sim[, rev(names(sim))], optimized = FALSE, alpha = 0.01)) + rskel.back = skeleton(algo(sim[, rev(names(sim))], optimized = TRUE, alpha = 0.01)) + d = rbind(d, list(algorithm = toupper(algo), + bn = file, sample = n, + flip = hamming(skel.orig, rskel.orig), + flip2 = hamming(skel.back, rskel.back) + )) + write.table(d, file = paste(toupper(algo), mult, "log", sep = "."), + row.names = FALSE) + }#FOR + }#FOR

Note that we save the data frame with the results at every iteration, so that if the R script crashes for some reason we can still recover all the results produced up to that time.

`Tue Apr 25 11:38:44 2017`

with **bnlearn**

`4.2-20170416`

and `R version 3.0.2 (2013-09-25)`

.