## Interfacing with the parallel R package

The **parallel** package provide a multi-platform implementation of the master-slave parallel
programming model, and are the *de facto* standard way of doing parallel computing in R. Since most tasks in the
application of Bayesian networks are computationally intensive, many functions in **bnlearn** have a
`cluster`

argument that takes the cluster returned by **parallel**'s `makeCluster()`

and split their computation among the available slave processes.

### Parallel structure learning

All constraint-based structure learning functions can be parallelised following
Scutari (2017); Markov blankets and neighbourhoods of
different nodes can be learned independently from each other, and later merged and reconciled to produce a coherent
Bayesian network. First, we need to load the **parallel** package and initialise the cluster of slave
processes, called `cl`

below.

> library(parallel) > library(bnlearn) > cl = makeCluster(2)

Note that `makeCluster()`

can create various `type`

s of clusters; they use different
inter-process communication mechanisms to pass R code and objects between the master process and the slaves, and some
may be more appropriate than others depending on the hardware. In **parallel**, random number generators in
the slaves are properly initialised to give independent streams of random numbers.

> rand = clusterEvalQ(cl, runif(10)) > rand

[[1]] [1] 0.8918727 0.9091145 0.4665011 0.8975309 0.7330680 0.1217163 0.1478243 [8] 0.1333822 0.4684630 0.3707698 [[2]] [1] 0.18055240 0.03889103 0.48855673 0.21555269 0.23556278 0.84877149 [7] 0.07610957 0.17053164 0.69244367 0.36051170

> cor(rand[[1]], rand[[2]])

[1] -0.401023

Otherwise, it is always possible to set a random seed with `clusterSetRNGStream()`

to get reproducible
results across the whole cluster.

> clusterSetRNGStream(cl, 42)

After the cluster is set up, performing parallel structure learning just means passing the `cl`

object to
any structure learning algorithm that takes a `cluster`

argument, such as the `si.hiton.pc()`

function.

> data(learning.test) > pdag = si.hiton.pc(learning.test, cluster = cl) > clusterEvalQ(cl, test.counter())

[[1]] [1] 41 [[2]] [1] 7

> (test.counter())

[1] 0

Using `test.counter()`

, we find that each slave executed about half of the conditional independence tests,
and the master process executed just a few.

Finally, we may want to stop the cluster and kill the slave processes when we are done.

> stopCluster(cl)

### Parallel parameter learning

Parameter learning is embarrassingly parallel by construction, since each local distribution is independent from the
others. Again, we just need to pass `cl`

to `bn.fit()`

to assign the computations to the slave
processes.

> dag = cextend(pdag) > fitted = bn.fit(dag, learning.test, cluster = cl) > fitted$B

Parameters of node B (multinomial distribution) Conditional probability table: E B a b c a 0.73364245 0.46483590 0.15581098 b 0.07521896 0.10113865 0.17305236 c 0.19113859 0.43402545 0.67113665

### Parallel cross-validation

It is also possible to parallelise structure learning. When learning a Bayesian network under *k*
cross-validation, we are in fact learning *k* networks independently from different subsets of the data
(identified by the fold we are withholding to use for validation). Hence we can meaningfully use a number of slaves up
to the number of folds and learn the networks in parallel passing `cl`

to `bn.cv()`

.

> bn.cv(learning.test, "hc", k = 10, cluster = cl)

k-fold cross-validation for Bayesian networks target learning algorithm: Hill-Climbing number of folds: 10 loss function: Log-Likelihood Loss (disc.) expected loss: 4.774591

This is also true for other kinds of cross-validation, like `hold-out`

and `custom-folds`

.

### Parallel bootstrap

Another classic technique that is embarrassingly parallel is the bootstrap; each bootstrap sample is created
independently from the original sample and then the statistic of interest is estimated independently from each bootstrap
sample. This being the case, both `bn.boot()`

and `boot.strength()`

have `cluster`

argument to perform computations in parallel.

For instance, we can use `bn.boot()`

to compute the number of arcs (`narcs`

) of the networks
learned with a particular structure learning algorithm (`hc()`

) and a sample size of `100`

.

> A = bn.boot(learning.test, algorithm = "hc", cluster = cl, + statistic = narcs, R = 10, m = 100) > unlist(A)

[1] 4 4 4 4 5 3 4 5 5 5

Or we can use `boot.strength()`

to compute the confidence in the presence and direction of each arc.

> confidence = boot.strength(learning.test, algorithm = "hc", cluster = cl) > confidence[(confidence$strength > 0.50) & (confidence$direction >= 0.50), ]

from to strength direction 1 A B 1.00 0.5050000 3 A D 1.00 0.8750000 9 B E 1.00 0.9575000 13 C D 0.99 0.8838384 30 F E 1.00 0.9575000

### Parallel approximate inference

Approximate inference techniques for Bayesian networks are typically embarrassingly parallel, since they are built on particles-based simulation methods that use independent particles. This makes it possible to generate particles in parallel in the slaves, to compute an estimate of the desired (conditional) probability in each slave, and then to return their average as the overall estimated value.

Below we do this with `cpquery()`

for both logic sampling and likelihood weighting.

> cpquery(fitted, event = (A == "a") & (B == "b"), + evidence = (F == "a"), method = "ls", cluster = cl)

[1] 0.007555464

> cpquery(fitted, event = (A == "a") & (B == "b"), + evidence = list(F = "a"), method = "lw", cluster = cl)

[1] 0.007534347

In the same spirit, `cpdist()`

can also generate random samples in parallel.

> cpdist(fitted, nodes = "A", evidence = (F == "a"), n = 10, + method = "ls", cluster = cl)$A

[1] b b b a Levels: a b c

> cpdist(fitted, nodes = "A", evidence = list(F = "a"), n = 10, + method = "lw", cluster = cl)$A

[1] a c a a b b b c b b Levels: a b c

### Parallel prediction and imputation

Different observations are usually taken to be independent, and therefore can be processed in parallel by both
`predict()`

and `impute()`

: the predicted or imputed values in one observation are not influenced
by the predictors or the observed values in a different one. For prediction:

> training.set = learning.test[1:4000, ] > validation.set = learning.test[4001:4050, ] > fitted = bn.fit(dag, training.set) > predict(fitted, node = "F", data = validation.set, cluster = cl)

[1] a a a b a b a a b a b b a a b b a a a a b a b a b b b a a a a b a a a b b b [39] a a a a a a a b a a b b Levels: a b

Similarly, for imputation:

> incomplete = learning.test[1:500, ] > incomplete[1:100, "A"] = NA > incomplete[101:200, "B"] = NA > incomplete[201:250, "C"] = NA > incomplete[251:300, "D"] = NA > incomplete[301:350, "E"] = NA > incomplete[351:400, "F"] = NA > head(impute(fitted, data = incomplete, cluster = cl))

A B C D E F 1 c c b a b b 2 a a c a b b 3 a a a a a a 4 a a a a b b 5 a a b c a a 6 c c a c c a

`Wed Dec 28 03:15:20 2022`

with **bnlearn**

`4.9-20221220`

and `R version 4.2.2 (2022-10-31)`

.