bnlearn - Documentation

Overview

bnlearn is an R package that provides a comprehensive software implementation of Bayesian networks:

Learning their structure from data, expert knowledge or both.
Learning their parameters from data.
Using them for inference in queries and prediction.
Validating their statistical properties.

The scope of bnlearn includes:

Simulation studies comparing different machine learning approaches.
Learning Bayesian networks from data including large, structured and incomplete data sets.
Causal discovery and classification.
Probabilitic and causal inference.

Architecture

The functionality provided by bnlearn in organised into four sets of functions.

Typical usage patterns

bnlearn is structured to be flexible and modular, making it possible to mix and match different functions to perform complex, custom tasks. Nevertheless, there are two workflows that are most throughly supported because I use them in my work.

The first is what I call the “simulation workflow”.

Choose some networks from the Bayesian network repository.
Possibly alter their structure and/or parameters to match the scientific question you want to answer.
Generate data sets with rbn(), with replicates across n/p ∈ {0.1, 0.2, 0.5, 1, 2, 5}.
Run the learning approach of your choice.
Benchmark its outputs.
1. Structural measures: compare() and shd().
2. Information theoretic measures: H() and KL().
3. Empirical measures, with a validation data set: predict() or logLik().
4. Visual analysis: graphviz.compare() and graphviz.chart().
Profit!

The second is what I call the “exploration workflow”.

Preprocess the data.
1. Remove highly correlated variables: dedup().
2. Possibly discretise variables: discretize().
3. Possibly impute missing values: impute().
Structure learning with model averaging.
1. Establish a blacklist of arc directions that make no sense.
2. Learn multiple structures with boot.strength() or custom.strength().
3. Average them with averaged.network().
Parameter learning with bn.fit().
Model validation and hypothesis generation.
1. Can you find literature supporting the existence of arcs and paths?
2. Do answers (through cpquery(), cpdist(), mutilated(), predict(), etc.) to key questions agree with domain experts?

To find out more

More real-usage examples can be found in the supplementary materials of my papers and in the examples for the most relevant functions.