Overview

bnlearn is an R package that provides a comprehensive software implementation of Bayesian networks:

  1. Learning their structure from data, expert knowledge or both.
  2. Learning their parameters from data.
  3. Using them for inference in queries and prediction.
  4. Validating their statistical properties.

The scope of bnlearn includes:

  • Simulation studies comparing different machine learning approaches.
  • Learning Bayesian networks from data including large, structured and incomplete data sets.
  • Causal discovery and classification.
  • Probabilitic and causal inference.

Architecture

The functionality provided by bnlearn in organised into four sets of functions.

More in detail:

  1. Cleaning and organising data and prior knowledge.
    • Preprocessing data to make them suitable for structure and parameter learning. For instance, discretising continuous variables, removing redundant (strongly correlated or identical) variables, or imputing missing values.
    • Encoding prior and expert knowledge by whitelisting or blacklisting arcs or by defining a topological ordering for the network.
  2. Producing Bayesian network structures, encoded as objects of class "bn".
    • Compiling Bayesian networks from prior and expert knowledge, manually creating their structure.
    • Learning the structure of Bayesian networks from data using constraint-based, score-based or hybrid algorithms; using causal discovery algorithms; or using Bayesian network classifiers.
  3. Producing fitted Bayesian networks, encoded as objects of class "bn.fit".
    • Loading golden-standard reference networks from the Bayesian network repository to run simulation studies.
    • Compiling fitted Bayesian networks from user-provided parameter sets, as was common in expert systems.
    • Learning the parameters of Bayesian networks from data using maximum likelihood and Bayesian posterior estimators.
  4. Extract information from fitted Bayesian networks.
    • Generating random samples for Monte Carlo inference, including approximate probabilistic inference.
    • Predicting target variables for new observations.
    • Measure statistical confidence and validate Bayesian networks with cross-validation and bootstrap.
    • Produce averaged, consensus Bayesian networks from an ensemble of models.
    • Plot network structures, parameter distributions and inference outputs.

Typical usage patterns

bnlearn is structured to be flexible and modular, making it possible to mix and match different functions to perform complex, custom tasks. Nevertheless, there are two workflows that are most throughly supported because I use them in my work.

The first is what I call the “simulation workflow”.

  1. Choose some networks from the Bayesian network repository.
  2. Possibly alter their structure and/or parameters to match the scientific question you want to answer.
  3. Generate data sets with rbn(), with replicates across n/p ∈ {0.1, 0.2, 0.5, 1, 2, 5}.
  4. Run the learning approach of your choice.
  5. Benchmark its outputs.
    1. Structural measures: compare() and shd().
    2. Information theoretic measures: H() and KL().
    3. Empirical measures, with a validation data set: predict() or logLik().
    4. Visual analysis: graphviz.compare() and graphviz.chart().
  6. Profit!

The second is what I call the “exploration workflow”.

  1. Preprocess the data.
    1. Remove highly correlated variables: dedup().
    2. Possibly discretise variables: discretize().
    3. Possibly impute missing values: impute().
  2. Structure learning with model averaging.
    1. Establish a blacklist of arc directions that make no sense.
    2. Learn multiple structures with boot.strength() or custom.strength().
    3. Average them with averaged.network().
  3. Parameter learning with bn.fit().
  4. Model validation and hypothesis generation.
    1. Can you find literature supporting the existence of arcs and paths?
    2. Do answers (through cpquery(), cpdist(), mutilated(), predict(), etc.) to key questions agree with domain experts?

To find out more

More real-usage examples can be found in the supplementary materials of my papers and in the examples for the most relevant functions.