Local discovery algorithms for structure learning from data with missing values

Like other structure learning algorithms, local discovery algorithms are originally formulated to work on complete data in the literature. The Chow-Liu algorithm to learn trees and the ARACNE algorithm (documented here) are no exception.

Both algorithms are based on pairwise measures associations, organised as a mutual information matrix. bnlearn handles incomplete data sets transparently by using pairwise-complete observations to estimate each element of this matrix in the same way as R estimates covariance matrices with cov(.., use = "pairwise.complete.obs").

> missing = matrix(FALSE, nrow(learning.test), ncol(learning.test))
> missing[sample(length(missing), 100)] = TRUE
> incomplete = learning.test
> incomplete[missing] = NA
> par(mfrow = c(1, 2))
> graphviz.compare(chow.liu(learning.test), chow.liu(incomplete))
plot of chunk unnamed-chunk-2
> par(mfrow = c(1, 2))
> graphviz.compare(aracne(learning.test), aracne(incomplete))
plot of chunk unnamed-chunk-3

All the considerations on how constraint-based algorithms can handle missing data in the same way are relevant for local discovery algorithms as well.

Last updated on Thu Nov 24 19:18:43 2022 with bnlearn 4.9-20221107 and R version 4.2.2 (2022-10-31).