Conditional independence tests
bnlearn implements several conditional independence tests for the
constraintbased learning algorithms (see the
overview of the package for a complete list).
They can be used independently with the ci.test()
function
(manual), which takes two variables x
and y
and an optional set of conditioning variables z
as arguments.
Unless specified, the default test is the asymptotic mutual information for categorical
data, the exact t test for Pearson's correlation for continuous data and the
JonckheereTerpstra test for ordinal data.
Parametric tests
Here are some examples, listed according to the form of ci.test()
's arguments (see the
manual page of the lizards
data set for
details on the data).

x
,y
,z
are character strings, the names of columns of the matrix or data frame specified bydata
.
> library(bnlearn) > data(lizards) > ci.test("Height", "Diameter", "Species", data = lizards, test = "mi")
Mutual Information (disc.) data: Height ~ Diameter  Species mi = 2.0256, df = 2, pvalue = 0.3632 alternative hypothesis: true value is greater than 0
 If the only argument is a matrix or a data frame, the first column is taken to be
x
, the second isy
and the following onesz
.
> ci.test(lizards, test = "mi")
Mutual Information (disc.) data: Species ~ Diameter  Height mi = 14.024, df = 2, pvalue = 0.0009009 alternative hypothesis: true value is greater than 0

x
,y
andz
are either distinct R objects or columns extracted from a data frame or a matrix.
> ci.test(lizards$Diameter, lizards$Height, lizards$Species, test = "mi")
Mutual Information (disc.) data: lizards$Diameter ~ lizards$Height  lizards$Species mi = 2.0256, df = 2, pvalue = 0.3632 alternative hypothesis: true value is greater than 0
> H = lizards$Height > D = lizards$Diameter > S = lizards$Species > ci.test(H, D, S, test = "mi")
Mutual Information (disc.) data: H ~ D  S mi = 2.0256, df = 2, pvalue = 0.3632 alternative hypothesis: true value is greater than 0
Different tests can be specified with the test
argument (see
here for an overview). ci.test()
makes sure that the test is appropriate for the data, and returns an error otherwise.
> ci.test(lizards, test = "cor")
## Error: test 'cor' may only be used with continuous data.
Semiparametric Monte Carlo tests
The mutual information and Pearson's X^{2} tests are also implemented as semiparametric tests in which the degrees of freedom of the χ^{2} are estimated via permutations. Results are almost identical to the corresponding asymptotic tests for large sample; but they can be quite different when a sample is subsetted in some way.
> ci.test("Height", "Diameter", "Species", data = lizards, test = "spmi")
Mutual Information (disc., semipar.) data: Height ~ Diameter  Species spmi = 2.0256, df = 2.1093, Monte Carlo samples = 100.0000, pvalue = 0.3866 alternative hypothesis: true value is greater than 0
> ci.test("Height", "Diameter", "Species", data = lizards[1:200, ], test = "spmi")
Mutual Information (disc., semipar.) data: Height ~ Diameter  Species spmi = 0.17798, df = 0.86026, Monte Carlo samples = 100.00000, pvalue = 0.6117 alternative hypothesis: true value is greater than 0
For example, dropping the second half of the lizards
data set reduces the estimated
degrees of freedom of the test from about 2 to almost zero. The number of permutations involved in
the test can be controlled with the B
argument (the default is 100).
Monte Carlo permutation tests
Nonparametric tests are performed in the same way as their parametric and semiparametric
counterparts, except that permutations are used to generate the empirical null distribution and the
observed significance value. How many are used may be specified again via the B
parameter (the default value is 5000). For example:
> ci.test("Diameter", "Height", "Species", data = lizards, test = "mcx2")
Pearson's X^2 (MC) data: Diameter ~ Height  Species mcx2 = 2.0174, Monte Carlo samples = 5000, pvalue = 0.3608 alternative hypothesis: true value is greater than 0
> ci.test("Diameter", "Height", "Species", data = lizards, test = "mcx2", B = 10^4)
Pearson's X^2 (MC) data: Diameter ~ Height  Species mcx2 = 2.0174, Monte Carlo samples = 10000, pvalue = 0.3548 alternative hypothesis: true value is greater than 0
Sat Feb 17 23:53:57 2024
with bnlearn
5.020240208
and R version 4.3.2 (20231031)
.