Conditional independence tests
bnlearn implements several conditional independence tests for the
constraint-based learning algorithms (see the
overview of the package for a complete list).
They can be used independently with the ci.test()
function
(manual), which takes two variables x
and y
and an optional set of conditioning variables z
as arguments.
Unless specified, the default test is the asymptotic mutual information for categorical
data, the exact t test for Pearson's correlation for continuous data and the
Jonckheere-Terpstra test for ordinal data.
Parametric tests
Here are some examples, listed according to the form of ci.test()
's arguments (see the
manual page of the lizards
data set for
details on the data).
-
x
,y
,z
are character strings, the names of columns of the matrix or data frame specified bydata
.
> library(bnlearn) > data(lizards) > ci.test("Height", "Diameter", "Species", data = lizards, test = "mi")
Mutual Information (disc.) data: Height ~ Diameter | Species mi = 2.0256, df = 2, p-value = 0.3632 alternative hypothesis: true value is greater than 0
- If the only argument is a matrix or a data frame, the first column is taken to be
x
, the second isy
and the following onesz
.
> ci.test(lizards, test = "mi")
Mutual Information (disc.) data: Species ~ Diameter | Height mi = 14.024, df = 2, p-value = 0.0009009 alternative hypothesis: true value is greater than 0
-
x
,y
andz
are either distinct R objects or columns extracted from a data frame or a matrix.
> ci.test(lizards$Diameter, lizards$Height, lizards$Species, test = "mi")
Mutual Information (disc.) data: lizards$Diameter ~ lizards$Height | lizards$Species mi = 2.0256, df = 2, p-value = 0.3632 alternative hypothesis: true value is greater than 0
> H = lizards$Height > D = lizards$Diameter > S = lizards$Species > ci.test(H, D, S, test = "mi")
Mutual Information (disc.) data: H ~ D | S mi = 2.0256, df = 2, p-value = 0.3632 alternative hypothesis: true value is greater than 0
Different tests can be specified with the test
argument (see
here for an overview). ci.test()
makes sure that the test is appropriate for the data, and returns an error otherwise.
> ci.test(lizards, test = "cor")
## Error: test 'cor' may only be used with continuous data.
Semiparametric Monte Carlo tests
The mutual information and Pearson's X2 tests are also implemented as semiparametric tests in which the degrees of freedom of the χ2 are estimated via permutations. Results are almost identical to the corresponding asymptotic tests for large sample; but they can be quite different when a sample is subsetted in some way.
> ci.test("Height", "Diameter", "Species", data = lizards, test = "sp-mi")
Mutual Information (disc., semipar.) data: Height ~ Diameter | Species sp-mi = 2.0256, df = 2.0791, Monte Carlo samples = 100.0000, p-value = 0.3801 alternative hypothesis: true value is greater than 0
> ci.test("Height", "Diameter", "Species", data = lizards[1:200, ], test = "sp-mi")
Mutual Information (disc., semipar.) data: Height ~ Diameter | Species sp-mi = 0.17798, df = 0.84679, Monte Carlo samples = 100.00000, p-value = 0.6053 alternative hypothesis: true value is greater than 0
For example, dropping the second half of the lizards
data set reduces the estimated
degrees of freedom of the test from about 2 to almost zero. The number of permutations involved in
the test can be controlled with the B
argument (the default is 100).
Monte Carlo permutation tests
Nonparametric tests are performed in the same way as their parametric and semiparametric
counterparts, except that permutations are used to generate the empirical null distribution and the
observed significance value. How many are used may be specified again via the B
parameter (the default value is 5000). For example:
> ci.test("Diameter", "Height", "Species", data = lizards, test = "mc-x2")
Pearson's X^2 (MC) data: Diameter ~ Height | Species mc-x2 = 2.0174, Monte Carlo samples = 5000, p-value = 0.3606 alternative hypothesis: true value is greater than 0
> ci.test("Diameter", "Height", "Species", data = lizards, test = "mc-x2", B = 10^4)
Pearson's X^2 (MC) data: Diameter ~ Height | Species mc-x2 = 2.0174, Monte Carlo samples = 10000, p-value = 0.3664 alternative hypothesis: true value is greater than 0
Mon Aug 5 02:38:52 2024
with bnlearn
5.0
and R version 4.4.1 (2024-06-14)
.