Conditional independence tests

bnlearn implements several conditional independence tests for the constraint-based learning algorithms (see the overview of the package for a complete list). They can be used independently with the ci.test() function (manual), which takes two variables x and y and an optional set of conditioning variables z as arguments. Unless specified, the default test is the asymptotic mutual information for categorical data, the exact t test for Pearson's correlation for continuous data and the Jonckheere-Terpstra test for ordinal data.

Parametric tests

Here are some examples, listed according to the form of ci.test()'s arguments (see the manual page of the lizards data set for details on the data).

  1. x, y, z are character strings, the names of columns of the matrix or data frame specified by data.
    > library(bnlearn)
    > data(lizards)
    > ci.test("Height", "Diameter", "Species", data = lizards, test = "mi")
    
    
    	Mutual Information (disc.)
    
    data:  Height ~ Diameter | Species
    mi = 2.0256, df = 2, p-value = 0.3632
    alternative hypothesis: true value is greater than 0
    
  2. If the only argument is a matrix or a data frame, the first column is taken to be x, the second is y and the following ones z.
    > ci.test(lizards, test = "mi")
    
    
    	Mutual Information (disc.)
    
    data:  Species ~ Diameter | Height
    mi = 14.024, df = 2, p-value = 0.0009009
    alternative hypothesis: true value is greater than 0
    
  3. x, y and z are either distinct R objects or columns extracted from a data frame or a matrix.
    > ci.test(lizards$Diameter, lizards$Height, lizards$Species, test = "mi")
    
    
    	Mutual Information (disc.)
    
    data:  lizards$Diameter ~ lizards$Height | lizards$Species
    mi = 2.0256, df = 2, p-value = 0.3632
    alternative hypothesis: true value is greater than 0
    
    > H = lizards$Height
    > D = lizards$Diameter
    > S = lizards$Species
    > ci.test(H, D, S, test = "mi")
    
    
    	Mutual Information (disc.)
    
    data:  H ~ D | S
    mi = 2.0256, df = 2, p-value = 0.3632
    alternative hypothesis: true value is greater than 0
    

Different tests can be specified with the test argument (see here for an overview). ci.test() makes sure that the test is appropriate for the data, and returns an error otherwise.

> ci.test(lizards, test = "cor")
## Error: test 'cor' may only be used with continuous data.

Semiparametric Monte Carlo tests

The mutual information and Pearson's X2 tests are also implemented as semiparametric tests in which the degrees of freedom of the χ2 are estimated via permutations. Results are almost identical to the corresponding asymptotic tests for large sample; but they can be quite different when a sample is subsetted in some way.

> ci.test("Height", "Diameter", "Species", data = lizards, test = "sp-mi")

	Mutual Information (disc., semipar.)

data:  Height ~ Diameter | Species
sp-mi = 2.0256, df = 2.0791, Monte Carlo samples = 100.0000, p-value =
0.3801
alternative hypothesis: true value is greater than 0
> ci.test("Height", "Diameter", "Species", data = lizards[1:200, ], test = "sp-mi")

	Mutual Information (disc., semipar.)

data:  Height ~ Diameter | Species
sp-mi = 0.17798, df = 0.84679, Monte Carlo samples = 100.00000, p-value
= 0.6053
alternative hypothesis: true value is greater than 0

For example, dropping the second half of the lizards data set reduces the estimated degrees of freedom of the test from about 2 to almost zero. The number of permutations involved in the test can be controlled with the B argument (the default is 100).

Monte Carlo permutation tests

Nonparametric tests are performed in the same way as their parametric and semiparametric counterparts, except that permutations are used to generate the empirical null distribution and the observed significance value. How many are used may be specified again via the B parameter (the default value is 5000). For example:

> ci.test("Diameter", "Height", "Species", data = lizards, test = "mc-x2")

	Pearson's X^2 (MC)

data:  Diameter ~ Height | Species
mc-x2 = 2.0174, Monte Carlo samples = 5000, p-value = 0.3606
alternative hypothesis: true value is greater than 0
> ci.test("Diameter", "Height", "Species", data = lizards, test = "mc-x2", B = 10^4)

	Pearson's X^2 (MC)

data:  Diameter ~ Height | Species
mc-x2 = 2.0174, Monte Carlo samples = 10000, p-value = 0.3664
alternative hypothesis: true value is greater than 0
Last updated on Mon Aug 5 02:38:52 2024 with bnlearn 5.0 and R version 4.4.1 (2024-06-14).