Analysis of class III malocclusion in Scutari et al., Scientific Reports (2017)

This is a short HOWTO describing the data analysis performed to build the structural model of class III malocclusion in “Bayesian Networks Analysis of Malocclusion Data” by Scutari, Auconi, Caldarelli and Franchi (Scientific Reports, 2017).

The data

The raw data contains 143 patients with two measurements at ages T1 and T2 (measured in years) for the following variables:

  1. ID: anonymised ID code unique to each patient.
  2. Treatment: untreated (NT), treated with bad results (TB), treated with good results (TG).
  3. Growth: a binary variable with values Good or Bad, determined on the basis of CoGn-CoA.
  4. ANB: angle between Down's points A and B (degrees).
  5. IMPA: incisor-mandibular plane angle (degrees).
  6. PPPM: palatal plane - mandibular plane angle (degrees).
  7. CoA: total maxillary length from condilion to Down's point A (mm).
  8. GoPg: length of mandibular body from gonion to pogonion (mm).
  9. CoGo: length of mandibular ramus from condilion to pogonion (mm).

The data can be loaded from prepd-ortho.rda (link).

> load("prepd-ortho.rda")
> str(ortho)
'data.frame':	143 obs. of  17 variables:
 $ ID       : Factor w/ 143 levels "P001","P002",..: 1 2 4 5 6 7 9 10 11 13 ...
 $ Treatment: Factor w/ 3 levels "NT","TB","TG": 1 1 1 1 1 1 1 1 3 1 ...
 $ Growth   : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 2 2 2 1 1 ...
 $ ANB      : num  -5.2 -1.7 -3.1 -1.3 0.4 1.5 -0.1 0.5 0.2 0.2 ...
 $ IMPA     : num  75.9 77.2 89.8 98.7 90.5 96.9 85.9 92 91.7 82.2 ...
 $ PPPM     : num  30.2 27 19.8 21.5 26.5 25.2 21.2 19.5 31.1 22.7 ...
 $ CoA      : num  83.4 91.3 78.6 96.4 83.3 88 85 77.1 88.8 77.5 ...
 $ GoPg     : num  77.9 84.1 67.3 75.6 74.7 72.8 75.2 65.2 76.2 67.8 ...
 $ CoGo     : num  50.1 59.2 50.4 65.7 51.3 58 54.9 44.8 53.3 44.5 ...
 $ ANB2     : num  -8.4 -2.3 -4.7 -2.4 -0.7 0.9 -1.3 0.4 0.8 -2.8 ...
 $ IMPA2    : num  71.7 81 83.8 86.6 83.8 95.8 87.7 93.6 92.3 82.6 ...
 $ PPPM2    : num  29.1 26.5 16.7 19.4 26.5 24.3 19.4 17.2 30.2 20.1 ...
 $ CoA2     : num  84.4 93.9 82.9 110.5 91 ...
 $ GoPg2    : num  81.9 84 71.5 96.3 83.5 71.8 76.9 69.3 81.3 82.5 ...
 $ CoGo2    : num  53.8 60.6 57.5 83.2 62.3 58.9 57.9 44.9 62 61 ...
 $ T1       : num  12 13 9 7 9 14 10 7 11 6 ...
 $ T2       : num  17 16 14 16 14 17 13 9 14 17 ...

A second data set can be constructed by subtracting the population reference values for each variable at a specific age given by Bathia. The resulting adjusted data can be loaded from prepd-bathia.rda (link).

> load("prepd-bathia.rda")

The analysis for both data sets follows the same steps, so we will just cover the raw, unadjusted data in the following.

Preprocessing and exploratory data analysis

Firstly, we create a data frame with the differences for all the variables and with Growth and Treatment.

> diff = data.frame(
+           dANB = ortho$ANB2 - ortho$ANB,
+           dPPPM = ortho$PPPM2 - ortho$PPPM,
+           dIMPA = ortho$IMPA2 - ortho$IMPA,
+           dCoA = ortho$CoA2 - ortho$CoA,
+           dGoPg = ortho$GoPg2 - ortho$GoPg,
+           dCoGo = ortho$CoGo2 - ortho$CoGo,
+           dT = ortho$T2 - ortho$T1,
+           Growth = as.numeric(ortho$Growth) - 1,
+           Treatment = as.numeric(ortho$Treatment != "NT")
+ )

The Growth and Treatment variables carry redundant information on the prognosis of the patient, as evidenced by the difference in the proportions of patients with good Growth between TB and TG.

> table(ortho[, c("Treatment", "Growth")])
         Growth
Treatment Bad Good
       NT  51   26
       TB  10    3
       TG  24   29

To avoid the confounding that would result from including both variables in the model we re-code Treatment as a binary variable for which 0 means NT and 1 means either TB or TG. Similarly, we re-code Growth with 0 meaning Bad and 1 meaning Good.

> table(diff[, c("Treatment", "Growth")])
         Growth
Treatment  0  1
        0 51 26
        1 34 32

We can then explore the (pair-wise) correlation structure of the data on the scale of change rate (that is, difference in value divided by the difference in age, ΔX / ΔT). Treatment is the same for both T1 and T2, and Growth only refers to T2; so they are not divided by ΔT and keep their original values.

> library(gplots)

Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess
> diff.delta = sapply(diff[, 1:6], function(x) x / diff$dT)
> rho = cor(data.frame(diff.delta, Growth = diff$Growth, Treatment = diff$Treatment))
> palette.breaks = seq(0, 1, 0.1)
> par(oma=c(2,2,2,1))
> heatmap.2(rho, scale = "none", trace = "none", revC = TRUE, breaks = palette.breaks)
plot of chunk unnamed-chunk-7

Alternatively, we can plot a (pair-wise) correlation graph, connecting the pairs of variables that have a correlation of at least 0.40.

> library(bnlearn)
Loading required package: methods
> ug = empty.graph(colnames(rho))
> amat(ug) = (rho > 0.4) + 0L - diag(1L, nrow(rho))
> graphviz.plot(ug, layout = "fdp", shape = "ellipse")
Loading required namespace: Rgraphviz
plot of chunk unnamed-chunk-9

We can see that there is a single cluster of (pair-wise) correlated variables: (dANB, dCoA, dGoPg and Treatment); all the other variables are isolated. This already suggests which variables are most likely to be the target of the treatment: dANB and dCoA, both related to point A.

Learning the Bayesian network

Firstly, we encode the available prior knowledge in sets of whitelisted and blacklisted arcs to use in learning the structure of the Bayesian network.

  • We blacklist any arc pointing to dT, Treatment and Growth from the orthodontic variables.
  • We blacklist any arc from dT and Treatment. This means that whether a patient is treated does not change over time.
  • We whitelist the dependence structure dANBdIMPAdPPPM.
  • We whitelist the arc from dT to Growth which allows the prognosis to change over time.
> bl = tiers2blacklist(c("dT", "Treatment", "Growth",
+        c("dANB", "dPPPM", "dIMPA", "dCoA", "dGoPg", "dCoGo")))
> bl = rbind(bl, c("dT", "Treatment"), c("Treatment", "dT"))
> wl = matrix(c("dANB", "dIMPA",
+               "dPPPM", "dIMPA",
+               "dT", "Growth"),
+         ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to")))

Secondly, we produce a consensus model by learning 200 Bayesian networks and keeping the arcs that appear at least ≈ 50% of the time (the threshold is estimated from the data). The algorithm used is hill-climbing with the BIC score.

> str.raw = boot.strength(diff, R = 200, algorithm = "hc",
+             algorithm.args = list(whitelist = wl, blacklist = bl))
> attr(str.raw, "threshold")
[1] 0.49
> avg.raw.full = averaged.network(str.raw)

The resulting consensus network is as below. Arcs that are forced to be present in the graph by the whitelist are highlighted in red, and arc thickness is in proportion to the frequency reported in the str.raw object.

> strength.plot(avg.raw.full, str.raw, shape = "ellipse", highlight = list(arcs = wl))
plot of chunk unnamed-chunk-12

All the directions of the arcs seem to be well established; this can probably be attributed to the use of a whitelist and a blacklist, as they force the directions of nearby arcs to cascade into place.

> avg.raw.full$learning$whitelist = wl
> avg.raw.full$learning$blacklist = bl
> nrow(undirected.arcs(cpdag(avg.raw.full, wlbl = TRUE)))
[1] 0

Furthermore, a cursory examination of the arc strengths above the threshold confirms that 14 arcs in the consensus network appear in fact with a frequency of at least 0.85. All arc directions are also clearly established (all frequencies are equal to 1).

> nrow(str.raw[with(str.raw, strength > 0.50 & direction > 0.50), ])
[1] 19
> nrow(str.raw[with(str.raw, strength > 0.85 & direction > 0.50), ])
[1] 14
> min(str.raw[with(str.raw, strength > 0.50 & direction > 0.50), "direction"])
[1] 1

For this reason we may choose to work on the simpler consensus network shown below.

> avg.raw.simpler = averaged.network(str.raw, threshold = 0.85)
> strength.plot(avg.raw.simpler, str.raw, shape = "ellipse", highlight = list(arcs = wl))
plot of chunk unnamed-chunk-15

Model validation

In order to validate the model learning strategy we perform 10 runs of 10-fold cross-validation and measure the predictive accuracy for Growth given all the other variables. The result is a classification error of ≈ 0.28.

> xval = bn.cv(diff, bn = "hc",
+   algorithm.args = list(blacklist = bl, whitelist = wl), loss = "cor-lw",
+   loss.args = list(target = "Growth", n = 200), runs = 10)
> 
> err = numeric(10)
> 
> for (i in 1:10) {

+   tt = table(unlist(sapply(xval[[i]], '[[', "observed")),
+              unlist(sapply(xval[[i]], '[[', "predicted")) > 0.50)

+   err[i] = (sum(tt) - sum(diag(tt))) / sum(tt)

+ }#FOR
> 
> summary(err)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2587  0.2675  0.2762  0.2790  0.2867  0.3077

Predictive correlations for the other orthodontic variables are reported below, and are all quite strong apart from dIMPA (≈ 0.23) and dPPPM (≈ 0.42).

> predcor =
+   structure(numeric(6),
+     names = c("dCoGo", "dGoPg", "dIMPA", "dCoA", "dPPPM", "dANB"))
> 
> for (var in names(predcor)) {

+   xval = bn.cv(diff, bn = "hc",
+     algorithm.args = list(blacklist = bl, whitelist = wl), loss = "cor-lw",
+     loss.args = list(target = var, n = 200), runs = 10)

+     predcor[var] = mean(sapply(xval, function(x) attr(x, "mean")))

+ }#FOR
> 
> round(predcor, digits = 3)
dCoGo dGoPg dIMPA  dCoA dPPPM  dANB 
0.855 0.906 0.231 0.921 0.423 0.646
> mean(predcor)
[1] 0.663624

Interesting questions

  1. An excessive growth of CoGo should induce a reduction in PPPM.
    We test this hypothesis by generating samples for the Bayesian network stored in fitted.raw.simpler for both dCoGo and dPPPM and assuming no treatment is taking place.
    > fitted.raw.simpler = bn.fit(avg.raw.simpler, diff)
    > sim = cpdist(fitted.raw.simpler, nodes = c("dCoGo", "dPPPM"), n = 10^4,
    +         evidence = (Treatment < 0.5))
    > plot(sim, col = "grey")
    > abline(v = 0, col = 2, lty = 2, lwd = 2)
    > abline(h = 0, col = 2, lty = 2, lwd = 2)
    > abline(coef(lm(dPPPM ~ dCoGo, data = sim)), lwd = 2)
    
    plot of chunk unnamed-chunk-18
    We note that as dCoGo increases (which indicates an increasingly rapid growth) dPPPM becomes increasingly negative (which indicates a reduction in the angle assuming the angle is originally positive; in ortho the variable PPPM ranges between 10.30 and 39.10).
  2. If ANB decreases, IMPA decreases to compensate.
    This hypothesis can be tested by simulation as before, and looking for negative values of dANB (which indicate a decrease assuming the angle is originally positive) associated with negative values of IMPA (same). In ortho the original IMPA is indeed between 70.60 and 109.60, but ANB is often negative, in which case the angle actually increases compared to a 90° reference position.
    > sim = cpdist(fitted.raw.simpler, nodes = c("dIMPA", "dANB"), n = 10^4,
    +         evidence = (Treatment < 0.5))
    > plot(sim, col = "grey")
    > abline(v = 0, col = 2, lty = 2, lwd = 2)
    > abline(h = 0, col = 2, lty = 2, lwd = 2)
    > abline(coef(lm(dIMPA ~ dANB, data = sim)), lwd = 2)
    
    plot of chunk unnamed-chunk-19
    From the figure above dANB is proportional to dIMPA, so a decrease in one suggests a decrease in the other; the mean trend (the black line) is negative for both at the same time.
  3. If GoPg increases strongly, then both ANB and IMPA decrease.
    If we simulate dGoPg, dANB and dIMPA from the Bayesian network assuming dGoPg > 5 (i.e. GoPg is increasing) we estimate the probability that dANB < 0 (i.e. ANB is decreasing) at ≈ 0.71 and that dIMPA < 0 at ≈ 0.59.
    > sim = cpdist(fitted.raw.simpler, nodes = c("dGoPg", "dANB", "dIMPA"), n = 10^4,
    +         evidence = (dGoPg > 5) & (Treatment < 0.5))
    > nrow(sim[(sim$dGoPg > 5) & (sim$dANB < 0), ]) / nrow(sim[(sim$dGoPg > 5), ])
    
    [1] 0.7104416
    
    > nrow(sim[(sim$dGoPg > 5) & (sim$dIMPA < 0), ]) / nrow(sim[(sim$dGoPg > 5), ])
    
    [1] 0.5884762
    
  4. Therapy attempts to stop the decrease of ANB. If we fix ANB is there any difference treated and untreated patients?
    First, we can check the relationship between treatment and growth for patients that have dANB ≈ 0 without any intervention (i.e. using the model we learned from the data).
    > sim = cpdist(fitted.raw.simpler, nodes = c("Treatment", "Growth"), n = 5 * 10^4,
    +         evidence = abs(dANB) < 0.1)
    > tab = table(TREATMENT = sim$Treatment < 0.5, GOOD.GROWTH = sim$Growth > 0.5)
    > round(prop.table(tab, margin = 1), 2)
    
             GOOD.GROWTH
    TREATMENT FALSE TRUE
        FALSE  0.63 0.37
        TRUE   0.49 0.51
    
    The estimated P(GOOD.GROWTH | Treatment) is different for treated and untreated patients (≈ 0.63 versus ≈ 0.51). If we simulate a formal intervention (a la Judea Pearl) and externally set dANB = 0 (thus making it independent from its parents and removing the corresponding arcs), we have that GOOD.GROWTH has practically the same distribution for both treated and untreated patients and thus becomes independent from Treatment. This suggests that a favourable prognosis is indeed determined by preventing changes in ANB and that other components of the treatment (if any) then become irrelevant.
    > avg.mutilated = mutilated(avg.raw.simpler, evidence = list(dANB = 0))
    > fitted.mutilated = bn.fit(avg.mutilated, diff)
    > fitted.mutilated$dANB = list(coef = c("(Intercept)" = 0), sd = 0)
    > sim = cpdist(fitted.mutilated, nodes = c("Treatment", "Growth"), n = 5 * 10^4,
    +         evidence = TRUE)
    > tab = table(TREATMENT = sim$Treatment < 0.5, GOOD.GROWTH = sim$Growth > 0.5)
    > round(prop.table(tab, margin = 1), 2)
    
             GOOD.GROWTH
    TREATMENT FALSE TRUE
        FALSE  0.57 0.43
        TRUE   0.57 0.43
    
  5. Does treatment affect point B after controlling for point A (using GoPg as a proxy for point B due to its closeness)?
    One way of assessing this is to check whether the angle between point A and point B (ANB) changes between treated and untreated patients while keeping GoPg fixed.
    > sim.GoPg = cpdist(fitted.raw.simpler, nodes = c("Treatment", "dANB", "dGoPg"),
    +         evidence = abs(dGoPg) < 0.1)
    
    Assuming GoPg does not change, the angle between point A and point B increases for treated patients (strongly negative values denote horizontal imbalance, so a positive rate of changes indicate a reduction in imbalance) and decreases for untreated patients (imbalance slowly worsens over time).
    > sim.GoPg$Treatment = c("UNTREATED", "TREATED")[as.numeric(sim.GoPg$Treatment > 0.5) + 1L]
    > mean(sim.GoPg[sim.GoPg$Treatment == "UNTREATED", "dANB"])
    
    [1] -1.103676
    
    > mean(sim.GoPg[sim.GoPg$Treatment == "TREATED", "dANB"])
    
    [1] 0.6653042
    
    > boxplot(dANB ~ Treatment, data = sim.GoPg)
    
    plot of chunk unnamed-chunk-25
    Results are similar if GoPg has positive or negative shifts.
Last updated on Wed Jun 14 11:28:38 2017 with bnlearn 4.2-20170530 and R version 3.0.2 (2013-09-25).