Analysis of class III malocclusion in Scutari et al., Scientific Reports (2017)
This is a short HOWTO describing the data analysis performed to build the structural model of class III malocclusion in “Bayesian Networks Analysis of Malocclusion Data” by Scutari, Auconi, Caldarelli and Franchi (Scientific Reports, 2017).
The data
The raw data contains 143 patients with two measurements at ages T1 and
T2 (measured in years) for the following variables:
-
ID: anonymised ID code unique to each patient. -
Treatment: untreated (NT), treated with bad results (TB), treated with good results (TG). -
Growth: a binary variable with valuesGoodorBad, determined on the basis of CoGn-CoA. -
ANB: angle between Down's points A and B (degrees). -
IMPA: incisor-mandibular plane angle (degrees). -
PPPM: palatal plane - mandibular plane angle (degrees). -
CoA: total maxillary length from condilion to Down's point A (mm). -
GoPg: length of mandibular body from gonion to pogonion (mm). -
CoGo: length of mandibular ramus from condilion to pogonion (mm).
The data can be loaded from prepd-ortho.rda (link).
> load("prepd-ortho.rda") > str(ortho)
'data.frame': 143 obs. of 17 variables: $ ID : Factor w/ 143 levels "P001","P002",..: 1 2 4 5 6 7 9 10 11 13 ... $ Treatment: Factor w/ 3 levels "NT","TB","TG": 1 1 1 1 1 1 1 1 3 1 ... $ Growth : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 2 2 2 1 1 ... $ ANB : num -5.2 -1.7 -3.1 -1.3 0.4 1.5 -0.1 0.5 0.2 0.2 ... $ IMPA : num 75.9 77.2 89.8 98.7 90.5 96.9 85.9 92 91.7 82.2 ... $ PPPM : num 30.2 27 19.8 21.5 26.5 25.2 21.2 19.5 31.1 22.7 ... $ CoA : num 83.4 91.3 78.6 96.4 83.3 88 85 77.1 88.8 77.5 ... $ GoPg : num 77.9 84.1 67.3 75.6 74.7 72.8 75.2 65.2 76.2 67.8 ... $ CoGo : num 50.1 59.2 50.4 65.7 51.3 58 54.9 44.8 53.3 44.5 ... $ ANB2 : num -8.4 -2.3 -4.7 -2.4 -0.7 0.9 -1.3 0.4 0.8 -2.8 ... $ IMPA2 : num 71.7 81 83.8 86.6 83.8 95.8 87.7 93.6 92.3 82.6 ... $ PPPM2 : num 29.1 26.5 16.7 19.4 26.5 24.3 19.4 17.2 30.2 20.1 ... $ CoA2 : num 84.4 93.9 82.9 110.5 91 ... $ GoPg2 : num 81.9 84 71.5 96.3 83.5 71.8 76.9 69.3 81.3 82.5 ... $ CoGo2 : num 53.8 60.6 57.5 83.2 62.3 58.9 57.9 44.9 62 61 ... $ T1 : num 12 13 9 7 9 14 10 7 11 6 ... $ T2 : num 17 16 14 16 14 17 13 9 14 17 ...
A second data set can be constructed by subtracting the population reference values for each
variable at a specific age given by Bathia. The resulting adjusted data can be loaded from
prepd-bathia.rda (link).
> load("prepd-bathia.rda")
The analysis for both data sets follows the same steps, so we will just cover the raw, unadjusted data in the following.
Preprocessing and exploratory data analysis
Firstly, we create a data frame with the differences for all the variables and
with Growth and Treatment.
> diff = data.frame( + dANB = ortho$ANB2 - ortho$ANB, + dPPPM = ortho$PPPM2 - ortho$PPPM, + dIMPA = ortho$IMPA2 - ortho$IMPA, + dCoA = ortho$CoA2 - ortho$CoA, + dGoPg = ortho$GoPg2 - ortho$GoPg, + dCoGo = ortho$CoGo2 - ortho$CoGo, + dT = ortho$T2 - ortho$T1, + Growth = as.numeric(ortho$Growth) - 1, + Treatment = as.numeric(ortho$Treatment != "NT") + )
The Growth and Treatment variables carry redundant information on
the prognosis of the patient, as evidenced by the difference in the proportions of patients with
good Growth between TB and TG.
> table(ortho[, c("Treatment", "Growth")])
Growth
Treatment Bad Good
NT 51 26
TB 10 3
TG 24 29
To avoid the confounding that would result from including both variables in the model we
re-code Treatment as a binary variable for which 0 means NT
and 1 means either TB or TG. Similarly, we re-code
Growth with 0 meaning Bad and 1 meaning
Good.
> table(diff[, c("Treatment", "Growth")])
Growth
Treatment 0 1
0 51 26
1 34 32
We can then explore the (pair-wise) correlation structure of the data on the scale of change
rate (that is, difference in value divided by the difference in age, ΔX / ΔT).
Treatment is the same for both T1 and T2, and
Growth only refers to T2; so they are not divided by ΔT
and keep their original values.
> library(gplots)
> diff.delta = sapply(diff[, 1:6], function(x) x / diff$dT) > rho = cor(data.frame(diff.delta, Growth = diff$Growth, Treatment = diff$Treatment)) > palette.breaks = seq(0, 1, 0.1) > par(oma=c(2,2,2,1)) > heatmap.2(rho, scale = "none", trace = "none", revC = TRUE, breaks = palette.breaks)
Alternatively, we can plot a (pair-wise) correlation graph, connecting the pairs of variables that have a correlation of at least 0.40.
> library(bnlearn) > > ug = empty.graph(colnames(rho)) > amat(ug) = (rho > 0.4) + 0L - diag(1L, nrow(rho)) > graphviz.plot(ug, layout = "fdp", shape = "ellipse")
We can see that there is a single cluster of (pair-wise) correlated variables: (dANB,
dCoA, dGoPg and Treatment); all the other variables are
isolated. This already suggests which variables are most likely to be the target of the treatment:
dANB and dCoA, both related to point A.
Learning the Bayesian network
Firstly, we encode the available prior knowledge in sets of whitelisted and blacklisted arcs to use in learning the structure of the Bayesian network.
- We blacklist any arc pointing to
dT,TreatmentandGrowthfrom the orthodontic variables. - We blacklist any arc from
dTandTreatment. This means that whether a patient is treated does not change over time. - We whitelist the dependence structure
dANB→dIMPA←dPPPM. - We whitelist the arc from
dTtoGrowthwhich allows the prognosis to change over time.
> bl = tiers2blacklist(list("dT", "Treatment", "Growth", + c("dANB", "dPPPM", "dIMPA", "dCoA", "dGoPg", "dCoGo"))) > bl = rbind(bl, c("dT", "Treatment"), c("Treatment", "dT")) > wl = matrix(c("dANB", "dIMPA", + "dPPPM", "dIMPA", + "dT", "Growth"), + ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to")))
Secondly, we produce a consensus model by learning 200 Bayesian networks and keeping the arcs that appear at least ≈ 50% of the time (the threshold is estimated from the data). The algorithm used is hill-climbing with the BIC score.
> str.raw = boot.strength(diff, R = 200, algorithm = "hc", + algorithm.args = list(whitelist = wl, blacklist = bl)) > attr(str.raw, "threshold")
[1] 0.52
> avg.raw.full = averaged.network(str.raw)
The resulting consensus network is as below. Arcs that are forced to be present in the graph
by the whitelist are highlighted in red, and arc thickness is in proportion to the frequency
reported in the str.raw object.
> strength.plot(avg.raw.full, str.raw, shape = "ellipse", highlight = list(arcs = wl))
All the directions of the arcs seem to be well established; this can probably be attributed to the use of a whitelist and a blacklist, as they force the directions of nearby arcs to cascade into place.
> avg.raw.full$learning$whitelist = wl > avg.raw.full$learning$blacklist = bl > nrow(undirected.arcs(cpdag(avg.raw.full, wlbl = TRUE)))
[1] 0
Furthermore, a cursory examination of the arc strengths above the threshold confirms that 14 arcs in the consensus network appear in fact with a frequency of at least 0.85. All arc directions are also clearly established (all frequencies are equal to 1).
> nrow(str.raw[with(str.raw, strength > 0.50 & direction > 0.50), ])
[1] 19
> nrow(str.raw[with(str.raw, strength > 0.85 & direction > 0.50), ])
[1] 12
> min(str.raw[with(str.raw, strength > 0.50 & direction > 0.50), "direction"])
[1] 0.6411043
For this reason we may choose to work on the simpler consensus network shown below.
> avg.raw.simpler = averaged.network(str.raw, threshold = 0.85) > strength.plot(avg.raw.simpler, str.raw, shape = "ellipse", highlight = list(arcs = wl))
Model validation
In order to validate the model learning strategy we perform 10 runs of 10-fold
cross-validation and measure the predictive accuracy for Growth given all the
other variables. The result is a classification error of ≈ 0.29.
> xval = bn.cv(diff, bn = "hc", + algorithm.args = list(blacklist = bl, whitelist = wl), loss = "cor-lw", + loss.args = list(target = "Growth", n = 200), runs = 10) > > err = numeric(10) > > for (i in 1:10) { + tt = table(unlist(sapply(xval[[i]], '[[', "observed")), + unlist(sapply(xval[[i]], '[[', "predicted")) > 0.50) + err[i] = (sum(tt) - sum(diag(tt))) / sum(tt) + }#FOR > > summary(err)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2727 0.2885 0.3007 0.2979 0.3077 0.3147
Predictive correlations for the other orthodontic variables are reported below, and
are all quite strong apart from dIMPA (≈ 0.23) and dPPPM
(≈ 0.41).
> predcor = + structure(numeric(6), + names = c("dCoGo", "dGoPg", "dIMPA", "dCoA", "dPPPM", "dANB")) > > for (var in names(predcor)) { + xval = bn.cv(diff, bn = "hc", + algorithm.args = list(blacklist = bl, whitelist = wl), loss = "cor-lw", + loss.args = list(target = var, n = 200), runs = 10) + predcor[var] = mean(sapply(xval, function(x) attr(x, "mean"))) + }#FOR > > round(predcor, digits = 3)
dCoGo dGoPg dIMPA dCoA dPPPM dANB 0.852 0.905 0.245 0.925 0.416 0.648
> mean(predcor)
[1] 0.6651207
Interesting questions
- An excessive growth of
CoGoshould induce a reduction inPPPM.
We test this hypothesis by generating samples for the Bayesian network stored infitted.raw.simplerfor bothdCoGoanddPPPMand assuming no treatment is taking place.We note that as> fitted.raw.simpler = bn.fit(avg.raw.simpler, diff) > sim = cpdist(fitted.raw.simpler, nodes = c("dCoGo", "dPPPM"), n = 10^4, + evidence = (Treatment < 0.5)) > plot(sim, col = "grey") > abline(v = 0, col = 2, lty = 2, lwd = 2) > abline(h = 0, col = 2, lty = 2, lwd = 2) > abline(coef(lm(dPPPM ~ dCoGo, data = sim)), lwd = 2)
dCoGoincreases (which indicates an increasingly rapid growth)dPPPMbecomes increasingly negative (which indicates a reduction in the angle assuming the angle is originally positive; inorthothe variablePPPMranges between 10.30 and 39.10). - If
ANBdecreases,IMPAdecreases to compensate.
This hypothesis can be tested by simulation as before, and looking for negative values ofdANB(which indicate a decrease assuming the angle is originally positive) associated with negative values ofIMPA(same). Inorthothe originalIMPAis indeed between 70.60 and 109.60, butANBis often negative, in which case the angle actually increases compared to a 90° reference position.From the figure above> sim = cpdist(fitted.raw.simpler, nodes = c("dIMPA", "dANB"), n = 10^4, + evidence = (Treatment < 0.5)) > plot(sim, col = "grey") > abline(v = 0, col = 2, lty = 2, lwd = 2) > abline(h = 0, col = 2, lty = 2, lwd = 2) > abline(coef(lm(dIMPA ~ dANB, data = sim)), lwd = 2)
dANBis proportional todIMPA, so a decrease in one suggests a decrease in the other; the mean trend (the black line) is negative for both at the same time. - If
GoPgincreases strongly, then bothANBandIMPAdecrease.
If we simulatedGoPg,dANBanddIMPAfrom the Bayesian network assumingdGoPg> 5 (i.e.GoPgis increasing) we estimate the probability thatdANB< 0 (i.e.ANBis decreasing) at ≈ 0.70 and thatdIMPA< 0 at ≈ 0.58.> sim = cpdist(fitted.raw.simpler, nodes = c("dGoPg", "dANB", "dIMPA"), n = 10^4, + evidence = (dGoPg > 5) & (Treatment < 0.5)) > nrow(sim[(sim$dGoPg > 5) & (sim$dANB < 0), ]) / nrow(sim[(sim$dGoPg > 5), ])
[1] 0.6993769
> nrow(sim[(sim$dGoPg > 5) & (sim$dIMPA < 0), ]) / nrow(sim[(sim$dGoPg > 5), ])
[1] 0.5775701
- Therapy attempts to stop the decrease of
ANB. If we fixANBis there any difference treated and untreated patients?
First, we can check the relationship between treatment and growth for patients that havedANB≈ 0 without any intervention (i.e. using the model we learned from the data).The estimated P(> sim = cpdist(fitted.raw.simpler, nodes = c("Treatment", "Growth"), n = 5 * 10^4, + evidence = abs(dANB) < 0.1) > tab = table(TREATMENT = sim$Treatment < 0.5, GOOD.GROWTH = sim$Growth > 0.5) > round(prop.table(tab, margin = 1), 2)
GOOD.GROWTH TREATMENT FALSE TRUE FALSE 0.63 0.37 TRUE 0.48 0.52GOOD.GROWTH|Treatment) is different for treated and untreated patients (≈ 0.61 versus ≈ 0.51). If we simulate a formal intervention (a la Judea Pearl) and externally setdANB= 0 (thus making it independent from its parents and removing the corresponding arcs), we have thatGOOD.GROWTHhas practically the same distribution for both treated and untreated patients and thus becomes independent fromTreatment. This suggests that a favourable prognosis is indeed determined by preventing changes inANBand that other components of the treatment (if any) then become irrelevant.> avg.mutilated = mutilated(avg.raw.simpler, evidence = list(dANB = 0)) > fitted.mutilated = bn.fit(avg.mutilated, diff) > fitted.mutilated$dANB = list(coef = c("(Intercept)" = 0), sd = 0) > sim = cpdist(fitted.mutilated, nodes = c("Treatment", "Growth"), n = 5 * 10^4, + evidence = TRUE) > tab = table(TREATMENT = sim$Treatment < 0.5, GOOD.GROWTH = sim$Growth > 0.5) > round(prop.table(tab, margin = 1), 2)
GOOD.GROWTH TREATMENT FALSE TRUE FALSE 0.58 0.42 TRUE 0.57 0.43 - Does treatment affect point B after controlling for point A (using
GoPgas a proxy for point B due to its closeness)?
One way of assessing this is to check whether the angle between point A and point B (ANB) changes between treated and untreated patients while keepingGoPgfixed.Assuming> sim.GoPg = cpdist(fitted.raw.simpler, nodes = c("Treatment", "dANB", "dGoPg"), + evidence = abs(dGoPg) < 0.1)
GoPgdoes not change, the angle between point A and point B increases for treated patients (strongly negative values denote horizontal imbalance, so a positive rate of changes indicate a reduction in imbalance) and decreases for untreated patients (imbalance slowly worsens over time).Results are similar if> sim.GoPg$Treatment = c("UNTREATED", "TREATED")[as.numeric(sim.GoPg$Treatment > 0.5) + 1L] > mean(sim.GoPg[sim.GoPg$Treatment == "UNTREATED", "dANB"])
[1] -1.593538
> mean(sim.GoPg[sim.GoPg$Treatment == "TREATED", "dANB"])
[1] 0.276933
> boxplot(dANB ~ Treatment, data = sim.GoPg)
GoPghas positive or negative shifts.
Thu Nov 24 22:58:59 2022 with bnlearn
4.9-20221107
and R version 4.2.2 (2022-10-31).