Analysis of class III malocclusion in Scutari et al., Scientific Reports (2017)
This is a short HOWTO describing the data analysis performed to build the structural model of class III malocclusion in “Bayesian Networks Analysis of Malocclusion Data” by Scutari, Auconi, Caldarelli and Franchi (Scientific Reports, 2017).
The data
The raw data contains 143 patients with two measurements at ages T1
and
T2
(measured in years) for the following variables:

ID
: anonymised ID code unique to each patient. 
Treatment
: untreated (NT
), treated with bad results (TB
), treated with good results (TG
). 
Growth
: a binary variable with valuesGood
orBad
, determined on the basis of CoGnCoA. 
ANB
: angle between Down's points A and B (degrees). 
IMPA
: incisormandibular plane angle (degrees). 
PPPM
: palatal plane  mandibular plane angle (degrees). 
CoA
: total maxillary length from condilion to Down's point A (mm). 
GoPg
: length of mandibular body from gonion to pogonion (mm). 
CoGo
: length of mandibular ramus from condilion to pogonion (mm).
The data can be loaded from prepdortho.rda
(link).
> load("prepdortho.rda") > str(ortho)
'data.frame': 143 obs. of 17 variables: $ ID : Factor w/ 143 levels "P001","P002",..: 1 2 4 5 6 7 9 10 11 13 ... $ Treatment: Factor w/ 3 levels "NT","TB","TG": 1 1 1 1 1 1 1 1 3 1 ... $ Growth : Factor w/ 2 levels "Bad","Good": 1 2 1 1 1 2 2 2 1 1 ... $ ANB : num 5.2 1.7 3.1 1.3 0.4 1.5 0.1 0.5 0.2 0.2 ... $ IMPA : num 75.9 77.2 89.8 98.7 90.5 96.9 85.9 92 91.7 82.2 ... $ PPPM : num 30.2 27 19.8 21.5 26.5 25.2 21.2 19.5 31.1 22.7 ... $ CoA : num 83.4 91.3 78.6 96.4 83.3 88 85 77.1 88.8 77.5 ... $ GoPg : num 77.9 84.1 67.3 75.6 74.7 72.8 75.2 65.2 76.2 67.8 ... $ CoGo : num 50.1 59.2 50.4 65.7 51.3 58 54.9 44.8 53.3 44.5 ... $ ANB2 : num 8.4 2.3 4.7 2.4 0.7 0.9 1.3 0.4 0.8 2.8 ... $ IMPA2 : num 71.7 81 83.8 86.6 83.8 95.8 87.7 93.6 92.3 82.6 ... $ PPPM2 : num 29.1 26.5 16.7 19.4 26.5 24.3 19.4 17.2 30.2 20.1 ... $ CoA2 : num 84.4 93.9 82.9 110.5 91 ... $ GoPg2 : num 81.9 84 71.5 96.3 83.5 71.8 76.9 69.3 81.3 82.5 ... $ CoGo2 : num 53.8 60.6 57.5 83.2 62.3 58.9 57.9 44.9 62 61 ... $ T1 : num 12 13 9 7 9 14 10 7 11 6 ... $ T2 : num 17 16 14 16 14 17 13 9 14 17 ...
A second data set can be constructed by subtracting the population reference values for each
variable at a specific age given by Bathia. The resulting adjusted data can be loaded from
prepdbathia.rda
(link).
> load("prepdbathia.rda")
The analysis for both data sets follows the same steps, so we will just cover the raw, unadjusted data in the following.
Preprocessing and exploratory data analysis
Firstly, we create a data frame with the differences for all the variables and
with Growth
and Treatment
.
> diff = data.frame( + dANB = ortho$ANB2  ortho$ANB, + dPPPM = ortho$PPPM2  ortho$PPPM, + dIMPA = ortho$IMPA2  ortho$IMPA, + dCoA = ortho$CoA2  ortho$CoA, + dGoPg = ortho$GoPg2  ortho$GoPg, + dCoGo = ortho$CoGo2  ortho$CoGo, + dT = ortho$T2  ortho$T1, + Growth = as.numeric(ortho$Growth)  1, + Treatment = as.numeric(ortho$Treatment != "NT") + )
The Growth
and Treatment
variables carry redundant information on
the prognosis of the patient, as evidenced by the difference in the proportions of patients with
good Growth
between TB
and TG
.
> table(ortho[, c("Treatment", "Growth")])
Growth Treatment Bad Good NT 51 26 TB 10 3 TG 24 29
To avoid the confounding that would result from including both variables in the model we
recode Treatment
as a binary variable for which 0
means NT
and 1
means either TB
or TG
. Similarly, we recode
Growth
with 0
meaning Bad
and 1
meaning
Good
.
> table(diff[, c("Treatment", "Growth")])
Growth Treatment 0 1 0 51 26 1 34 32
We can then explore the (pairwise) correlation structure of the data on the scale of change
rate (that is, difference in value divided by the difference in age, ΔX / ΔT).
Treatment
is the same for both T1
and T2
, and
Growth
only refers to T2
; so they are not divided by ΔT
and keep their original values.
> library(gplots)
> diff.delta = sapply(diff[, 1:6], function(x) x / diff$dT) > rho = cor(data.frame(diff.delta, Growth = diff$Growth, Treatment = diff$Treatment)) > palette.breaks = seq(0, 1, 0.1) > par(oma=c(2,2,2,1)) > heatmap.2(rho, scale = "none", trace = "none", revC = TRUE, breaks = palette.breaks)
Alternatively, we can plot a (pairwise) correlation graph, connecting the pairs of variables that have a correlation of at least 0.40.
> library(bnlearn)
> ug = empty.graph(colnames(rho)) > amat(ug) = (rho > 0.4) + 0L  diag(1L, nrow(rho)) > graphviz.plot(ug, layout = "fdp", shape = "ellipse")
We can see that there is a single cluster of (pairwise) correlated variables: (dANB
,
dCoA
, dGoPg
and Treatment
); all the other variables are
isolated. This already suggests which variables are most likely to be the target of the treatment:
dANB
and dCoA
, both related to point A.
Learning the Bayesian network
Firstly, we encode the available prior knowledge in sets of whitelisted and blacklisted arcs to use in learning the structure of the Bayesian network.
 We blacklist any arc pointing to
dT
,Treatment
andGrowth
from the orthodontic variables.  We blacklist any arc from
dT
andTreatment
. This means that whether a patient is treated does not change over time.  We whitelist the dependence structure
dANB
→dIMPA
←dPPPM
.  We whitelist the arc from
dT
toGrowth
which allows the prognosis to change over time.
> bl = tiers2blacklist(c("dT", "Treatment", "Growth", + c("dANB", "dPPPM", "dIMPA", "dCoA", "dGoPg", "dCoGo"))) > bl = rbind(bl, c("dT", "Treatment"), c("Treatment", "dT")) > wl = matrix(c("dANB", "dIMPA", + "dPPPM", "dIMPA", + "dT", "Growth"), + ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to")))
Secondly, we produce a consensus model by learning 200 Bayesian networks and keeping the arcs that appear at least ≈ 50% of the time (the threshold is estimated from the data). The algorithm used is hillclimbing with the BIC score.
> str.raw = boot.strength(diff, R = 200, algorithm = "hc", + algorithm.args = list(whitelist = wl, blacklist = bl)) > attr(str.raw, "threshold")
[1] 0.49
> avg.raw.full = averaged.network(str.raw)
The resulting consensus network is as below. Arcs that are forced to be present in the graph
by the whitelist are highlighted in red, and arc thickness is in proportion to the frequency
reported in the str.raw
object.
> strength.plot(avg.raw.full, str.raw, shape = "ellipse", highlight = list(arcs = wl))
All the directions of the arcs seem to be well established; this can probably be attributed to the use of a whitelist and a blacklist, as they force the directions of nearby arcs to cascade into place.
> avg.raw.full$learning$whitelist = wl > avg.raw.full$learning$blacklist = bl > nrow(undirected.arcs(cpdag(avg.raw.full, wlbl = TRUE)))
[1] 0
Furthermore, a cursory examination of the arc strengths above the threshold confirms that 14 arcs in the consensus network appear in fact with a frequency of at least 0.85. All arc directions are also clearly established (all frequencies are equal to 1).
> nrow(str.raw[with(str.raw, strength > 0.50 & direction > 0.50), ])
[1] 19
> nrow(str.raw[with(str.raw, strength > 0.85 & direction > 0.50), ])
[1] 14
> min(str.raw[with(str.raw, strength > 0.50 & direction > 0.50), "direction"])
[1] 1
For this reason we may choose to work on the simpler consensus network shown below.
> avg.raw.simpler = averaged.network(str.raw, threshold = 0.85) > strength.plot(avg.raw.simpler, str.raw, shape = "ellipse", highlight = list(arcs = wl))
Model validation
In order to validate the model learning strategy we perform 10 runs of 10fold
crossvalidation and measure the predictive accuracy for Growth
given all the
other variables. The result is a classification error of ≈ 0.28.
> xval = bn.cv(diff, bn = "hc", + algorithm.args = list(blacklist = bl, whitelist = wl), loss = "corlw", + loss.args = list(target = "Growth", n = 200), runs = 10) > > err = numeric(10) > > for (i in 1:10) { + tt = table(unlist(sapply(xval[[i]], '[[', "observed")), + unlist(sapply(xval[[i]], '[[', "predicted")) > 0.50) + err[i] = (sum(tt)  sum(diag(tt))) / sum(tt) + }#FOR > > summary(err)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2587 0.2675 0.2762 0.2790 0.2867 0.3077
Predictive correlations for the other orthodontic variables are reported below, and
are all quite strong apart from dIMPA
(≈ 0.23) and dPPPM
(≈ 0.42).
> predcor = + structure(numeric(6), + names = c("dCoGo", "dGoPg", "dIMPA", "dCoA", "dPPPM", "dANB")) > > for (var in names(predcor)) { + xval = bn.cv(diff, bn = "hc", + algorithm.args = list(blacklist = bl, whitelist = wl), loss = "corlw", + loss.args = list(target = var, n = 200), runs = 10) + predcor[var] = mean(sapply(xval, function(x) attr(x, "mean"))) + }#FOR > > round(predcor, digits = 3)
dCoGo dGoPg dIMPA dCoA dPPPM dANB 0.855 0.906 0.231 0.921 0.423 0.646
> mean(predcor)
[1] 0.663624
Interesting questions
 An excessive growth of
CoGo
should induce a reduction inPPPM
.
We test this hypothesis by generating samples for the Bayesian network stored infitted.raw.simpler
for bothdCoGo
anddPPPM
and assuming no treatment is taking place.We note that as> fitted.raw.simpler = bn.fit(avg.raw.simpler, diff) > sim = cpdist(fitted.raw.simpler, nodes = c("dCoGo", "dPPPM"), n = 10^4, + evidence = (Treatment < 0.5)) > plot(sim, col = "grey") > abline(v = 0, col = 2, lty = 2, lwd = 2) > abline(h = 0, col = 2, lty = 2, lwd = 2) > abline(coef(lm(dPPPM ~ dCoGo, data = sim)), lwd = 2)
dCoGo
increases (which indicates an increasingly rapid growth)dPPPM
becomes increasingly negative (which indicates a reduction in the angle assuming the angle is originally positive; inortho
the variablePPPM
ranges between 10.30 and 39.10).  If
ANB
decreases,IMPA
decreases to compensate.
This hypothesis can be tested by simulation as before, and looking for negative values ofdANB
(which indicate a decrease assuming the angle is originally positive) associated with negative values ofIMPA
(same). Inortho
the originalIMPA
is indeed between 70.60 and 109.60, butANB
is often negative, in which case the angle actually increases compared to a 90° reference position.From the figure above> sim = cpdist(fitted.raw.simpler, nodes = c("dIMPA", "dANB"), n = 10^4, + evidence = (Treatment < 0.5)) > plot(sim, col = "grey") > abline(v = 0, col = 2, lty = 2, lwd = 2) > abline(h = 0, col = 2, lty = 2, lwd = 2) > abline(coef(lm(dIMPA ~ dANB, data = sim)), lwd = 2)
dANB
is proportional todIMPA
, so a decrease in one suggests a decrease in the other; the mean trend (the black line) is negative for both at the same time.  If
GoPg
increases strongly, then bothANB
andIMPA
decrease.
If we simulatedGoPg
,dANB
anddIMPA
from the Bayesian network assumingdGoPg
> 5 (i.e.GoPg
is increasing) we estimate the probability thatdANB
< 0 (i.e.ANB
is decreasing) at ≈ 0.71 and thatdIMPA
< 0 at ≈ 0.59.> sim = cpdist(fitted.raw.simpler, nodes = c("dGoPg", "dANB", "dIMPA"), n = 10^4, + evidence = (dGoPg > 5) & (Treatment < 0.5)) > nrow(sim[(sim$dGoPg > 5) & (sim$dANB < 0), ]) / nrow(sim[(sim$dGoPg > 5), ])
[1] 0.7104416
> nrow(sim[(sim$dGoPg > 5) & (sim$dIMPA < 0), ]) / nrow(sim[(sim$dGoPg > 5), ])
[1] 0.5884762
 Therapy attempts to stop the decrease of
ANB
. If we fixANB
is there any difference treated and untreated patients?
First, we can check the relationship between treatment and growth for patients that havedANB
≈ 0 without any intervention (i.e. using the model we learned from the data).The estimated P(> sim = cpdist(fitted.raw.simpler, nodes = c("Treatment", "Growth"), n = 5 * 10^4, + evidence = abs(dANB) < 0.1) > tab = table(TREATMENT = sim$Treatment < 0.5, GOOD.GROWTH = sim$Growth > 0.5) > round(prop.table(tab, margin = 1), 2)
GOOD.GROWTH TREATMENT FALSE TRUE FALSE 0.63 0.37 TRUE 0.49 0.51
GOOD.GROWTH
Treatment
) is different for treated and untreated patients (≈ 0.63 versus ≈ 0.51). If we simulate a formal intervention (a la Judea Pearl) and externally setdANB
= 0 (thus making it independent from its parents and removing the corresponding arcs), we have thatGOOD.GROWTH
has practically the same distribution for both treated and untreated patients and thus becomes independent fromTreatment
. This suggests that a favourable prognosis is indeed determined by preventing changes inANB
and that other components of the treatment (if any) then become irrelevant.> avg.mutilated = mutilated(avg.raw.simpler, evidence = list(dANB = 0)) > fitted.mutilated = bn.fit(avg.mutilated, diff) > fitted.mutilated$dANB = list(coef = c("(Intercept)" = 0), sd = 0) > sim = cpdist(fitted.mutilated, nodes = c("Treatment", "Growth"), n = 5 * 10^4, + evidence = TRUE) > tab = table(TREATMENT = sim$Treatment < 0.5, GOOD.GROWTH = sim$Growth > 0.5) > round(prop.table(tab, margin = 1), 2)
GOOD.GROWTH TREATMENT FALSE TRUE FALSE 0.57 0.43 TRUE 0.57 0.43
 Does treatment affect point B after controlling for point A (using
GoPg
as a proxy for point B due to its closeness)?
One way of assessing this is to check whether the angle between point A and point B (ANB
) changes between treated and untreated patients while keepingGoPg
fixed.Assuming> sim.GoPg = cpdist(fitted.raw.simpler, nodes = c("Treatment", "dANB", "dGoPg"), + evidence = abs(dGoPg) < 0.1)
GoPg
does not change, the angle between point A and point B increases for treated patients (strongly negative values denote horizontal imbalance, so a positive rate of changes indicate a reduction in imbalance) and decreases for untreated patients (imbalance slowly worsens over time).Results are similar if> sim.GoPg$Treatment = c("UNTREATED", "TREATED")[as.numeric(sim.GoPg$Treatment > 0.5) + 1L] > mean(sim.GoPg[sim.GoPg$Treatment == "UNTREATED", "dANB"])
[1] 1.103676
> mean(sim.GoPg[sim.GoPg$Treatment == "TREATED", "dANB"])
[1] 0.6653042
> boxplot(dANB ~ Treatment, data = sim.GoPg)
GoPg
has positive or negative shifts.
Wed Jun 14 11:28:38 2017
with bnlearn
4.220170530
and R version 3.0.2 (20130925)
.