The Data: an Overview

data = readRDS("data/prepd.v3.rds")
symptoms = c("Acne", "Asthma", "ADHD", "Burn", "Erectile_dysfunction", "Scar", 
             "Alcoholism", "Anxiety", "Depression", "Dermatitis",
             "Sleep_disorder", "Obesity")

The data contain 100 weeks worth of data, collected between 2020-03-02 and 2022-01-24, on 15 different medical conditions. The data are organised in a multivariate time serie for each of 2879 counties spread over 50 US states.

All time series are observed over the whole time period.

In what follows we focus on the 12 medical conditions above, spanning dermatological and psychiatric conditions.

data = data[data$county %in% names(which(table(data$county) > 35)), ]
data$county = droplevels(data$county)
data = data[, c("state", "county", "date", symptoms)]

The data originally contained missing data, either single data points or whole time series for specific counties and periods of time. Hence we explore the imputation error of three approaches to impute missing values in time series: moving average (with a window of 10 data points), interpolation and Kalman filters. We estimate the relative imputation error by removing individual values from the data, reimputing them and measuring the relative difference.

experiment = function(each.county, data, m, imputeFUN, symptoms) {
  
  complete = incomplete = as.matrix(data[data$county == each.county, symptoms])
  
  how.many = ceiling(m * length(complete))
  incomplete[sample(length(complete), how.many)] = NA
  
  # for each symptom individually.
  for (s in colnames(incomplete)) {
  
    # impute using the user-provided function FUN; short time series cannot
    # be imputed, just return zero average error.
    imputed = try(imputeFUN(incomplete[, s]), silent = TRUE)
    if (is(imputed, "try-error"))
      return(0)
    else 
      incomplete[, s] = imputed
    
  }#FOR
  
  reldiff = abs(complete - incomplete) / complete
  reldiff[is.na(reldiff) | is.infinite(reldiff)] = 0 
  
  return(sum(reldiff) / how.many)
  
}#EXPERIMENT

All the considered methods perform similarly for all of 2% and 5%, 10% and 20% proportions of missing values.

We perform a similar experiment for values missing in a block of consecutive data points, like symptoms not being recorded for a whole month in a specific county.

experiment = function(each.county, data, m, imputeFUN, symptoms) {
  
  complete = incomplete = as.matrix(data[data$county == each.county, symptoms])
  
  which.symptoms = sample(ncol(complete), ceiling(m * length(complete)), replace = TRUE)
  which.times = sample(nrow(complete) - 3, ceiling(m * length(complete)), replace = TRUE)
  for (s in which.symptoms)
    incomplete[which.times[s] + 0:3, s] = NA
  how.many = sum(is.na(incomplete))
    
  # for each symptom individually.
  for (s in colnames(incomplete)) {
  
    # impute using the user-provided function FUN; short time series cannot
    # be imputed, just return zero average error.
    imputed = try(imputeFUN(incomplete[, s]))
    if (is(imputed, "try-error"))
      return(0)
    else 
      incomplete[, s] = imputed
      
  }#FOR
  
  reldiff = abs(complete - incomplete) / complete
  reldiff[is.na(reldiff) | is.infinite(reldiff)] = 0 
  
  return(sum(reldiff) / how.many)
  
}#EXPERIMENT

All the considered methods perform similarly for all of the considered proportions of missing values.

Dependence Structure of the Data

The data are characterised by a strong dependence between observations across both space and time. In the case of space, we can measure how much of the variance of the conditions of interest is explained by the US states and the counties:

r2.states = parSapply(cl, symptoms, function(symptom, data) {

  model = lmer(paste(symptom, "~ 1 | state"), data = data)
  return(1 - var(resid(model)) / var(data[, symptom]))
  
}, data = data[, c("state", "date", symptoms)])

r2.counties = parSapply(cl, symptoms, function(symptom, data) {

  model = lmer(paste(symptom, "~ 1 | county"), data = data)
  return(1 - var(resid(model)) / var(data[, symptom]))
  
}, data = data[, c("county", "date", symptoms)])

We compute the proportion of explained variance using an intercept-only random-effect model because the heterogeneity between states and between counties violates the homoschedasticity assumption of fixed-effect models. We can see that US states explain between 8% and 16% of the variance in each of the symptoms (average 12%); counties explain between 26% and 84% (average 51%).

To account for time dependence, we can add an autocorrelation with lag 1 to the individual time series corresponding to the different symptoms in the random-effect model for the counties. The implied assumption is that each symptom depends on the itself one week prior within each county.

r2.time = parSapply(cl, symptoms, function(symptom, data) {

  model = lmer(paste(symptom, "~ 1 | county"), data = data)
  rr = residuals(model)
  date = as.integer(as.factor(data$date))
  model = try(lme(rr ~ 1, random = ~ 1 | date, correlation = corAR1(),
                  control = lmeControl(returnObject = TRUE)))

  return(1 - var(resid(model)) / var(data[, symptom]))

}, data = data[, c("county", "date", symptoms)])

Time and space dependence together explain between 35% and 86% of the variance in each of the symptoms (average 63%). This indicates that a non-negligible part of the information in the symptoms is a result of the data points being collected as longitudinal measures in each US state.

Bayesian Networks: Static versus Dynamic Structures

The classic (static) formulation of Bayesian networks is as follows: for a set of variables \mathbf{X}= \{ X_1, \ldots, X_N \} and a directed acyclic graph \mathcal{G}, the joint distribution of \mathbf{X} can be written as \begin{align*}
  &\operatorname{P}(\mathbf{X}) = \prod_{i=1}^N \operatorname{P}(X_i \mid\operatorname{pa}(X_i))&
  &\text{where}&
  &\operatorname{pa}(X_i) = \{ \text{parents of $X_i$} \}.
\end{align*} Each node in the graph \mathcal{G} corresponds to one of the variables in \mathbf{X}, and the arcs in \mathcal{G} determines which other variables each variable depends on. If all the variables are continuous, as is the case for the medical conditions in the data, we typically assume that \mathbf{X} follows a multivariate normal distribution. As a result, the local distributions \operatorname{P}(X_i \mid\operatorname{pa}(X_i)) take the form of a linear regression model in which the node X_i is the response variable and its parents \operatorname{pa}(X_i) are the explanatory variables: \begin{align*}
  X_i = \mu_i + \operatorname{pa}(X_i) \boldsymbol{\beta}_i + \boldsymbol{\varepsilon}_i
\end{align*} where the \boldsymbol{\beta}_i are the regression coefficients associated with the parents \operatorname{pa}(X_i) and the \boldsymbol{\varepsilon}_i are independent, normally distributed errors with some variance \sigma_i specific to X_i. If a node has not parents, the equation above will contain only the intercept and the error terms.

The assumption that residuals are independent, which implies that data points are independent, is violated by the data: as we noted above, the data points are dependent across time and space due to how they were collected. Replacing linear regression models with more complex regressions such as the random-effect models we used above makes learning the graph \mathcal{G} from data computationally unfeasible for large data sets. However, we can use the dynamic formulation of Bayesian network. It is constructed as follows:

  1. Include two nodes for each variable X_i in the graph \mathcal{G}: one for the variable as observed at a time t_0 and one for the variable as observed at the following time t_1.
  2. Only allow arcs going from nodes at time t_0 to nodes at time t_1 in the graph. Arcs between nodes associated with the same time point, sometimes called “instantaneous dependencies”, describe probabilistic dependencies that take effect without any time lag. Such dependencies are difficult to interpret as causal effects because Granger causality requires a time lag between cause and effect.

Arcs between nodes corresponding to the same variable at different time points model autocorrelation with lag 1: an arc from X_{i, t_0} to X_{i, t_1} implies that X_i is regressed against itself at the previous time point like in an AR(1) time series. Crossing arcs model feedback loops: if we both X_{i, t_0} \rightarrow X_{j, t_1} and X_{j, t_0} \rightarrow X_{i, t_1} are in the graph then the relationship between the underlying variables is X_i \leftrightarrows X_j. Note that static Bayesian networks are unable to express feedback loops because nodes are not duplicated and the graph is acyclic by definition.

The local distributions \operatorname{P}(X_{i, t_1} \mid\operatorname{pa}(X_{i, t_1})) in a dynamic Bayesian network take the form \begin{align*}
  X_{i_, t_1} = \mu_{i, t_1} + \operatorname{pa}(X_{i, t_1}) \boldsymbol{\beta}_{i, t_1} + 
                  \boldsymbol{\varepsilon}_{i, t_1}
\end{align*} where the parents \operatorname{pa}(X_{i, t_1}) will all be nodes for variables observed at time t_0. The local distributions for the nodes at time t_0 take the form \begin{align*}
  X_{i_, t_0} = \mu_{i, t_0} + \boldsymbol{\varepsilon}_{i, t_0}
\end{align*} since nodes at time t_0 do not have any parents by construction. As a result, dynamic Bayesian networks focus completely on modelling the transition between any two consecutive times and, if we again assume that \mathbf{X} follows a multivariate normal distribution, generalise vector autoregressive time series. The implication of using two generic time points (t_0, t_1) is that we are assuming that the time series is stationary.

Dynamic Bayesian networks allow us to model the space dependence between the data points as well. Assume that each condition has a different baseline value in each state that does not change over time. Then the local distribution of each condition at time t_1 is regressed against itself at time t_0 (same county, previous time point). The different baseline for the state then appears in both sides of the equation, and can be accounted for in the regression model.

Learning the graph \mathcal{G} and the regression coefficients of a static Bayesian network from the data while disregarding the space-time dependence structure of the data produces strongly biased models.

decorrelated = parSapply(cl, symptoms, function(symptom, data) {

  model = lmer(paste(symptom, "~ 1 | county"), data = data)
  rr = residuals(model)
  date = as.integer(as.factor(data$date))
  model = try(lme(rr ~ 1, random = ~ 1 | date, correlation = corAR1(),
                  control = lmeControl(returnObject = TRUE)))

  return(residuals(model))

}, data = data[, c("county", "date", symptoms)])
decorrelated = as.data.frame(decorrelated)

dag = hc(data[, symptoms])
fitted.original = bn.fit(dag, data[, symptoms], cluster = cl)
fitted.decorrelated = bn.fit(dag, decorrelated[, symptoms], cluster = cl)

inflation = unlist(lapply(symptoms, function(node) {
  coef(fitted.original[[node]])[-1] / coef(fitted.decorrelated[[node]])[-1]
}))

We find that 63% of the regression coefficients are inflated by a factor of at least 2; and the sign is different 29% of the coefficients. The reason is that the model is misspecified, and the dependence relationships between the variables are confounded with space-time dependencies between the data points.

dag.original = hc(data[, symptoms])
dag.decorrelated = hc(decorrelated[, symptoms])
shd(dag.original, dag.decorrelated) / narcs(dag.original)
## [1] 0.7096774

Furthermore, we find that the graph learned from the original data and that learned from the decorrelated data where the space-time dependence has been removed differ in 71% of the arcs.

Learning the Dynamic Bayesian Network

Learning the structure of the dynamic Bayesian network requires us to reshape the data to construct the pairs of variables at (t_0, t_1) for all the time series for the different counties.

reshaped = parLapply(cl, levels(data$county), function(each.county, data, symptoms) {

  county.data = subset(data, county == each.county)
  available.t0 = county.data$date[-length(county.data$date)]
  available.t1 = county.data$date[-1]
    
  t0 = subset(county.data, date %in% available.t0)[, symptoms]
  t1 = subset(county.data, date %in% available.t1)[, symptoms]
  names(t0) = paste0(symptoms, "_0")
  names(t1) = paste0(symptoms, "_1")
    
  return(cbind(county = each.county, date = available.t0, t0, t1))
  
}, data = data, symptoms = symptoms)

dyndata = do.call(rbind, reshaped)
dyndata$county = as.factor(dyndata$county)

t0.vars = grep("_0", names(dyndata), value = TRUE)
t1.vars = grep("_1", names(dyndata), value = TRUE)

We then use the reshaped data to learn the graph of the dynamic Bayesian network. As we discussed above, we blacklist all the arcs between nodes in the same time points, as well as arcs pointing backwards in time from nodes in t_1 to nodes in t_0. We define the optimal graph as that with the best BIC score computed as the sum of the BIC value for all the regression models in the local distributions.

learn.dbn = function(data, penalty = 1) {

  bl = rbind(tiers2blacklist(list(t0.vars, t1.vars)), 
             set2blacklist(t0.vars), set2blacklist(t1.vars))
  hc(data, blacklist = bl, score = "bic-g", k = penalty * log(nrow(data) / 2))  
  
}#LEARN.DBN

Furthermore, we use model averaging to increase the stability of the learned graph. We implement it using bagging:

  1. We produce 500 bootstrap samples from the data. We sample only 75% of the available counties and we shuffle the order of the variables to increase the heterogeneity of the samples, which improves the performance of bagging.
  2. We learn a graph from each of them.
  3. We construct a consensus graph that contains only those arcs that appear in a large proportion of the learned graphs, with a data-driven inclusion threshold.
bagging = function(i, data, counties, penalty = 1) {
  
  keep = sample(counties, 0.75 * nlevels(counties))
  boot.sample = data[counties %in% keep, ]
  boot.sample = boot.sample[, sample(ncol(data), ncol(data))]
  
  learn.dbn(boot.sample, penalty = penalty)
  
}#BAGGING
averaging = function(data, penalty = 1) {

  dags = parLapply(cl, seq(500), bagging, data = data[, c(t0.vars, t1.vars)], 
                   counties = data$county, penalty = penalty)
  strength = custom.strength(dags, nodes = c(t0.vars, t1.vars))

  averaged.network(strength)

}#AVERAGING

avg.dag = averaging(dyndata, penalty = 1)

The consensus graph produces by the code above is the following.

The labels of the nodes here and in the following are abbreviated for readability.

original names (time 0) abbreviations (time 0) original names (time 1) abbreviations (time 1)
Acne_0 ACNE0 Acne_1 ACNE1
Asthma_0 ASTH0 Asthma_1 ASTH1
ADHD_0 ADHD0 ADHD_1 ADHD1
Burn_0 BURN0 Burn_1 BURN1
Erectile_dysfunction_0 ED0 Erectile_dysfunction_1 ED1
Scar_0 SCAR0 Scar_1 SCAR1
Alcoholism_0 ALC0 Alcoholism_1 ALC1
Anxiety_0 ANX0 Anxiety_1 ANX1
Depression_0 DEP0 Depression_1 DEP1
Dermatitis_0 DER0 Dermatitis_1 DER1
Sleep_disorder_0 SLD0 Sleep_disorder_1 SLD1
Obesity_0 OBE0 Obesity_1 OBE1

Merging the nodes that correspond to the same variable (at different time points) produces the following graph, in which feedback loops are explicitly shown using bi-directed arcs.

We observe that the graph is dense: as a result of the large sample size, even small effects significant and therefore we learn a large number of arcs. However, we would like to have a sparser graph with few enough arcs that we can interpret their meaning and reason about them qualitatively.

Recall that we use BIC to choose which arc to include. If we take two networks M_0 and M_1 with M_1 = M_0 \cup \{\text{one additional arc}\} the inclusion criterion for that arc is: \begin{align*}
  &BIC(M_1) - BIC(M_0) > 0& &\text{where}& 
  &BIC(M) = \log L(D \mid M) - \frac{\log n}{2} p(M)
\end{align*} Taking into account that BIC(M) \approx \log P(M \mid |D) if we assume all models are equally probable a priori, the above means: \begin{align*}
  \log BF(M_0, M_1) > 0
\end{align*} so we accept M_1 over M_0 if the Bayes factor for the two models is larger than 1. If we increase the BIC penalty, say, \begin{align*}
  &BIC^*(M) = \log L(D \mid M) - w \left[ \frac{\log n}{2} p(M) \right]& 
  &\text{with}& &w \geqslant 1
\end{align*} then \begin{align*}
  BIC^*(M) = BIC(M) - (w - 1) \left[ \frac{\log n}{2} p(M) \right]
\end{align*} and the resulting Bayes factor becomes \begin{align*}
  \log BF(M_0, M_1) > (w -1) \left[ \frac{\log n}{2} (p(M_1) - p(M_0))\right]
\end{align*} so that the strength of the evidence we need increases with the sample size at a rate w that we control, gradually excluding arcs that correspond to small effects. For instance, increasing w = 1 to w = 8 yields the following consensus graph.

Further increasing w = 8 to w = 32 yields the following consensus graph.

Validating Predictive Accuracy

The best value of w is that striking the right balance between the sparsity of the graph and predictive accuracy. Since all variables are continuous, we measure predictive accuracy using the proportion of variance in the symptoms explained by the predicted values.

dbn.accuracy = function(dag, training, validation) {

  fitted = bn.fit(dag, training[, c(t0.vars, t1.vars)])
  accuracy = structure(vector(length(t1.vars), mode = "numeric"), names = t1.vars)
  for (node in t1.vars) {

    pred = predict(fitted, data = validation[, c(t0.vars, t1.vars)], 
                   node = node, method = "parents")
    accuracy[node] = summary(lm(validation[, node] ~ pred))$adj.r.squared

  }#FOR

  accuracy["AVERAGE"] = mean(accuracy)

  return(accuracy)
  
}#DBN.ACCURACY

In particular

  1. We create a training set comprising the first 53 weeks worth of data, and we keep the remaining 21 weeks as a validation set.
  2. We learn the graph of the dynamic Bayesian network from the training set for w = 1, 2, 4, 8, 16, 32, 64, 128.
  3. We learn the regression coefficients for each node in each network from the training set.
  4. For each data point in the validation set, we use the symptoms at time t_0 to predict the symptoms at time t_1.
  5. For each symptom, we regress the observed values on the predicted values and we measure the proportion of explained variance.
  6. We average the proportions of explained variance in all symptoms to compute a summary measure of the overall predictive accuracy of the dynamic Bayesian network.
training = subset(dyndata, date < min(date) + 52 * 7)
validation = subset(dyndata, date >= min(date) + 52 * 7)

penalty = c(1, 2, 4, 8, 16, 32, 64, 128)
accuracy = vector(length(penalty), mode = "list")

for (i in seq_along(penalty)) {
  
  dag = averaging(training, penalty = penalty[i])
  accuracy[[i]] = dbn.accuracy(dag, training, validation)
  
}#FOR

The proportion of variance explained in the validation set is comparable to that in the training set, which suggests that the model generalises well. Note that many of the data points in the validation set are measured weeks and months after the most recent data point in the training set, which confirms that the model can predict well the symptoms in the next time point beyond the immediate future.

The number of arcs in the graph decreases linearly in \log(w), going down from 108 for w = 1 to 47 for w = 16 and to 32 for w = 32.

Furthermore, the proportion of variance explained in the validation set remains stable for all w between 1 and and 64: it decreases by only 0.01 even though the nuber of arcs in the graph decreases from 108 to 21. The proportion of the variance in the training set, which measures the goodness of fit (as opposed to predictive accuracy), starts to fall sharply after w = 32. This suggests that w \in [16, 32] produce dynamic Bayesian networks that have a good balance between sparsity, goodness of fit and predictive accuracy.

The dynamic Bayesian network learned with w = 32 is shown above. For reference, that learned with w = 16 is shown below.

Finally, we choose the Bayesian networks learned with w = 4 as the best compromise between predictive accuracy and sparsity: it has 80 arcs and a predictive accuracy of 0.67 (99.96% of the network learned with w = 1 with only 74.07% of the arcs).

The strengths of the arcs included in the final Bayesian network are reported below.

from to strength direction
12 Acne_0 Acne_1 1.000 1
14 Acne_0 ADHD_1 1.000 1
17 Acne_0 Scar_1 1.000 1
18 Acne_0 Alcoholism_1 1.000 1
19 Acne_0 Anxiety_1 1.000 1
21 Acne_0 Dermatitis_1 1.000 1
22 Acne_0 Sleep_disorder_1 1.000 1
23 Acne_0 Obesity_1 1.000 1
35 Asthma_0 Acne_1 1.000 1
36 Asthma_0 Asthma_1 1.000 1
38 Asthma_0 Burn_1 1.000 1
39 Asthma_0 Erectile_dysfunction_1 1.000 1
40 Asthma_0 Scar_1 1.000 1
42 Asthma_0 Anxiety_1 0.994 1
58 ADHD_0 Acne_1 1.000 1
59 ADHD_0 Asthma_1 1.000 1
60 ADHD_0 ADHD_1 1.000 1
61 ADHD_0 Burn_1 1.000 1
62 ADHD_0 Erectile_dysfunction_1 1.000 1
63 ADHD_0 Scar_1 1.000 1
64 ADHD_0 Alcoholism_1 1.000 1
65 ADHD_0 Anxiety_1 0.998 1
66 ADHD_0 Depression_1 1.000 1
67 ADHD_0 Dermatitis_1 0.688 1
69 ADHD_0 Obesity_1 0.998 1
82 Burn_0 Asthma_1 1.000 1
83 Burn_0 ADHD_1 0.996 1
84 Burn_0 Burn_1 1.000 1
85 Burn_0 Erectile_dysfunction_1 1.000 1
86 Burn_0 Scar_1 1.000 1
87 Burn_0 Alcoholism_1 1.000 1
88 Burn_0 Anxiety_1 1.000 1
90 Burn_0 Dermatitis_1 1.000 1
91 Burn_0 Sleep_disorder_1 0.972 1
105 Erectile_dysfunction_0 Asthma_1 1.000 1
106 Erectile_dysfunction_0 ADHD_1 1.000 1
107 Erectile_dysfunction_0 Burn_1 1.000 1
108 Erectile_dysfunction_0 Erectile_dysfunction_1 1.000 1
109 Erectile_dysfunction_0 Scar_1 1.000 1
114 Erectile_dysfunction_0 Sleep_disorder_1 0.938 1
127 Scar_0 Acne_1 1.000 1
128 Scar_0 Asthma_1 1.000 1
129 Scar_0 ADHD_1 1.000 1
130 Scar_0 Burn_1 1.000 1
131 Scar_0 Erectile_dysfunction_1 1.000 1
132 Scar_0 Scar_1 1.000 1
133 Scar_0 Alcoholism_1 1.000 1
135 Scar_0 Depression_1 0.746 1
136 Scar_0 Dermatitis_1 1.000 1
137 Scar_0 Sleep_disorder_1 0.992 1
150 Alcoholism_0 Acne_1 0.974 1
152 Alcoholism_0 ADHD_1 1.000 1
155 Alcoholism_0 Scar_1 1.000 1
156 Alcoholism_0 Alcoholism_1 1.000 1
161 Alcoholism_0 Obesity_1 0.984 1
173 Anxiety_0 Acne_1 1.000 1
175 Anxiety_0 ADHD_1 1.000 1
176 Anxiety_0 Burn_1 1.000 1
180 Anxiety_0 Anxiety_1 1.000 1
181 Anxiety_0 Depression_1 1.000 1
198 Depression_0 ADHD_1 1.000 1
201 Depression_0 Scar_1 0.732 1
203 Depression_0 Anxiety_1 1.000 1
204 Depression_0 Depression_1 1.000 1
206 Depression_0 Sleep_disorder_1 1.000 1
219 Dermatitis_0 Acne_1 0.906 1
221 Dermatitis_0 ADHD_1 0.996 1
222 Dermatitis_0 Burn_1 1.000 1
224 Dermatitis_0 Scar_1 1.000 1
226 Dermatitis_0 Anxiety_1 1.000 1
227 Dermatitis_0 Depression_1 0.966 1
228 Dermatitis_0 Dermatitis_1 1.000 1
229 Dermatitis_0 Sleep_disorder_1 0.998 1
230 Dermatitis_0 Obesity_1 1.000 1
245 Sleep_disorder_0 Burn_1 1.000 1
246 Sleep_disorder_0 Erectile_dysfunction_1 0.930 1
247 Sleep_disorder_0 Scar_1 0.998 1
249 Sleep_disorder_0 Anxiety_1 1.000 1
250 Sleep_disorder_0 Depression_1 1.000 1
252 Sleep_disorder_0 Sleep_disorder_1 1.000 1
253 Sleep_disorder_0 Obesity_1 1.000 1
265 Obesity_0 Acne_1 1.000 1
266 Obesity_0 Asthma_1 0.978 1
271 Obesity_0 Alcoholism_1 1.000 1
272 Obesity_0 Anxiety_1 0.986 1
273 Obesity_0 Depression_1 1.000 1
274 Obesity_0 Dermatitis_1 1.000 1
275 Obesity_0 Sleep_disorder_1 1.000 1
276 Obesity_0 Obesity_1 1.000 1

The predictive accuracies for the individual symptoms are as follows.

x
Acne_1 0.42
Asthma_1 0.85
ADHD_1 0.58
Burn_1 0.87
Erectile_dysfunction_1 0.76
Scar_1 0.88
Alcoholism_1 0.57
Anxiety_1 0.57
Depression_1 0.53
Dermatitis_1 0.74
Sleep_disorder_1 0.60
Obesity_1 0.66
AVERAGE 0.67

The final Bayesian network follows a multivariate normal distribution: we can therefore represent it with its marginal and partial correlation matrices. Below are the portions of the those matrices that describe the correlations between variables across time, with correlations in absolute values.

The proportions of variance explained by the parents of key conditions, discounting the variance explained by the condition itself due to autocorrelation, are reported below.

for (node in c("Acne_1", "Dermatitis_1", "Depression_1", "Anxiety_1", "Sleep_disorder_1", "ADHD_1", "Obesity_1")) {

  pa = unique(c(sub("_1", "_0", node), parents(fitted, node)))
  formula = paste(node, "~", paste(pa, collapse = " + "))

  aa = anova(lm(formula, data = dyndata))
  aa = aa[!(row.names(aa) %in% c("Residuals", sub("_1", "_0", node))), , drop = FALSE]
  cat("<p><strong>node:", node, "</strong></p>\n")
  print(knitr::kable(round(aa[, "Sum Sq", drop = FALSE] / sum(aa[, "Sum Sq", drop = FALSE]), 3)))
  cat("\n")

}#FOR

node: Acne_1

Sum Sq
Asthma_0 0.235
ADHD_0 0.066
Scar_0 0.127
Alcoholism_0 0.018
Anxiety_0 0.216
Dermatitis_0 0.074
Obesity_0 0.264

node: Dermatitis_1

Sum Sq
Acne_0 0.261
ADHD_0 0.005
Burn_0 0.185
Scar_0 0.160
Obesity_0 0.389

node: Depression_1

Sum Sq
ADHD_0 0.081
Scar_0 0.072
Anxiety_0 0.371
Dermatitis_0 0.109
Sleep_disorder_0 0.207
Obesity_0 0.160

node: Anxiety_1

Sum Sq
Acne_0 0.175
Asthma_0 0.219
ADHD_0 0.064
Burn_0 0.067
Depression_0 0.230
Dermatitis_0 0.128
Sleep_disorder_0 0.079
Obesity_0 0.039

node: Sleep_disorder_1

Sum Sq
Acne_0 0.090
Burn_0 0.331
Erectile_dysfunction_0 0.011
Scar_0 0.170
Depression_0 0.156
Dermatitis_0 0.145
Obesity_0 0.097

node: ADHD_1

Sum Sq
Acne_0 0.014
Burn_0 0.495
Erectile_dysfunction_0 0.138
Scar_0 0.121
Alcoholism_0 0.140
Anxiety_0 0.046
Depression_0 0.033
Dermatitis_0 0.011

node: Obesity_1

Sum Sq
Acne_0 0.622
ADHD_0 0.158
Alcoholism_0 0.043
Dermatitis_0 0.052
Sleep_disorder_0 0.125