A network analysis of the symptoms from the Zung depression scale components, Briganti, Scutari and Linkowski, Psychological Reports (2020)
This is a short HOWTO describing the analysis in “Network Structures of Symptoms from the Zung Depression Scale” by Briganti, Scutari and Linkowski (2020, Psychological Reports).
Loading the required libraries
> library(qgraph) > library(bootnet) > require(pcalg) > library(bnlearn) > library(NetworkComparisonTest)
Loading and exploring the data
The data are available and can be downloaded from here. They comprise 1090 observations and 20 variables taking values between 1 and 4.
> data = readRDS("zung.rds") > names(data) = + c("Blue", "Morning", "Crying", "SleepP", "Eat", "Sex", "Weight","Constipated", + "HeartR","Tired", "Mind", "Things", "Restless", "Hope", "Irritable", + "Decision", "Useful", "LifeFull", "Dead", "Enjoy")
Exploring the data, we find that most of the variability is explained by the first 3 to 5 principal components. The
function cor_auto() from the qgraph package automatically recognizes that the variables
are on Likert scales and treats them as ordered factors; manually converting them with as.ordered() gives
the same results.
> plot(eigen(cor_auto(data))$values, type = "b", pch = 19)
A correlation graph (from the qgraph() function in qgraph) shows that all variables are
positively correlated, with the notable exception of Useful and Weight. A fair number of
correlations appear to be strong in magnitude, but mist are relatively weak.
> qgraph(cor_auto(data))
Learning an undirected network model
We can build more informative undirected network models than the correlation graphs above using partial
correlations instead of marginal correlation coefficients. One way to learn such a model is to use the graphical
lasso, in this case via qgraph(). This method is described in detail in “The Elements of
Statistical Learning” by Hastie, Tibshirani and Friedman and is implemented in the glasso
(on which qgraph depends.)
> glasso = qgraph(cor(data), layout = "spring", graph = "glasso", + sampleSize = nrow(data), labels = names(data), theme = "colorblind")
> qgraph(glasso)
We can also use the first part of the PC algorithm to learn the skeleton of a Bayesian network. This approach
associates arcs with partial correlations like the graphical LASSO, and produces an undirected network which
may be refined to produce a Bayesian network. With the skeleton() function from the pcalg
package:
> skel = skeleton(suffStat = list(C = cor(data), n = nrow(data)), indepTest = gaussCItest, + alpha = 0.10, p = 20) > qgraph(skel, layout = "spring", labels = names(data))
The same model is learned by pc.stable() in bnlearn, which provides an alternative
implementation of the PC algorithm.
> skel = pc.stable(data, alpha = 0.10, undirected = TRUE)
We can compute some descriptive statistics that summarize the network's structure using centrality()
from the qgraph package.
> glasso.stats = centrality(glasso) > glasso.stats[c("InDegree", "Betweenness", "Closeness")]
$InDegree
Blue Morning Crying SleepP Eat Sex
1.1016105 0.1769498 0.9778115 0.6259414 0.9193397 0.4998780
Weight Constipated HeartR Tired Mind Things
0.3539278 0.3103145 0.7591750 0.9311434 1.1277081 0.6825825
Restless Hope Irritable Decision Useful LifeFull
0.5089186 0.7477680 0.8717946 0.8012730 0.7970176 0.8354880
Dead Enjoy
0.6784831 0.7287845
$Betweenness
Blue Morning Crying SleepP Eat Sex
56 0 6 4 52 16
Weight Constipated HeartR Tired Mind Things
4 0 8 34 70 0
Restless Hope Irritable Decision Useful LifeFull
0 0 18 28 40 32
Dead Enjoy
18 14
$Closeness
Blue Morning Crying SleepP Eat Sex
0.003241426 0.001451208 0.002823657 0.002371516 0.002965184 0.002553947
Weight Constipated HeartR Tired Mind Things
0.002368336 0.001898261 0.002682269 0.002782175 0.003477046 0.002805384
Restless Hope Irritable Decision Useful LifeFull
0.002477704 0.002513255 0.002823393 0.002952942 0.002714208 0.002628929
Dead Enjoy
0.002818637 0.002877713
> cor(glasso.stats$InDegree, glasso.stats$Betweenness, method = "spearman")
[1] 0.7589347
> cor(glasso.stats$InDegree, glasso.stats$Closeness, method = "spearman")
[1] 0.8195489
> cor(glasso.stats$Closeness, glasso.stats$Betweenness, method = "spearman")
[1] 0.7323985
Another function we considered to estimate an undirected network model is estimateNetwork() from the
bootnet package, which provides another implementation of the graphical LASSO.
> glasso2 = estimateNetwork(data, default = "EBICglasso", corMethod = "cor", + corArgs = list(use = "pairwise.complete.obs"))
We can pass the object returned by estimateNetwork() to the bootnet() function to
compute various bootstrap estimates of arc strength and other graphical summaries of the network.
> boot = bootnet(glasso2, ncores = 4, nboots = 200, type = "nonparametric", + verbose = FALSE) > plot(boot, labels = FALSE, order = "sample")
Learning a directed network model
We can learn a directed network model (that is, a Bayesian network) using the PC algorithm in a similar way,
calling either the pc() function from pcalg:
> pc.fit = pc(suffStat = list(C = cor(data), n = nrow(data)), indepTest = gaussCItest, + alpha = 0.05, p = 20)
or pc.stable(), if we are using bnlearn:
> pc.fit = pc.stable(data, alpha = 0.05)
In order to improve the stability of the learned network, we perform bootstrap aggregation and model averaging
with boot.strength() and averaged.network() instead of calling pc.stable()
directly.
> bootstr = boot.strength(data, R = 100, algorithm = "pc.stable") > bootstr[with(bootstr, strength >= 0.85 & direction >= 0.5), ]
from to strength direction
8 Blue HeartR 0.98 0.5714286
10 Blue Mind 0.93 0.5645161
14 Blue Irritable 0.95 0.8263158
39 Crying Blue 1.00 0.5300000
47 Crying Tired 1.00 0.6150000
71 SleepP Irritable 0.92 0.7391304
95 Eat Enjoy 0.98 0.6224490
100 Sex Eat 1.00 0.6750000
112 Sex LifeFull 0.97 0.6288660
114 Sex Enjoy 0.94 0.7819149
119 Weight Eat 1.00 0.6350000
156 HeartR SleepP 1.00 0.6250000
161 HeartR Tired 1.00 0.6050000
185 Tired Irritable 0.87 0.6954023
195 Mind Eat 0.99 0.5454545
201 Mind Things 1.00 0.5150000
204 Mind Irritable 0.85 0.6705882
209 Mind Enjoy 0.93 0.6774194
224 Things Decision 0.98 0.5000000
232 Restless SleepP 0.91 0.6538462
238 Restless Tired 1.00 0.6200000
242 Restless Irritable 1.00 0.7700000
296 Decision Mind 1.00 0.5050000
297 Decision Things 0.98 0.5000000
301 Decision Useful 1.00 0.5250000
318 Useful Hope 0.92 0.7173913
321 Useful LifeFull 1.00 0.5050000
337 LifeFull Hope 0.99 0.6414141
341 LifeFull Dead 0.85 0.5470588
356 Dead Hope 0.90 0.5111111
380 Enjoy Dead 0.94 0.5212766
> avgnet = averaged.network(bootstr, threshold = 0.85) > avgnet
Random/Generated Bayesian network
model:
[partially directed graph]
nodes: 20
arcs: 28
undirected arcs: 1
directed arcs: 27
average markov blanket size: 4.20
average neighbourhood size: 2.80
average branching factor: 1.35
generation algorithm: Model Averaging
significance threshold: 0.85
We can plot the result either with strength.plot() or with qgraph(), for consistency with
previous plots.
> sp = strength.plot(avgnet, bootstr, shape = "ellipse")
> qgraph(sp, layout = "spring", labels = nodes(avgnet))
Finally, we can test the causal effects of variables on each other using the implementation of the IDA algorithm in
the pcalg package. For instance, we can check the effect of Morning on
Sex as follows. (ida() identifies variables by position, not by name.)
> ida(6, 2, cov(data), as.graphNEL(avgnet), method = "global")
A network comparison test
Another interesting question we can investigate from the data is whether there is a difference in the symptoms
between men and women. We have split data sets for men (here) and for women
(here), and we will use the NCT() function in the
NetworkComparisonTest package to answer it.
> dataF = readRDS("zungF.rds") > dataM = readRDS("zungM.rds") > > test = NCT(dataM, dataF, it = 500, binary.data = FALSE, test.edges = TRUE, + edges = "all", progressbar = FALSE)
NCT() uses permutation testing in combination with several graph summary statistics (network structure
invariance, global strength invariance, edge invariance) to compare the undirected networks learned from
dataM and dataM using graphical LASSO. (Like that in the glasso object above).
The p-value for maximum difference in edge weights (nwinv.pval) and the p-value for the difference in
global strength are both higher than any threshold you would typically use, leading us to accept the null hypothesis
that there is no significant difference between men and women.
> test$nwinv.pval
[1] 0.376
> test$glstrinv.pval
[1] 0.186
Fri Nov 25 18:59:25 2022 with bnlearn
4.9-20221107
and R version 4.2.2 Patched (2022-11-10 r83330).