A network analysis of the symptoms from the Zung depression scale components, Briganti, Scutari and Linkowski, Psychological Reports (2020)

This is a short HOWTO describing the analysis in “Network Structures of Symptoms from the Zung Depression Scale” by Briganti, Scutari and Linkowski (2020, Psychological Reports).

Loading the required libraries

> library(qgraph)
> library(bootnet)
> require(pcalg)
> library(bnlearn)
> library(NetworkComparisonTest)

Loading and exploring the data

The data are available and can be downloaded from here. They comprise 1090 observations and 20 variables taking values between 1 and 4.

> data = readRDS("zung.rds")
> names(data) =
+   c("Blue", "Morning", "Crying", "SleepP", "Eat", "Sex", "Weight","Constipated",
+     "HeartR","Tired", "Mind", "Things", "Restless", "Hope", "Irritable",
+     "Decision", "Useful", "LifeFull", "Dead", "Enjoy")

Exploring the data, we find that most of the variability is explained by the first 3 to 5 principal components. The function cor_auto() from the qgraph package automatically recognizes that the variables are on Likert scales and treats them as ordered factors; manually converting them with as.ordered() gives the same results.

> plot(eigen(cor_auto(data))$values, type = "b", pch = 19)
plot of chunk unnamed-chunk-4

A correlation graph (from the qgraph() function in qgraph) shows that all variables are positively correlated, with the notable exception of Useful and Weight. A fair number of correlations appear to be strong in magnitude, but mist are relatively weak.

> qgraph(cor_auto(data))
plot of chunk unnamed-chunk-5

Learning an undirected network model

We can build more informative undirected network models than the correlation graphs above using partial correlations instead of marginal correlation coefficients. One way to learn such a model is to use the graphical lasso, in this case via qgraph(). This method is described in detail in “The Elements of Statistical Learning” by Hastie, Tibshirani and Friedman and is implemented in the glasso (on which qgraph depends.)

> glasso = qgraph(cor(data), layout = "spring", graph = "glasso",
+          sampleSize = nrow(data), labels = names(data), theme = "colorblind")
plot of chunk unnamed-chunk-6
> qgraph(glasso)
plot of chunk unnamed-chunk-6

We can also use the first part of the PC algorithm to learn the skeleton of a Bayesian network. This approach associates arcs with partial correlations like the graphical LASSO, and produces an undirected network which may be refined to produce a Bayesian network. With the skeleton() function from the pcalg package:

> skel = skeleton(suffStat = list(C = cor(data), n = nrow(data)), indepTest = gaussCItest,
+            alpha = 0.10, p = 20)
> qgraph(skel, layout = "spring", labels = names(data))
plot of chunk unnamed-chunk-7

The same model is learned by pc.stable() in bnlearn, which provides an alternative implementation of the PC algorithm.

> skel = pc.stable(data, alpha = 0.10, undirected = TRUE)

We can compute some descriptive statistics that summarize the network's structure using centrality() from the qgraph package.

> glasso.stats = centrality(glasso)
> glasso.stats[c("InDegree", "Betweenness", "Closeness")]
$InDegree
       Blue     Morning      Crying      SleepP         Eat         Sex 
  1.1016105   0.1769498   0.9778115   0.6259414   0.9193397   0.4998780 
     Weight Constipated      HeartR       Tired        Mind      Things 
  0.3539278   0.3103145   0.7591750   0.9311434   1.1277081   0.6825825 
   Restless        Hope   Irritable    Decision      Useful    LifeFull 
  0.5089186   0.7477680   0.8717946   0.8012730   0.7970176   0.8354880 
       Dead       Enjoy 
  0.6784831   0.7287845 

$Betweenness
       Blue     Morning      Crying      SleepP         Eat         Sex 
         56           0           6           4          52          16 
     Weight Constipated      HeartR       Tired        Mind      Things 
          4           0           8          34          70           0 
   Restless        Hope   Irritable    Decision      Useful    LifeFull 
          0           0          18          28          40          32 
       Dead       Enjoy 
         18          14 

$Closeness
       Blue     Morning      Crying      SleepP         Eat         Sex 
0.003241426 0.001451208 0.002823657 0.002371516 0.002965184 0.002553947 
     Weight Constipated      HeartR       Tired        Mind      Things 
0.002368336 0.001898261 0.002682269 0.002782175 0.003477046 0.002805384 
   Restless        Hope   Irritable    Decision      Useful    LifeFull 
0.002477704 0.002513255 0.002823393 0.002952942 0.002714208 0.002628929 
       Dead       Enjoy 
0.002818637 0.002877713
> cor(glasso.stats$InDegree, glasso.stats$Betweenness, method = "spearman")
[1] 0.7589347
> cor(glasso.stats$InDegree, glasso.stats$Closeness, method = "spearman")
[1] 0.8195489
> cor(glasso.stats$Closeness, glasso.stats$Betweenness, method = "spearman")
[1] 0.7323985

Another function we considered to estimate an undirected network model is estimateNetwork() from the bootnet package, which provides another implementation of the graphical LASSO.

> glasso2 = estimateNetwork(data, default = "EBICglasso", corMethod = "cor",
+             corArgs = list(use = "pairwise.complete.obs"))

We can pass the object returned by estimateNetwork() to the bootnet() function to compute various bootstrap estimates of arc strength and other graphical summaries of the network.

> boot = bootnet(glasso2, ncores = 4, nboots = 200, type = "nonparametric",
+          verbose = FALSE)
> plot(boot, labels = FALSE, order = "sample")
plot of chunk unnamed-chunk-11

Learning a directed network model

We can learn a directed network model (that is, a Bayesian network) using the PC algorithm in a similar way, calling either the pc() function from pcalg:

> pc.fit = pc(suffStat = list(C = cor(data), n = nrow(data)), indepTest = gaussCItest,
+            alpha = 0.05, p = 20)

or pc.stable(), if we are using bnlearn:

> pc.fit = pc.stable(data, alpha = 0.05)

In order to improve the stability of the learned network, we perform bootstrap aggregation and model averaging with boot.strength() and averaged.network() instead of calling pc.stable() directly.

> bootstr = boot.strength(data, R = 100, algorithm = "pc.stable")
> bootstr[with(bootstr, strength >= 0.85 & direction >= 0.5), ]
        from        to strength direction
8       Blue    HeartR     0.98 0.5714286
10      Blue      Mind     0.93 0.5645161
14      Blue Irritable     0.95 0.8263158
39    Crying      Blue     1.00 0.5300000
47    Crying     Tired     1.00 0.6150000
71    SleepP Irritable     0.92 0.7391304
95       Eat     Enjoy     0.98 0.6224490
100      Sex       Eat     1.00 0.6750000
112      Sex  LifeFull     0.97 0.6288660
114      Sex     Enjoy     0.94 0.7819149
119   Weight       Eat     1.00 0.6350000
156   HeartR    SleepP     1.00 0.6250000
161   HeartR     Tired     1.00 0.6050000
185    Tired Irritable     0.87 0.6954023
195     Mind       Eat     0.99 0.5454545
201     Mind    Things     1.00 0.5150000
204     Mind Irritable     0.85 0.6705882
209     Mind     Enjoy     0.93 0.6774194
224   Things  Decision     0.98 0.5000000
232 Restless    SleepP     0.91 0.6538462
238 Restless     Tired     1.00 0.6200000
242 Restless Irritable     1.00 0.7700000
296 Decision      Mind     1.00 0.5050000
297 Decision    Things     0.98 0.5000000
301 Decision    Useful     1.00 0.5250000
318   Useful      Hope     0.92 0.7173913
321   Useful  LifeFull     1.00 0.5050000
337 LifeFull      Hope     0.99 0.6414141
341 LifeFull      Dead     0.85 0.5470588
356     Dead      Hope     0.90 0.5111111
380    Enjoy      Dead     0.94 0.5212766
> avgnet = averaged.network(bootstr, threshold = 0.85)
> avgnet

  Random/Generated Bayesian network

  model:
    [partially directed graph]
  nodes:                                 20 
  arcs:                                  28 
    undirected arcs:                     1 
    directed arcs:                       27 
  average markov blanket size:           4.20 
  average neighbourhood size:            2.80 
  average branching factor:              1.35 

  generation algorithm:                  Model Averaging 
  significance threshold:                0.85

We can plot the result either with strength.plot() or with qgraph(), for consistency with previous plots.

> sp = strength.plot(avgnet, bootstr, shape = "ellipse")
plot of chunk unnamed-chunk-15
> qgraph(sp, layout = "spring", labels = nodes(avgnet))
plot of chunk unnamed-chunk-15

Finally, we can test the causal effects of variables on each other using the implementation of the IDA algorithm in the pcalg package. For instance, we can check the effect of Morning on Sex as follows. (ida() identifies variables by position, not by name.)

> ida(6, 2, cov(data), as.graphNEL(avgnet), method = "global")

A network comparison test

Another interesting question we can investigate from the data is whether there is a difference in the symptoms between men and women. We have split data sets for men (here) and for women (here), and we will use the NCT() function in the NetworkComparisonTest package to answer it.

> dataF = readRDS("zungF.rds")
> dataM = readRDS("zungM.rds")
> 
> test = NCT(dataM, dataF, it = 500, binary.data = FALSE, test.edges = TRUE,
+          edges = "all", progressbar = FALSE)

NCT() uses permutation testing in combination with several graph summary statistics (network structure invariance, global strength invariance, edge invariance) to compare the undirected networks learned from dataM and dataM using graphical LASSO. (Like that in the glasso object above).

The p-value for maximum difference in edge weights (nwinv.pval) and the p-value for the difference in global strength are both higher than any threshold you would typically use, leading us to accept the null hypothesis that there is no significant difference between men and women.

> test$nwinv.pval
[1] 0.376
> test$glstrinv.pval
[1] 0.186
Last updated on Fri Nov 25 18:59:25 2022 with bnlearn 4.9-20221107 and R version 4.2.2 Patched (2022-11-10 r83330).