Introduction
Depression is one of the most prevalent and debilitating mental illnesses worldwide (World Health Organization, 2023). Despite the availability of effective treatments, a considerable proportion of individuals with depression still fail to achieve an adequate and sustained improvement (Cuijpers et al., Reference Cuijpers, Karyotaki, Ciharova, Miguel, Noma and Furukawa2021; Rush et al., Reference Rush, Trivedi, Wisniewski, Nierenberg, Stewart, Warden and Fava2006; Trivedi et al., Reference Trivedi, Rush, Wisniewski, Nierenberg, Warden, Ritz and Fava2006). Unfortunately, there are few reliable and robust characteristics that distinguish those who respond and do not respond to treatment with most efforts to-date focused on standardised clinical measures and demographics (Maj et al., Reference Maj, Stein, Parker, Zimmerman, Fava, De Hert and Wittchen2020; McMahon, Reference McMahon2014; Rost, Binder, & Brückl, Reference Rost, Binder and Brückl2023). Many have suggested this is a consequence of the way we conceptualise depression as a latent phenomenon that causes observed symptoms like sadness and anhedonia, which we typically sum to produce an overall depression score. Network theory of psychopathology forwards a different perspective and posits that symptoms are interacting components of a dynamical system (Borsboom, Reference Borsboom2017; Borsboom & Cramer, Reference Borsboom and Cramer2013), which can result in positive feedback loops that propel people into episodes of illness. The greater the connectivity of these symptom networks, the lower the psychological resilience one has, with more connected networks reacting more strongly to perturbations and taking longer to recover. A key prediction of network theory emerging from this is that individuals with tightly connected networks should have greater vulnerability to depression, poorer prognosis, and more treatment resistance (Cramer et al., Reference Cramer, van Borkulo, Giltay, van der Maas, Kendler, Scheffer and Borsboom2016; Pe et al., Reference Pe, Kircanski, Thompson, Bringmann, Tuerlinckx, Mestdagh and Gotlib2015; van Borkulo et al., Reference van Borkulo, Boschloo, Borsboom, Penninx, Lourens and Schoevers2015).
Several studies tested this using cross-sectional network analysis. van Borkulo and colleagues compared baseline connectivity differences between persisters (n = 253) and remitters (n = 262) of depression after two years (van Borkulo et al., Reference van Borkulo, Boschloo, Borsboom, Penninx, Lourens and Schoevers2015). In line with network theory, persisters had tighter network connectivity compared to remitters at baseline. This was replicated in a child and adolescent sample (n = 566/174) (McElroy, Napoleone, Wolpert, & Patalay, Reference McElroy, Napoleone, Wolpert and Patalay2019), but there have also been null findings, for example, in adolescents (n = 232/233) (Schweren, van Borkulo, Fried, & Goodyer, Reference Schweren, van Borkulo, Fried and Goodyer2018), and when depression and anxiety symptoms were examined together (n = 956/1466) (O'Driscoll et al., Reference O'Driscoll, Buckman, Fried, Saunders, Cohen, Ambler and Pilling2021). On a more granular level, some studies have shown that the severity of symptoms that are more ‘central’ (i.e. important) is associated with non-response (Elliott, Jones, & Schmidt, Reference Elliott, Jones and Schmidt2020; Hagan et al., Reference Hagan, Matheson, Datta, L'Insalata, Onipede, Gorrell and Lock2021), and that improvements in central symptoms predict changes in other symptoms (Papini, Rubin, Telch, Smits, & Hien, Reference Papini, Rubin, Telch, Smits and Hien2020; Robinaugh, Millner, & McNally, Reference Robinaugh, Millner and McNally2016; Rodebaugh et al., Reference Rodebaugh, Tonge, Piccirillo, Fried, Horenstein, Morrison and Heimberg2018). Findings regarding the centrality hypothesis, however, are not univocal (O'Driscoll et al., Reference O'Driscoll, Buckman, Fried, Saunders, Cohen, Ambler and Pilling2021; Spiller et al., Reference Spiller, Levi, Neria, Suarez-Jimenez, Bar-Haim and Lazarov2020), and it remains unclear whether centrality measures perform better than other network/non-network metrics when compared directly. Finally, contrary to network theory, a host of studies have reported that connectivity increases (rather than decreases) after treatment (Beard et al., Reference Beard, Millner, Forgeard, Fried, Hsu, Treadway and Björgvinsson2016; Berlim, Richard-Devantoy, Dos Santos, & Turecki, Reference Berlim, Richard-Devantoy, Dos Santos and Turecki2021; Blanco et al., Reference Blanco, Contreras, Chaves, Lopez-Gomez, Hervas and Vazquez2020; Bos et al., Reference Bos, Fried, Hollon, Bringmann, Dimidjian, DeRubeis and Bockting2018; Curtiss et al., Reference Curtiss, Wallace, Fisher, Nyer, Jain, Cusin and Pedrelli2021).
One of the common critiques of the network literature is the over-reliance on cross-sectional data and methods; estimating correlations between symptoms across-subjects rather than within-subject (Contreras, Nieto, Valiente, Espinosa, & Vazquez, Reference Contreras, Nieto, Valiente, Espinosa and Vazquez2019; Robinaugh, Hoekstra, Toner, & Borsboom, Reference Robinaugh, Hoekstra, Toner and Borsboom2020), and often employing small samples (Schumacher, Burger, Echterhoff, & Kriston, Reference Schumacher, Burger, Echterhoff and Kriston2022). This introduces two issues. First, it is uncertain if cross-sectional relationships between symptoms correspond to intraindividual relationships (Epskamp & Fried, Reference Epskamp and Fried2018; Fisher, Medaglia, & Jeronimus, Reference Fisher, Medaglia and Jeronimus2018). Second, cross-sectional studies typically construct just two networks of differential treatment response for comparison. This precludes controlling for potential confounds such as symptom severity and variance. Variance is of particular interest as it relates to the strength of the association that can be observed between symptoms. Cross-sectional networks are typically estimated from the partial correlations between symptom-pairs (Fried et al., Reference Fried, van Borkulo, Epskamp, Schoevers, Tuerlinckx and Borsboom2016), and the correlation between any two symptoms is their covariance proportional to their total variance. This leaves network estimation susceptible to differences in variance, which can be introduced artificially when creating sub-groups of participants (Bos & De Jonge, Reference Bos and De Jonge2014; Fried et al., Reference Fried, van Borkulo, Epskamp, Schoevers, Tuerlinckx and Borsboom2016; Terluin, de Boer, & de Vet, Reference Terluin, de Boer and de Vet2016). Prior research has shown that connectivity differences remain when groups are matched on baseline severity (McElroy et al., Reference McElroy, Napoleone, Wolpert and Patalay2019; van Borkulo et al., Reference van Borkulo, Boschloo, Borsboom, Penninx, Lourens and Schoevers2015), but to our knowledge, none have assessed the impact of differences in variance.
Our study sought to fill this gap by examining baseline network differences in N = 40 518 patients who received internet-delivered cognitive behavioural therapy (iCBT) for depression. Leveraging our large sample, we adopted a novel subsampling approach so to conduct parametric analyses for 160 independent responder and non-responder networks with n = 250 unique patients per subsample. Importantly, these subsamples naturally varied in levels of baseline network connectivity, symptom severity, and variance. This allowed us to assess if differences in cross-sectional network connectivity are better explained by differences in depression severity and/or variance, which have not been separable using standard methods comparing single dyads of responder-non-responder networks. Additionally, using the independent networks from the subsampling method, we assessed whether other network metrics such as symptom centrality related to treatment success and contextualised their effect sizes against simpler metrics such as mean and variance of individual symptoms. Finally, findings were tested for generalisation to partially overlapping samples receiving iCBT of a longer duration (8–12 weeks) and to networks constructed from anxiety symptoms in patients receiving anxiety-relevant iCBT.
Methods
Study setting and intervention
We examined an archival dataset of patients who received iCBT from SilverCloud Health between January 2015 and December 2020, as part of the Improving Access to Psychological Therapies programme within the National Health Service in England. The intervention followed NICE guidelines and have shown efficacy in improving clinical outcomes with sustained effects (Palacios et al., Reference Palacios, Enrique, Mooney, Farrell, Earley, Duffy and Richards2022; Richards et al., Reference Richards, Enrique, Eilert, Franklin, Palacios, Duffy and Timulak2020). Patients provided their consent for their anonymised data to be used in routine service evaluations.
Outcome measure
Depression was measured by the Patient Health Questionnaire-9 (PHQ-9) (Kroenke, Spitzer, & Williams, Reference Kroenke, Spitzer and Williams2001). PHQ-9 was administered to patients at the beginning of each iCBT session, but patients were able to skip and return to these assessments later.
Study sample
Figure 1a illustrates the process from which we derived our final study sample. First, patients were excluded if they did not have at least one PHQ-9 completed in a timeframe of 4–8 weeks since treatment initiation. We included patients completing relatively short durations of treatment (i.e. 4 weeks) due to the self-paced nature of iCBT (Lawler, Earley, Timulak, Enrique, & Richards, Reference Lawler, Earley, Timulak, Enrique and Richards2021). The last PHQ-9 completed within the 4–8 week window was deemed as the follow-up assessment. As the study focused on examining the association between depression network characteristics and clinical changes following treatment for depression, patients were further excluded if they scored <10 on the PHQ-9 (i.e. did not reach ‘caseness’ for depression) at baseline, and if they were enrolled in any other type of iCBT programme not purposed for treating depression. Most patients were clinician-guided, meaning treatment progress was monitored and facilitated by a clinician. As prior studies have shown differential efficacy of iCBT when guided v. unguided (Karyotaki et al., Reference Karyotaki, Efthimiou, Miguel, Bermpohl, Furukawa, Cuijpers and Forsell2021), we excluded data from patients who were unguided. Furthermore, patients who satisfied the responder and non-responder status defined in our study were included. Patients were classified as Responder if (1) they recovered (i.e. transitioned from ‘caseness’ to ‘non-caseness’ post-treatment), and (2) their score reduction was greater than the Reliable Change Index of ⩾6 on the PHQ-9 (Jacobson & Truax, Reference Jacobson and Truax1991). Patients were classified as Non-Responders if they met neither of these criteria, and patients who met only one of these criteria were treated as intermediate cases that were removed from analyses. Finally, as network estimation is influenced by sample size (Burger et al., Reference Burger, Isvoranu, Lunansky, Haslbeck, Epskamp, Hoekstra and Blanken2022), we yielded equal-sized groups for Responders and Non-Responders by matching the cohorts using 1:1 propensity score matching (n = 20 259 per group), where each patient with a specific number of days in treatment in the Responder group was matched to another patient with the same number of treatment days in the Non-Responder group, independent of their clinical scores.
Data analysis
Baseline and pre-post score analyses
Differences in PHQ-9 sum and item scores at baseline and follow-up, along with treatment engagement, were compared across Responders and Non-Responders using t tests and ANOVA.
Network analysis
Cross-sectional networks using Gaussian Graphical Models were estimated for Responders and Non-Responders at baseline and follow-up using all items of the PHQ-9 (Epskamp & Fried, Reference Epskamp and Fried2018; Epskamp, Waldorp, Mõttus, & Borsboom, Reference Epskamp, Waldorp, Mõttus and Borsboom2018). Relationships between symptoms (nodes) were estimated using partial correlations (edges) (i.e. the relationship between two symptoms after controlling for the others within the same network). The glasso regularisation penalisation technique based on the Extended Bayesian Information Criterion was performed during model selection (Chen & Chen, Reference Chen and Chen2008). A tuning hyperparameter (γ = 0.5) was employed to find the optimal balance between parsimony and goodness of fit of the network. Network connectivity was defined as the weighted sum of the signed associations between nodes. For symptom centrality, we focused on examining node strength as one of the most evaluated and intuitive metrics in psychological networks. It quantifies the strength of a node's direct connections to other nodes in the network (Bringmann, Reference Bringmann2021). Statistical significance testing on network connectivity, edge-specific, and centrality differences were conducted using the Network Comparison Test (NCT) (van Borkulo et al., Reference van Borkulo, van Bork, Boschloo, Kossakowski, Tio, Schoevers and Waldorp2022). The NCT is a two-tailed resampling-based permutation test that compares network differences between two independent cross-sectional networks (responders and non-responders). Edge-difference networks (i.e. subtracting two network covariance-matrices) were used to illustrate significant edge differences between networks.
Power estimation
To determine the required sample size to detect connectivity differences between Responders v. Non-Responders at baseline, we repeated the NCT 1000 times for random subsets of n = 250, n = 500, n = 750, and n = 1000 per group and reported the statistical power, i.e., the proportion of samples in which a significant difference was detected.
Subsampling analysis
To test if the relationship between connectivity and treatment response is explained by baseline severity and/or variance differences, we divided our sample into 160 independent subsamples of n = 250, of which 80 were Responders and 80 were Non-Responders (Fig. 1b). Each subsample naturally varied in baseline PHQ-9 sum score mean and PHQ-9 sum score variance, which allowed us to treat these networks as unique observations in linear regressions predicting network connectivity from response status, with baseline PHQ-9 sum score mean and PHQ-9 sum score variance as covariates. Using these independent subsamples, we further tested the added prognostic value of network metrics for treatment success; we contextualised the magnitude (i.e. effect size) of the association between baseline network connectivity and treatment response by comparing it to other baseline measures in univariate regressions, with response status as the IV, and the severity and variance of PHQ-9 sum and item score as well as strength centrality of individual symptoms as DVs. We repeated this procedure to test for differences in network connectivity, prior to and after treatment.
Generalisation test
To test if our main results generalised, we applied the same analytical procedures to two other samples from our dataset. This included (1) a smaller group of patients (N = 22 952) who underwent a longer course of iCBT (8–12 weeks) for depression to examine treatment duration effect (online Supplementary eAppendix 1) and (2) a larger group of patients (N = 70 620) who received iCBT for anxiety to probe whether observed findings were disorder-specific, where response status and networks were based on the Generalised Anxiety Disorder-7 (GAD-7) (Spitzer, Kroenke, Williams, & Löwe, Reference Spitzer, Kroenke, Williams and Löwe2006) (online Supplementary eAppendix 2). The main dataset partially overlapped with both of these datasets (33% for the 8–12 weeks iCBT sample, 49% for the GAD sample).
All data processing and analyses were conducted using R (version 4.1.1). We used specific packages such as MatchIt for group matching (Ho, Imai, King, & Stuart, Reference Ho, Imai, King and Stuart2011), qgraph for network visualisation (Epskamp, Cramer, Waldorp, Schmittmann, & Borsboom, Reference Epskamp, Cramer, Waldorp, Schmittmann and Borsboom2012), bootnet for network estimation (Epskamp & Fried, Reference Epskamp and Fried2018), and NetworkComparisonTest for network comparisons (van Borkulo et al., Reference van Borkulo, van Bork, Boschloo, Kossakowski, Tio, Schoevers and Waldorp2022).
Results
Sample characteristics
Non-Responders had significantly higher baseline PHQ-9 sum score mean and PHQ-9 sum score variance (M = 16.26, s.d. = 4.03) compared to Responders (M = 15.33, s.d. = 3.56) [mean difference: t(40 516) = 24.64, p < 0.001; variance difference: F = 1.28, p < 0.001]. Non-Responders also scored higher on all PHQ-9 items and had greater variance in ‘loss of interest/pleasure’, ‘depressed mood’, ‘psychomotor problems’, and ‘suicidality’ (Table 1, online Supplementary eFig. 1). By definition, Responders exhibited a larger reduction post-treatment (M = 10.06, s.d. = 3.47) than Non-Responders (M = 0.13, s.d. = 3.36) in PHQ-9 sum score, t(40 516) = 292.83, p < 0.001, even after controlling for imbalance in baseline PHQ-9 sum score mean, F (1, 40515) = 121 473.12, p < 0.001 (Fig. 2a, online Supplementary eTable 1, eFig. 2). On average, Responders were in treatment one day longer (M = 44.17, s.d. = 7.93) than Non-Responders (M = 43.07, s.d. = 8.22), t(51 883) = −15.36, p < 0.001. There were more Non-Responders (68%) receiving depression-only iCBT v. comorbid depression-anxiety iCBT than Responders (65%), χ2 = 64.09 (2), p < 0.001.
Note: All p values indicated above for PHQ-9 item comparisons have been adjusted for multiple significance testing using the Hochberg method.
Full sample network differences at baseline
The Non-Responder network had greater connectivity than the Responder network at baseline (3.15 v. 2.70, S = 0.44, p < 0.001) (Fig. 2b). This effect was small; a power analysis revealed that n = 750 per group was required to achieve 85% power to detect this (Fig. 2c). When we further matched both groups on baseline PHQ-9 sum scores, thereby matching on both PHQ-9 sum score mean and PHQ-9 sum score variance [n = 18 281 per group; mean difference: t(36 560) = 0, p = 1; variance difference: F = 1.00, p = 1], connectivity differences disappeared between Responders and Non-Responders (2.73 v. 2.72, S = 0.008, p = 0.80), suggesting that sum score mean and/or variance drive the effect. We found 10/36 edges were significantly different between-groups (all p < 0.05) (online Supplementary eTable 3, eFig. 4a). The Non-Responder network had two more edges present, while the Responder network had five weaker positive edges and two stronger negative edges. With regards to strength centrality (Fig. 2d), ‘depressed mood’ was the most central symptom in both networks (1.18 v. 1.22, p = 0.17). Responders exhibited greater strength in ‘worthlessness’ (0.93 v. 0.83, p = 0.004) and ‘loss of interest/pleasure’ (0.89 v. 0.84, p = 0.047), while ‘sleep’ (0.66 v. 0.74, p = 0.02) and ‘psychomotor problems’ (0.61 v. 0.71, p = 0.002) were significantly more central in the Non-Responder network (online Supplementary eTable 2).
Parametric analysis of PHQ-9 sum score mean, variance and network connectivity
Responders and Non-Responders differed in baseline PHQ-9 sum score and symptom mean, PHQ-9 sum score and symptom variance, and network connectivity. To disentangle these features, we drew 160 independent samples of n = 250 Responders and n = 250 Non-Responders (i.e. 80 subsets per group). We found that baseline PHQ-9 sum score mean and PHQ-9 sum score variance were positively correlated in the networks of both Responders (r = 0.47, p < 0.001) and Non-Responders (r = 0.35, p = .002), where the greater the PHQ-9 sum score means within each subsample, the higher the PHQ-9 sum score variances (Fig. 3a). We estimated networks for each subsample and found that networks were more connected in Non-Responders (β = −1.35, s.e. = 0.11, p < 0.001) (Fig. 3d). However, network connectivity across these subsamples was positively associated with baseline PHQ-9 sum score mean (Non-Responders, r = 0.23, p = 0.04; Responders, r = 0.20, p = 0.08; Fig. 3b) and PHQ-9 sum score variance (Responders r = 0.41; p < 0.001; Non-Responders r = 0.58, p < 0.001; Fig. 3c). Taking these network characteristics forward to a multiple linear regression analysis, group differences in network connectivity survived after controlling for baseline PHQ-9 sum score mean (β = −0.71, s.e. = 0.26, p = 0.007, Fig. 3e), but not PHQ-9 sum score variance (β = −0.28, s.e. = 0.19, p = 0.14, Fig. 3f).
Parametric analysis of symptom-level data
The subsampling analysis further revealed between-group differences in symptom strength, where the centrality of all symptoms were higher in the Non-Responder subsets (all p < 0.001, Fig. 3g). To contextualise these differences, we compared their effect sizes relative to the mean and variance of individual symptoms, and the aggregate measures from the prior section. We found that baseline PHQ-9 sum score mean was the most strongly associated with response status (β = −1.79, s.e. = 0.07, p < 0.001), with Non-Responders having greater baseline severity. This was followed by ‘suicidality’ mean (β = −1.74, s.e. = 0.08, p < 0.001), and baseline PHQ-9 sum score variance (β = −1.67, s.e. = 0.09, p < 0.001) (Fig. 3g). Notably, the mean score of every symptom (except ‘depressed mood’) was more associated with treatment response than its centrality. The strength of ‘depressed mood’, the most central symptom at baseline for both groups, had the highest signal for treatment response of all other symptom strengths, but was still weaker than 7/9 measures of item means.
Network connectivity changes following treatment
Examining changes following treatment, the overall network connectivity of the full sample increased from baseline to follow-up (2.97 v. 4.08, S = 1.10, p < 0.001). These effects were evident separately in both the Responder networks (2.70 v. 3.25, S = 0.55, p < 0.001) (online Supplementary eFigs 3a, 3c, 4c; eTable 4), and Non-Responder networks (3.15 v. 3.52, S = 0.38, p < 0.001) (online Supplementary eFigs 3b, 3d, 4d, eTable 5). At follow-up, Non-Responders continued to have a more connected network (3.52 v. 3.25 S = 0.27, p < 0.001) compared to Responders (online Supplementary eFigs 3c, 3d, 4b, eTable 6). In the subsampling analysisFootnote †Footnote 1, we examined network connectivity in both groups, pre- and post-treatment. A repeated measures ANOVA revealed a significant main effect of Group, where Non-Responders had overall more connected networks, F (1, 156) = 197.23, p < 0.001. There was also an effect of Time, where networks increased in connectivity from baseline to follow-up, F (1, 156) = 545.45, p < 0.001. Finally, there was a Group by Time interaction, F (1, 156) = 37.44, p < 0.001, driven by the fact that Responders had greater increases in connectivity (M = 1.05, s.d. = 0.52), t(78) = −18.14, p.adj < 0.001, compared to Non-Responders (M = 0.62, s.d. = 0.37), t(78) = −14.79, p.adj < 0.001. PHQ-9 sum score variance decreased over time for Responders but increased for Non-Responders, both likely a function of the small range of values required to qualify for ‘response’ (online Supplementary eFig. 5). Correlational analyses revealed that changes in network connectivity were not associated with changes in PHQ-9 sum score mean for Responders (r = 0.09, p = 0.44) nor for Non-Responders (r = 0.06, p = 0.60). For both cohorts, changes in network connectivity were positively associated with changes in PHQ-9 sum score variance (Responder r = 0.42, p < 0.001; Non-Responder r = 0.49, p < 0.001).
Replication and generalisation
To test the robustness of our main findings, we repeated the core analyses for two partially overlapping datasets including (1) patients receiving iCBT for 8–12 weeks (N = 22 952) and (2) where networks were based on anxiety symptoms (N = 70 620). We replicated our results in both sensitivity analyses: at baseline, the full sample Non-Responder network was more connected than the full sample Responder network in both patients undergoing longer treatment (3.08 v. 2.74, S = 0.34, p < 0.001) and in those receiving iCBT for anxiety (2.68 v. 2.42, S = 0.26, p < 0.001). Parametric analyses revealed that, in both cases, connectivity differences between Responders and Non-Responders were no longer significant when sum score variance was accounted for in the model (patients undergoing longer treatment: β = 0.23, s.e. = 0.20, p = 0.25; patients undergoing treatment for anxiety: β = −0.19, s.e. = 0.12, p = 0.13). Baseline sum score mean and sum score variance were also once again more predictive of treatment response than baseline network connectivity in both patients undergoing longer iCBT (mean: β = −1.53, s.e. = 0.14, p < 0.001; variance: β = −1.50, s.e. = 0.14, p < 0.001, connectivity: β = −1.07, s.e. = 0.18, p < 0.001) and those receiving iCBT for anxiety (mean: β = −1.86, s.e. = 0.05, p < 0.001; variance: β = −1.78, s.e. = 0.06, p < 0.001; connectivity: β = −1.67, s.e. = 0.08, p < 0.001). Lastly, correlational analyses examining network changes following treatment confirmed an association between sum score variance changes and network connectivity changes in both patients undergoing longer iCBT (Non-Responders: r = 0.34, p = 0.02; Responders: r = 0.23, p = 0.127) and patients receiving iCBT for anxiety (Non-Responder: r = 0.59, p < 0.001; Responder: r = 0.49, p < 0.001) (see online Supplementary eAppendix 1 and 2 for a detailed report of sensitivity analyses findings).
Discussion
Prior work has suggested that patients with more tightly connected symptom networks are more treatment resistant (Cramer et al., Reference Cramer, van Borkulo, Giltay, van der Maas, Kendler, Scheffer and Borsboom2016; Pe et al., Reference Pe, Kircanski, Thompson, Bringmann, Tuerlinckx, Mestdagh and Gotlib2015; van Borkulo et al., Reference van Borkulo, Boschloo, Borsboom, Penninx, Lourens and Schoevers2015). However, existing studies are based on comparisons of single responder v. non-responder cross-sectional networks (Fisher et al., Reference Fisher, Medaglia and Jeronimus2018), with relatively small samples (Forbes, Wright, Markon, & Krueger, Reference Forbes, Wright, Markon and Krueger2017), that do not account for symptom variance (Terluin et al., Reference Terluin, de Boer and de Vet2016). We addressed these gaps in a sample of N = 40 518 that was analysed as a whole, and also divided into subsamples, thereby permitting parametric analyses of the role of variance in connectivity estimates, separate to that of severity. In the single network comparison, we found that connectivity was greater for Non-Responders than Responders at baseline. This effect was small, however, requiring n = 750 per group to reliably detect it, and we identified two potential confounds: Non-Responders had greater depression severity and variance at baseline. To disentangle these effects, we created 160 independent networks of Responders and Non-Responders (n = 250 each), and tested across networks if severity and/or variance explained connectivity differences. While the Non-Responder networks were on average more connected than the Responder networks at baseline, after controlling for sum score variance, the association between connectivity and treatment response was no longer significant. We replicated this result in two partially overlapping generalisation samples, one with patients undergoing a longer duration of iCBT (8–12 weeks) and another based on anxiety, not depression (4–8 weeks).
This paper highlights an important confound that is under-studied in the network literature (Bos & De Jonge, Reference Bos and De Jonge2014; Terluin et al., Reference Terluin, de Boer and de Vet2016); network estimation is based on (partial) symptom correlations which depend on the variance of these symptoms, not just their covariance. Imbalances in variance may be an inherent clinical characteristic of treatment-resistant groups (Friedman et al., Reference Friedman, Davis, Zisook, Wisniewski, Trivedi, Fava and Team2012), but can also be easily introduced when subgrouping patients based on treatment response (leading to range restriction of items) (Linn, Reference Linn1968). That said, it is important to recognise that variance contributes to, but is not the same as, network connectivity. For example, correlations between network connectivity and variance were moderate (r = 0.41–0.58), and recent work examining temporal, intraindividual networks found associations with symptom change over time that survived controlling for variance (Kelley et al., Reference Kelley, Fisher, Lee, Gallagher, Hanlon, Robertson and Gillan2022). Moreover, networks actually became more connected following treatment despite reductions in both symptom severity and variance, a counter-intuitive finding most consistent in the network literature (Beard et al., Reference Beard, Millner, Forgeard, Fried, Hsu, Treadway and Björgvinsson2016; Berlim et al., Reference Berlim, Richard-Devantoy, Dos Santos and Turecki2021; Blanco et al., Reference Blanco, Contreras, Chaves, Lopez-Gomez, Hervas and Vazquez2020; Bos et al., Reference Bos, Fried, Hollon, Bringmann, Dimidjian, DeRubeis and Bockting2018; Curtiss et al., Reference Curtiss, Wallace, Fisher, Nyer, Jain, Cusin and Pedrelli2021). One explanation is that increased symptom connectivity is not necessarily bad; a highly connected network should theoretically lead to a more malleable system, but not necessarily worsening mental health (Fried et al., Reference Fried, van Borkulo, Epskamp, Schoevers, Tuerlinckx and Borsboom2016; McElroy et al., Reference McElroy, Napoleone, Wolpert and Patalay2019), as therapeutic gains may be due to systems becoming less ‘stuck’ and more open to change. Recent work examining personalised network dynamics in healthy individuals supports this; those with more connected depression networks tended to have greater fluctuations in depression over 6 weeks, but these went in both positive and negative directions of change (Kelley et al., Reference Kelley, Fisher, Lee, Gallagher, Hanlon, Robertson and Gillan2022).
Consistent with prior work (Hagan et al., Reference Hagan, Matheson, Datta, L'Insalata, Onipede, Gorrell and Lock2021; O'Driscoll et al., Reference O'Driscoll, Buckman, Fried, Saunders, Cohen, Ambler and Pilling2021; Robinaugh et al., Reference Robinaugh, Millner and McNally2016), ‘depressed mood’ had the highest strength centrality for both groups at baseline. However, we found that the severity of all symptoms (except ‘worthlessness’) was more strongly linked to treatment response than the strength of ‘depressed mood’, and that the severity of each symptom was more predictive than its corresponding strength centrality. Both baseline severity and variance were also more predictive of treatment success than network connectivity. The lack of added prognostic value of network metrics was previously challenged by Spiller et al. (Reference Spiller, Levi, Neria, Suarez-Jimenez, Bar-Haim and Lazarov2020), who found that both mean symptom severity and count were more predictive of symptom changes than centrality indices. Together these findings question the real-world prognostic utility of cross-sectional network metrics, over and beyond basic self-report symptom data readily available at baseline.
There are several limitations of this study. Firstly, this was a retrospective, observational study with no control group. Information on patient demographics and concurrent treatment such as medication status were also not available. It therefore remains unclear whether the observed results can be generalised to networks estimated with patients undergoing alternative treatment (e.g. antidepressant medication). Our main study sample was also limited to patients who, on average, scored on the cusp of the cut-off for determining less/more severe depression at baseline (i.e. 16 on the PHQ-9), and therefore may not be representative of all patients with depression enrolled in primary care (NICE, 2022). In addition, while the PHQ-9 is widely used for detecting and monitoring depression symptoms within routine care settings, the instrument is primarily purposed for screening depression symptoms against the Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV) by combining related symptomology into single items (Harrison et al., Reference Harrison, Walton, Fennema, Duan, Jaeckle, Goldsmith and Zahn2021). Future research should consider gold-standard symptom assessments such as semi-structured clinical interviews specifically designed and validated for in-depth assessment of individual symptoms of depression (e.g. Wing et al., Reference Wing, Babor, Brugha, Burke, Cooper, Giel and Sartorius1990). As previously noted, networks estimated from cross-sectional data do not always generalise onto an individual-level (Hamaker, Reference Hamaker, Mehl and Conner2012; von Klipstein, Borsboom, & Arntz, Reference von Klipstein, Borsboom and Arntz2021), and indeed, differences in baseline sum score mean and sum score variance can be introduced systematically by the binary definition of ‘response’ that is required for cross-sectional network analysis. The crucial next step for network theory is to move towards a dynamical account of psychopathology afforded by personalised, within-subject networks for each patient undergoing treatment overtime.
Conclusion
In a large sample of > 40 000 patients, we determined that network connectivity differences between iCBT responders and non-responders are small, requiring hundreds of patients to be appropriately powered. We highlighted that symptom variance is an important confound to interpreting cross-sectional network effects and may drive prior findings of increased baseline connectivity in treatment non-responders. The baseline mean and variance of depression sum and symptom scores fared better at predicting response than both overall network connectivity and individual symptom strength centrality.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291723001368
Acknowledgements
The authors wish to thank the employees of SilverCloud Health, Amwell for providing administrative and data processing support.
Authors’ contributions
CTL and CMG conceptualised the research; CTL performed data processing and both CTL and CMG performed data analyses. CTL, SWK, and CMG interpreted the results while CTL and CMG wrote the paper. CTL, SWK, JP, DR, and CMG advised on, edited, and produced the final manuscript.
Financial support
This work was supported by the PhD studentship of CTL funded by the Irish Research Council's Enterprise Partnership Scheme, in conjunction with SilverCloud Health, Amwell (Grant number: EPSPG/2020/8).
Competing interests
This study forms part of the PhD studentship of CTL, which is co-funded by SilverCloud Health, Amwell (the provider of the iCBT intervention in this study) and the Irish Research Council. DR is a current employee of and hold shares in SilverCloud Health, Amwell. JP was a former employee of SilverCloud Health, Amwell. CMG has no direct competing interests but is the primary supervisor of CTL and SWK. SWK has no direct competing interests.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.