Introduction
Depression is a leading cause of disability worldwide (GBD, 2019 Diseases & Injuries Collaborators, 2020) and is associated with great psychological, physical, and economic burden (Herrman et al., Reference Herrman, Patel, Kieling, Berk, Buchweitz, Cuijpers and Wolpertin press). The primary first-line treatments for depression are psychotherapy, anti-depressant medication, and their combination (Qaseem, Barry, & Kansagara, Reference Qaseem, Barry and Kansagara2016). Most patients initiating depression treatment prefer psychotherapy (Gelhorn, Sexton, & Classi, Reference Gelhorn, Sexton and Classi2011; Leung et al., Reference Leung, Ziobrowski, Puac-Polanco, Bossarte, Bryant, Keusch and Kessler2021; van Schaik et al., Reference van Schaik, Klijn, van Hout, van Marwijk, Beekman, de Haan and van Dyck2004), but psychotherapy is substantially more expensive and time-consuming than ADM (Koeser, Donisi, Goldberg, & McCrone, Reference Koeser, Donisi, Goldberg and McCrone2015; Ross, Vijan, Miller, Valenstein, & Zivin, Reference Ross, Vijan, Miller, Valenstein and Zivin2019) and is also less accessible than ADM due to the fact that primary care physicians can prescribe ADMs but psychotherapy requires access to a mental health specialist.
Fewer than half of patients respond to psychotherapy alone (Blais et al., Reference Blais, Malone, Stein, Slavin-Mulford, O'Keefe, Renna and Sinclair2013; Cuijpers et al., Reference Cuijpers, Karyotaki, Ciharova, Miguel, Noma and Furukawa2021). As randomized trials show that psychotherapy and ADM have comparable aggregate effects when used alone and that combined treatment has better aggregate effects than either alone (Cuijpers et al., Reference Cuijpers, Berking, Andersson, Quigley, Kleiboer and Dobson2013; Kappelmann et al., Reference Kappelmann, Rein, Fietz, Mayberg, Craighead, Dunlop and Kopf-Beck2020), the typical response to treatment nonresponse with psychotherapy would be either augmentation with an ADM or switching to an ADM. However, as depression treatment selection typically works through trial and error, patients who begin with psychotherapy often spend weeks or months trying this treatment before determining that the treatment is not working (Blais et al., Reference Blais, Malone, Stein, Slavin-Mulford, O'Keefe, Renna and Sinclair2013), at which time they can either augment or switch to ADM if they have not already dropped out of treatment. A strategy to predict patients' likelihood of responding to psychotherapy before the beginning of treatment could help avoid these delays. Such a strategy, if it could be developed, might help reduce treatment dropout and facilitate more rapid receipt of helpful treatments for patients who are unlikely to respond to psychotherapy.
Previous research has documented diverse baseline risk factors that consistently predict psychotherapy response, such as depression severity and subtypes and prior history, psychiatric comorbidity, and stressful life experiences (e.g. Bone et al. Reference Bone, Simmonds-Buckley, Thwaites, Sandford, Merzhvynska, Rubel and Delgadillo2021; Coley, Boggs, Beck, & Simon, Reference Coley, Boggs, Beck and Simon2021; Serbanescu et al. Reference Serbanescu, Backenstrass, Drost, Weber, Walter, Klein and Schoepf2020). However, none of these associations is strong enough to be used as a primary basis for treatment planning. Based on this fact, researchers have examined whether multivariable models that combine information across a range of individually significant predictors can improve the prediction of psychotherapy treatment response (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014; Huibers et al., Reference Huibers, Cohen, Lemmens, Arntz, Peeters, Cuijpers and DeRubeis2015; Saunders et al., Reference Saunders, Cohen, Ambler, DeRubeis, Wiles, Kessler and Buckman2021). However, such models risk overfitting (van Klaveren, Balan, Steyerberg, & Kent, Reference van Klaveren, Balan, Steyerberg and Kent2019). Machine learning (ML) methods can help protect against overfitting (Roelofs et al., Reference Roelofs, Shankar, Recht, Fridovich-Keil, Hardt, Miller and Schmidt2019). Although some ML studies of psychotherapy treatment response have been carried out (Coley et al., Reference Coley, Boggs, Beck and Simon2021; Pearson, Pisner, Meyer, Shumake, & Beevers, Reference Pearson, Pisner, Meyer, Shumake and Beevers2019; Tymofiyeva et al., Reference Tymofiyeva, Yuan, Huang, Connolly, Henje Blom, Xu and Yang2019), most were based on secondary analyses of randomized clinical trials. The latter studies typically had limited predictor sets and reduced external validity because patients unwilling to be randomized or with psychiatric comorbidities were excluded. Observational samples resolve these problems, but the few studies that tried to predict depression psychotherapy treatment response in observational samples either had limited predictor sets (Bone et al., Reference Bone, Simmonds-Buckley, Thwaites, Sandford, Merzhvynska, Rubel and Delgadillo2021; Delgadillo & Gonzalez Salas Duhne, Reference Delgadillo and Gonzalez Salas Duhne2020; Tymofiyeva et al., Reference Tymofiyeva, Yuan, Huang, Connolly, Henje Blom, Xu and Yang2019), used predictors that would not be possible to collect in a routine clinical visit (Tymofiyeva et al., Reference Tymofiyeva, Yuan, Huang, Connolly, Henje Blom, Xu and Yang2019), or based predictions only on administrative data (Coley et al., Reference Coley, Boggs, Beck and Simon2021).
The current report presents the results of a study designed to address the above limitations by collecting information on a rich baseline set of potential predictors from a self-report assessment and administrative data in a prospective observational sample of Veterans Health Administration (VHA) patients initiating treatment for major depressive disorder (MDD). Patients were followed for 3 months to assess treatment response. The VHA provides a unique opportunity to study MDD treatment response because it is the largest national US healthcare delivery system integrating mental health services into primary care (Leung et al., Reference Leung, Rubenstein, Yoon, Post, Jaske, Wells and Trivedi2019). We focus here on baseline variables known or hypothesized to predict psychotherapy treatment response. We aimed to develop a parsimonious model with a small number of predictors that could feasibly be administered in routine clinical practice.
Methods
Sample
We recruited eligible VHA patients from weekly nationally representative probability samples between December 2018 and June 2020. As we aimed to focus on incident treatment encounters, we excluded patients who in the prior 12 months before the focal visit received any MDD treatment or attempted suicide. Outpatient settings included primary care and specialty mental health clinics. Patients either had to receive a prescription for an ADM or a referral to psychotherapy in the focal visit to be eligible. Focal visits were not counted as eligible if the record noted that the patient was depressed but that watchful waiting was being used rather than treatment. The present report considers only the subset of eligible patients who were referred to psychotherapy but did not receive an ADM prescription in the focal visit. We additionally excluded patients who had any lifetime diagnosis of bipolar disorder, nonaffective psychosis, dementia, intellectual disabilities, autism, Tourette's disorder, stereotyped movement disorders, or borderline intellectual functioning, or ever received a prescription of either antimanic or antipsychotic medication (see Online Supplementary Table S1 for ICD-9-CM and ICD-10-CM codes).
As described in more detail elsewhere (Puac-Polanco et al., Reference Puac-Polanco, Leung, Bossarte, Bryant, Keusch, Liu and Kessler2021) and shown in Online Supplementary Fig. S1, recruitment letters were mailed to 55 106 eligible patients inviting them to participate in a study of depression treatment that would require completing one self-report web- or phone-based survey at baseline (taking approximately 45 min) and another self-report survey at 3-months follow-up (taking approximately 20 min). Patients received up to three recruitment calls over the next week. A total of 17 000 patients were reached within this period, 6298 of whom agreed to participate and 4164 completed the baseline survey. Of these patients, 1554 were excluded after completing the baseline survey because they either reported being actively suicidal, did not report depression as a presenting problem, reported mania as a presenting problem, or did not report depression severity equal to at least 6 on the Quick Inventory of Depression Symptomatology Self-Report (QIDS-SR; (Rush et al., Reference Rush, Trivedi, Ibrahim, Carmody, Arnow, Klein and Keller2003)). As reported previously (Puac-Polanco et al., Reference Puac-Polanco, Leung, Bossarte, Bryant, Keusch, Liu and Kessler2021), patients who completed the baseline questionnaire were, on average, slightly older than non-respondents and somewhat more likely to be female, non-Hispanic White, and currently married (with odds-ratios ranging between 1.2 and 1.7), but multivariate associations with participation were weak [area under the receiver operating characteristic curve (AUC) = 0.59].
Among the remaining 2609 baseline respondents, 989 received psychotherapy without ADM and 807 of the latter completed the 3-month follow-up survey. These are the patients included in the present report. Patients were compensated $50 and $25 for completing the baseline and 3-month surveys, respectively. The Institutional Review Board of Syracuse VA Medical Center, Syracuse, New York, approved these procedures. The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. We followed the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) reporting guidelines (Collins, Reitsma, Altman, & Moons, Reference Collins, Reitsma, Altman and Moons2015) for reporting analyses designed to develop predictive models.
Measures
Treatment response
Self-reports of depressive symptom severity and role impairment were assessed at baseline and 3-months. Depressive symptoms were assessed with the 16-item QIDS-SR (Rush et al., Reference Rush, Trivedi, Ibrahim, Carmody, Arnow, Klein and Keller2003) (Cronbach's α = 0.675), which asks about symptom severity in the past 2 weeks using a 0–3 response scale with embedded labels for each item (e.g. between not feel sad and sad nearly all the time for depressed mood). Role impairment due to depression was assessed with a modified version of the Sheehan Disability Scale (Leon, Olfson, Portera, Farber, & Sheehan, Reference Leon, Olfson, Portera, Farber and Sheehan1997), in which patients rated how much their depression interfered with their ability to work, participate in family and home life, and participate in social activities in the past 2 weeks on a labeled 0–10 visual analog scale of not at all (0), mildly (1–3), moderately (4–6), markedly (7–9), and extremely (10) (Cronbach's α = 0.847).
Patients were classified as responding to treatment if they met any of the following criteria in their 3-month follow-up assessments: (1) had a QIDS-SR score of 0–5 (indicating ‘remission’ of depressive symptoms (Rush et al., Reference Rush, Trivedi, Wisniewski, Nierenberg, Stewart, Warden and Fava2006)), (2) their QIDS-SR score was half or less of its baseline value, or (3) they had a baseline score of 4 or more (i.e. moderate-severe) in one or more role impairment domains and a 3-month score of 0–3 (i.e. none-mild) on all role impairment domains.
Predictors
A recent review by Maj et al. (Reference Maj, Stein, Parker, Zimmerman, Fava, De Hert and Wittchen2020) recommended considering 14 risk factor domains to personalize depression treatment: symptom profile, clinical subtypes, severity, clinical staging, early environmental exposures, recent environmental stressors, family history, functioning and quality of life, physical comorbidities, personality, antecedent and concomitant psychiatric conditions, protective factors/resilience, neurocognition, and dysfunctional cognitive schemas. We included predictors from each of these 14 domains as well as from two additional domains that have been shown in previous research to predict depression treatment response: socio-demographics and treatment characteristics, the latter including information about prior treatment history as well as current expectations and preferences. In total, 2810 baseline predictors were included in our analysis, derived from the baseline survey, administrative records, and geospatial data based on patient residence (Online Supplementary Tables S2–S4). These 2810 predictors included transformations of the same variables, as described in more detail in Online Supplementary Tables S2–S4. Categorical variables were indicator-coded with dummy variables. Quantitative variables were standardized to a mean of 0 and variance of 1 for use in linear algorithms (described below) and discretized into ventiles for use in tree-based algorithms (described below) to avoid overfitting and to reduce computation time.
Analysis methods
Analysis was limited to patients who completed both the baseline and 3-month follow-up self-report surveys. As detailed in a previous report (Puac-Polanco et al., Reference Puac-Polanco, Leung, Bossarte, Bryant, Keusch, Liu and Kessler2021), we used the R program sbw (Zubizarreta, Li, Allouah, & Greifer, Reference Zubizarreta, Li, Allouah and Greifer2021) to account for potential selection bias from nonresponse by creating stable balancing weights to adjust for significant differences between baseline respondents and the full target sample on significant predictors of non-response (Zubizarreta, Reference Zubizarreta2015). This procedure was then repeated in the weighted follow-up sample, applying a second sbw to adjust for discrepancies in baseline survey predictors between respondents in the follow-up sample and those lost to follow-up. These doubly weighted data were used in the analysis.
A ML model was developed based on the doubly weighted data to predict psychotherapy response. Rather than use a single algorithm, which has been done in previous ML studies of MDD treatment response, we used the Super Learner (SL) stacked generalization method (Polley, LeDell, Kennedy, Lendle, & van der Laan, Reference Polley, LeDell, Kennedy, Lendle and van der Laan2021, May 10) to pool results across a library of multiple algorithms. This was done using a weight for each algorithm derived from a training sample (described below) via 10-fold cross-validation. The composite predicted outcome score based on this weight is guaranteed in expectation to perform at least as well as the best component algorithm in the library in terms of a prespecified criterion (Polley, Rose, & van der Laan, Reference Polley, Rose, van der Laan, van der Laan and Rose2011), which in our case was non-negative least squares. Consistent with recommendations (LeDell, van der Laan, & Petersen, Reference LeDell, van der Laan and Petersen2016; Naimi & Balzer, Reference Naimi and Balzer2018), we included a diverse set of algorithms in the library to capture nonlinearities and interactions and reduce the risk of model misspecification (Kabir & Ludwig, Reference Kabir and Ludwig2019). These included linear algorithms (logistic regression, regularized regression, spline and polynomial spline regressions, support vector machines) and tree-based algorithms (boosting and bagging ensemble trees and Bayesian additive regression trees) (Online Supplementary Table S5). Similar stacking procedures have been used in prior computational psychiatric studies (Karrer et al., Reference Karrer, Bassett, Derntl, Gruber, Aleman, Jardri and Bzdok2019; Ziobrowski et al., Reference Ziobrowski, Kennedy, Ustun, House, Beaudoin, An and van Rooij2021a).
Hyperparameters were tuned by including individual algorithms multiple times in the library with different hyperparameter values. This tuning method allowed SL to weight relative importance across this range rather than using an external grid search or random search method. Feature selection was independently conducted in each 10-fold cross-validation training sample. To increase the feasibility of implementation in clinical practice and to reduce overfitting, we explored 2 feature reduction methods: least absolute shrinkage and selection operator penalized regression (lasso; Park and Casella, Reference Park and Casella2008) and variable importance ranking from Bayesian additive regression trees (BART; Chipman, George, and McCulloch, Reference Chipman, George and McCulloch2010). We then compared the predictive accuracy of the SL with a simpler lasso penalized regression model to see how much, if at all, SL improved prediction.
Models were estimated in a training sample that included 70% of patients selected with stratification to have the same distribution on a wide range of predictors and the outcome as the total sample. The remaining 30% of patients were used to evaluate model accuracy. We used a locally estimated scatterplot smoothed calibration curve (Austin & Steyerberg, Reference Austin and Steyerberg2014) to quantify calibration of predicted outcome probabilities from our best model in the test sample using the integrated calibration index (ICI) and expected calibration error (ECE) (Austin & Steyerberg, Reference Austin and Steyerberg2019; Naeini, Cooper, & Hauskrecht, Reference Naeini, Cooper and Hauskrecht2015).
Model evaluation in the 30% test sample was carried out by examining the association between the predicted probability of treatment response and observed response across a range of cut-points derived from the training sample distribution. We evaluated model fairness, defined as whether a model performance was comparable across important segments of the population (Yuan, Kumar, Ahmad, & Teredesai, Reference Yuan, Kumar, Ahmad and Teredesai2021), by examining variation in the association of predicted probability of response with observed response across socio-demographic subgroups (age, sex, race/ethnicity, and education) using robust Poisson regression models (Zou, Reference Zou2004). Lastly, we assessed predictor importance by examining standardized model coefficients from predictors selected by the final ML model.
Data were managed and the outcome prevalence and AUC were calculated using SAS statistical software, version 9.4 (SAS Institute Inc, 2013). ML models were estimated in R, version 4.0.5 (R Core Team, 2021).
Results
Sample characteristics and treatment response
The mean QIDS-SR score of depression symptom severity at baseline was 12.9 among the total weighted sample. When we transformed QIDS-SR scores into Hamilton Depression Rating Scale criteria, 32.3% of patients met criteria for mild depression, 33.9% for moderate depression, 18.6% for severe depression, and 15.2% for very severe depression. Most patients were male, non-Hispanic White, married, and living in a major metro area (Table 1). There were no statistically significant differences in baseline socio-demographics or depression severity between patients who completed the baseline and 3-month surveys v. those who completed the baseline but not the 3-month survey. In the total weighted sample, 32.0% [standard error (s.e.) = 2.0] of patients responded to psychotherapy after 3 months of treatment. For our three criteria of responding to treatment, 7.9% of patients met the criteria for remission, 14.1% of patients had a QIDS-SR score at 3-months that was half or less of their baseline score, and 23.5% of patients showed improvement in role functioning.
DF, degrees of freedom; s.e., standard error.
a Patients who received psychotherapy and responded to the baseline survey.
b Patients who received psychotherapy and responded to both the baseline and 3-month surveys.
c Patients who received psychotherapy and responded to the baseline but not the 3-month survey.
d None of the χ2 tests is significant at the 0.05 level, two-sided test. p = 0.15–0.47.
Model performance
The AUC (s.e.) of the SL ensemble model in the test sample was 0.648 (0.039). However, the simpler lasso model had slightly better performance, with AUC (s.e.) of 0.652 (0.038). We consequently focused on the lasso model. This model had good calibration in the test sample [mean (s.e.) ICI, 0.056 (0.005); mean (s.e.) ECE, 0.054 (0.004)] (Fig. 1) as well as comparable prediction accuracy in terms of fairness across subgroups defined by age, sex, race/ethnicity, and education (Online Supplementary Table S6).
Fifty percent of patients in the top tertile of predicted probability of treatment response did, in fact, respond to treatment (Table 2). In comparison, 23.5 and 21.1%, respectively, of patients in the second and third tertiles responded to treatment.
CI, confidence interval; lasso, least absolute shrinkage and selection operator.
a Defined by tertiles of predicted probability of treatment response in the training sample of 566 patients.
Predictor importance
A total of 43 predictors were selected by the lasso model. Figure 2 displays these predictors, which are sorted from the strongest (defined by associations of predictors standardized to have a mean of 0 and variance of 1.0 with logits of the dichotomous outcome) at the top to weakest at the bottom. Positive associations indicated higher likelihoods of treatment response, whereas negative associations indicated lower likelihoods of treatment response. The great majority of predictors (n = 39) were based on patient self-reports rather than administrative or geo-spatial data. Predictors came mainly from the risk factor domains of antecedent and concomitant psychiatric conditions (n = 7), clinical staging (n = 6), treatment characteristics (n = 6), protective factors/resilience (n = 5), and socio-demographics (n = 5). The top 5 predictors were: having a greater cognitive reappraisal score, being aged 74+, having a longer drive time to VHA facility, having greater concerns about ADMs, and being in the current depressive episode for more than 3+ months before seeking treatment. The first three of these predictors were associated with increased likelihoods of treatment response, whereas the latter two were associated with decreased likelihoods of treatment response.
Discussion
Our finding that fewer than one-third of patients with MDD responded to psychotherapy after 3 months is lower than response rates observed in other observational studies (Blais et al., Reference Blais, Malone, Stein, Slavin-Mulford, O'Keefe, Renna and Sinclair2013) and randomized clinical trials (Cuijpers et al., Reference Cuijpers, Karyotaki, Ciharova, Miguel, Noma and Furukawa2021) from non-Veteran samples, where 40–50% of patients responded to psychotherapy. However, response rates for MDD treatment have been found to be similarly low for Veterans in other studies (Katz, Liebmann, Resnick, & Hoff, Reference Katz, Liebmann, Resnick and Hoff2021), possibly due to the particularly high burden of psychiatric comorbidities and impairment in this population (Ziobrowski et al., Reference Ziobrowski, Leung, Bossarte, Bryant, Keusch, Liu and Kessler2021b). This low treatment response rate highlights the need to develop clinical tools that can support patients in treatment decision-making. To this end, our model is of potential value in finding that the one-third of patients with the highest predicted probability of treatment response had an observed probability of response (50%) more than twice than those of patients in the lower tertiles (23–21%).
Although this level of discrimination is both statistically and substantively significant, it is not strong enough to be the primary arbiter of treatment selection. Nor does the model provide information on which other treatments would be optimal for a given patient (i.e. ADM-alone, combined ADM and psychotherapy, some other therapy). Our model could be useful, though, in the context of a broader shared decision-making conversation that informs patients and providers about a patient's likelihood of responding to psychotherapy. Such a tool could help guide patients with a low likelihood of response toward considering alternative treatments options, thus averting the costs and morbidity of ineffective psychotherapy monotherapy. Conversely, patients with a high likelihood of response could be reassured about deferring ADMs, thus limiting their potential for somatic side effects. This approach would be similar in concept to pharmacogenomic testing, in which patients' genetic information is used to pre-emptively identify specific ADMs that are more v. less likely to cause side effects or be effective (Greden et al., Reference Greden, Parikh, Rothschild, Thase, Dunlop, DeBattista and Dechairo2019). Notably, our model performs as well as or better than pharmacogenomic testing in terms of its predictive power, as indicated by the fact that in the largest trial to date of pharmacogenomic testing for ADM selection, patients receiving test-congruent v. test-incongruent medications had 29% v. 17% treatment response rates (Greden et al., Reference Greden, Parikh, Rothschild, Thase, Dunlop, DeBattista and Dechairo2019).
Caution is needed in interpreting the results reported above about predictor importance because these results do not reflect causal relationships and can be unstable when, as in our dataset, many of the predictors in the full set used to select the final lasso predictors are highly inter-related (Leeuwenberg et al., Reference Leeuwenberg, van Smeden, Langendijk, van der Schaaf, Mauer, Moons and Schuit2022). Nonetheless, several results are noteworthy. First, nearly all top predictors were self-report measures, suggesting that patient self-report data may be more useful for predicting psychotherapy treatment response than administrative or geo-spatial data. Moreover, these self-report variables would be feasible to collect in a primary care visit. Second, some of the top predictors (e.g. recency of TBI and age at baseline) may be specific to Veterans who served in Iraq and Afghanistan. Third, several variables about socio-demographics and treatment characteristics were among the most important predictors. This is noteworthy because socio-demographics and treatment characteristics were not among the categories included by Maj et al. (Reference Maj, Stein, Parker, Zimmerman, Fava, De Hert and Wittchen2020) as salient risk factor domains that should be considered in efforts to personalize depression treatment. Fourth, while most of the selected variables are not modifiable (e.g. age, lifetime histories of mental disorders, personality characteristics), several are potentially modifiable in psychotherapy treatment, such as variables related to emotion regulation. Fifth, the model did not select any variables related to depression severity, clinical subtypes, or family history of depression, and selected only 1 variable related to depression symptoms (late insomnia). Although treatment decisions may be based on these factors in practice, these results show that these variables are not the most predictive of psychotherapy response among depressed patients in the VHA system.
The study has several strengths, including the large sample size, the rich and diverse set of predictors from self-reports, administrative records, and geo-spatial data, and rigorous ML methods used to develop the model and help reduce potential overfitting. However, there are also several limitations to note. First, the baseline survey response rate was low, although similar to rates reported in other studies examining mental health outcomes among VHA patients (King, Beehler, Buchholz, Johnson, & Wray, Reference King, Beehler, Buchholz, Johnson and Wray2019; Stolzmann et al., Reference Stolzmann, Meterko, Miller, Belanger, Seibert and Bauer2019). We previously reported (Puac-Polanco et al., Reference Puac-Polanco, Leung, Bossarte, Bryant, Keusch, Liu and Kessler2021) that there were minimal differences between responders and non-responders with regard to baseline administrative variables and we found here equally modest baseline self-report differences between baseline respondents who were followed and those lost to follow-up, both of which were adjusted for in the weighted analyses. However, we had no way to determine if response bias exists with respect to unmeasured variables. Second, our outcome measures were based on brief validated self-report scales rather than clinical interviews. Third, patients included those who had mild baseline QIDS-SR scores, whereas many other studies require baseline sores of at least moderate severity. It is noteworthy, though, that baseline symptom severity was not among the important predictors, which means that this broad definition of sample eligibility might not have influenced results. Fourth, psychotherapy response was assessed only up through 3 months of treatment. It is possible that some patients improved later and that some defined as responding at 3 months had recurrences of more severe baseline symptoms shortly thereafter. Fifth, it is unclear whether our findings are generalizable to non-VHA patients. Sixth, we did not account for possible disruptions in care due to the COVID-19 pandemic, but 92.4% of study patients completed assessments before March 2020. Seventh, with more patients receiving telehealth care since the start of the COVID-19 pandemic, the important predictors observed in this analysis may have since changed. Lastly, our predictive model only provides information on patients' likelihood of responding to psychotherapy in the absence of ADMs. The model cannot tell us which alternative treatments would be optimal for a given patient nor the magnitude of benefit a patient would be expected to attain by receiving an alternative treatment.
Conclusions
We found that a parsimonious model to predict psychotherapy treatment response for depression can be developed using a battery of self-report questions along with some administrative variables in electronic health records and geospatial variables. This model could be used to inform depressed patients pre-emptively about their likelihood of responding to psychotherapy as part of a patient-centered treatment decision-making process. Our findings should be replicated before such a model is implemented in practice. More elaborate models are also needed to compare predicted probabilities of treatment response at the patient level across different types of treatment to determine the best treatment option for particular patients (Kessler & Luedtke, Reference Kessler and Luedtke2021).
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291722000228.
Financial support
This research was supported by the Office of Mental Health Services and Suicide Prevention and Center of Excellence for Suicide Prevention (Bossarte), the National Institute of Mental Health of the National Institutes of Health (R01MH121478, Kessler), the United States Department of Veterans Affairs Health Services Research & Development Service Career Development Award (IK2 HX002867, Leung), the PCORI Project Program Award (ME-2019C1–16172, Zubizarreta), and the Advanced Fellowship from the VISN 4 Mental Illness Research, Education, & Clinical Center (MIRECC, Cui). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health, the U.S. Department of Veterans Affairs, or the United States Government.
Conflicts of interest
In the past 3 years, Dr Kessler was a consultant for Datastat, Inc., Holmusk, RallyPoint Networks, Inc., and Sage Therapeutics. He has stock options in Mirah, PYM, and Roga Sciences. Dr Pigeon consulted for CurAegis Technologies and received clinical trial support from Pfizer, Inc. and Abbvie, Inc. Dr Zubizarreta consulted for Johnson & Johnson Real-World Data Analytics. The remaining authors report no conflict of interest.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.