Introduction
Antidepressant medications (ADMs) are widely prescribed and maintenance of ADM treatment is increasing. In the USA, 12% of the population 12 years and older utilize ADMs, and the percentage using ADMs ⩾24 months more than doubled from 3% to 7% between 1999 and 2010 (Mojtabai & Olfson, Reference Mojtabai and Olfson2014; Pratt, Brody, & Gu, Reference Pratt, Brody and Gu2017). Similar trends have been observed in the UK (Lockhart & Guthrie, Reference Lockhart and Guthrie2011) and the Netherlands (Wildeboer, van der Hoek, & Verhaak, Reference Wildeboer, van der Hoek and Verhaak2016). Given this utilization, the hundreds of double-blind placebo-controlled AD trials (Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa and Geddes2018), and the prominent recommendation of ADMs in clinical guidelines, one would expect that ADMs are universally considered effective and worth their side effects. Surprisingly, that is not the case: fierce debate on their effectiveness continues in the scientific literature, media, and Internet (Moncrieff, Reference Moncrieff2018a). Here we explore the positions of the advocates and critics and examine why the debate continues and how it is best resolved. In addition, we examine the efficacy of evidence-based psychological treatments and their vulnerability to criticism.
Advocates point at the significant effect size of acute phase ADM treatment (aADM) of depression [number needed to treat (NNT) = 6–8], the ability of continuation and maintenance antidepressants (cmADs) to prevent relapse/recurrence in those who remit on aADM (NNT = 3–4), and the reduced risk of suicide (attempts) associated with ADM treatment (Bridge, Barbe, Birmaher, Kolko, & Brent, Reference Bridge, Barbe, Birmaher, Kolko and Brent2005; Cooney et al., Reference Cooney, Dwan, Greig, Lawlor, Rimer and Waugh2013; Geddes & Cipriani, Reference Geddes and Cipriani2015; Leucht, Hierl, Kissling, Dold, & Davis, Reference Leucht, Hierl, Kissling, Dold and Davis2012; Nutt, Goodwin, Bhugra, Fazel, & Lawrie, Reference Nutt, Goodwin, Bhugra, Fazel and Lawrie2014; Quitkin, Rabkin, Gerald, Davis, & Klein, Reference Quitkin, Rabkin, Gerald, Davis and Klein2000). Advocates acknowledge individual differences in response and that many do not respond to ADMs as well as significant side effects and dropouts. However, they argue that most patients on cmADM choose to stay on ADMs, suggesting that the benefits prevail, and that good clinical management can keep dropouts and adverse side effects low.
Critics doubt whether antidepressants have clinically significant surplus value over and above placebo (PLA) (Antonuccio, Danton, DeNelsky, Greenberg, & Gordon, Reference Antonuccio, Danton, DeNelsky, Greenberg and Gordon1999; Fava, Reference Fava2003; Gotschke, Reference Gotschke2013; Kirsch, Reference Kirsch2014; Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Moncrieff, Reference Moncrieff2002, Reference Moncrieff2018b; Moncrieff & Kirsch, Reference Moncrieff and Kirsch2015; Turner, Matthews, Linardatos, Tell, & Rosenthal, Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008; Whitaker, Reference Whitaker2015). The standardized mean difference (SMD) observed in recent meta-analyses is ~0.3 (Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa and Geddes2018). On average, this translates to 2–3 points more on the Hamilton Depression Rating Scale (HAMD) than the reduction seen with placebo, a group difference that critics have argued is clinically insignificant (Moncrieff & Kirsch, Reference Moncrieff and Kirsch2015). Critics argue that trials (and thus meta-analyses) overestimate effectiveness due to a variety of biases, owing to unblinding, high dropout rates, selective patient recruitment, publication bias, etc. (Turner et al., Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008). They point at the heavy marketing and manipulation of research data by the pharmaceutical industry and the many conflicts of interest of ADM advocates (Gotschke, Reference Gotschke2013; Whitaker, Reference Whitaker2015).
Unfortunately, both advocates and critics selectively cite studies in support of their positions but hardly address and analyze the evidence cited by the other side, with some exceptions (Quitkin et al., Reference Quitkin, Rabkin, Gerald, Davis and Klein2000). A nice illustration is provided by the citations on efficacy and suicide risk provided by Nutt et al. (Reference Nutt, Goodwin, Bhugra, Fazel and Lawrie2014) and Gotzsche (Reference Gotzsche2014) in their 2014 letters in the Lancet on benefits and harms of ADMs. Such discussions remain sterile and do not advance knowledge. They benefit neither patients nor providers.
Methods
Our editorial is not a systematic review or meta-analysis but rather a logical conceptual and methodological analysis. We address two important issues: why have antidepressants become so widely prescribed and how valuable are antidepressants in the treatment of depression. Both have multiple facets. It is virtually impossible (and largely irrelevant to what we set out to do) to address each facet in a full systematic way or by meta-analysis. A fully systematic approach would have required many literature searches, and would probably result in multiple papers. What we chose to do was to think rather than to count. We think that there is value in a logical analysis. While we have done our best to use systematic reviews and meta-analyses to inform our thinking as much as possible, what we set out to do was to describe where we thought the field currently is and where it ought to go. We have tried to be balanced in our selection of literature and to describe studies systematically.
Factors that keep fueling the dispute
Acute phase ADM trials: efficacy and limitations
Response rates (symptom reduction of 50% or more from baseline) in aADM arms exceed those in PLA arms, typically by 10–15%. The largest recent meta-analysis of aAD efficacy by Cipriani et al. included 522 trials, published and unpublished, comprising 1 16 477 participants and found that ADs had better response rates than placebo, with ORs ranging between 1.37 and 2.13 and an averaged SMD = 0.30, similar to NNT = 8 (Meadows et al., Reference Meadows, Prodan, Patten, Shawyer, Francis and Enticott2019), comparable with other recent meta-analyses (Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa and Geddes2018). Although treatment settings were often unclear, most trials did not take place in primary care settings.
Cipriani et al.'s meta-analysis may even underestimate antidepressants efficacy as post-hoc analyses have found consistent superiority and dose-dependency for selective serotonin reuptake inhibitors (SSRIs) relative to PLA when using depressed mood as the major outcome (Hieronymus, Emilsson, Nilsson, & Eriksson, Reference Hieronymus, Emilsson, Nilsson and Eriksson2016a; Hieronymus, Nilsson, & Eriksson, Reference Hieronymus, Nilsson and Eriksson2016b) or the six prototypical depression items from the HAMD [depressed mood, guilt feeling, loss of interests, psychomotor retardation, psychic anxiety, and tiredness (Bech, Reference Bech2010)]. The dose-dependency figures suggest that inclusion of trials using suboptimal doses in Cipriani et al.'s meta-analyses may have led to an underestimation of aADM efficacy. Likewise, the use of more narrow depression symptoms might have yielded higher efficacies, although caution is needed given the post-hoc nature of these findings. Because it is not fully clear whether findings from specialty settings can be generalized to primary care as only a minority of cases are referred to specialty settings, Linde et al. meta-analyzed 66 randomized controlled trials (RCTs) with 15.161 patients treated in primary care (Linde et al., Reference Linde, Kriston and Ruecker2015). ADs were found to be significantly superior to PLA, with estimated ORs between 1.69 and 2.03, rather similar to the Cipriano results. There were no significant differences between drug classes. Compared to general medicine drugs (e.g. ACE inhibitors and statins for prevention of cardiovascular events and stroke) ADM are not in general less efficacious (Leucht et al., Reference Leucht, Hierl, Kissling, Dold and Davis2012), although the clinical significance of a given effect size is context-dependent and may differ between psychiatry and general medicine.
Such data should settle the debate, at least for the acute phase treatment. However, a few issues still produce uncertainties. First, the authors caution that the ‘certainty of evidence was moderate to very low’ with ‘46 (9%) of 522 trials rated as high risk of bias, 380 (73%) as moderate, and 96 (18%) as low’ (p. 1), leaving space for speculation about bias. Second, the analysis did not address full remission efficacy, only response, i.e. 50% or more improvement relative to baseline. That is, patients were better but not necessarily well. Third, the modest SMD, corresponding to a few HAMD points, has been argued to be of limited clinical significance (Jakobsen et al., Reference Jakobsen, Katakam, Schou, Hellmuth, Stallknecht and Leth-Moller2017; Kirsch, Reference Kirsch2014; Moncrieff & Kirsch, Reference Moncrieff and Kirsch2015; Turner et al., Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008). Fourth, it remains unclear how large the risk of breaking the blind is in SSRI trials because they have no ‘active’ placebo. Side effects of TCAs can be mimicked and a meta-analysis comparing TCA and active placebos, though limited by methodological problems found no significant difference (Moncrieff, Reference Moncrieff2003). These limitations keep providing fuel for critics.
Prolonged acute phase trials: 6–8 months outcomes
Some acute phase placebo-controlled trials have prolonged follow-up assessments of response and remission up to 8 months post-baseline randomization. Deshauer et al.'s meta-analysis included six studies that randomized a total of 1299 patients to either ADM or PLA and followed both groups for 6–8 months (Deshauer et al., Reference Deshauer, Moher, Fergusson, Moher, Sampson and Grimshaw2008). Response rates averaged 61% in the ADM and 48% in the PLA arm, a difference of 13%; rates for remission, assessed in only four trials, were 45% and 38%, a difference of 7%, yielding a non-significant trend (p < 0.1). It is important to note that at least half of the studies (Davidson et al., Reference Davidson, Gadde, Fairbank, Krishnan, Califf and Binanay2002; Detke et al., Reference Detke, Wiltse, Mallinckrodt, McNamara, Demitrack and Bitter2004; Murray et al., Reference Murray, von Arbin, Bartfai, Berggren, Landtblom and Lundmark2005) included in Deshauer et al.'s meta-analysis excluded patients from follow-up if they did not respond to the acute phase blinded treatment, even if their short-term data were used in the intention-to-treat analysis. It is unclear whether similar exclusions occurred in the other three studies. Differences in treatment dropouts and study completers were not significant between the arms but many participants did not complete the final outcome assessment, ranging from 23% to 73% with mean 52% (Deshauer et al., Reference Deshauer, Moher, Fergusson, Moher, Sampson and Grimshaw2008). The studies evidenced moderate risk of bias and most failed to report key methodologic issues, for instance on what happened to those who did not improve during the acute phase. Hence it is unclear how valid and generalizable the results of these prolonged aADM trials are. Compared to remission rates in the 8–12 week trials, Deshauer et al.'s findings suggest that the placebo group tends to ‘catch up’, possibly due to the natural history of depression.
Placebo effects and spontaneous remission; net and gross efficacy
Response rates (⩾50% improvement) in PLA arms of RCTs of depression are large, often between 35% and 40% in the PLA arm (Furukawa et al., Reference Furukawa, Cipriani, Atkinson, Leucht, Ogawa, Takeshima and Salanti2016; Levkovitz, Tedeschini, & Papakostas, Reference Levkovitz, Tedeschini and Papakostas2011; Rutherford & Roose, Reference Rutherford and Roose2013). The response in PLA arms is due to multiple factors: (i) spontaneous remission referring to the phenomenon that some people get better in the absence of any treatment as part of the natural course of their depression; (ii) regression to the mean, in which individuals who are far from the mean at the baseline measurement tend to be closer to the mean at the next measurement, in as far as their extreme baseline scores were inflated by measurement error; and (iii) non-specific treatment effects, which consist of improvement due to being treated by a professional including explanation, attention, support, positive expectations, amongst others. It is possible that the placebo response in ADM trials is larger than with real-world patients because patients in regulatory trials are not fully representative of real-world patients (Rush et al., Reference Rush, Trivedi, Wisniewski, Nierenberg, Stewart and Warden2006; Van der Lem, van der Wee, Van Veen, & Zitman, Reference Van der Lem, van der Wee, Van Veen and Zitman2012). The large placebo effect indicates that many respond not because of the specific ADM treatment, but because of other effects. This makes it difficult for trials to establish the excess value of specific treatments and provides critics with ammunition to minimize the benefits of ADMs by focusing on net efficacy and advocates to maximize benefits by focusing on gross efficacy.
Continuation and maintenance ADM trials: efficacy and limitations
The most recent meta-analysis evaluated 72 cADM and 37 mADM trials with 14.450 and 7253 participants, respectively (Sim, Lau, Sim, Sum, & Baldessarini, Reference Sim, Lau, Sim, Sum and Baldessarini2016). In the 72 cADM trials, lasting 33 weeks on average, the pooled advantage of cADM over discontinuation to placebo in terms of the relative response rate (RR) was 1.90, CI 1.73–2.08, NNT 4.4; in the 37 mADM trials lasting 27 months on average, it was 2.03, CI 1.80–2.28; NNT = 3.8; with minor differences among drug types (Sim et al., Reference Sim, Lau, Sim, Sum and Baldessarini2016). The authors did not distinguish between the two designs that have been used to investigate the efficacy of continuation and maintenance treatment with ADMs (cmADMs): the discontinuation and the extension design (Fig. 1).
In the discontinuation (or placebo-substitution) design, patients who respond to aADM are randomized to cmADM v. PLA. Meta-analyses document substantially higher relapse/recurrence rates during the post-randomization 6–12 months in placebo-substitution arms (~42%) compared to the cmAD arms (~22%) (Geddes et al., Reference Geddes, Carney, Davies, Furukawa, Kupfer, Frank and Goodwin2003; Glue, Donovan, Kolluri, & Emir, Reference Glue, Donovan, Kolluri and Emir2010; Hansen et al., Reference Hansen, Gaynes, Thieda, Gartlehner, Deveaugh-Geiss, Krebs and Lohr2008; Kaymaz, van Os, Loonen, & Nolen, Reference Kaymaz, van Os, Loonen and Nolen2008). The major problem of the discontinuation design, however, is that AD withdrawal symptoms may masquerade as relapse/recurrence (Fava, Gatti, Belaise, Guidi, & Offidani, Reference Fava, Gatti, Belaise, Guidi and Offidani2015; Greenhouse & Meyer, Reference Greenhouse and Meyer1991). Its bias potential is unclear because of uncertainty about the risk period of withdrawal phenomena (Borges et al., Reference Borges, Chen, Laughren, Temple, Patel, David and Khin2014; El-Mallakh, Waltrip, & Peters, Reference El-Mallakh, Waltrip and Peters1999; Kaymaz et al., Reference Kaymaz, van Os, Loonen and Nolen2008). Excess relapse/recurrence rates in the placebo-substitution arms compared to the cmADM arms are strongest in the first month after discontinuation and gradually drop to approximately zero in the next 3–6 months (Borges et al., Reference Borges, Chen, Laughren, Temple, Patel, David and Khin2014; El-Mallakh et al., Reference El-Mallakh, Waltrip and Peters1999; Kaymaz et al., Reference Kaymaz, van Os, Loonen and Nolen2008).
Extension trials start as regular double-blind, placebo-controlled aADM trials that continue blinded treatment in responders to ADMs (~45%) and responders to PLA (~32%) for another 5–12 months (Zimmerman, Posternak, & Ruggero, Reference Zimmerman, Posternak and Ruggero2007). In five double-blind extension trials including 901 participants and lasting on average 35 weeks, relapse/recurrence rates in those who initially remitted averaged 25% in the PLA arm v. 8% in the ADM arm, showing a strong benefit for cADM relative to cPLA (RR = 2.0) (Zimmerman et al., Reference Zimmerman, Posternak and Ruggero2007). The extension design lacks the withdrawal issues of discontinuation trials, but still has the limitation that acute phase non-responders are not included and followed. This is a serious limitation as about half do not remit on acute phase treatment (Khin, Chen, Yang, Yang, & Laughren, Reference Khin, Chen, Yang, Yang and Laughren2011; Levkovitz et al., Reference Levkovitz, Tedeschini and Papakostas2011; Pigott, Leventhal, Alter, & Boren, Reference Pigott, Leventhal, Alter and Boren2010; Rush et al., Reference Rush, Trivedi, Wisniewski, Nierenberg, Stewart and Warden2006). Treatment completion rates were often <50%.
In conclusion, compared to discontinuation trials, extension trials provide stronger evidence that cmADM in responders to aADM treatment reduce relapse/recurrence risk relative to cmPLA in aPLA responders because aADM responders were not switched to PLA, excluding confounding effects of ADM withdrawal. However, extension trials are relatively rare and patients on cmPLA, although they initially responded, may slowly realize that they are using placebos, with possibly negative psychological consequences.
Naturalistic long-term outcome studies
Many naturalistic outcome studies (i.e. non-randomized observational cohort or longitudinal studies) have charted the long-term course and outcome of depression (⩾2 years). Reviews typically report worse long-term outcomes for depressed patients who are managed in specialty settings compared to those managed in primary care (e.g. Brodaty, Luscombe, Peisah, Anstey, & Andrews, Reference Brodaty, Luscombe, Peisah, Anstey and Andrews2001; Hardeveld, Spijker, De Graaf, Nolen, & Beekman, Reference Hardeveld, Spijker, De Graaf, Nolen and Beekman2010; Piccinelli & Wilkinson, Reference Piccinelli and Wilkinson1994; Steinert, Hofmann, Kruse, & Leichsenring, Reference Steinert, Hofmann, Kruse and Leichsenring2014; van Weel-Baumgarten, Schers, van den Bosch, van den Hoogen, & Zitman, Reference van Weel-Baumgarten, Schers, van den Bosch, van den Hoogen and Zitman2000). Piccinelli and Wilkinson, reviewing 50 outcome studies, reported weighted averages of 43% sustained recovery and 15% persistent depression at 1-year follow-up, and 24% sustained recovery and 12% persistent depression at >10-year follow-up. Other reviews report recurrence rates of up to 85% in 15 years for depression treated in specialty settings (Hardeveld et al., Reference Hardeveld, Spijker, De Graaf, Nolen and Beekman2010; Mulder & Frampton, Reference Mulder and Frampton2014). In contrast, two major reviews of adult depression in general practice and the community (20 studies of almost 6000 participants, most with 3–7 years follow-ups) found that although 10–17% had a chronic course, up to 85% recovered for some time and that 35–60% experienced stable recovery (Steinert et al., Reference Steinert, Hofmann, Kruse and Leichsenring2014; van Weel-Baumgarten et al., Reference van Weel-Baumgarten, Schers, van den Bosch, van den Hoogen and Zitman2000). Regarding outcome of non-treated community and general practice cases, multiple studies suggest that many depressive episodes (80–50%) are self-limiting as they remit within 3–12 months (Goldberg, Privett, Ustun, Simon, & Linden, Reference Goldberg, Privett, Ustun, Simon and Linden1998; Regier et al., Reference Regier, Kaelber, Rae, Farmer, Knauper, Kessler and Norquist1998; Sareen et al., Reference Sareen, Henriksen, Stein, Afifi, Lix and Enns2013; Spijker et al., Reference Spijker, de Graaf, Bijl, Beekman, Ormel and Nolen2002; Wang et al., Reference Wang, Henriksen, ten Have, de Graaf, Stein, Enns and Sareen2017; Whiteford et al., Reference Whiteford, Harris, McKeon, Baxter, Pennell, Barendregt and Wang2013). For instance, Whiteford's (2013) meta-analysis of 19 studies of untreated depression in general practice settings estimated that 23% remit within 3 months, 32% within 6 months, and 53% within 12 months.
Observational studies typically report similar or worse outcomes for treated v. untreated depressive episodes in primary care and community samples (Boerema et al., Reference Boerema, ten Have, Kleiboer, de Graaf, Nuyen, Cuijpers and Beekman2017; Goldberg et al., Reference Goldberg, Privett, Ustun, Simon and Linden1998; Hughes & Cohen, Reference Hughes and Cohen2009; Wang et al., Reference Wang, Henriksen, ten Have, de Graaf, Stein, Enns and Sareen2017). Hughes and Cohen's review compared AD-treated cases in 12 naturalistic cohorts of treated patients (n = 3901 at final follow-up) and three non-ADM-treated samples of 1160 patients. Most participants were white females with one inpatient stay. Frequency, duration, and severity of episodes varied substantially. Outcomes were unrelated to treatment status, and not better in ADM-treated cohorts relative to non-ADM-treated samples. Heterogeneity of study designs and outcome definitions were large and hampered statistical analysis. Recent naturalistic long-term outcome studies comparing ADM-treated and untreated individuals with diagnosed depression consistently corroborate the findings of Hughes and Cohen's review (Bockting, Hollon, Jarrett, Kuyken, & Dobson, Reference Bockting, Hollon, Jarrett, Kuyken and Dobson2015; Hengartner, Angst, & Roessler, Reference Hengartner, Angst and Roessler2018; Nuijen, ten Have, Tuithof, van Dorsselaer, & van Bon-Martens, Reference Nuijen, ten Have, Tuithof, van Dorsselaer and van Bon-Martens2014; Verduijn et al., Reference Verduijn, Verhoeven, Milaneschi, Schoevers, van Hemert, Beekman and Penninx2017; Vittengl, Reference Vittengl2017; Wang et al., Reference Wang, Henriksen, ten Have, de Graaf, Stein, Enns and Sareen2017). Statistical adjustment for confounders (e.g. educational level, marital status, baseline severity, family history) did not attenuate the association of ADM treatment with outcome (Hengartner et al., Reference Hengartner, Angst and Roessler2018; Vittengl, Reference Vittengl2017). However, statistical adjustment cannot adjust for unmeasured differences between ADM v. non-ADM (known as confounding by indication) and it is very likely that untreated cases with established major depressive disorder are less vulnerable and more resilient than treated cases (e.g. Goldberg et al., Reference Goldberg, Privett, Ustun, Simon and Linden1998; Ormel, Oldehinkel, Brilman, & van den Brink, Reference Ormel, Oldehinkel, Brilman and van den Brink1993).
Conclusion naturalistic studies
We want to emphasize two points. First, RCTs are the gold standard of treatment evaluation and naturalistic outcome studies can never prove treatment (non-) effectiveness relative to no treatment. However, because naturalistic outcome studies typically do not find an association between ADM-treatment and long-term outcome, their findings offer ammunition to critics to express doubts regarding long-term ADM effects, in particular in combination with (i) the limitations of and (ii) the weaker evidence of prolonged aADM trials. Second, naturalistic long-term outcome studies typically follow prevalent cases without distinguishing between first-ever onset cases v. recurrent cases. As the latter are overrepresented in patient series and population samples, they may not reflect the prognosis of first-ever onsets, which is reasonably positive, with 50–60% achieving stable recovery, 35–40% experiencing at least one recurrence in the next 15 years, and 15% becoming chronic (Eaton et al., Reference Eaton, Shao, Nestadt, Lee, Bienvenu and Zandi2008; Mattisson, Bogren, Horstmann, Munk-Jorgensen, & Nettelbladt, Reference Mattisson, Bogren, Horstmann, Munk-Jorgensen and Nettelbladt2007).
Efficacy and limitations of psychological treatments
Critics of ADMs often advocate psychological treatments, especially cognitive behavioral therapy (CBT), interpersonal therapy (IPT), mindfulness-based cognitive therapy (MBCT), and problem-solving therapy (PST). But how efficacious are these treatments and do they offer a serious alternative to cmADM or are they vulnerable to similar criticisms as the ADM? Compared to PLA, care-as-usual (CAU), and waiting-list controls, meta-analyses indicate that short-term efficacy of psychological treatments is modest in magnitude and comparable to aADM (e.g. Barth et al., Reference Barth, Munder, Gerger, Nueesch, Trelle and Znoj2013; Cuijpers et al., Reference Cuijpers, Karyotaki, Weitz, Andersson, Hollon and van Straten2014; Cuijpers, Cristea, Karyotaki, Reijnders, & Huibers, Reference Cuijpers, Cristea, Karyotaki, Reijnders and Huibers2016). For instance, Barth's meta-analysis identified 198 studies, including 15 118 adult patients with depression. All major types of psychotherapeutic interventions were superior to waitlist control condition with moderate-to-large effects (range d = 0.62–0.92). Studies with larger samples and higher quality (blinded observers, self-report measures, adequate concealed allocation, etc.) reported substantially smaller effects of about d = 0.30. Cuijpers et al.'s meta-analysis of CBT for MDD, the most studied psychological treatment, including 63 studies and approximately 4000 participants, reported a pooled effect size of 0.75 (95% CI 0.64–0.87; NNT = 3.9). High-quality studies using CAU as a control had significantly smaller effect sizes (0.43, NNT 7.3) and so did trials using PLA as the control (0.55, NNT = 5.5). Direct comparisons between ADM and major psychological treatments indicate that they are about equally efficacious in the short-term treatment of major depression (Cuijpers et al., Reference Cuijpers, Berking, Andersson, Quigley, Kleiboer and Dobson2013a, Reference Cuijpers, Hollon, van Straten, Bockting, Berking and Andersson2013b), although an earlier meta-analysis found a slight excess benefit for SSRIs (Cuijpers, van Straten, van Oppen, & Andersson, Reference Cuijpers, van Straten, van Oppen and Andersson2008). Dropout rates are typically smaller in psychological treatments (Cuijpers et al., Reference Cuijpers, van Straten, van Oppen and Andersson2008). Similar to ADM trials, accumulating biases have also resulted in overestimation of the short-term efficacy of psychological treatments, (Cuijpers et al., Reference Cuijpers, Cristea, Karyotaki, Reijnders and Huibers2016; Driessen, Hollon, Bockting, Cuijpers, & Turner, Reference Driessen, Hollon, Bockting, Cuijpers and Turner2015).
Three meta-analyses examined relapse/recurrence in responders to psychological treatments (largely CBT, BA, and IPT) (Cuijpers et al., Reference Cuijpers, Berking, Andersson, Quigley, Kleiboer and Dobson2013a; Sim et al., Reference Sim, Lau, Sim, Sum and Baldessarini2016; Vittengl, Clark, Dunn, & Jarrett, Reference Vittengl, Clark, Dunn and Jarrett2007). Vittengl et al.'s meta-analysis of 28 studies including 1880 adults found substantial levels of symptom return after discontinuation of aCBT in CBT responders (29% within 1 year and 54% within 2 years). These rates appeared comparable to those associated with other depression-specific psychotherapies but lower than those associated with aADM. Seven trials compared aCBT and aADM. Averaging separately by treatment type yielded relapse/recurrence rates of 39% for aCBT and 61% for aADM over an average of 68 weeks.
Cuijpers et al.'s (Reference Cuijpers, Hollon, van Straten, Bockting, Berking and Andersson2013b) meta-analysis examined the long-term (6–18 months) effects of aCBT v. aADM with and without cmADM (nine studies, 506 patients) (Cuijpers et al., Reference Cuijpers, Hollon, van Straten, Bockting, Berking and Andersson2013b). Short-term outcomes were comparable although dropout was lower in CBT. Patients who received acute phase CBT were significantly less likely to relapse after discontinuation than patients who were withdrawn from ADM (OR = 2.61, 95% CI 1.58–4.31, NNT = 5). There was a non-significant trend favoring prior CBT over acADM (five studies) (p < 0.1; OR = 1.62, 95% CI 0.97–2.72; NNT = 10) (p < 0.1; OR = 1.62, 95% CI 0.97–2.72; NNT = 10); suggesting that aCBT might be even more efficacious than continuation ADM in preventing relapse. Finally, Sim et al.'s meta-analysis included 22 psychosocial treatment trials [CBT, IPT, MBCT, psychoeducation that followed 1969 (mostly) remitted patients with recurrent depression across 24 months (Sim et al., Reference Sim, Lau, Sim, Sum and Baldessarini2016)]. Treatments were slightly more effective than controls in preventing relapse/recurrence: pooled RR = 1.39 (1.13–1.70); substantially less effective than in Vittengl et al.'s meta-analysis.
Some trials have recently examined whether preventive variants of CBT/MBCT have better long-term outcomes than cmADM. These trials randomized aADM responders into two or three arms: continuing ADM v. preventive CBT/MBCT with or without continuing ADM (Bockting et al., Reference Bockting, Klein, Elgersma, van Rijsbergen, Slofstra and Ormel2018; Huijbers et al., Reference Huijbers, Spinhoven, Spijker, Ruhe, van Schaik and van Oppen2016; Kuyken et al., Reference Kuyken, Hayes, Barrett, Byng, Dalgleish, Kessler and Byford2015; Segal et al., Reference Segal, Bieling, Young, MacQueen, Cooke, Martin and Levitan2010). A recent meta-analysis of four preventive MBCT trials (637 participants, 266 relapses during the 60-week follow-up period) found a statistically significant advantage for MBCT compared to cmADM, suggesting that MBCT appears efficacious as a treatment for relapse prevention for those with recurrent depression in (partial) remission (Kuyken et al., Reference Kuyken, Warren, Taylor, Whalley, Crane and Bondolfi2016). Additionally, in a recent trial not included in that review, cmADM was not superior to preventive CT administered while tapering off ADM [relapse/recurrence risks over 15–24 months (60% v. 63%) (Bockting et al., Reference Bockting, Klein, Elgersma, van Rijsbergen, Slofstra and Ormel2018)]. However, tapering was often unsuccessful as many patients continued or resumed cmADM, which may have decreased symptom return in the taper PCT/MBCT arm. On the other hand, misclassified withdrawal symptoms may have increased symptom return in this arm.
Conclusions
Although these findings suggest that acute phase and preventive psychotherapy represent viable alternatives to ADM treatment, the evidence is still somewhat uncertain due to the relatively small number of high-quality trials and the inherent problem of blinding in psychotherapy evaluation. Currently, the evidence base of cmADM in acute phase ADM responders is substantially larger than that for recurrence-prevention using CBT/MBCT, which may reflect registration requirements and funding opportunities. Similar to ADM trials, accumulating biases have also resulted in overestimation of the short- and long-term efficacy of psychological treatments (Cuijpers et al., Reference Cuijpers, Cristea, Karyotaki, Reijnders and Huibers2016; Driessen et al., Reference Driessen, Hollon, Bockting, Cuijpers and Turner2015). Effective blinding remains a problem and waitlist controls are often used even though this is known to inflate contrasts (Cuijpers & Cristea, Reference Cuijpers and Cristea2015). Furthermore, in all trials comparing psychological treatments with discontinued ADM, misclassified withdrawal symptoms in the ADM arm may have inflated the difference. Hence, there is a clear need for replication by high-quality head-to-head trials.
How to resolve the controversy
New data are urgently needed to push the (partly political) antidepressant debate toward a balanced and evidence-based discussion of benefits and harms of treatment modalities. We recommend four approaches: (1) placebo-controlled RCTs with relevant long-term outcomes, (2) analysis of non-randomized treatment outcome data using instrumental variable analysis and propensity score analysis, (3) patient cohort studies including effect moderators to enhance personalized treatment, and (4) psychological interventions as a universal first-line treatment step. The first two approaches will inform regarding treatment effects on long-term outcome whereas the latter two will reduce the relevance of the controversy. Negotiations between advocates and critics of AD to achieve consensus on the exact data needed to resolve the controversy should precede concrete research steps.
Placebo-controlled trials with relevant long-term outcomes
Logistically extremely difficult but in our opinion highly desirable are randomized placebo-controlled long-term outcome studies of ADM and psychotherapy. Long-term means at least 1 year follow-up, preferably 2 years. If sufficiently powered, an additional arm of combined treatment (COM) could be added. An example of a top priority trial is randomization of currently ADM-free depressed individuals, stratified by history of ADM treatment (never v. ever), to ADM v. CBT v. PLA. Acute phase non-responders, irrespective of treatment, will be offered the treatment they prefer, but kept in the study and administered the very same follow-up assessments, thereby addressing the major deficiency of previous trials. Alternatively, non-responders could be offered the other treatment modality in addition to the allotted treatment (COM) and provide the same for the PLA non-responders. Primary outcomes should be psychopathology, role function, adverse events, and acceptability. Machine learning can be used to generate selection algorithms that identify the optimal intervention for each individual who does not remit on COM and use those algorithms in the patient by mechanism interactions to improve the power of the tests of mediation (Cohen & DeRubeis, Reference Cohen and DeRubeis2018). If putative effect moderators are included at baseline, much can be learned about interactions between treatment modality and effect moderators. Finally, it is important to perform cost-effectiveness analyses as well because what drives shifts in practice are often shifts in reimbursement and what drives shifts in reimbursement is evidence of cost-effectiveness. As an anonymous program officer at NIMH once said: ‘You cannot herd cats but you can move their food.’
Multiple problems threaten the feasibility and validity of the proposed study: dropouts, protocol violations, imbalance between the arms at the entry of the continuation phase, complex logistics, and ethical concerns. Since 2000, it has been difficult to keep patients on PLA for more than a few months because it is considered unethical to withhold effective treatment. However, the modest excess efficacy of ADM and CBT relative to PLA implies that the majority of patients who respond to specific treatments do so for other reasons; that is, due to non-specific treatment effects and spontaneous remission, and not a ‘specific’ ADM or CBT effect. Furthermore, ethical concerns are manageable using patient-level stopping rules. The modest net efficacy of ADM or CBT (specificity) reduces the threat of imbalance as well, as many will respond to PLA, and balance can be managed with propensity analyses. Logistical problems necessitate the cooperation of many primary care providers from large catchment areas.
Inventive analyses of observational databases
The unsurpassed advantage of randomization is that the resulting groups are comparable (apart from differences due to chance). This allows the unambiguous attribution of differences in outcomes between groups to differences in treatment. Observational, non-randomized research can never rule out that the groups differ in characteristics (confounders) that affect both treatment and outcome. Notorious ‘confounders’ are severity and history. However, statisticians have recently developed methods that allow, under certain conditions and with reasonable certainty, causal statements about treatment effects based on observational data, even if not all potential ‘confounders’ are measured. The methods are instrumental variable analysis (Martens, Pestman, de Boer, Belitser, & Klungel, Reference Martens, Pestman, de Boer, Belitser and Klungel2006) and propensity score analysis (Austin, Reference Austin2011). Potentially relevant observational databases can be generated from routine outcome measurement (ROM), patient cohort data, and longitudinal population-based studies, provided they assess long-term outcome.
Cohort studies including effect moderators
Predicting who will benefit from which treatment is difficult. As a result, personalized treatment (‘precision medicine’) is not yet possible with depression although there are interesting developments in this direction (Cohen & DeRubeis, Reference Cohen and DeRubeis2018; DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014; Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009; Kessler, Reference Kessler2018). Cohort studies of patients with the same diagnosis can provide insight if putative effect moderators are included at baseline. If treatments become more precise and personalized, the relevance of the ADM controversy will decline. Relatively easy to measure potential effect moderators are clinical characteristics, personality, biomarkers to be determined from bodily fluids including DNA, social network, and support characteristics.
Psychological help as a first-line treatment
An approach that may also reduce the relevance of the controversy as well is to offer short-term psychological treatment as the first-line treatment step for all patients with depression unless they refuse or cannot receive psychological treatment or have characteristics shown to predict a better response to ADM. ADM then remains reserved for patients who do not want, get, or respond to this first step. It is of great importance which of the non-responders to psychological treatment improves on ADM and whether psychological treatment can subsequently contribute to the phasing out of their ADM use. Severely ill patients could be started on combination of ADM and psychotherapy. Some guidelines actually do advise psychological treatment for adolescents with depression as the first-line treatment step, and other guidelines, such as the UK NICE guideline for depression and the Dutch multidisciplinary guideline for depression, also recommend psychological treatment as the default first step for adults with mild-to-moderate depression(Clark, Reference Clark2018; NICE, 2011; Spijker et al., Reference Spijker, Meeuwissen, Aalbers, van Avendonk, van Bon and Huson2019).
Concluding comments
The ADM debate lingers because of mixed messages and differences between ADM critics and advocates in their interpretations of the research findings, the weaknesses and strengths of the various studies, and the lack of high-quality RCTs with long-term follow-ups; not a single large trial investigated head-on the long-term outcomes of acute phase CBT v. ADM v. PLA. The problems of generalizability to real-world patients, unblinding, dropout, withdrawal symptoms, exclusion of acute phase non-responders, selective reporting, and publication bias make it easy for critics and advocates to minimize the relevance of study findings that do not fit their interests and beliefs and emphasize the relevance of findings that do fit. It is worth noting that these problems affect both psychotherapy and ADM. What also plays a role is that we do not fully understand what depression is, or what the precise mechanism of action of ADM is. The combination of multiple theories with little substrate is not helping to resolve the controversy.
Overall, relatively strong evidence (in terms of low bias risk) in favor of ADM relative to PLA comes from the ‘classical’ trials with relatively long follow-ups (6–8 months) and the extension studies targeting relapse in responders to acute phase ADM v. PLA. However, the extension studies are limited to acute phase responders and the ‘classical’ trials are few in number and show statistically significant associations only for response and not remission. This suggests that long-term follow-up effects of ADM may be smaller than 8-week post-treatment effects; control groups tend to ‘catch up’ with the treatment group across time.
It is likely to prove especially important to conduct cost-effectiveness analyses of these studies. With the advent of the SSRIs, the pendulum has swung toward a wholesale reliance on ADM with the vast majority of the prescriptions now written in general practice. The UK has bucked this trend by bypassing the providers and going straight to the funders in the government with relevant cost-effectiveness data (Clark, Reference Clark2018). The National Health Service invested £700 million pounds to train psychotherapists to do interventions judged to be efficacious in clinical practice guidelines developed by the National Institute for Health Care Evaluation (NICE). Patient outcomes are monitored at every session and anonymized aggregate data posted on a publically accessible website. That recovery rates have risen from 35% to over 50% across the last decade speaks to the power of shifting reimbursement to the most cost-efficient interventions and monitoring outcomes in a wholly transparent fashion speaks to the utility of this English Improving Access to Psychological Therapies (IAPT) program (Clark, Reference Clark2018). Although the overall outcomes achieved by IAPT until 2017 are encouraging, it is unclear to what extent gains are maintained as systematic long-term FU are lacking. Hence, it is important to wait for the final evaluation of the IAPT program.
Despite countless studies on the short-term effectiveness of ADMs and hundreds of billions of revenue, the ADM debate continues. The lack of long-term outcome information hampers a balanced consideration. RCTs comparing the long-term outcome of ADM v. PLA and psychological treatments can provide the necessary data. Research consortia with researchers from varied fields and different allegiances are best situated to perform such ‘mega-trials’ so as to guarantee balanced composition (Quitkin et al., Reference Quitkin, Rabkin, Gerald, Davis and Klein2000) in accordance with the principle of ‘adversarial collaboration’ (Mellers, Hertwig, & Kahneman, Reference Mellers, Hertwig and Kahneman2001). Given the substantial prevalence of depression and cmADM treatment and the availability of alternative treatments, it is crucial that the lacking data become available.
Author contributions
Jo conceived and managed the study, collected and interpreted the literature, developed the argument, and wrote various drafts of the report. SH added important material. All co-authors contributed to the interpretation of relevant literature, provided critical feedback on earlier drafts, and approved the final version of the manuscript.
Conflict of interest
All authors declare to have no competing interests. JO is an epidemiologist and sociologist with a record of accomplishment in psychiatric epidemiology, and has been involved in a range of studies including a few AMD and psychotherapy trials. PS is a clinical psychologist who has been strongly involved in psychotherapy research. CLHB and SDH are clinical psychologists involved in both AMD and psychotherapy trials. YAdeV is an epidemiologist, not involved in trials, and GJS and AOJC are psychologists who have not been involved in randomized trials of ADM or psychotherapy.