Background
Globally, depression is the most common psychiatric disorder in the general population (Liu et al., Reference Liu, He, Yang, Feng, Zhao and Lyu2020). It is a major contributor to the disease burden and a leading cause of disability globally (Wang et al., Reference Wang, Wu, Lai, Long, Zhang, Li, Zhu, Chen, Zhong, Liu, Wang and Lin2017). In primary care settings, depression is the most prevalent mental health condition. Prevalence rates for major depressive disorder range from 3.2% to 27.2% in primary care settings (Craven & Bland, Reference Craven and Bland2013).
Studies have reported under-recognition of depression in primary care (Hirschfeld et al., Reference Hirschfeld, Keller, Panico, Arons, Barlow, Davidoff, Endicott, Froom, Goldstein, Gorman, Marek, Maurer, Meyer, Phillips, Ross, Schwenk, Sharfstein, Thase and Wyatt1997; Fekadu et al., Reference Fekadu, Demissie, Birhane, Medhin, Bitew, Hailemariam, Minaye, Habtamu, Milkias, Petersen, Patel, Cleare, Mayston, Thornicroft, Alem, Hanlon and Prince2020). While symptoms are prevalent, primary care patients do not discuss them with their doctors. Barriers to diagnosis include patients’ lack of awareness and understanding of the nature of the disease and its symptoms as well as the variability in clinical presentation. It is estimated that 50% of patients with major depressive disorder are not identified (Mitchell et al., Reference Mitchell, Vaze and Rao2009). Screening for depression in primary care to provide early identification and intervention is supported by a strong body of evidence (Siniscalchi et al., Reference Siniscalchi, Broome, Fish, Ventimiglia, Thompson, Roy, Pipes and Trivedi2020). Given large estimates of underdiagnosed and undertreated depression, routine screening in primary care can improve the detection rate and reduce the disease burden.
Increased use of screening tools can help improve identification and treatment of depression in primary care settings. The most commonly used tool to screen for depression in primary care is the Patient Health Questionnaire (PHQ) (Mitchell et al., Reference Mitchell, Yadegarfar, Gill and Stubbs2016). There are three main formats of the PHQ: PHQ-9 (linear), PHQ-9 (algorithm) and PHQ-2. PHQ-9 includes nine questions, whereas PHQ-2 includes the first two questions of PHQ-9. It is designed as an initial screening tool to be followed by the more comprehensive PHQ-9 and diagnostic interviews.
In Qatar, the current clinical guidelines recommend the use of PHQ-2 as a screening tool for all patients visiting a primary healthcare setting. If the overall PHQ-2 score is ≥ 3, the comprehensive PHQ-9 and diagnostic interviews are undertaken. This study was designed to establish diagnostic accuracy of the PHQ-2 in Qatar’s primary care population. Its findings will inform local and international guideline development for depression screening in primary healthcare settings.
Methods
Study setting
The study was conducted in Primary Health Care Corporation (PHCC) in Qatar. PHCC is a public sector organisation that delivers primary care to approximately 70% of the country’s population through 28 health centres.
Study population and data collection
PHCC operates a single electronic medical record (EMR) system across all PHCC health centres. The data required for the study were anonymously extracted from PHCC’s EMR system. The eligibility criteria for inclusion were (1) individuals aged > 18 years to <65 years and (2) completed PHQ-2 score in the electronic records between January 2017 and December 2019. Individuals with other mental health conditions (personality disorder, schizophrenia, mental disability and dementia) were excluded.
Data analysis
Descriptive analysis of age, gender and diagnosis of depression was undertaken. Individuals with no diagnostic codes for depression on the EMR were considered not depressed. A mean PHQ-2 score was calculated. The sensitivity, specificity, predictive values, negative values and optimal cut-off points were calculated for the tool. Youden’s index, area under the curve of a receiver operated curve and gain in certainty metric were calculated to estimate performance.
Ethical considerations
The study presented a minimal risk of harm to its subjects, and the data collected for it were anonymised. None of the subjects’ personal information was available to the research team. Overall, the study was conducted with integrity according to generally accepted ethical principles and was reviewed and approved under the exempt review category by the PHCC’s research subcommittee (PHCC/DCR/2020/03/017).
Results
A total of 6921 individuals met the study’s inclusion criteria. The mean age of those included was 40.4 years and 63% were women. Depression was diagnosed in 17.9% of the study population. The mean PHQ-2 score was 1.6.
The diagnostic accuracy of cut-off values was calculated for scores 1–6 (see Table 1). Based on the Youden’s index (0.58), a score of 2 was identified as the most optimal cut-off. It offers a sensitivity of 88.73% and specificity of 69.31%.
Discussion
This study is the first to report diagnostic accuracy of the PHQ-2 in Qatar’s primary care population and potentially in the Gulf Cooperation Countries (which have similar population characteristics as Qatar). Its findings demonstrate the PHQ-2 tool has a high diagnostic accuracy in Qatar’s primary care settings.
The tool was found to be very sensitive for a diagnosis of depression with sensitivities of 95% and 88% for thresholds of ≥ 1 and ≥ 2, respectively. However, it had a modest specificity of 57% and 69%, respectively, at these cut-off values. The finding that a score of ≥ 2 was more successful in detecting cases of depression than the current score ≥ 3 suggests that it may be too high for clinical practice. A systematic review and meta-analysis also concluded that ≥2 may be preferable if clinicians want to ensure that few cases of depression are missed. Another systematic review and meta-analysis reported that the combination of PHQ-2 (with cut-off ≥2) followed by PHQ-9 (with cut-off ≥10) had similar sensitivity but higher specificity compared with PHQ-9 cut-off scores of 10 or greater alone (Levis et al., Reference Levis, Sun, He, Wu, Krishnan, Bhandari, Neupane, Imran, Brehaut, Negeri, Fischer, Benedetti, Thombs, Che, Levis, Riehm, Saadat, Azar, Rice, Boruff, Kloda, Cuijpers, Gilbody, Ioannidis, McMillan, Patten, Shrier, Ziegelstein, Moore, Akena, Amtmann, Arroll, Ayalon, Baradaran, Beraldi, Bernstein, Bhana, Bombardier, Buji, Butterworth, Carter, Chagas, Chan, Chan, Chibanda, Cholera, Clover, Conway, Conwell, Daray, de Man-van Ginkel, Delgadillo, Diez-Quevedo, Fann, Field, Fisher, Fung, Garman, Gelaye, Gholizadeh, Gibson, Goodyear-Smith, Green, Greeno, Hall, Hampel, Hantsoo, Haroz, Harter, Hegerl, Hides, Hobfoll, Honikman, Hudson, Hyphantis, Inagaki, Ismail, Jeon, Jetté, Khamseh, Kiely, Kohler, Kohrt, Kwan, Lamers, Asunción Lara, Levin-Aspenson, Lino, Liu, Lotrakul, Loureiro, Löwe, Luitel, Lund, Marrie, Marsh, Marx, McGuire, Mohd Sidik, Munhoz, Muramatsu, Nakku, Navarrete, Osório, Patel, Pence, Persoons, Petersen, Picardi, Pugh, Quinn, Rancans, Rathod, Reuter, Roch, Rooney, Rowe, Santos, Schram, Shaaban, Shinn, Sidebottom, Simning, Spangenberg, Stafford, Sung, Suzuki, Swartz, Tan, Taylor-Rowan, Tran, Turner, van der Feltz-Cornelis, van Heyningen, van Weert, Wagner, Li Wang, White, Winkley, Wynter, Yamada, Zhi Zeng and Zhang2020). Cut-off scores of ≥ 2 are supported by other studies and healthcare settings (Yu et al., Reference Yu, Stewart, Wong and Lam2011; Thombs et al., Reference Thombs, Benedetti, Kloda, Levis, Nicolau, Cuijpers, Gilbody, Ioannidis, McMillan, Patten, Shrier, Steele and Ziegelstein2014; Carey et al., Reference Carey, Boyes, Noble, Waller and Inder2016; Gelaye et al., Reference Gelaye, Wilson, Berhane, Deyessa, Bahretibeb, Wondimagegn, Shibre Kelkile, Berhane, Fann and Williams2016; Scoppetta et al., Reference Scoppetta, Cassiani-Miranda, Arocha-Díaz, Cabanzo-Arenas and Campo-Arias2021). Therefore, it is recommended that clinical guidelines are reviewed and revised taking these findings into consideration. Using a higher cut-off value may be a reason for underdiagnosed in primary healthcare settings as reported by previous studies (Mitchell et al., Reference Mitchell, Vaze and Rao2009).
The study’s strength includes a large sample and reliable data recorded by qualified healthcare professionals and extracted from an EMR system. The study reports an overview of PHQ-2 diagnostic accuracy in Qatar’s primary care settings. This facilitates development of clinical guidelines that can enhance diagnosis of depression. The limitations of the study include the following: a cross-sectional study design which provides a snapshot in time. Furthermore, the study included only patients who were 18 years and above and those who completed a PHQ-2 questionnaire. Also, the clinical diagnosis of depression is subject to diagnostic variability among clinicians.
The study demonstrates the PHQ-2 is most effective with a cut-off score of ≥ 2 in Qatar’s primary care settings. Clinical guidelines in the country should be aligned with the findings. Further studies should aim to confirm the results using alternative study designs and to report them in accordance to population characteristics both in Qatar and internationally.
Acknowledgements
The authors acknowledge the support we receive from the Primary Health Care Corporation’s Department of Clinical Research.
Authors’ Contribution
STS, MJMA and EH conceptualised the study. EH undertook the data analysis. MAS drafted the manuscript. All authors revised and approved the final manuscript.
Financial support
Funding for the study was approved by Primary Health Care Corporation’s Research Budget Working Group (PHCC/DCR/2020/03/017).
Conflicts of interest
None.