Introduction
There is a growing clinical and academic interest in investigating symptoms that cut across multiple mental disorders (Eaton et al., Reference Eaton, Bringmann, Elmer, Fried, Forbes, Greene and Waszczuk2023; Forbes et al., Reference Forbes, Neo, Nezami, Fried, Faure, Michelsen and Dras2024; Gibbons, Farmer, Shaw, & Chung, Reference Gibbons, Farmer, Shaw and Chung2023), such as anxiety, insomnia, depressed mood, hallucinations, among others. Both authoritative texts (American Psychiatric Association, 2013) and empirical studies (Forbes, Tackett, Markon, & Krueger, Reference Forbes, Tackett, Markon and Krueger2016; Waszczuk et al., Reference Waszczuk, Zimmerman, Ruggero, Li, MacNamara, Weinberg and Kotov2017) have encouraged dimensional approaches to psychopathology assessment, with the aim of elucidating shared features among different disorders. An emerging tool is The Diagnostic and Statistical Manual of Mental Disorders-5 (DSM-5) Level 1 Cross-Cutting Symptom Measure (DSM-XC), which consists of 23 self-reported questions that explore 13 different mental health domains. However, the current body of evidence supporting the utility of cross-cutting symptom assessments, including the DSM-XC, remains limited. This gap presents an unexplored opportunity not only for understanding the connections between different diagnostic categories but also for optimizing community-based mental disorder screening.
The U.S. Preventive Services Task Force (USPSTF) supports the use of disorder-specific scales in screening recommendations for anxiety (US Preventive Services Task Force et al., Reference Barry, Nicholson, Silverstein, Coker, Davidson and Wong2023a), depression (US Preventive Services Task Force et al., Reference Barry, Nicholson, Silverstein, Chelmow, Coker and Wong2023b), and unhealthy drug and alcohol use among adults (US Preventive Services Task Force et al., Reference Curry, Krist, Owens, Barry, Caughey and Wong2018, Reference Krist, Davidson, Mangione, Barry, Cabana and Wong2020). Notable scales presenting good diagnostic accuracy for these purposes (Mulvaney-Day et al., Reference Mulvaney-Day, Marshall, Downey Piscopo, Korsen, Lynch, Karnell and Ghose2018) include the Patient Health Questionnaire-9 (PHQ-9, sensitivity = 0.85, specificity = 0.85) and Center for Epidemiological Studies Depression (CES-D, sensitivity = 0.83, specificity = 0.78) for major depression (Negeri et al., Reference Negeri, Levis, Sun, He, Krishnan, Wu and Thombs2021; Vilagut, Forero, Barbaglia, & Alonso, Reference Vilagut, Forero, Barbaglia and Alonso2016), the General Anxiety Disorder-7 (GAD-7) for generalized anxiety disorder (sensitivity = 0.83, specificity = 0.84) (Plummer, Manea, Trepel, & McMillan, Reference Plummer, Manea, Trepel and McMillan2016), the National Institute on Alcohol Abuse and Alcoholism (NIAAA) for current alcohol use disorder (sensitivity = 0.88, specificity = 0.67) (Smith, Schmidt, Allensworth-Davies, & Saitz, Reference Smith, Schmidt, Allensworth-Davies and Saitz2009), The Alcohol, Smoking and Substance Involvement Screening Test (ASSIST) for tobacco, alcohol, and cannabis use disorders (sensitivities = 0.95–1.00; specificities = 0.79–0.93) (Gryczynski et al., Reference Gryczynski, Kelly, Mitchell, Kirk, O'Grady and Schwartz2015), and the Tobacco, Alcohol, Prescription Medication and Other Substance Use (TAPS, sensitivities = 0.70–0.74, specificities 0.85–0.95 for tobacco, alcohol and marijuana) (McNeely et al., Reference McNeely, Wu, Subramaniam, Sharma, Cathers, Svikis and Schwartz2016), among others. Nevertheless, current recommendations lack the inclusion of multi-disorder tools. Only a handful of such screening instruments are available, such as the PHQ-4, The Hospital Anxiety and Depression Scale (HADS), and the Mental Health Index (MHI-5) (Means-Christensen et al., Reference Means-Christensen, Arnau, Tonidandel, Bramson and Meagher2005). The PHQ-4 (Löwe et al., Reference Löwe, Wahl, Rose, Spitzer, Glaesmer, Wingenfeld and Brähler2010) is an ultra-short assessment that combines the PHQ-2 (sensitivity = 0.89, specificity = 0.75) (Mitchell, Yadegarfar, Gill, & Stubbs, Reference Mitchell, Yadegarfar, Gill and Stubbs2016) and GAD-2 (sensitivity = 0.80, specificity = 0.81) (Plummer et al., Reference Plummer, Manea, Trepel and McMillan2016), facilitating simultaneous screening for depression and anxiety. The HADS is a lengthier but commonly used tool for identifying depression and anxiety in medically ill patients, with sensitivities and specificities for depression and anxiety subscales of approximately 0.80 (Bjelland, Dahl, Haug, & Neckelmann, Reference Bjelland, Dahl, Haug and Neckelmann2002). However, the extent of diagnostic insight provided by comprehensive assessments remains largely unexplored for psychiatric disorders.
The symptoms included in the DSM-XC exhibit similarities to the PHQ-2 (Löwe, Kroenke, & Gräfe, Reference Löwe, Kroenke and Gräfe2005; Mitchell et al., Reference Mitchell, Yadegarfar, Gill and Stubbs2016) and the GAD-2 (Plummer et al., Reference Plummer, Manea, Trepel and McMillan2016). However, the utility of these scales has been ascertained only for the specific diagnoses they were designed to screen for. Given the recognized high comorbidity among mental disorders (Plana-Ripoll et al., Reference Plana-Ripoll, Pedersen, Holtz, Benros, Dalsgaard, de Jonge and McGrath2019) and the occurrence of overlapping symptoms across diagnoses (Forbes et al., Reference Forbes, Neo, Nezami, Fried, Faure, Michelsen and Dras2024), it is pertinent to consider expanding the use of these brief screeners to identify multiple disorders. In addition, a comprehensive examination of the screening properties of the 13 DSM-XC domains can help pinpoint which cross-cutting symptoms are more effective in either confirming or ruling out specific diagnoses.
Previous studies suggested the DSM-XC (Bastiaens & Galus, Reference Bastiaens and Galus2018; Doss & Lowmaster, Reference Doss and Lowmaster2022; Mahoney et al., Reference Mahoney, Farmer, Sinclair, Sung, Dehaut and Chung2020) could be used for screening, but they lacked the use of diagnostic interviews and assessed convenience samples with limited generalizability and ecological validity. Additionally, the DSM-XC merges questioning about the intensity (‘how much’) and frequency (‘how often’) of symptoms, which could impact its psychometric and screening properties (Gong et al., Reference Gong, Zhang, Yang, Huang, Feng and Zhang2013; Krabbe & Forkmann, Reference Krabbe and Forkmann2014; Krishnakumar et al., Reference Krishnakumar, Scopel Hoffmann, Schoeller, Clucas, Son, Müller-Naendrup and Klein2021; Levin, Schneider, & Gaeth, Reference Levin, Schneider and Gaeth1998), but empirical studies on the impact on screening utility of different framings are not known.
Our aim was to assess the screening properties (sensitivity, specificity, and likelihood ratios [LRs]) of the DSM-XC's domains to detect eight mental disorders, representative of both internalizing and externalizing spectra, in the community. We will address this by assessing the instruments' original operationalization merging intensity and frequency of symptoms and conducting an analysis for both framings separately.
Methods
Assessments
Participants
We analyzed data from the latest assessment of 22 years of an ongoing longitudinal population-based study of the city of Pelotas, Brazil, that recruited all children born alive in 1993 (Victora et al., Reference Victora, Hallal, Araujo, Menezes, Wells and Barros2008), with a retention rate of 76.3%. Detailed information on the cohort is described elsewhere (Gonçalves et al., Reference Gonçalves, Wehrmeister, Assunção, Tovo-Rodrigues, de Oliveira, Murray and Menezes2018; Victora et al., Reference Victora, Araújo, Menezes, Hallal, Vieira, Neutzling and Barros2006). Only individuals with completed DSM-XC and MINI data were included in the present study. Written informed consent was obtained from all participants. The study was approved by the Institutional Review Board of the School of Medicine, Universidade Federal de Pelotas (approval number 1.250.366).
The DSM-5 Level 1 cross-cutting symptom measure
The DSM-XC is a self-reported measure that examines different mental health domains to assist clinicians in identifying comorbidity that could impact a patient's treatment and prognosis (American Psychiatric Association, 2013). The cross-cutting measures for adults involve two levels of inquiry. At Level 1, the questionnaire consists of 23 items distributed in 13 domains: depression, anger, mania, anxiety, somatic symptoms, suicidal ideation, psychosis, sleep problems, memory, repetitive thoughts and behaviors, dissociation, personality functioning, and substance use. Items ask, ‘how much (or how often) have you been bothered by’ a given problem during the previous two weeks and provide a 5-point Likert scale ranging from 0 = ‘None (not at all)’ to 4 = ‘Severe (nearly every day)’. A rating of mild (i.e. 2) or greater on any item within a domain, except for substance use, suicidal ideation, and psychosis is considered a positive screen. For substance use, suicidal ideation, and psychosis, a rating of slight (i.e. 1) or greater on any item within the domain is considered a positive domain. The threshold scores in any of its domains indicate the need for a thorough clinical investigation with the respective Level 2 Cross-Cutting Symptom Measure. Level 2 questions offer a more detailed assessment of specific domains. Except for suicidal ideation, psychosis, memory problems, dissociation, and personality functioning, each domain in the adult version of the DSM-XC corresponds to a Level 2 measure, which contains more questions about each screened disorder. Additional details on Level 1 and Level 2 measures can be found online at www.psychiatry.org/dsm5.
In the original DSM-XC, there are two simultaneous framings when asking about symptoms: intensity (i.e. how much) and frequency (i.e. how often), requiring the respondent to choose a single answer. After a pilot test in Brazil, we identified that some participants had difficulty understanding the double-barreled questions, which led to the decision to split each question into two, and answers were recorded on two separate 5-point scales: one for intensity and another for frequency. To compute the score as intended by the original DSM-XC ‘OR’ rule, the highest score in each question was considered when analyzed jointly. For example, an answer of 0 (none) for how much and 2 (several days) for how often, or vice versa, was coded as 2. An additional adaptation was made to the instrument by dividing the last item (item 23) into two (items 23 and 24, as shown in online Supplementary Table S1) to allow for the exploration of substance use and self-prescription/misuse separately.
Assessment of psychiatric diagnoses
Participants were also assessed for the presence of psychiatric disorders with an adapted Brazilian version of the MINI International Neuropsychiatric Interview (MINI) (Amorim, Reference Amorim2000), administered by trained clinical psychologists, who did not have access to the DSM-XC responses. The MINI 5.0.0 is a structured and standardized diagnostic interview based on the 4th edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) (American Psychiatric Association, 1994), and was used to assess the following disorders: antisocial personality disorder (APD), attention-deficit/hyperactivity disorder (ADHD), bipolar disorder (BD), generalized anxiety disorder (GAD), major depressive disorder (MDD), post-traumatic stress disorder (PTSD) and social anxiety disorder (SAD). Moreover, to capture the DSM's impairment criterion, we included an additional question inquiring participants how much impairment the reported symptoms caused in their life, with response options of none, mild, moderate, or severe (Manfro et al., Reference Manfro, Belem da Silva, Anselmi, Barros, Eaton, Gonçalves and Kieling2021; Matte et al., Reference Matte, Anselmi, Salum, Kieling, Gonçalves, Menezes and Rohde2015). The presence of clinical impairment was operationally defined as having a score of moderate or severe. The Alcohol Use Disorders Identification Test (AUDIT), a 10-question instrument, was used to identify participants with active alcohol use disorder (AUD). A score of ≥ 15 for men and ≥ 13 for women was considered positive. The AUDIT validity has been previously investigated in Brazil (Meneses-Gaya et al., Reference Meneses-Gaya, Zuardi, Loureiro, Hallak, Trzesniak, de Azevedo Marques and Crippa2010).
Data analyses
We followed the Standards for Reporting of Diagnostic Accuracy Studies (STARD) 2015 guidelines (Bossuyt et al., Reference Bossuyt, Reitsma, Bruns, Gatsonis, Glasziou, Irwig and Cohen2015), reported in online Supplementary Table S2.
Screening properties
We estimated the number of true positives, false positives, true negatives, and false negatives for each of the 13 DSM-XC domains to detect individuals with psychiatric disorders. We then estimated the sensitivity, specificity, positive (LR+), and negative (LR−) likelihood ratios (Altman & Bland, Reference Altman and Bland1994; Grimes & Schulz, Reference Grimes and Schulz2005) for each domain-diagnosis pair.
Sensitivities and specificities higher than 0.75 (Power, Fell, & Wright, Reference Power, Fell and Wright2013), LR+ values higher than 2, and LR− values lower than 0.5 were considered clinically useful (Grimes & Schulz, Reference Grimes and Schulz2005; Jaeschke, Guyatt, & Sackett, Reference Jaeschke, Guyatt and Sackett1994). Exact binomial 95% confidence intervals (95% CI) were calculated for sensitivity and specificity (Collett, Reference Collett2002). Confidence intervals for LR+ and LR− values are based on formulae provided by Simel, Samsa, and Matchar (Reference Simel, Samsa and Matchar1991).
Analysis of variability in screening properties by framing
We first described the by-item response differences and calculated the polychoric correlation and Cohen's kappa to measure the reliability between frames for each DSM-XC item. Second, we calculated and compared all screening properties between framings. Overlapping confidence intervals were considered statistically equivalent. Paired Wilcoxon tests with DSM-XC domains as dependent variables were conducted to compare the differences in screening properties between the two framings. Screening properties of the two framings were also compared with the OR rule (reference group) with paired Wilcoxon tests with DSM-XC domains as dependent variables.
Logistic regression models were used to test potential interactions between intensity and frequency that could indicate an advantage of a particular framing. The odds ratios for specific combinations, namely Intensity (−)/Frequency (+), Intensity (+)/Frequency (−), and Intensity (+)/Frequency (+) were calculated with Intensity (−)/Frequency (−) as the reference group for each domain with a corresponding disorder.
We also tested if clinical impairment could be a factor for potential differences in framings instead of the framing effect itself. Because of this, secondary analyses were conducted by removing the clinical impairment criteria. The diagnosis of AUD was based on AUDIT scores and was not included in this analysis.
All analyses were performed using the software R, version 4.1.3. Screening properties were evaluated using the epiR package, version 2.0.44 (Stevenson et al., Reference Stevenson, Sergeant, Heuer, Nunes, Heuer, Marshall and Rabiee2020). Paired Wilcoxon tests were conducted using the rstatix package, version 0.7.0 (Kassambara, Reference Kassambara2021). Code is available at http://osf.io/q54yu.
Results
Sample description
A total of 3578 subjects with completed DSM-XC and MINI data participated in the 2015 follow-up study. The mean age was 22.57 years. Participants were mainly of the female sex (53.58%). Most individuals had working experience (94.10%), 63.30% had finished high school, and 84.13% were single. At least one psychiatric diagnosis was present in 813 subjects (22.72%). Sample characteristics are provided in Table 1. The prevalence of specific diagnoses ranged from 1.00% (APD) to 10.56% (GAD) (Table 1). A positive screen in the anxiety domain was observed in 50% of participants, followed by substance use at 47% and depression at 43%. Other prevalent domains included somatic symptoms (36%), anger (32%), and personality functioning (30%). Suicidal ideation (9%), psychosis (14%), and dissociation (16%) were the least frequent domains. Full details of the number and proportion of positive screens for each domain is provided in online Supplementary Table S3. Cross-tabulation of the DSM-XC domains by the results of the reference standards (MINI diagnoses and AUDIT) are provided in online Supplementary Tables S4–S19.
Note: NEET, Not in Education, Employment, or Training; BRL, Brazilian Real (Currency Unit).
Sensitivity
The anxiety domain demonstrated excellent sensitivity across all internalizing disorders, with sensitivity values ranging from 0.86 to 0.95 (Table 2). In particular, the depression domain exhibited good sensitivity for both GAD (0.80, 95% CI 0.76–0.84) and MDD (0.95, 95% CI 0.89–0.98). The personality functioning domain showed adequate sensitivity for MDD (0.86, 95% CI 0.78–0.92). For externalizing disorders, only the substance use domain displayed good sensitivity for AUD (0.87, 95% CI 0.81–0.91).
Sensitivity represents the ability of the test to designate an individual with the disorder as positive, and values greater than 0.75 are shown in bold if crossing the confidence interval. Specificity represents the ability of the test to designate an individual without the disorder as negative, and values greater than 0.75 are shown in bold if crossing the confidence interval.
Specificity
Various domains, including suicidal ideation, psychosis, memory, repetitive thoughts and behaviors, and dissociation, exhibited good specificity across all tested disorders (Table 2). Specifically, suicidal ideation had a specificity range of 0.91–0.92, psychosis ranged from 0.86 to 0.87, memory from 0.83 to 0.86, repetitive thoughts and behaviors from 0.81 to 0.84, and dissociation from 0.84 to 0.87. Sleep problems showed good specificity for both GAD (0.79, 95% CI 0.77–0.80) and PTSD (0.77, 95% CI 0.76–0.78).
Likelihood ratios
Differences in LR+ values were observed between internalizing and externalizing disorders (Table 3). Several domains, namely suicidal ideation, repetitive thoughts and behaviors, dissociation, and personality functioning, exhibited clinically meaningful LR+ values for all internalizing disorders, but not for externalizing disorders.
LR values for positive tests greater than two (shown in bold if crossing the confidence interval) represent substantial changes in the probability of the target disorder, while LR values for negative tests less than 0.5 (also if crossing the interval) indicate considerable changes in the opposite direction.
The anxiety domain demonstrated meaningful LR− values for all internalizing disorders, ranging from 0.09 to 0.27. The depression domain also displayed good LR− values for most internalizing disorders, specifically GAD, MDD, and PTSD. Other domains, such as anger, personality functioning, and repetitive thoughts and behaviors showed meaningful LR− values for specific disorders. For externalizing disorders, only the substance use domain presented acceptable LR− values for AUD.
Variability in above-threshold domains and screening properties by adverbial framing
Polychoric correlation coefficients between ‘intensity’ and ‘frequency’ framings ranged from 0.77 to 0.90 and Kappa coefficient ranged from 0.50 to 0.66. Detailed results are shown in online Supplementary Table S1, and the item response distribution for the DSM-XC is shown in online Supplementary Fig. S1.
Intensity-framed questions resulted in a similar number of above-threshold domains compared to the original rule (online Supplementary Table S2). Frequency-framed questions showed a lower number of above-threshold domains, which was accompanied by significant improvements in LR+ values (Fig. 1) mainly for the depression, anger, and anxiety domains. LR− values were not significantly impacted by the framing variation (Fig. 2). Wilcoxon tests comparing framings across all DSM-XC domains showed that intensity-framed questions present marginally better sensitivity and LR− values, but worse specificity, and LR+ values when compared to frequency-framed questions (online Supplementary Figs S2 and S3, Supplementary Tables S20–S23). The intensity framing showed minor differences in sensitivity, specificity, and LR+ when compared to the OR rule, while the frequency framing showed lower sensitivity, but higher specificity and LR+ compared to the OR rule. Differences in LR− were very small for both framings compared to the OR rule (online Supplementary Tables S24–S27).
Logistic regressions of specific framing combinations (online Supplementary Table S28) indicate that for mania-BD, personality functioning-APD, and substance use-AUD pairs, the Intensity (−)/Frequency (+) group had a significant higher odds compared to the reference group, while the Intensity (+)/Frequency (−) had not. For the depression domain to detect MDD and anxiety domain to detect GAD and SAD, all operationalizations had a higher likelihood of the corresponding disorder.
Full details of the screening properties by framing are described in online Supplementary Tables S29–S52.
Secondary analyses showed that the findings remained stable even without the MINI clinical impairment criterion (online Supplementary Figs S4–S7, Supplementary Tables S53–S78).
Discussion
This is the first study to describe the screening properties of the DSM-XC for a broad spectrum of psychiatric diagnoses in a large community sample. Results suggest that screening for psychiatric conditions using simple and objective operational criteria increases the likelihood of detection or may help to rule out specific diagnoses. The results comprise three main findings: First, the DSM-XC domains are not diagnostic-specific, rather, they function as transdiagnostic screening tools. For example, a negative anxiety domain is useful in decreasing and a positive suicidal ideation domain is useful in increasing the probability of many internalizing disorders. Second, the screening utility of the DSM-XC is lower for externalizing disorders. Third, ‘intensity’ and ‘frequency’ framings as measured by the DSM-XC are only moderately correlated, and the ‘frequency’ framing exhibited a lower number of above-threshold domains and improved screening properties compared to ‘intensity’ framing.
While several screening tools exist for specific diagnosis, few address multiple diagnostic categories simultaneously. Despite limitations in externalizing disorder assessment, e.g. due to the lack of ADHD domain in the DSM-XC and the lack of a diagnostic interview for AUD in the present cohort, the DSM-XC performance for depression and anxiety is comparable to existing scales, justifying its potential applicability in community settings. The Patient Health Questionnaire-2 (PHQ-2) (Kroenke, Spitzer, & Williams, Reference Kroenke, Spitzer and Williams2003) assesses similar symptoms to the depression domain of the DSM-XC, querying low mood and loss of interest over the past two weeks, despite differences in instructions and scoring scales. In the present study, we found screening properties of the DSM-XC ‘OR’ rule depression domain to detect MDD (LR+ = 2.32, LR− = 0.08, sensitivity = 0.95, specificity = 0.59) to be similar to recent meta-analytic evidence examining the PHQ-2 (LR+ = 2.97, LR− = 0.26, sensitivity = 0.89, specificity = 0.75) (Mitchell et al., Reference Mitchell, Yadegarfar, Gill and Stubbs2016). The Generalized Anxiety Disorder two questions scale (GAD-2) with a cut-off of three presents more informative properties (LR+ = 4.31, LR− = 0.25, sensitivity = 0.80, specificity = 0.81) (Plummer et al., Reference Plummer, Manea, Trepel and McMillan2016) compared to the DSM-XC anxiety domain to detect GAD (original ‘OR’ rule LR+ 1.90, LR− = 0.24, sensitivity = 0.87, specificity = 0.54). The differences in performance between legacy measures with similar conceptual content and DSM-XC domains may be attributed to variations in item content, scale, and scoring methods (linear vs algorithm). For instance, the DSM-XC anxiety domain comprises three items, includes more symptoms per item and assesses different symptoms, such as panic and anxious avoidance, while the GAD-2 has only two items focusing on anxiety and worrying. Additionally, both PHQ-2 and GAD-2 use 4-point scales, while the DSM-XC is rated on a 5-point scale.
A lack of direct correspondence was observed across domains and psychiatric diagnoses. Moreover, some domains have a higher capability than others for screening, and some domains showed utility in screening for multiple disorders (transdiagnostic utility). For example, the depression domain showed a LR+ of 2.32 for detecting MDD, 2.11 for detecting GAD, and 1.94 for PTSD. Conversely, the anxiety domain showed significant LR− values for all internalizing disorders, suggesting its potential as a first screening step. This finding contrasts with the current understanding of the DSM-XC process of further inquiring an individual with disorder-specific questions if they screen positive in a domain. However, the DSM-XC did not show meaningful utility for externalizing disorders, except for a good likelihood ratio of a negative screen in the substance use domain to rule out AUD. Due to identified shortcomings, specific recommended scales for alcohol and substance use should supplement primary care assessments using the DSM-XC. Further modifications in the tool are necessary to address externalizing symptoms, such as impulsivity and disruptive behaviors, aiming at increasing overall screening utility. These targeted improvements could broaden the DSM-XC's scope and solidify its place as a component of community screening strategies.
Strong connections observed in certain domains' screening capabilities likely reflect the interconnectedness of the disorders they captured. The presence of clinical features that are not represented in the diagnostic criteria but are frequent among individuals with certain mental disorders is a potential explanation for some domains demonstrating screening utility for multiple disorders. For instance, somatic symptoms, while not part of diagnostic criteria, may be more commonly experienced among individuals with GAD and MDD, while anxious distress is a notable feature in both bipolar and major depressive disorders (Bartoli et al., Reference Bartoli, Bachi, Callovini, Palpella, Piacenti, Morreale and Carrà2024). Another contributing factor is the conceptual similarity and symptom overlap across various diagnostic criteria, with insomnia and irritable mood being among the top non-specific symptoms (Forbes et al., Reference Forbes, Neo, Nezami, Fried, Faure, Michelsen and Dras2024). In our study, sleep problems and anger, for example, demonstrated significant positive likelihood ratios across anxiety, mood, and post-traumatic stress disorder categories. Comorbidity and common temperamental risk factors can contribute to this phenomenon, as individuals with high levels of negative affectivity (neuroticism) are likely to endorse multiple internalizing symptom domains in a non-specific manner due to being oriented towards negative affectivity (Griffith et al., Reference Griffith, Zinbarg, Craske, Mineka, Rose, Waters and Sutton2010). Additionally, substantial evidence supports the idea that common neural and genetic mechanisms contribute to the association between these disorders (Jami et al., Reference Jami, Hammerschlag, Ip, Allegrini, Benyamin, Border and Middeldorp2022; Xie et al., Reference Xie, Xiang, Shen, Peng, Kang and Li2023).
We also examined the contributions of framing questions as intensity or frequency of symptoms, as the questionnaire was revised based on participant feedback regarding the challenge of understanding double-barreled questions. The DSM-XC showed a moderate correlation between framings, with intensity-framed questions leading to a significantly higher proportion of positive screens for most domains compared to frequency-framed questions. For some domains, such as depression, anger, and memory, framing questions as ‘intensity’ approximately doubled the number of individuals that would have been needed to complete Level 2 measures, compared with ‘frequency’. Thus, using frequency-framed questions may be a more resource-efficient approach.
Framings also differed in screening properties, with frequency-framed questions presenting generally higher specificity but equivalent sensitivity to detect their corresponding specific disorders. Differences in LR+ favoring the use of the ‘frequency’ framing were found, with minimal compromise in LR− values. The value of likelihood ratios for clinical decision-making is significant (Grimes & Schulz, Reference Grimes and Schulz2005). For example, a positive result in the anxiety domain, with a likelihood ratio of 1.95, translates to a shift from the pre-test probability of GAD from 10.5% to a post-test probability of 18.3%. However, by employing the ‘frequency’ framing, with a LR+ value of 3.11, the post-test probability substantially increases to 26.8% for GAD. As the conversion from pre-test to post-test probability involves a transformation into odds, we have developed an accessible tool, available at https://pacheco-jpg.shinyapps.io/dsm_xc_shiny_app/ that enables quick estimation of the shift from pre- to post-test probability for the tested disorders within the DSM-XC domains.
Apart from establishing improved screening utility for the DSM-XC, defining standards for adverbial framing of questions can reduce the substantial heterogeneity in mental health assessment (Newson, Hunter, & Thiagarajan, Reference Newson, Hunter and Thiagarajan2020). Previous work suggests higher reliability, criterion validity (Krishnakumar et al., Reference Krishnakumar, Scopel Hoffmann, Schoeller, Clucas, Son, Müller-Naendrup and Klein2021), and stability over time (Krabbe & Forkmann, Reference Krabbe and Forkmann2014) for frequency in comparison to intensity, which could explain the higher accuracy of the frequency framing that we found in the present study.
This study has three main strengths. First, we examined a community-based sample, which allows for real-world estimates of the screening properties of the DSM-XC. Second, we investigated a broad spectrum of psychiatric diagnoses, with examples of both internalizing and externalizing disorders. Third, we removed a previous restriction imposed by double-barreled phrasing, which allowed for examining the contributions of intensity and frequency framings. However, interpreting these results requires considering several limitations. First, results should not be necessarily extrapolated for other age groups as the sample was limited to the age of 22. Second, although the presence of most disorders was determined by a qualified clinical psychologist with a structured interview, we used the AUDIT to evaluate AUD. Third, further research is needed to directly compare the implemented ‘OR’ rule with the original double-barreled scale. The ‘OR’ rule allowed us to characterize the intended meaning of the DSM-XC, which is an important as it addresses the cognitively demanding phrasing of the original scale. A notable concern with the implemented approach is that the order in that questions were presented could have affected how individuals rated each item. As the ‘intensity’ framing was always presented first, this piece of information could have influenced how participants answered the following ‘frequency’ question, introducing anchoring bias. However, anchoring bias tends to make results look more similar and we found only moderate correlations between framings. Additionally, the frequency framing, presented after intensity, showed better screening properties, and logistic regression analyses of specific framing subgroups suggest the frequency framing captures relevant information of specific corresponding disorders on more occasions than the intensity framing alone. This suggests that anchoring bias did not occur as frequency framing is not conditioned to intensity framing.
The present results demonstrated that the DSM-XC is useful in screening multiple domains of psychopathology, increasing or decreasing the likelihood of detection of specific diagnoses with simple objective questions. Due to the low direct correspondence between domain and diagnosis, more than one Level 2 Cross-Cutting Symptom Measures could apply after a positive screening in some domains. We also found that adverbial framings have an impact on the DSM-XC's screening performance. Framing questions as ‘frequency’ may be more resource-efficient and could lead to improved detection of positive cases. We recommend that these findings should be replicated in other samples to allow implementation in routine practice and that future examination of scoring thresholds should be done to further optimize the DSM-XC as a screening tool for mental health conditions.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291724000849.
Acknowledgements
This work was supported by the Brazilian public funding agency Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) grant 466826/2014-1. C. K. is a CNPq researcher and a UK Academy of Medical Sciences Newton Advanced Fellow. M. S. H. is a postdoctoral research fellow at UFRGS, supported by the United States National Institutes of Health grant R01MH120482-01.
Funding statement
This work was supported by the Brazilian public funding agency Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) grant 466826/2014-1.
Competing interests
C. K. is the founder of Wida, a digital mental health platform. L. A. R. has received grant or research support from, served as a consultant too, and served on the speakers' bureau of Abbott, Aché, Bial, Medice, Novartis/Sandoz, Pfizer/Upjohn, and Shire/Takeda in the last three years. The ADHD and Juvenile Bipolar Disorder Outpatient Programs chaired by Dr Rohde have received unrestricted educational and research support from the following pharmaceutical companies in the last three years: Novartis/Sandoz and Shire/Takeda. He has received authorship royalties from Oxford Press and ArtMed.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.