1. Introduction
Psychosis is a severe psychiatric condition and there is limited evidence that treatments are successful in improving patients’ functioning once the disorder is established Reference Insel[1]. Intervening in the earlier phases is therefore the only viable possibility to substantially alter the course of the disorder [Reference Fusar-Poli, McGorry and Kane2, Reference Millan, Andrieux, Bartzokis, Cadenhead, Dazzan and Fusar-Poli3]. Within early intervention, a key focus for improving the outcome has been primary indicated prevention [Reference Fusar-Poli, McGorry and Kane2, Reference McGorry4, Reference Fusar-Poli5]. Primary indicated prevention allows for early intervention for those at clinical high risk of developing psychosis (CHR-P), with greater scope for improving outcomes. To do this effectively, the first necessary step is to reach an accurate, robust prognostic identification of individuals meeting CHR-P criteria who will subsequently develop psychosis or not. Ideally, all subjects who will actually develop psychosis should be classified as “at risk” (CHR-P+) while those not developing an established psychosis should be classified as “not at risk” (CHR-P–). These key concepts involved in prognostic reasoning in the CHR-P have been detailed and presented in a recent paper by our group Reference Fusar-Poli and Schultze-Lutter[6].
Prognostic prediction is used in many branches of medicine to identify individuals who may develop a particular disease Reference Arbyn, Verdoodt, Snijders, Verhoef, Suonio and Dillner[7]. For example, fasting glucose, oral glucose tolerance test and glycated haemoglobin are used to detect individuals at high risk for developing diabetes (pre-diabetes or intermediate hyperglycaemia) Reference Tabák, Herder, Rathmann, Brunner and Kivimäki[8] and systolic blood pressure and ratio of total serum cholesterol to high density lipoprotein cholesterol levels are used to detect individuals at high risk for developing cardiovascular disease Reference Hippisley-Cox, Coupland, Vinogradova, Robson, May and Brindle[9]. However, unlike these other fields, there are no biological tests to assess the risk of developing mental disorders Reference Fusar-Poli and Meyer-Lindenberg[10], which is instead reliant on semi-structured CHR-P psychometric interviews, such as the CAARMS (Comprehensive Assessment for At Risk Mental States) Reference Yung, Yuen, McGorry, Phillips, Kelly and Dell’Olio[11]. Recently, the CAARMS has become the mainstream tool to detect CHR-P individuals in the UK, recommended by international bodies, such as NICE [12]. Therefore, understanding its exact psychometric properties is of paramount clinical relevance. The CAARMS shows excellent inter-rater reliability when performed by trained raters (0.85) Reference Yung, Stanford, Cosgrave, Killackey, Phillips and Nelson[13]. However, its prognostic accuracy is uncertain. A recent meta-analysis by our lab Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14] investigated the prognostic accuracy of CHR-P instruments, showing generally excellent prognostic performance of these instruments. However, CHR-P tools were grouped together including the CAARMS Reference Yung, Yuen, McGorry, Phillips, Kelly and Dell’Olio[11], the SIPS (Structured Interview for Prodromal Syndromes) Reference Miller, McGlashan, Rosen, Cadenhead, Cannon and Ventura[15] and the SPI-A (Schizophrenia Proneness Instrument-Adult Version) Reference Schultze-Lutter, Ruhrmann, Picker and Klosterkötter[16]. This was due to the fact that there were not enough studies contributing data to assess the meta-analytical prognostic accuracy of the CAARMS specifically. Given the marked differences between the CAARMS and other CHR-P instruments Reference Fusar-Poli, Cappucciati, Rutigliano, Lee, Beverly and Bonoldi[17], in particular with respect to the functional deterioration criterion Reference Fusar-Poli, Rocchetti, Sardella, Avila, Brandizzi and Caverzasi[18], it is possible that the previously reported meta-analytical prognostic accuracy is not completely accurate. In addition, the previous meta-analysis combined multiple follow-up time points, and even though meta-regressions of this variable found no significant effect, validity of the prognostic accuracy results would be improved by using a more defined and consistent follow-up time Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14].
The current study tackles these caveats and advances knowledge in the psychometric properties of the CAARMS. We capitalize on recently published CAARMS studies reporting useful and innovative meta-analytical data to conduct a meta-analytical prognostic accuracy analysis of the CAARMS at two-year follow-up. This is the period of time during which most transitions to psychosis occur Reference Kempton, Bonoldi, Valmaggia, McGuire and Fusar-Poli[19]. The results will hopefully support the refinement of psychosis prediction and therefore facilitate indicated primary prevention in CHR-P individuals.
2. Methods
2.1. Search strategy
Two investigators (DO, PFP) conducted a two-step literature search. At a first step, the Web of Knowledge database was searched, incorporating both the Web of Science and Medline. The search was extended until August 2017, only including abstracts in English. The electronic research adopted several combinations of the following keywords: “at risk mental state”, “psychosis risk”, “prodrome”, “prodromal psychosis”, “ultra-high risk”, “high risk”, “help-seeking”, “diagnostic accuracy”, “sensitivity”, “specificity”, “psychosis prediction”, “psychosis onset”. The second step involved the use of Scopus to investigate citations of previous systematic reviews on transition outcomes in CHR-P subjects and a manual search of the reference lists of the retrieved articles.
Articles identified through these two steps were then screened for the selection criteria on the basis of abstract reading. The articles surviving this selection were assessed for eligibility on the basis of full text reading. To achieve a high standard of reporting, we adopted the Meta-analysis Of Observational Studies in Epidemiology (MOOSE) checklist Reference Stroup, Berlin, Morton, Olkin, Williamson and Rennie[20].
2.2. Selection criteria
Studies were eligible for inclusion if:
• they were reported in original articles, written in English;
• they had used the CAARMS (index test) in the same pool of referrals;
• they had followed up both CHR-P+ and CHR-P– subjects for psychosis onset (reference index) using established international diagnostic manuals (ICD or DSM);
• they had reported sufficient prognostic accuracy data at 2-year follow-up.
With respect to this last point, when data were not directly presented, they were indirectly extracted from associated data. Additionally, we contacted all corresponding authors to request additional data when needed.
We excluded:
• abstracts, reviews, articles in a language other than English;
• studies in which interviews were not conducted in the same pool of referrals or that used an external CHR-P group of healthy controls;
• studies with overlapping datasets.
In case of multiple publications deriving from the same study population, we selected the article reporting the largest and most recent data set. The literature search was summarized according to PRISMA guidelines Reference Moher, Liberati, Tetzlaff and Altman[21].
2.3. Recorded variables
Data extraction was independently performed by two investigators (DO, PFP). Data included author, year of publication, characteristics of subject samples (baseline sample sizes, mean age and age range, proportion of females), diagnostic criteria used at follow-ups to assess the psychotic outcome, prognostic accuracy data (number of true and false positives, true and false negatives or associated data) and quality assessment conducted with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist Reference Whiting, Rutjes, Westwood, Mallett, Deeks and Reitsma[22].
2.4. Statistical analysis
The statistical analysis followed the Cochrane Guidelines for Systematic Reviews of Diagnostic Test Accuracy, Version 1.0 Reference Macaskill, Gatsonis, Deeks, Harbord, Takwoingi, Deeks, Bossuyt and Gatsonis[23] and the Methods Guide for Authors of Systematic Reviews of Medical Tests by the Agency for Healthcare Research and Quality (chapter 8) Reference Smetana, Umscheid, Chang and Matchar[24]. Evaluating test accuracy requires knowledge of two quantities: the test's sensitivity (Se) and specificity (Sp). Meta-analysis methods for diagnostic test accuracy thus have to deal with two summary statistics simultaneously rather than one Reference Macaskill, Gatsonis, Deeks, Harbord, Takwoingi, Deeks, Bossuyt and Gatsonis[23]. Methods for undertaking analyses, which account for both Se and Sp, the relationship between them, and the heterogeneity in test accuracy, require fitting advanced hierarchical random effects models Reference Macaskill, Gatsonis, Deeks, Harbord, Takwoingi, Deeks, Bossuyt and Gatsonis[23].
For each study, we constructed a two-by-two table, which included true positive, false positive, true negative, and false negative values. The baseline sample size was conservatively used as the base reference.
Data were then analysed with MIDAS (Meta-analytical Integration of Diagnostic Accuracy Studies) Reference Dwamena[25], a comprehensive program of statistical and graphical routines for undertaking meta-analysis of diagnostic/prognostic test performance in STATA 14 software Reference StataCorp.[26]. The index tests of CHR-P status (CHR-P+ or CHR-P–) and reference tests of transition to psychosis according to international diagnostic manuals (ICD or DSM as gold standard) were dichotomous.
Primary data synthesis was performed within the bivariate mixed-effects regression framework for the logit transforms of Se and Sp. In addition to accounting for study size, the bivariate model estimates and incorporates the intrinsic negative correlation that may arise between Se and Sp within studies (threshold effect) Reference Harbord and Whiting[27], as a result of differences in the test threshold between studies Reference Janda, Shahidi, Gin and Swiston[28]. The bivariate model allows for heterogeneity beyond chance as a result of clinical and methodological differences between studies Reference Janda, Shahidi, Gin and Swiston[28].
We estimated the summary Se and Sp and the hierarchical SROC (summary receiver operator characteristic) curves [Reference Macaskill, Gatsonis, Deeks, Harbord, Takwoingi, Deeks, Bossuyt and Gatsonis23, Reference Higgins, Thompson, Deeks and Altman32]. A SROC graph across each predictor, with the y-axis representing the predictor's Se and the x-axis representing 1–specificity, was used to plot a 95% confidence region and a 95% prediction region around the summary estimates to illustrate the precision with which the summary values were estimated (confidence ellipse of a mean), and to show the amount of between-study variation (prediction ellipse; the likely range of values for a new study). We also estimated the AUC (area under the curve). The AUC serves as a global measure of test performance. Values in the range of 0.9–1 are considered outstanding, between 0.8 and 0.9 are considered excellent, between 0.7 and 0.8 are considered acceptable Reference Hosmer, Lemeshow and May[29].
Heterogeneity across studies was assessed using the I 2, with values of 25%, 50% and 75% representing mild, moderate and severe inconsistency, respectively Reference Lipsey and Wilson[30]. Within MIDAS, forest plots and heterogeneity statistics can be created for each test performance parameter individually or may be displayed as paired plots. Meta-regressions were used to examine the influence of mean age, gender (% females), sample size, and quality assessment (QUADAS) on meta-analytical estimates. Furthermore, we investigated the prognostic accuracy difference between CAARMS based studies and studies employing the SIPS, as detected in the previous meta-analysis Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14]. To control for biases associated with imbalanced datasets Reference Bekkar, Djemaa and Alitouche[31], we further tested the impact of the proportion of CHR-P+ subjects in the overall samples. The meta-regressions were used if there was substantial heterogeneity (I 2 > 50%) Reference Higgins, Thompson, Deeks and Altman[32] and when more than 10 studies were available.
Sensitivity analyses (i.e., exclusion of outliers and rerunning of the model) were conducted to further explore heterogeneity. We did not test publication bias Reference Deeks, Macaskill and Irwig[33], because no proven statistical method exists for this type of meta-analysis Reference Wanders, East, Uitentuis, Leeflang and Dekker[34].
In a second step, we employed the probability-modifying plot to estimate the clinical or patient-relevant utility of the CAARMS in subjects seeking help at CHR-P services.
The clinical utility was evaluated using the positive and negative likelihood ratios (LR+ and LR–) to calculate post-test probability (post-TP) based on Bayes’ theorem (with pre-test probability, pre-PT, being the prevalence of the condition in the target population), as follows: post-TP = LR × pre-TP/[(1 – pre-TP)+(pre-TP × LR)] Reference Harbord and Whiting[27]. Specifically, the probability-modifying plot Reference Dwamena[25] is a graphical sensitivity analysis of the test's predictive values across a baseline psychosis risk continuum in people seeking help at CHR-P services. It depicts separate curves for positive and negative tests and uses general summary statistics (i.e., unconditional positive and negative predictive values, NPV and PPV, which permit underlying psychosis risk heterogeneity) to evaluate the effect of the CHR-P assessment on predictive values Reference Li, Fine and Safdar[35]. The pre-TP probability of psychosis risk in subjects seeking help at early detection services was computed in the current dataset as the proportion of subjects developing psychosis on the total baseline sample (CHR-P+ plus CHR-P–) Reference Dwamena[25].
Statistical tests were two-sided and statistical significance was defined as P values < 0.05.
3. Results
3.1. Database
The literature search produced 6 independent studies [Reference Fusar-Poli, Rutigliano, Stahl, Davies, De Micheli and Ramella-Cravaro36–Reference Lee, Rekhi, Mitter, Bong, Kraus and Lam41] that met inclusion criteria with a total of 1876 subjects (CHR+: n = 892; CHR–: n = 984) referred to clinical high risk services. The dataset was balanced with CHR+ individuals composing 47.5% of the total subjects. The characteristics of the studies are reported in the Table 1 while the PRISMA diagram is depicted in Fig. 1. The MOOSE checklist is reported in the Table 1. The detailed QUADAS assessment is reported in the eTable 2 and eFig. 1.
3.2. Prognostic accuracy of the CAARMS at 2-years
The summary meta-analytical estimate of Se at 2 years (0.86, 95% CI = 0.76–0.92) was outstanding with the 2-year AUC (0.79, 95% CI = 0.75–0.83) acceptable but the estimate of 2-year Sp (0.55, 95% CI = 0.48–0.63) was poor (Fig. 2). There was severe heterogeneity present in this analysis (I 2 = 93.28%, 95% CI = 89.42–97.15%).
3.3. Clinical utility of the CAARMS at 2 years
The 2-year psychosis transition risk in the 1876 subjects was 0.09 (95% CI = 0.05–0.13). On the basis of the prior distribution, the continuous relationship between pre-TP and post-TP probability is summarized in Fig. 3. Being CHR-P+ was associated with a 0.16 (95% CI = 0.10–0.22) risk of developing psychosis within 2 years, yet a small LR+ of just 1.9 (95% CI = 1.5–2.4) while being CHR-P– was associated with a 0.03 (95% CI = 0.02–0.05) risk of transition to psychosis with a moderate LR– of 0.25 (95% CI = 0.13–0.48).
3.4. Meta-regressions and sensitivity analyses
Sensitivity analysis suggested that one study Reference Lee, Rekhi, Mitter, Bong, Kraus and Lam[41] was influential with a Cook's distance > 1. While we hypothesised this was due to the study reporting 0 false negatives, we were unable to test the effect of false negatives through meta-regression due to low number of studies. Similarly, we were unable to perform meta-regressions for age, gender, QUADAS score or sample size as there were fewer than 10 studies contributing data. As indicated in the methods, we were able to perform a meta-regression comparing the prognostic accuracy of the CAARMS vs. that of the SIPS, using the studies reporting SIPS data [Reference Woods, Addington, Cadenhead, Cannon, Cornblatt and Heinssen42–Reference Simon, Grädel, Cattapan-Ludewig, Gruber, Ballinari and Roth46] only as identified in our previous study (n = 5, CHR+: n = 783; CHR–: n = 360). As indicated in Fig. 4, Se was significantly higher (P < 0.001) for the SIPS (n = 5, mean = 0.95, 95% CI 0.91–0.99) compared to the CAARMS (n = 6, mean = 0.87, 95% CI 0.79–0.96), while Sp was comparably (P = 0.27) low in the SIPS (n = 5, mean = 0.45, 95% CI = 0.38–0.53) and in the CAARMS (n = 6, mean = 0.55, 95% CI 0.48–0.62).
4. Discussion
This is the first meta-analysis specifically investigating the prognostic accuracy of the CAARMS for the prediction of psychosis. We found 6 studies that investigated prognostic accuracy of the CAARMS at two-year follow-up, which contributed a relatively large database of 1876 subjects overall, with 892 considered CHR-P+ and 984 CHR-P–. Prognostic accuracy of the CAARMS in terms of AUC was found to be only acceptable (0.79), mostly mediated by its substantial ability to rule out psychosis (i.e. LR– was relatively small and Se high). However, this was at the expense of ruling in psychosis (i.e. LR+ was small and Sp was poor). While prognostic accuracy was overall acceptable, this study indicates that refining the prediction of outcomes should be the key priority of future research in this field.
a Updated follow-up data provided by the authors.
The primary aim of the study was to synthesize available data for the prognostic accuracy of the CAARMS in determining psychosis risk 2 years after young help-seeking subjects presented to CHR-P services. As noted in the introduction, our recent meta-analysis Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14] looked into the prognostic accuracy of CHR-P instruments as a collective. The current study advances knowledge indicating that the exact prognostic accuracy of the CAARMS alone is weaker (0.79) than the overall value previously observed when the CHR-P instruments were pooled together (0.90). Although not as outstanding as before, the AUC value here reported is still considered to be acceptable for a diagnostic test and is comparable to other prognostic tools used in different areas of medicine, such as the AUC = 0.76 attributed to the Cambridge risk score for pre-diabetes Reference Thomas, Hyppönen and Power[47]. In a similar fashion, we found that the Se (0.86) of the CAARMS alone was less impressive than the Se (0.96) of CHR-P instruments assessed in the previous meta-analysis. Interestingly, there was an apparent minor increase in Sp (0.55 for CAARMS alone compared to 0.47 for CHR-P instruments generally) Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14]. The lower AUC compared to the previous general estimate may reflect profound operationalization differences between the CAARMS and the other CHR-P instruments. For example, a comparative analysis between the CAARMS and the SIPS confirmed caseness discrepancies between the two instruments Reference Fusar-Poli, Cappucciati, Rutigliano, Lee, Beverly and Bonoldi[17], mostly due to different definition of brief limited intermittent psychotic cases, ascertainment of comorbidities Reference Fusar-Poli, Cappucciati, Rutigliano, Lee, Beverly and Bonoldi[17] and of functional level at intake Reference Fusar-Poli, Rocchetti, Sardella, Avila, Brandizzi and Caverzasi[18]. To directly test the effect of these differences on the prognostic performance, in the current study, we performed the first meta-analytical comparison of Se and Sp across the CAARMS and SIPS, using previously published SIPS data Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14]. We found that Se was higher in the SIPS compared to the CAARMS, while there were no substantial differences in Sp. Overall, it is unlikely that these differences may account for significant differences in the positive predictive values of the two instruments, as confirmed by previous meta-analyses in CHR-P+ samples Reference Fusar-Poli, Rocchetti, Sardella, Avila, Brandizzi and Caverzasi[18].
In a second step, we estimated the clinical utility of the CAARMS. As previously reported by our lab, clinical utility is not static, instead reliant on the underlying pre-test risk in any given population [Reference Fusar-Poli, Rutigliano, Stahl, Schmidt, Ramella-Cravaro and Hitesh49–Reference Fusar-Poli, Palombini, Davies, Oliver, Bonoldi and Ramella-Cravaro51]. We found that being classified as CHR+ by the CAARMS is associated with a 16.4% risk of developing psychosis within 2 years, which is lower than the 29.1% 2-year transition risk previously reported Reference Fusar-Poli, Bonoldi, Yung, Borgwardt, Kempton and Valmaggia[52]. This was driven by a small LR+ (1.9), similar to the LR+ seen previously (1.82) Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14]. CHR– individuals had 3.38% 2-year transition rate and this was driven by a moderate LR– (0.25), which was not as large as the LR– for CHR assessments as a whole (0.09) Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi and Borgwardt[14]. These findings taken altogether indicate that the acceptable prognostic accuracy is due to an imbalance between Se and Sp and LR+ and LR–, with the CAARMS being a valuable tool to correctly identify individuals who will develop psychosis however showing only modest ability to identify those who will not.
On a pragmatic level, the results of this meta-analysis show that the only acceptable prognostic accuracy of the CAARMS needs improving through a refined assessment of psychosis risk. An improved detection of individuals who will transition would lead to improved clinical and research opportunities. For example, a greater proportion of true positives would lead to more efficient primary indicated prevention as well as a more homogenous CHR-P group [Reference Fusar-Poli, Cappucciati, Borgwardt, Woods, Addington and Nelson48, Reference Fusar-Poli, Cappucciati, De Micheli, Rutigliano, Bonoldi and Tognin53] for developing putative treatments. This manuscript has the clinical potential to be the reference point for refining future versions of the CAARMS or for the development of refined prognostic tools and assessments. To improve prediction of psychosis, it seems necessary to tailor it on an individual level. To date, the CAARMS has just considered CHR-P+ individuals as belonging to a whole group. However, it is now clear that such an assumption is incorrect, given the profound difference in level of psychosis risk observed across different CHR-P+ subgroups [Reference Fusar-Poli, Cappucciati, Borgwardt, Woods, Addington and Nelson48, Reference Fusar-Poli, Cappucciati, De Micheli, Rutigliano, Bonoldi and Tognin53]. Furthermore, to date, psychosis prediction has been limited to the assessment and rating of CHR-P symptoms and signs. However, it is evident that these are only epiphenomena of underlying neurobiological and psychological processes that may characterize the onset of psychosis in vulnerable individuals. Research evidence in the field of risk and protective factors associated with an impending vulnerability to psychosis has accumulated over the past few decades and only recently has it been systematically assessed. In a recent large-scale meta-analysis, our lab has stratified the level of evidence for associations of several risk or protective factors and established psychotic disorders Reference Ramella-Cravaro, Radua, Ionnidis, Reichenberg, Phiphopthatsanee and Amir[54]. This study may lay the groundwork for investigating how specific risk or protective factors accumulate in CHR-P+ individuals explaining their increased liability to develop psychosis. In a first attempt by our lab Reference Fusar-Poli, Tantardini, De Simone, Ramella-Cravaro, Oliver and Kingdon[55], we reviewed forty-four studies encompassing 170 independent datasets and 54 risk/protective factors in CHR-P+ individuals. We showed that CHR-P+ individuals were more likely to show obstetric complications, tobacco use, physical inactivity, childhood trauma/emotional abuse/physical neglect, high perceived stress, childhood and adolescent low functioning, affective comorbidities, male gender, single status, unemployment and low educational level as compared to controls. The differential accumulation of these factors in each CHR-P+ individuals are likely to account for the different outcomes observed in these samples, such as psychosis onset, persistence of CHR-P+ features or remission. A refinement of psychosis prediction in these samples would inevitably require a careful investigation of these factors beyond the rating of severity and frequency of CHR-P+ symptoms as currently required by the CAARMS.
4.1. Limitations
Some limitations of this meta-analysis need to be acknowledged. Firstly, only 6 studies were able to be synthesised for this meta-analysis, and although supplying a healthy number of subjects, power could be questioned. Another limitation of our meta-analysis is the small sample size. However, conducting longitudinal studies in individuals assessed for a CHR-P state but not meeting intake criteria is logistically challenging and therefore only a few studies are currently available. Secondly, heterogeneity was very high and this could potentially have been reduced through a greater pool of studies. Thirdly, this heterogeneity remains unexplained as we were unable to perform meta-regressions because there were not enough studies.
5. Conclusion
The 2-year meta-analytical prognostic accuracy of the CAARMS in predicting psychosis is only acceptable. A refined prediction of psychosis risk is necessary to advance clinical research in this area.
Disclosure of interest
The authors declare that they have no competing interest.
Acknowledgements
DO is supported by the Medical Research Council.
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.eurpsy.2017.10.001.
Comments
No Comments have been published for this article.