Background
Psychosis risk has been extensively studied during the last few decades, with the aim of identifying people who are in the prodromal period for psychosis. Psychosis transition rates vary greatly among studies, with the validity of psychosis risk assessment depending on the population in which it is studied (Fusar-Poli et al., Reference Fusar-Poli, Schultze-Lutter, Cappucciati, Rutigliano, Bonoldi, Stahl and Mcguire2016b). Most psychosis risk research has been focused on help-seeking patients in specialized psychosis risk care, fulfilling and enriched with respect to the criteria for clinical high-risk (CHR; Fusar-Poli et al., Reference Fusar-Poli, Bonoldi, Yung, Borgwardt, Kempton, Valmaggia and McGuire2012; Schultze-Lutter et al., Reference Schultze-Lutter, Michel, Schmidt, Schimmelmann, Maric, Salokangas and Klosterkötter2015). The predictive precision has been lower in more diverse populations, lowering the generalizability of findings from specialized clinics to general psychiatric practice (van Os & Guloksuz, Reference van Os and Guloksuz2017). In addition, non-CHR groups as psychiatric controls are often overlooked (Millman, Gold, Mittal, & Schiffman, Reference Millman, Gold, Mittal and Schiffman2019).
The most established methods of psychosis risk detection, Comprehensive Assessment of At-Risk Mental States (CAARMS; Yung et al., Reference Yung, Stanford, Cosgrave, Killackey, Phillips, Nelson and McGorry2006) and the Structured Interview for Prodromal Syndromes (SIPS; Miller et al., Reference Miller, McGlashan, Rosen, Cadenhead, Cannon, Ventura and Woods2003), assess various symptom domains. However, only positive symptoms are actually used for determining psychosis risk status, in addition to a combination of genetic risk and functional level, which are also components of CHR syndrome definitions. Some studies have investigated the predictiveness of symptoms from the other domains – negative and disorganized symptoms (Addington & Heinssen, Reference Addington and Heinssen2012) – and various psychosis prediction models have been developed combining positive symptoms or CHR status with the other symptom dimensions (Studerus, Ramyead, & Riecher-Rössler, Reference Studerus, Ramyead and Riecher-Rössler2017). Since impairments in neurocognitive abilities might be connected to psychosis transition (Bora et al., Reference Bora, Lin, Wood, Yung, McGorry and Pantelis2014), these are also included in some models. Other commonly suggested predictors of psychosis include substance abuse, trauma exposure, and various biological markers (Studerus et al., Reference Studerus, Ramyead and Riecher-Rössler2017).
As fulfilling criteria for psychosis can be seen as an arbitrary line on the psychosis continuum, transition to a psychotic disorder as the sole outcome of interest has been criticized (van Os & Guloksuz, Reference van Os and Guloksuz2017), especially in the context of positive symptoms. Although positive symptoms at the CHR level may be precursors of psychosis, especially milder positive symptoms may more often be unspecific risk markers expressing general psychiatric vulnerability (Healy et al., Reference Healy, Brannigan, Dooley, Coughlan, Clarke, Kelleher and Cannon2019; Jeppesen et al., Reference Jeppesen, Clemmensen, Munkholm, Rimvall, Rask, Jørgensen and Skovgaard2015). Therefore, in addition to the psychosis outcome, the existing CHR assessment tools are also being used to predict unspecific outcomes such as everyday level of functioning or hospitalization – which they may predict better than psychosis (Carrión et al., Reference Carrión, McLaughlin, Goldberg, Auther, Olsen, Olvet and Cornblatt2013; Cotter et al., Reference Cotter, Drake, Bucci, Firth, Edge and Yung2014; Lin et al., Reference Lin, Wood, Nelson, Brewer, Spiliotacopoulos, Bruxner and Yung2011). In previous prediction algorithms applied to young people at CHR, both psychosis transition and functioning outcomes have repeatedly been best predicted by suspicion/paranoia, delusions, and social functional decline (Worthington, Cao, & Cannon, Reference Worthington, Cao and Cannon2019). Furthermore, we have previously found that, in a general adolescent psychiatric population, the ability of the dichotomous CHR status to predict psychosis was poor; positive symptom intensity predicted psychotic disorders, whereas CHR status predicted psychiatric hospitalization (Lindgren et al., Reference Lindgren, Manninen, Kalska, Mustonen, Laajasalo, Moilanen and Therman2014). We now used a longer follow-up and previously published prediction algorithms based on other samples to predict clinical outcomes in a confirmatory manner.
In the current study, the performance of some of the previously suggested psychosis prediction models was tested to determine the generalizability of previous findings to clinical practice (1) in a psychiatric sample consisting of both CHR and non-CHR adolescents, and (2) in the CHR subsample. In addition, the models were used to predict functional outcome, operationalized as first psychiatric hospitalization. Finally, we also fitted exploratory models to identify the best predictors in the current sample.
Methods
Participants and study procedure
The Helsinki Prodromal Study is a prospective psychosis risk study among adolescent in psychiatric care. All 15‒18-year-olds entering any public psychiatric clinic or ward in the city of Helsinki during a 3-year period in 2003‒2004 and 2007‒2008 were invited to fill out the Finnish version of the Prodromal Questionnaire (PQ; Loewy, Bearden, Johnson, Raine, and Cannon, Reference Loewy, Bearden, Johnson, Raine and Cannon2005) on their first or second appointment. A total of 819 of questionnaires (~75% of the eligible participants in psychiatric treatment) were completed (Therman et al., Reference Therman, Lindgren, Manninen, Loewy, Huttunen, Cannon and Suvisaari2014). Those who had a current or previous psychotic disorder at baseline or were unable to communicate in Finnish were excluded from the sample.
The review boards of the Finnish Institute for Health and Welfare and the Ethics Committee of the Hospital District of Helsinki and Uusimaa approved the study procedure. The study was carried out in accordance with the sixth version of the Declaration of Helsinki (World Medical Association, 2000). The PQ was part of the standard assessment in the units. Those entering the in-depth assessment gave written informed consent.
In-depth assessment
All 145 adolescents who scored 18 or higher on the Positive Symptoms subscale of the PQ, indicating elevated psychosis risk (Loewy et al., Reference Loewy, Bearden, Johnson, Raine and Cannon2005), were invited to the in-depth baseline assessment, together with 87 block-randomized adolescents who scored below this cut-off. Of these invited participants, 174 adolescents completed the in-depth assessment, which included extensive neuropsychological testing (Lindgren et al., Reference Lindgren, Manninen, Laajasalo, Mustonen, Kalska, Suvisaari and Therman2010), of which the current study utilizes raw scores of WAIS-III Matrix Reasoning (Wechsler, Reference Wechsler1997), measuring non-verbal reasoning, as well as WMS-R Visual Reproduction I (Wechsler, Reference Wechsler1987), measuring visual episodic memory. In addition, the adolescents taking part in the in-depth assessment were interviewed with the Finnish translation of the SIPS 3.0 (Miller et al., Reference Miller, McGlashan, Rosen, Cadenhead, Cannon, Ventura and Woods2003). The SIPS also includes the modified Global Assessment of Functioning (GAF-M) scale, rated from 1 to 100 (Miller et al., Reference Miller, McGlashan, Rosen, Cadenhead, Cannon, Ventura and Woods2003). Medical records preceding the psychiatric treatment were available for most interviewed participants (97.7%). The Global Functioning: Social scale (Cornblatt et al., Reference Cornblatt, Auther, Niendam, Smith, Zinberg, Bearden and Cannon2007) was rated from 1 to 10 based on interview data and medical records. Substance abuse and duration of psychiatric symptoms before treatment contact were also rated based on all available information. All interview- and record-based ratings were made by two or more trained reviewers independently, and differences resolved by consensus.
Follow-up
Follow-up information was collected from the national Care Register for Health Care, which includes psychiatric treatment in any public outpatient or inpatient clinic or ward in Finland. As the last participants were enrolled in the year 2008 and the data on the use of psychiatric services were available until the end of the year 2015, a 7-year follow-up was possible for all participants. A psychosis outcome was defined as a psychotic disorder diagnosis of ICD-10 codes F20, F22–F29, F30.2, F31.2, F31.5, F32.3, or F33.3. The psychiatric hospitalization outcome was defined as staying in a psychiatric hospital or in any hospital with a primary psychiatric diagnosis during the follow-up period, excluding those with any such hospitalizations before or at the time of the baseline assessment.
Prediction models
The prediction models were selected from exhaustive reviews by Studerus et al. (Reference Studerus, Ramyead and Riecher-Rössler2017) and Montemagni, Bellino, Bracale, Bozzatello, and Rocca (Reference Montemagni, Bellino, Bracale, Bozzatello and Rocca2020). They accepted studies that developed or validated a multivariable prediction model on psychosis transition among people estimated to be at high risk for psychosis, some of the studies proposing more than one prediction model. In the current study we did not test models from small samples (N < 100) or from studies finding no evidence supporting any tested model. As we were only able to test models with variables available in the Helsinki Prodromal Study, we included 19 prediction models (Table 1).
a In the Ruhrmann model, education was left out because of lack of variance in the current adolescent sample. Schizotypal personality disorder criteria were not met anyone in the sample, but a cutoff of at least three out of nine schizotypal personality features was used instead (11 adolescents meeting this criterion). The model included a criterion of sum of positive symptoms >16, but as only one person fulfilled this criterion, the model was run with this predictor replaced by the SIPS positive symptoms sum (used in other models) and these results are presented as ‘Ruhrmann modified’.
b In the Velthorst model, GAF < 61 predicted psychosis in the opposite direction than was meant to, so also a ‘Velthorst modified’ model was calculated without this predictor.
c The Walder model originated from a male sample, but here it was used for both genders.
If the original model used CAARMS or Scale for the Assessment of Negative Symptoms (SANS) scales, we used the SIPS equivalent and thus, for example, CAARMS ‘Unusual thought content’ was substituted with SIPS scale ‘P1 Unusual thought content and delusional ideas’, and SANS ‘Attention’ was considered equivalent to SIPS scale ‘D3 Trouble with focus and attention’.
Some models used an equation such as sum of predictor scores; these models were used both as an equation as well as using the predictors individually (Perkins et al., Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015; Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010; Thompson, Nelson, & Yung, Reference Thompson, Nelson and Yung2011). Furthermore, the prediction model proposed by Ruhrmann et al. (Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010) included a criterion of positive symptoms >16; the value of this parameter, however, was questionable in our sample, as only one person scored this high. The model was therefore also estimated with this dichotomous predictor replaced by the SIPS positive symptoms sum, presented separately as a modified model. The models used by Velthorst et al. (Reference Velthorst, Nelson, Wiltink, De Haan, Wood, Lin and Yung2013) used both baseline functioning level and decline in functioning level over the follow-up, controlling for symptoms, however, we only tested the model with baseline functioning and symptom levels, as functioning decline at the outcome time point is confounded with the outcome itself. We also estimated the model without the general functioning variable (Velthorst modified model), due to that variable when estimated having an opposite sign to that of the original publication. In addition, information on alcohol use in our sample was limited to alcohol use and dependency, and we coded their absence as ‘low alcohol use’ for testing the model by Buchy, Perkins, Woods, Liu, and Addington (Reference Buchy, Perkins, Woods, Liu and Addington2014), which used a wider spectrum of severity of alcohol use. For two models the required factor score variables were estimated with confirmatory factor analysis (online Supplement S1). As Cannon et al. (Reference Cannon, Cadenhead, Cornblatt, Woods, Addington, Walker and Heinssen2008) had published a wide variety of models without stated preference, we selected the best combinations of both three and four predictors, as measured by the published hazard ratios.
Statistical analysis
The confirmatory factor analyses of two models (Demjaha, Valmaggia, Stahl, Byrne, & McGuire, Reference Demjaha, Valmaggia, Stahl, Byrne and McGuire2012; Raballo, Nelson, Thompson, & Yung, Reference Raballo, Nelson, Thompson and Yung2011) were fitted using Mplus 8.3 (Muthén & Muthén, Reference Muthén and Muthén2017), as described in detail in online Supplement S1.
The performance of each previously proposed model in predicting psychosis transitions and first lifetime hospitalization was tested with penalized (Firth) logistic regression using the logistf package (version 1.24; Heinze and Ploner, Reference Heinze and Ploner2018) in R (version 4.0.4; R Core Team, 2021), with standardized predictors. This method is suitable for small samples. The original coefficients reported by the previous studies were used to calculate the risk predictions when this information was available. In almost all cases, however, the original study had not reported the individual coefficients, and we estimated coefficients based on the current data. To make model comparison fair, we additionally estimated coefficients using the variables of the one model (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010) for which we had the original coefficients. For comparison purposes, we also tested how well CHR status alone (as assessed by the SIPS) predicted the outcomes, and how well previous hospitalization alone predicted rehospitalization during follow-up (in this analysis, those with previous hospitalizations were not excluded, in contrast to all other hospitalization analyses). All regression models using the whole sample were weighted to compensate for participation rate and under-sampling of questionnaire screen-negatives and males (online Supplement S2; Lindgren et al., Reference Lindgren, Manninen, Kalska, Mustonen, Laajasalo, Moilanen and Therman2014). As all the models have been developed within CHR samples, we repeated the analyses in the subsample of CHR participants.
The logarithmic odds ratio is presented for each unstandardized predictor. Akaike information criterion (AIC) values are reported to compare the models, representing the models' quality in the current sample, with a lower AIC value indicating a better fit. The AIC was chosen over the Bayesian information criterion on the recommendation of Vrieze (Reference Vrieze2012). In addition, the discriminative ability of models was quantified with the weighted area under the curve (AUC) statistics, as calculated with the R package PRROC (version 1.3.1; Grau, Grosse, and Keilwagen, Reference Grau, Grosse and Keilwagen2015), and secondarily with Cohen's κ coefficient, obtained with the cohen.kappa function of the R package psych (version 2.0.12; Revelle, Reference Revelle2021), as presented in the online Supplementary workbook.
For diagnostic performance assessment, cut-offs for the predicted values of the logistic regressions were set to obtain a fraction of test-positives equal to or higher than the outcome prevalence at follow-up (prevalence matching), with the exception of models where this would have resulted in no screen-negatives, in which case we chose the value corresponding to the closest lower available prevalence with that combination of variables. Setting the cut-off in this way without post-hoc data-driven optimization reduces bias (Ewald, Reference Ewald2006) and makes models more comparable. Net benefit or relative utility were considered outside the scope of this paper. Calculations based on the confusion matrix were made both with and without the Haldane correction (0.5 observations added to each cell), and with and without sample weights, with the corrected and weighted results considered primary (Brown & White, Reference Brown and White2005). Diagnostic odds ratios (DORs) are reported as the primary single indicator of diagnostic performance, with distribution separation (Cohen's d) and the phi coefficient rφ as secondary indicators.
Finally, we performed exploratory analyses using all the predictors of the proposed models (online Supplement S3), with the exception of the two cognitive variables, as including them would have reduced the number of participants to unacceptably low for the exploratory modeling. As these analyses required complete data, there were 141 cases available for the psychosis outcome and 120 for the hospitalization outcome. As recommended by Studerus et al. (Reference Studerus, Ramyead and Riecher-Rössler2017), we employed LASSO for these exploratory analyses. Specifically, we used the function glmnet of the R package glmnet (version 4.1-1; Friedman, Hastie, and Tibshirani, Reference Friedman, Hastie and Tibshirani2010), fitting weighted logistic regression models with default settings, allowing up to four coefficients, which was the maximum used in the previously published models, with the exception of the five-parameter Ruhrmann et al. model (Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010). All analysis parameters were set to their defaults.
Results
The sample consisted of 153 adolescents who were starting psychiatric treatment (Fig. 1), most often with a mood disorder diagnosis. Of these participants, 120 (78.4%) were girls (which is representative of the gender distribution of patients in these units). The mean age was 16.5 years (range 15–18), and 10 (6.5%) were inpatients at baseline, the rest being outpatients. Cognitive data were available for 146 adolescents and this smaller sample was used when testing models with cognitive variables.
During the 7-year follow-up, 18 (11.8%) adolescents were diagnosed with a psychotic disorder (11 of whom were CHR at baseline) and 25 (16.3%) had entered their first psychiatric hospital care. We predicted these outcomes with separate models. We then applied the same models to the CHR subsample of 50 adolescents (of whom 40, 80.0% were girls), predicting the 11 psychotic disorders and 11 first psychiatric hospitalizations among them.
Psychosis transition
Most of the models predicted psychosis better than the CHR status alone (Table 2). There were considerable differences in predictive ability between the models, with one of the best performing models using the positive symptom sum score only (Auther et al., Reference Auther, McLaughlin, Carrión, Nagachandran, Correll and Cornblatt2012). The sample-fitted Velthorst model had good AIC and AUC values but having poor general functioning (GAF < 61) predicted not developing psychosis, that is, in the opposite direction of the original model. The modified Velthorst model without the GAF predictor (using merely sums of positive and negative symptoms) was still the best performing model. Other models with good AIC values consisted of the positive symptom sum score and the social functioning scale (Walder et al., Reference Walder, Holtzman, Addington, Cadenhead, Tsuang, Cornblatt and Walker2013) and the negative symptoms sum score (Piskulic et al., Reference Piskulic, Addington, Cadenhead, Cannon, Cornblatt, Heinssen and McGlashan2012), respectively.
AIC, Akaike information criterion; AUC, area under the curve; APS, Attenuated Psychotic Symptoms risk group; CHR, clinical high-risk; GAF, Global Assessment of Functioning; GRD, genetic risk and deterioration (functional decline) risk group; HR, hazard ratio; log OR, logarithmic odds ratio; SIPS, Structured Interview for Prodromal Syndromes.
a For the whole model. The five best values in the AIC and AUC columns are in boldface.
b Models presented as an equation such as sum of the predictor scores were used both as an equation as well as using the predictors individually.
c The Ruhrmann model included a criterion of sum of positive symptoms >16, but as only one person fulfilled this criterion, the model was run with this predictor replaced by the SIPS positive symptoms sum (used in other models) and these results are presented as ‘Ruhrmann modified’ model.
d Ruhmann equation: 1.571 × (SIPS positive symptoms sum >16) + (0.865 × SIPS D2 >2) + (0.793 × SIPS G1 >2) + (1.037 × SIPS schizotypal personality) + (0.033 × (100 − GAF 12 month max. − 34.64)).
e SIPS negative symptoms sum, SIPS positive symptoms sum, SIPS D2 >2, SIPS positive symptoms sum >16.
The discriminability order of the prediction models differed somewhat from that of the model fit, with a natural advantage for multi-predictor models. Discriminative ability also varied considerably between the models. The highest AUC estimates between 0.71 and 0.75 were observed for models that used the presence of marked positive symptoms, bizarre thinking, sleep disturbance, and schizotypal personality, along with the general functioning score (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010); intensity of delusions, suspiciousness, decreased ideational richness, and trouble with attention (Perkins et al., Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015); or the negative symptoms sum score (Piskulic et al., Reference Piskulic, Addington, Cadenhead, Cannon, Cornblatt, Heinssen and McGlashan2012). The original and modified Velthorst models (2013) were also among the models with best AUC values. Figure 2 shows the discrepancy between AIC and AUC values of the models in predicting psychosis, using the whole sample.
When the prevalence-matched cut-offs were applied, the most effective classifier, the Thompson et al. (Reference Thompson, Nelson and Yung2011) single-criterion model, reached a DOR of 10, with a sensitivity of 26% and a specificity of 97%. For comparison, the geometric mean of the other models' DORs was 4.8, and the SIPS CHR criterion had a DOR of 3.7 (online Supplementary workbook).
The models performed a little better in the CHR subsample than in the unselected psychiatric sample. In the CHR subsample, the two best models had both the best model fit and high discriminability (AUC between 0.81 and 0.83), using the disorganization symptoms latent factor (Raballo et al., Reference Raballo, Nelson, Thompson and Yung2011) and delusions, social functioning, and substance abuse (Cannon et al., Reference Cannon, Cadenhead, Cornblatt, Woods, Addington, Walker and Heinssen2008), respectively. In addition, among the best models were those consisting of the sums of positive and negative symptoms (modification of Velthorst et al., Reference Velthorst, Nelson, Wiltink, De Haan, Wood, Lin and Yung2013), delusion intensity and visual reasoning score (Lin et al., Reference Lin, Yung, Nelson, Brewer, Riley, Simmons and Wood2013), or positive symptoms, bizarre thinking, sleep disturbance, schizotypal personality, and functioning (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010).
With cut-offs specified (matching the 22% transition rate in this subsample) the Perkins et al. (Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015) model with sample-estimated parameters obtained a DOR of 13, in contrast to a geometric mean of 5.1 for the other models.
First hospitalization
Hospital care for any psychiatric disorder was best explained by models including presence of marked positive symptoms, bizarre thinking, sleep disturbance, schizotypal personality, and functioning (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010), or delusion intensity and nonverbal reasoning score (Lin et al., Reference Lin, Yung, Nelson, Brewer, Riley, Simmons and Wood2013), both models having the best AIC and AUC values. In addition, the model with the disorganized symptoms latent factor (Raballo et al., Reference Raballo, Nelson, Thompson and Yung2011) had one of the best AIC values, and a high AUC value was estimated for a model using intensity of delusions, suspiciousness, decreased ideational richness, and trouble with attention (Perkins et al., Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015). However, the existing models had quite poor predictive abilities. As a comparison, previous hospitalization explained rehospitalizations better than most of these models predicted the first hospitalization (online Supplement S4).
When applied only to the CHR subsample, hospitalizations in our sample were best predicted by positive symptoms, bizarre thinking, sleep disturbance, schizotypal personality, and functioning (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010), disorganized communication (DeVylder et al., Reference DeVylder, Muchomba, Gill, Ben-David, Walder, Malaspina and Corcoran2014), or delusions and nonverbal reasoning (Lin et al., Reference Lin, Yung, Nelson, Brewer, Riley, Simmons and Wood2013) (online Supplement S4). As assessed with the AUC, also models consisting of delusions, suspiciousness, decreased ideational richness, and trouble with attention (Perkins et al., Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015), or genetic risk and deterioration (GRD) syndrome, delusions, suspiciousness, and substance abuse (Cannon et al., Reference Cannon, Cadenhead, Cornblatt, Woods, Addington, Walker and Heinssen2008) were among the most discriminative models.
Exploratory models
The exploratory model results can be found in the online Supplement S3. Online Supplementary Fig. S1 illustrates the standardized coefficients predicting psychosis and hospitalizations with the LASSO at the relevant ranges of λ.
Psychosis transition was best predicted by the SIPS negative symptoms sum score, the SIPS positive symptoms sum score, having a positive symptoms sum score >16, and the presence of bizarre thinking (using a cut-off score of >2 on SIPS D2). Unstandardized coefficients gave similar results. The performance of the one-, two-, and three-predictor exploratory models can be found in online Supplement S3. The four- and three-parameter exploratory models were not substantially better than the two-parameter model including only the negative and positive symptom sum scores (all AUC = 0.72 and DOR = 10) (Table 2; online Supplementary workbook).
Hospitalizations were predicted by low functioning (using a cut-off score of <50 on the GAF), the Demjaha et al. negative symptom factor score, presence of delusions, and sleep problems (using a cut-off of >2 in SIPS G1) (online Supplement S3). The three- and four-predictor exploratory models had AUC values of 0.70 and 0.72, respectively, and the same DOR of 3.1 (online Supplement S4 and Supplementary workbook).
Conclusions
This study tested previously published psychosis prediction models chosen from the exhaustive reviews by Studerus et al. (Reference Studerus, Ramyead and Riecher-Rössler2017) and Montemagni et al. (Reference Montemagni, Bellino, Bracale, Bozzatello and Rocca2020). Numerous prediction models have been suggested to improve prediction of psychosis incidence beyond CHR status alone, using different combinations of symptoms, functioning, neurocognition, and substance use. However, these models have seldom been tested or externally validated. The models have been developed in CHR clinic settings, and we wanted to test their generalizability in a general adolescent psychiatric context.
The models predicted psychosis better than CHR status alone, but not as well as in the original studies. Comparison is difficult, however, as the overall predictive model accuracy has seldom been reported. The best models used sum of positive and negative symptoms (Velthorst et al., Reference Velthorst, Nelson, Wiltink, De Haan, Wood, Lin and Yung2013) or just the positive symptom sum score (Auther et al., Reference Auther, McLaughlin, Carrión, Nagachandran, Correll and Cornblatt2012). Among other best performing models in our sample were the ones with severity of positive symptoms and level of social functioning (Walder et al., Reference Walder, Holtzman, Addington, Cadenhead, Tsuang, Cornblatt and Walker2013) or severity of negative symptoms (Piskulic et al., Reference Piskulic, Addington, Cadenhead, Cannon, Cornblatt, Heinssen and McGlashan2012) as predictors. Additionally, the best discriminating models used the severity of positive symptoms, bizarre thinking, level of functioning, sleep problems, and schizotypal traits (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010), or delusions, suspiciousness, decreased ideational richness, and trouble with attention (Perkins et al., Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015). Many of the best models thus avoided dichotomization of baseline variables and explicitly included the severity of positive and negative symptoms. Our results are in line with a recent review, summing prognostic evidence among CHR individuals, and highlighting the significance of baseline severity of positive and negative symptoms as well as functional level as predictors of psychosis (Fusar-Poli et al., Reference Fusar-Poli, Salazar De Pablo, Correll, Meyer-Lindenberg, Millan, Borgwardt and Arango2020). The highest discriminability estimates were in the range of AUC = 0.71–0.82, which exceeds the threshold of a prediction model that can be considered as clinically useful (AUC = 0.7 or even AUC = 0.8; Schummers, Himes, Bodnar, and Hutcheon, Reference Schummers, Himes, Bodnar and Hutcheon2016).
Because a psychosis risk state may be more indicative of imminent deterioration in functional outcome or severity of general psychopathology than transition to psychosis, prediction of first psychiatric hospitalization was additionally used as a clinically relevant proxy. ‘Psychosis risk symptoms’, especially in their milder form, are not specific to the psychosis prodrome, but appear as markers of unspecific psychiatric symptomatology and lowered functioning (Healy et al., Reference Healy, Brannigan, Dooley, Coughlan, Clarke, Kelleher and Cannon2019; Trotta et al., Reference Trotta, Arseneault, Caspi, Moffitt, Danese, Pariante and Fisher2020; Werbeloff et al., Reference Werbeloff, Drukker, Dohrenwend, Levav, Yoffe, van Os and Weiser2012; Wigman et al., Reference Wigman, van Nierop, Vollebergh, Lieb, Beesdo-Baum, Wittchen and van Os2012). Predicting transdiagnostic functional outcomes of these symptoms may therefore be useful in detecting those at most in need of help. Although the models were originally not used to predict hospitalization, they seemed to predict this kind of nonspecific functional decline to a certain extent. The same proposed risk indicator combinations that predicted psychosis best also often were the best in predicting hospitalization for psychiatric disorder: positive symptoms, bizarre thinking, sleep disturbance, schizotypal personality, and general functioning (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010), delusional ideas and visual reasoning (Lin et al., Reference Lin, Yung, Nelson, Brewer, Riley, Simmons and Wood2013), the disorganization symptom factor (Raballo et al., Reference Raballo, Nelson, Thompson and Yung2011), or delusions, suspiciousness, decreased ideational richness, and attention (Perkins et al., Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015) were among the best models for predicting hospitalization. Although baseline negative symptoms emerged as significant predictors of psychosis, they did not seem to be associated with later hospitalizations in the same way. The best single predictor of rehospitalization was, unsurprisingly, previous hospitalization.
Psychosis risk predictions may work at an acceptable level only in very enriched populations, diminishing their usefulness in clinical use (van Os & Guloksuz, Reference van Os and Guloksuz2017). The original models predicted psychosis specifically in CHR populations suspected to have psychosis risk symptoms already at referral, and many false positives have been anticipated to emerge if used in non-selected clinical populations (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010). The current sample consisted of adolescents in their initial psychiatric help-seeking phase, who were at somewhat heightened risk for psychosis based on the questionnaire screening, but also included screen-negatives, and the use of weights allows for generalization to the whole base population of adolescents in psychiatric care. Psychosis outcomes are less likely within this sample than in psychosis risk clinics, and it was thus expected that the accuracy of the models would be reduced. However, we also tested the predictiveness of the models in our CHR subsample, and the models still performed worse than in the samples they were derived from, despite sample-derived parameters. Among the CHR adolescents, a common feature of the best performing models in predicting psychosis was including the intensity of positive symptoms, and especially delusional thought. Psychotic illnesses in the CHR group were best explained by models using the disorganization symptom factor by Raballo et al. (Reference Raballo, Nelson, Thompson and Yung2011) consisting of odd behavior, cognitive difficulties, disorganized communications, and delusions, and a model combining delusions, social functioning, and substance abuse (Cannon et al., Reference Cannon, Cadenhead, Cornblatt, Woods, Addington, Walker and Heinssen2008). Using delusion intensity and visual reasoning score as predictors (Lin et al., Reference Lin, Yung, Nelson, Brewer, Riley, Simmons and Wood2013) best discriminated CHR adolescents with and without psychosis during follow-up. First psychiatric hospitalizations in the CHR group were best explained by models combining positive symptoms, bizarre thinking, sleep disturbance, schizotypal personality, and functioning (Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010).
Using the predictors individually with weights estimated from the current data (sample-optimized version) (Perkins et al., Reference Perkins, Jeffries, Cornblatt, Woods, Addington, Bearden and McGlashan2015; Ruhrmann et al., Reference Ruhrmann, Schultze-Lutter, Salokangas, Heinimaa, Linszen, Dingemans and Klosterkötter2010; Thompson et al., Reference Thompson, Nelson and Yung2011) led to better prediction values compared to using the original equations. A modified version of the Ruhrmann model was additionally used where the sum of positive symptoms was used to replace the dichotomized >16 criterion which was rare in the sample, leading to best fit. The Ruhrmann model also had to be modified because of the lack of variance in education and schizotypal personality disorder in our adolescent sample (Table 1).
That the cut-off of functioning in the Velthorst model predicted psychosis in the ‘wrong direction’ stresses how the predictors depend on the sample used, and in this sample the adolescents with higher baseline functioning were more likely to develop a psychotic disorder. This is also an example of overfitting, highlighting the need for replicable models with published parameters.
In the exploratory analyses using the predictors proposed by the previous studies, psychotic disorders were best predicted by a combination of high overall positive and negative symptoms scores, with a larger weight on the negative symptoms (this was due to the penalty applied in the LASSO, in contrast to the modified Velthorst model, where the weights were approximately equal). There was a modest improvement in model fit when including the third and fourth criteria (at least moderately severe bizarre thinking and exceptionally severe positive psychotic-like symptoms), but adding these predictors did not result in better prediction rates. Our results are in line with a recent meta-analysis among CHR individuals which suggested that despite the large number of putative risk factors evaluated, only positive symptoms and level of functioning predicted transition to psychosis with highly suggestive evidence, whereas negative symptoms showed suggestive evidence, and no other factor showed convincing evidence (Oliver et al., Reference Oliver, Reilly, Baccaredda Boy, Petros, Davies, Borgwardt and Fusar-Poli2020). In another study, employing machine learning among UHR individuals, psychosis was predicted by unusual thought content, severity of positive symptoms, and level of functioning (Mechelli et al., Reference Mechelli, Lin, Wood, McGorry, Amminger, Tognin and Yung2017). This previous study also predicted functional outcome with the same symptom measures, finding that it was best predicted by attention deficits, anhedonia-asociality, and unusual thought content (Mechelli et al., Reference Mechelli, Lin, Wood, McGorry, Amminger, Tognin and Yung2017), which were somewhat different from the variables best predicting hospitalization in the present study. In our sample, psychiatric hospitalization in the upcoming years was best predicted by serious impairment in functioning, Demjaha's negative symptom factor, delusions, and baseline sleep problems. Demjaha's negative symptom factor was most heavily loaded on avolition, blunted expression of emotion, impaired role functioning, social anhedonia, dysphoria, and odd behavior. Sleep disturbance, on the other hand, is an unspecific general symptom, common in psychiatric illnesses, which may independently add risk to worsening mental health as well as act as a marker of underlying problems, related to both sleep and mental health.
In the current study, comprehensive register data were used for the follow-up. Only information from public services was obtained, but private service use is uncommon among adolescents in Finland and there are no private psychiatric hospitals. The same 7-year follow-up time was used for all participants, they were thus followed until age 22‒25. New psychosis cases may still emerge after that age, but for those participants with a longer follow-up time we noticed that only four new transitions emerged in this sample during the next 3 years (data not used).
We used the thorough extensive works by Studerus et al. and Montemagni et al. to select the prediction models. As original coefficients were usually not available, the coefficients were based on the moderate-sized current sample, limiting the comparability with the original studies. SIPS equivalents were substituted for CAARMS variables, but these two methods have been found highly comparable (Fusar-Poli et al., Reference Fusar-Poli, Cappucciati, Rutigliano, Lee, Beverly, Bonoldi and McGuire2016a). We studied adolescents, although most of the original studies also included adults, or only adults. The positive symptoms of adolescents are more transitory than those of adults (Gerstenberg et al., Reference Gerstenberg, Theodoridou, Traber-Walker, Franscini, Wotruba, Metzler and Heekeren2016; Welsh & Tiffin, Reference Welsh and Tiffin2014) and caution is recommended when assessing psychosis risk among adolescents (Schultze-Lutter et al., Reference Schultze-Lutter, Michel, Schmidt, Schimmelmann, Maric, Salokangas and Klosterkötter2015), but psychosis risk has been found a useful concept also in help-seeking adolescents (Spada et al., Reference Spada, Molteni, Pistone, Chiappedi, McGuire, Fusar-Poli and Balottin2016; Ulhaq, Thevan, & Adams, Reference Ulhaq, Thevan and Adams2017). There have also been studies predicting psychosis with baseline diagnostic categories both in routine secondary mental health care (Fusar-Poli et al., Reference Fusar-Poli, Rutigliano, Stahl, Davies, Bonoldi, Reilly and McGuire2017) and in the general population (Guloksuz et al., Reference Guloksuz, Pries, ten Have, de Graaf, van Dorsselaer, Klingenberg and van Os2020), but these models were not included in the systematic reviews used.
Common methodological problems have been noticed in psychosis prediction studies (Fusar-Poli et al., Reference Fusar-Poli, Cappucciati, Rutigliano, Schultze-Lutter, Bonoldi, Borgwardt and McGuire2015; Steyerberg & Vergouwe, Reference Steyerberg and Vergouwe2014; Studerus et al., Reference Studerus, Ramyead and Riecher-Rössler2017) including overfitting of models, insufficient reporting of analysis procedures and results, and lack of commonly used and easily comparable indicators. All these problems were not completely avoided in the current study; a major limitation is our sample size, which makes overfitting of models a concern. This is especially relevant in relation to models which have many predictors, and was exemplified by the Velthorst et al. (Reference Velthorst, Nelson, Wiltink, De Haan, Wood, Lin and Yung2013) model. Furthermore, categorical scales such as the SIPS subscales (non-psychotic range 0–5) had to be treated as linear in our logistic models; this limitation was offset, however, by several models specifying cut-offs.
To conclude, the applicability and generalizability of psychosis prediction models was found to be only moderate in a general psychiatric sample of adolescents. We were not able to generalize the predictiveness of the majority of previous models which were based on samples from specialized psychosis risk clinics. A clinically significant functional outcome could, however, be partially predicted by models developed for psychosis risk detection, highlighting the importance of assessing psychosis risk factors. Transition to psychosis was best predicted by a parsimonious combination of positive and negative symptom total severity.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291721001938
Data
Data are from the Helsinki Prodromal Study at the Finnish Institute for Health and Welfare. Sharing of the data is possible with research collaborations if it is in agreement with the consent given by the participants and with the General Data Protection Regulation (GDPR) and other applicable law. Collaborations require a separate agreement and local ethical committee approval.
Acknowledgements
The authors wish to thank all the adolescents and clinicians who participated in the Helsinki Prodromal Study, and Marjut Grainger for her contribution to data management.
Financial support
This study was supported by the Jalmari and Rauha Ahokas Foundation (ML) and the Academy of Finland (No. 311578 to MJ, No. 317363 to ST). The funding sources had no role in the study design, in the collection, analysis, or interpretation of data, in writing of the paper, or the decision to submit the article for publication.
Conflict of interest
None.