Background
Lithium remains a commonly used first-line treatment for bipolar disorder,Reference Miura, Noma and Furukawa1–Reference Malhi, Adams and Berk3 highly effective for both acute manic episodesReference Cipriani, Barbui and Salanti4,Reference Yildiz, Vieta, Leucht and Baldessarini5 and maintenance treatment.Reference Malhi, Adams and Berk3 However, in around 65% of acute manic presentations, response is incomplete and 35% of patients do not respond to treatment at all.Reference Miller, Tanenbaum, Griffin and Ritvo6,Reference Machado-Vieira, Luckenbaugh and Soeiro-de-Souza7 In maintenance treatment, approximately 30% of patients report an excellent long-term response, around 30% report an intermediate response and 30% respond poorly.Reference Hou, Heilbronner and Degenhardt8 In addition, lithium has a range of serious acute and chronic side-effects, including increased risk of renal failure and suppression of thyroid and parathyroid function.Reference Malhi, Tanious, Das and Berk2 Moreover, lithium can be toxic at high doses so plasma levels often need to be monitored.Reference Schubert and Wisdom9 These varying response rates and side-effect profiles suggest the need to better tailor lithium treatment for individual patients, ensuring timely prescription of the right drug for the right patient at the right time. Better understanding of the link between genetic and clinical factors may assist in achieving this personalised approach.
Regarding associated clinical factors, variables reflecting an episodic pattern of mania–depression intervals, a later age at onset and fewer hospital admissions preceding treatment have shown significant associations with lithium response.Reference Kleindienst, Engel and Greil10,Reference Tighe, Mahon and Potash11 Using a large range of clinical factors, Nunes et alReference Nunes, Ardau and Berghöfer12 demonstrated the ability of machine-learning models to classify lithium responders from non-responders. Beyond these clinical factors, a genetic basis to lithium response has also been found. In genome-wide association studies (GWASs), multiple genetic variants have been associated with lithium response. However, the effects of these variants have been too small to facilitate lithium response prediction.Reference Hou, Heilbronner and Degenhardt8 Combining these variants into polygenic risk scores (PRS) has improved their performance, however, these scores still only explain ~1% of variance in lithium response.Reference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13
Given this biopsychosocial basis to lithium response, one solution may be to combine clinical factors with PRS to improve lithium response prediction. This genotype–phenotype approach was recently used by Antonucci et alReference Antonucci, Pergola and Pigoni14 to classify patients with schizophrenia (SCZ) from healthy controls. Using environmental and genetic data, they a priori stratified patients into tertiles (thirds) based on decision scores from two support vector machines. Making predictions with patients in the lower and upper tertiles led to an increase in balanced accuracy from 77.7% to 89.4%.
In the context of lithium response prediction, Amare et al found evidence for a non-linear stratified relationship between SCZ PRS and lithium response.Reference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13 In particular, patients in the lowest decile of the SCZ PRS distribution were 3.46 times more likely to be lithium responders when compared with patients in the tenth decile. In addition, Amare et al also found that patients with bipolar disorder with a low polygenic load for major depressive disorder (MDD) were more likely to respond to lithium treatment, with the largest differences observed between the quartiles of the MDD PRS distribution.Reference Amare, Schubert and Hou15 This transdiagnostic and polygenic basis to lithium response is not without precedent. For example, recent studies have shown significant genetic overlap and shared biological pathways between SCZ, MDD and bipolar disorder.Reference Lee, Ripke and Neale16,Reference Maier, Moser and Chen17 Exploiting this genetic overlap, Maier et alReference Maier, Moser and Chen17 used multitrait models to utilise correlations between, as well as a participant's individual risk for SCZ, MDD and bipolar disorder. This multivariate approach led to an equivalent increase in sample size of 34% for SCZ, 68% for bipolar disorder and 76% for MDD when compared with single trait models.
Aims
Given these findings, we have conducted a range of analyses to test the predictive ability of combined transdiagnostic genetic and clinical data for lithium response prediction. First, to measure the predictive contribution of PRS alongside clinical variables, we trained both uni- and multimodal prediction models of lithium response in patients with bipolar disorder. Second, we trained uni- and multimodal models containing interaction terms within and across variables from each data modality to measure non-linear and biopsychosocial effects between each modality. Third, to measure the effects of a patient's PRS loadings on clinical model accuracy,Reference Antonucci, Pergola and Pigoni14 we used MDD and SCZ PRS, as well as a combined MDD and SCZ meta-PRS to a priori stratify patients according to their polygenic loadings prior to the supervised prediction of lithium response with clinical data. This approach was then directly compared with the traditional method of including PRS and clinical data modalities together in a single predictive model.Reference Cearns, Opel and Clark18 Finally, to test the effects of model linearity on prediction, we compared the use of linear and non-linear machine-learning models for regression analyses and validated all findings on a geographically stratified test set. We then assessed the best performing models in a classification framework.
Method
The International Consortium on Lithium Genetics
The International Consortium on Lithium Genetics (ConLi+Gen, www.ConLiGen.org) is an initiative by the National Institute of Mental Health and the International Group for the Study of Lithium-Treated Patients (www.IGSLI.org) and was established with the aim of studying the genetic basis of lithium treatment response in patients with bipolar disorder.Reference Schulze, Alda and Adli19 The ConLi+Gen study involved patients with bipolar disorder from Europe, South America, USA, Asia and AustraliaReference Malhi, Bassett and Boyce20 who have been treated with lithium. A series of quality control procedures were implemented on the genotype data before and after imputation as described below. Sample characteristics have been published in previous works.Reference Hou, Heilbronner and Degenhardt8,Reference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13 This study used consortium data through an international collaboration.
The University of Heidelberg Ethics Committee provided central ethics approval for the consortium. Written consent was obtained from each patient according to the study protocols of the participating cohorts.
Computing PRS
PRS were computed for people with SCZ and MDD, two traits previously associated with lithium response.Reference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13,Reference Amare, Schubert and Hou15 Each PRS was calculated using individual genetic data from ConLi+GenReference Hou, Heilbronner and Degenhardt8 and summary statistics from the previous largest GWASs available for MDDReference Wray, Ripke and Mattheisen21 and SCZ.22 PRS were calculated at different GWAS P-value thresholds, however, the best predictive score was selected for each trait based on previous analysis. More details on PRS calculation, genotyping, imputation and quality control steps can be found in previous publicationsReference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13,Reference Amare, Schubert and Hou15,Reference Amare, Schubert, Klingler-Hoffmann, Cohen-Woods and Baune23 and in the Supplementary Methods available at https://doi.org/10.1192/bjp.2022.28.
Study participants
As our aim was to assess both uni- and multimodal regression models of lithium response, a requirement for inclusion was complete PRS data and no more than 20% missingness on clinical predictors. Any clinical predictors above this threshold were removed (Missing data table in Supplementary Table 1). Only ConLi+Gen GWAS 1 was used for analysis (n = 1163) as it contained both clinical and genetic data, whereas GWAS 2 only contained genetic data. Fifteen demographic, clinical, substance use and comorbid psychiatric illness predictors were used in analyses. To ensure geographic homogeneity, only samples of European descent were used (Halifax n = 240, University California San Diego n = 216, Cagliari n = 196, Poznan n = 97, Wuerzburg n = 91, Geneva n = 46, Prague n = 45, Dresden n = 43, National Institute for Mental Health = 36, John Hopkins University n = 24), leaving a final sample of n = 1034 patients.
To ensure the unbiased approximation of the model's generalisability to new patients, we partitioned 33% (n = 342) of our sample into a holdout set for model testing, leaving 692 observations for training and validation. This partitioning was stratified by data-collection site to ensure the same distribution of sites across partitions.
Regression and classification target
Lithium treatment response was assessed using the validated ‘Retrospective Criteria of Long-Term Treatment Response in Research Subjects with Bipolar Disorder’ scale, also known as the Alda scale.Reference Manchia, Adli and Akula24 To arrive at a total Alda score, this scale measures symptom improvement over the course of treatment (A score, range 0–10), which is then weighted against five criteria (B score) that assesses the quality of evidence for the response score.Reference Hou, Heilbronner and Degenhardt8 For the predictive regression analyses, the total Alda score was used.Reference Scott, Etain and Manchia25 For a subset of models assessed in a classification framework, patients with a score ≥7 were coded as responders, whereas patients scoring <7 were coded as non-responders.Reference Scott, Etain and Manchia25
Unimodal and multimodal machine-learning pipelines
On our training sample of 692 patients, we fit pipelines that conducted imputation, polynomial feature engineering (interaction terms only), standardisation, feature selection, hyperparameter optimisation, and the fitting of linear regression (regularised with ridge and the elastic net) and random forest models. For the unimodal linear regression models, we imputed predictors using multivariate imputation by chained equations (MICE) with the ten nearest predictors used in the imputation process.Reference Azur, Stuart, Frangakis and Leaf26 As regularised linear regression models may perform poorly if variables are on different scales, we standardised all predictors to have a mean of 0 and a s.d. of 1. As the number of predictors were low in the unimodal and multimodal PRS and clinical models (PRS 2, clinical 15), all predictors were included in analyses. Finally, we fit a linear regression model with ridge regularisation to the training sample. For the interaction term models, interaction terms between all predictors were engineered in the pipeline prior to feature selection and hyperparameter tuning. As the number of predictor variables grew exponentially in these analyses, we conducted feature selection with the elastic net, a form of penalised regression that removes highly correlated predictors from the model while retaining the most predictive subset for model fitting.Reference Zou and Hastie27
This same procedure was then repeated using an random forest model, however, as random forest models are scale invariant, we did not scale the predictors prior to model training.Reference Biau, Devroye and Lugosi28 Instead of regularised linear predictor selection, we conducted non-linear predictor selection according to the mean decrease in variance provided by each predictor in the random forest model.Reference Biau, Devroye and Lugosi28 For the subset of classification models, equivalent classifiers replaced the regression models in each pipeline.
For all pipelines, we used a random search of 60 iterations to tune model hyperparameters. When less than 60 hyperparameter combinations were present we used an exhaustive grid search. See Supplementary Table 2 for the tuned hyperparameter values. This process and all steps inside the pipelines were completed using leave-site-out cross-validation. This method trains on all data-collection sites minus one. The excluded site is then used to assess the selected features and hyperparameters, with the combination that minimises the root mean squared error on the held-out site selected.Reference Cearns, Hahn, Clark and Baune29 As there were ten collection sites, this equates to tenfold cross-validation for model selection. This site-based stratification protects against the optimisation of hyperparameters and selection of features that may proxy for disparities in feature and outcome distributions across sites and result in ungeneralisable estimates.Reference Cearns, Hahn and Baune30 All tuned and selected models from training and validation were then further tested in the a priori held-out set of 342 patients.
We then re-ran all analyses with clinical variables on patients who were in the lower and upper quartiles of the PRS distributions for MDD, SCZ, and MDD and SCZ combined. See Supplementary Methods. In addition, we ran supplementary analyses to control for sample size effects and changes in the number of predictor variables across analyses (Supplementary Methods).
Results
Cohort characteristics
The final analysis cohort contained 1034 patients with an average age of 47.7 years (s.d. = 14) years and an average age at onset of bipolar disorder of 24.9 years (s.d. = 11). Of these patients, 627 (60.6%) were male and 803 (77.7%) were classified as having bipolar I disorder. The average Alda score for lithium response was 4.3 (s.d. = 3.3) out of 10. See Supplementary Fig. 1 for the full distribution of Alda scores. See Table 1 for more information on participant characteristics.
JHU, John Hopkins University; NIMH,National Institute of Mental Health; UCSD, University California San Diego; MDD, major depressive disorder; OCD, obsessive–compulsive disorder; PTSD, post-traumatic stress disorder.
a. Statistics calculated using one-way ANOVA for continuous variables and Fischer exact tests for categorical variables. All P-values were false discovery rate-corrected using the Benjamini and Hochberg method.
Unimodal and multimodal models
According to the coefficient of determination (R 2), the unimodal linear regression PRS and clinical models explained 1.2% and 1.8% of variance in lithium response, respectively, and the combined multimodal model explained 4.7% of variance in lithium response. Re-running the three models including interaction terms between all variables resulted in 1.4%, 4.5% and 5.1% explained variance. For the non-linear random forest models, the unimodal PRS and clinical models explained 2% and 8.1% of variance in lithium response, and the combined multimodal model explained 7.4% of variance in lithium response. Re-running the three models and including interaction terms between all variables resulted in −0.9%, 6.7% and 5.2% explained variance.
Stratified PRS analyses
For the stratified analysis using patients in the upper and lower quartiles of the MDD PRS distribution, the clinical linear and clinical linear interaction models explained −2.8% and 2.7% of variance in lithium response, whereas the non-linear random forest and random forest interaction models explained 3.5% and 1.8% of variance. For the stratified SCZ PRS analyses, the clinical linear and clinical linear interaction models explained 7.1% and 9% of variance in lithium response, and the non-linear random forest and random forest interaction models explained 7.2% and 9.3% of variance. Finally, for the stratified meta-PRS analyses, the clinical linear and clinical linear interaction models explained 12.1% and 9.2% of variance in lithium response, and the non-linear random forest and random forest interaction models explained 13.7% and 4.5% of variance. All models were statistically significant after false discovery rate (FDR) corrections. See Fig. 1 and Table 2 for all model results.
PRS, polygenic risk scores; MDD, major depressive disorder; SCZ, schizophrenia.
a. Unimodal, multimodal and interaction term predictors spaces were measured using both linear regression (ridge and the elastic net) and random forest regression models. In addition, PRS stratified models composed of MDD PRS, SCZ PRS, and their standardised combinations in the form of a meta-PRS were assessed across model types and feature interaction combinations. Mean and (s.d.) represent the mean (s.d.) from the leave-site-out train and validation procedures. All P-values were false discovery rate-corrected with the Benjamini and Hochberg method.
Completing 1000 runs of the Monte-Carlo sampling procedure to control for decreases in sample size on the stratified meta-PRS model, we attained an average R 2 = 2.7% (s.d. = 5, P = 0.002). Therefore, the superior performance of our meta-PRS stratified model was not explainable by increased performance variability resulting from decreased sample size.Reference Flint, Cearns, Opel, Redlich, Mehler and Emden31 In addition, increases in R 2 were not explained by changes in the number of predictor variables across models (Pearson's r = 0.17, P = 0.44) (Supplementary Fig. 1). See Table 2 for results and Supplementary Tables 3–6 for all model metrics. After controlling for these confounds, the best performing meta-PRS stratified model explained 69% more variance (R 2 = 13.7%, P = 0.0001) than the equivalent model containing no a priori meta-PRS stratification (R 2 = 8.1%, P = 0.0001). In this model, all clinical variables were retained in model selection (See Supplementary Table 7). Re-running these two best performing models in a classification framework led to balanced accuracies of 58.95% and 63.65%, respectively. See Supplementary Table 8 for all classification metrics.
Patient characteristics in the genetically stratified cohort
After FDR corrections, significant differences in clinical characteristics were found between those in quartiles 1 (low meta-PRS load) and 4 (high meta-PRS load) of the combined meta-PRS distribution for binary lithium response (ALDA ≥ 7) (χ2 = 12.214, P = 0.005), bipolar I disorder versus rest (bipolar II disorder and schizoaffective disorder) (χ2 = 12.755, P = 0.005) and DSM diagnosis (bipolar I disorder, bipolar II disorder and schizoaffective disorder) (χ2 = 13.33, P = 0.027).
In quartile 1 of the meta-PRS distribution, 70% had bipolar I disorder, 26% had bipolar II disorder and 4% had schizoaffective disorder. In total, 39% of these patients were lithium responders. In quartile 4, 86% had bipolar I disorder, 12% had bipolar II disorder and 3% had schizoaffective disorder. In total, 22% of these patients were lithium responders.
Overall, those in quartile 1 were 67.7% more likely to be lithium responders compared with those in quartile 4 (odds ratio 1.677, 95% CI 1.14–2.47, P = 0.009). For all other clinical characteristics, including variables that attained nominal significance, see Table 3.
FDR, false discovery rate.
a. Statistic: calculated using independent samples t-tests for continuous variables and χ2 tests for categorical variables. All P-values were FDR-corrected using the Benjamini and Hochberg method.
b. Nominal and FDR-corrected P-values in bold.
Discussion
Main findings
This is the first study to provide evidence for the combined predictive ability of routine clinical data and PRS for lithium response. Specifically, we show that first using PRS to stratify patients according to their polygenic loadings, followed by training with clinical data explains more variance in lithium response and improves model accuracy in a classification setting.Reference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13,Reference Antonucci, Pergola and Pigoni14 Interestingly, the combination of PRS with clinical data performed best in the linear models, but not in the non-linear models. Outside of the best performing stratified meta-PRS model, neither of these multimodal models performed best overall. Moreover, unimodal clinical models outperformed their PRS equivalents.
Interpretation of our findings
This observation of clinical variables outperforming their biological counterparts has been repeatedly demonstrated across a range of multimodal machine-learning studies.Reference Cearns, Opel and Clark18,Reference Dinga, Marquand and Veltman32 The most intuitive explanation is that the small effect sizes yielded by biological variables, when compared with clinical variables, leads to overfitting and/or their lack of selection in cross-validation, resulting in underperformance for biomarker models when tested out of sample.
The next consideration is why effects are smaller for biological variables. To answer this, we need to consider how psychiatric traits are constructed and the implications this has for studies attempting to elucidate a biological basis for psychiatric phenomenon. In comparison with other disorders, psychiatric phenotypes are defined by deviations from normative behaviours and emotional–cognitive experiences, rather than from well-defined physiological processes.Reference Cearns, Hahn and Baune30 Therefore, it is plausible that they bias towards larger effect sizes for clinical variables that correlate with clinical data already used in the construction of phenotypes and illness trajectories. Consequently, this tautology in the formation of diagnostic and prognostic constructs may limit the predictive contribution of biological data. In theory, this problem could be circumvented by first parsing patient heterogeneity at the biological level, and then using clinical variables in secondary analyses.
This rationale informed our stratified analyses where we a priori partitioned patients based on their polygenic loadings for MDD, SCZ and their combination in the form of a standardised meta-PRS. Interestingly, this method was most predictive of lithium response overall, explaining 69% more variance than the equivalent model with no a priori meta-PRS stratification. These results support the view that first parsing biological heterogeneity may improve the prediction of bipolar disorder lithium response with clinical data.
When assessing the best performing non-stratified model, the clinical random forest model, and the best performing overall model, the meta-PRS stratified model, we also observed an increase in model performance in a classification framework, albeit a smaller percentage change. This observation warranted an inspection and interpretation of the lithium response distributions between patients in the low and high meta-PRS quartiles (Supplementary Fig. 1). Here, we observe disproportionate densities of very low (0) and moderate response (5–7) scores for patients with high meta-PRS loadings. Conversely, for patients with low meta-PRS loadings, we observe disproportionate densities of very low (0) and very high response (7–10) scores. Both quartiles of meta-PRS loadings demonstrated high densities of very low response, yet differences between moderate and high response scores were evident. More specifically, patients with low meta-PRS loadings belonged to a continuous bimodal response distribution, whereas those with high meta-PRS loadings appeared to be mixed across the distribution and skewed towards lower response. When dichotomising lithium response, this nuanced understanding between a patient's genetic loadings and lithium response was lost.
This observation is interesting in light of recent work that quantified the asymmetrical reliability of the Alda scale, finding higher interrater reliability in the upper tail of the response distribution.Reference Nunes, Trappenberg, Alda and Genetics33 Therefore, a dichotomous representation of lithium response was generally argued for, even after considering the resultant loss in statistical power. However, rather than deciding a priori to discretise this distribution, an alternative approach would be to tune and select models in a leave-site-out cross-validation framework, as was done in the current work. This is because we would expect to see the highest amount of interrater disagreement between data-collection sites, as purported by Nunes et al.Reference Nunes, Trappenberg, Alda and Genetics33 If it was high enough to warrant a priori discretisation, these across site models would not generalise because of their disagreement in lithium response. However, in the current work nearly all models generalised across sites to the out-of-sample-test sets that were excluded from model construction, demonstrating that the use of leave-site-out cross-validation ensured that each model was tuned to learn parameters and relationships that generalised regardless of any disagreement between raters across sites. In addition, this established that there was enough agreement between raters to learn meaningful, informative and generalisable patterns in the continuous lithium response distribution.
In future works, an alternative to dichotomising the Alda scale would be to use the full scale and run analyses using spline regression.Reference Durrleman and Simon34 With this technique, we would not build one model for the entire data-set, but instead, divide the data-set into multiple bins and fit each bin with its own model. Some of these models may be linear, whereas others may be polynomial. This approach would allow us to fit PRS to the lithium response distribution and account for the linear and non-linear relationships between different strata of the PRS and lithium response distributions.Reference Durrleman and Simon34
Regarding the clinical characteristics of patients in each meta-PRS quartile, we observed significant differences between the types of psychiatric diagnosis. Quartile 1 (low meta-PRS load) had lower proportions of bipolar I disorder diagnoses and higher proportions of bipolar II disorder and schizoaffective disorder, whereas the opposite was true for those in quartile 4 (high meta-PRS load). Given that higher meta-PRS loadings are associated with poorer lithium response, and that people with ‘purer’ forms of bipolar I disorder are considered better responders to lithium,Reference Malhi, Tanious, Das and Berk2,Reference Sportiche, Geoffroy and Brichant-Petitjean35,Reference Nunes, Stone and Ardau36 this is an unexpected finding: one might have hypothesised that there would be a higher proportion of people with bipolar I disorder in the low, and therefore less ‘contaminated’, meta-PRS group. Our finding suggests that the relationship between meta-polygenic disposition for SCZ and MDD and actual phenotypical expression of bipolar spectrum disorders is more complex,37 and that people with seemingly unfavourable genetic constellations may still benefit from lithium once other clinical and environmental parameters come into play. Similarly, patients with seemingly less favourable diagnoses for lithium response (i.e. bipolar II disorder and schizoaffective disorder) may still benefit if their polygenic disposition points towards better responsiveness.
Considerations for clinical use
This leads to two main considerations for clinical use. The overall increase in variance from combining clinical and genetic data may be of use for clinicians to improve the accuracy of their clinical decision-making overall, especially when combined with other PRS and biomarkers in future works and then incorporated into classification models. Further and more immediate benefit could be derived from using this added genetic data to reconsider patients who would be traditionally ruled out as favourable responders to lithium based solely on their clinical presentation if their meta-polygenic loadings suggest otherwise.
Limitations
A number of limitations exist in the current work. First, there was a limited amount of clinical data available for analysis in this cohort. Future studies should aim to collect a wider range of clinical data (for example symptom scales) to elucidate the relationship between PRS and clinical characteristics, as well as their combined predictive ability. Ideally, prospective studies of lithium response will be required in the future to quantify the predictive ability of machine-learning models in an environment that is analogous to clinical practice.
Second, correctly operationalising bipolar I disorder, bipolar II disorder and schizoaffective disorder DSM phenotypes is difficult in real-world practice. Relying on patient's retrospective reporting of symptoms and past episodes to form these diagnoses, as was done in the current study, can lead to misdiagnosis of bipolar subtypes.Reference Malhi, Tanious, Das and Berk2,Reference Malhi, Adams and Berk3
Another consideration concerns the selection of quantile-based PRS stratification over tertileReference Nunes, Ardau and Berghöfer12 and decileReference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13 stratification used in previous works. Choosing the number of PRS strata involves considering the trade-offs between a higher number of bins (i.e. decile stratification) that would likely contain larger differences in clinical characteristics, lithium response and polygenic risk, but result in a smaller sample (only 20% of the original sample would be retained when taking the extreme deciles). Alternatively, a lower number of bins (i.e. tertiles) would result in the opposite being true. To balance this trade-off, we chose quartile-based stratification. When taking the two extreme quartiles, we retained 50% of the original sample, while removing the middle of the PRS distributions that shows the smallest genetic differences in lithium response.Reference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13,Reference Amare, Schubert and Hou15 However, future studies could attempt to find the optimal number of strata through the use of cross-validation with spline regression, where the optimal number of strata could be tuned and selected according to the minimisation of a loss function. If completed in a leave-site-out framework, between site rater disagreement would be controlled for and the full lithium response distribution could be modelled.
The next limitation pertains to the patients that do not fall in the tails of the PRS distributions and who would therefore be excluded from prediction with this model. However, such a stratified model that confers superior predictive ability could first be used for patients that fall within these strata, and for patients that do not, models without stratification could be usedReference Nunes, Ardau and Berghöfer12 or other stratifying biomarkers could be incorporated.Reference Clark, Baune and Schubert38 Through this lens, we envision a stepwise process in clinical deployment where the choice of model itself would be tailored to individual patients depending on their unique clinical and biological characteristics (Supplementary Fig. 2). An alternative approach to parse biological heterogeneity would be to use unsupervised machine-learning models.Reference Dwyer, Kalman and Budde39 However, the disproportionately small effect sizes afforded by PRS,Reference Ivleva, Clementz and Dutcher40 the large risk of overfitting on unlabelled data,Reference Buhmann and Held41 the high level of polygenic collinearity across psychiatric traitsReference Lee, Ripke and Neale16,Reference Martin, Taylor and Lichtenstein42 and the resultant demands these considerations impose on statistical power,Reference Kasiulevičius, Šapoka and Filipavičiūte˙43 led us to take a simpler approach informed by previous findings.Reference Amare, Schubert, Hou, Clark, Papiol and Heilbronner13,Reference Antonucci, Pergola and Pigoni14
Implications
In conclusion, using PRS to stratify patients genetically and then train machine-learning models with clinical predictors led to large improvements in lithium response prediction over other forms of unimodal and multimodal modelling. Clinical data explained the most variance and both clinical and PRS data showed non-linear relationships with lithium response. To adequately model the linear and non-linear relationships between these PRS and lithium response across different genetic strata, future works should consider modelling these relationships using spline regression. Moreover, engineering a direct lithium response PRS and using this to parse heterogeneity may further improve model performance. In addition, parsing heterogeneity with biomarkers from neuroimaging and omics domains should also be considered. Finally, data-sets with a larger range of clinical variables will likely improve prediction following genetic stratification.
Supplementary material
Supplementary material is available online at https://doi.org/10.1192/bjp.2022.28.
Data availability
All data used in analysis is available to ConLi+Gen members. See http://www.conligen.org/ for more information.
Author contributions
Micah Cearns analysed the data, trained all models and drafted the manuscript. Azmeraw Amare calculated the PRS variables. Bernhard Baune, Oliver Schubert, and Scott Clark acted as senior authors providing supervision and overall guidance in the drafting of the manuscript. All ConLi+Gen members contributed clinical and genetic data and provided overall feedback on the manuscript.
Funding
The primary sources of funding were grants RI 908/7-1, FOR2107 and RI 908/11-1 from the Deutsche Forschungsgemeinschaft (Marcella Rietschel) and grant NO 246/10-1 (Markus M. Nöthen) and grant ZIA-MH00284311 from the Intramural Research Program of the National Institute of Mental Health (ClinicalTrials.gov identifier: NCT00001174). The genotyping was funded in part by the German Federal Ministry of Education and Research through the Integrated Network IntegraMent (Integrated Understanding of Causes and Mechanisms in Mental Disorders), under the auspices of the e:Med Programme (Thomas G. Schulze, Marcella Rietschel and Markus M. Nöthen). This study was supported by National Institutes of Health grants P50CA89392 from the National Cancer Institute and 5K02DA021237 from the National Institute of Drug Abuse. The Canadian part of the study was supported by grant 64410 from the Canadian Institutes of Health Research (Martin Alda). Collection and phenotyping of the Australian University of New South Wales sample was funded by program grant 1037196 from the Australian National Health and Medical Research Council (Philip B. Mitchell, Peter R. Schofield, Janice M. Fullerton). The collection of the Barcelona sample was supported by grants PI080247, PI1200906, PI12/00018, 2014SGR1636, 2014SGR398, and MSII14/00030 from the Centro de Investigación en Red de Salud Mental, Institut d'Investigacions Biomèdiques August Pi i Sunyer, the Centres de Recerca de Catalunya Programme/Generalitat de Catalunya, and the Miguel Servet II and Instituto de Salud Carlos III. The Swedish Research Council, the Stockholm County Council, Karolinska Institutet and the Söderström-Königska Foundation supported this research through grants awarded to Lena Backlund, Louise Frisen, Catharina Lavebratt and Martin Schalling. The collection of the Geneva sample was supported by grants Synapsy–The Synaptic Basis of Mental Diseases 51NF40-158776 and 32003B-125469 from the Swiss National Foundation. The work by the French group was supported by INSERM (Institut National de la Santé et de la Recherche Médicale), AP-HP (Assistance Publique des Hôpitaux de Paris), the Fondation FondaMental (RTRS Santé Mentale), and the labex Bio-PSY (Investissements d'Avenir program managed by the ANR under reference ANR-11-IDEX-0004-02). The collection of the Romanian sample was supported by a grant from Unitatea Executiva pentru Finantarea Invatamantului Superior, a Cercetarii, Dezvoltarii si Inovarii (Maria Grigoroiu-Serbanescu). The collection of the Czech sample was supported by the project Nr. LO1611 with a financial support from the MEYS under the NPU I program and by the Czech Science Foundation, grant Nr. 17-07070S. Azmeraw T. Amare is supported by 2019–2021 National Alliance for Research on Schizophrenia and Depression (NARSAD) Young Investigator Grant from the Brain & Behaviour Research Foundation (BBRF) and National Health and Medical Research Council (NHMRC) Emerging Leadership Investigator Grant 2021 – 2008000
Declaration of interest
Eduard Vieta has received grants and served as consultant, advisor or CME speaker for the following entities: AB-Biotics, Allergan, Angelini, AstraZeneca, Bristol-Myers Squibb, Dainippon Sumitomo Pharma, Farmindustria, Ferrer, Forest Research Institute, Gedeon Richter, Glaxo-Smith-Kline, Janssen, Lundbeck, Otsuka, Pfizer, Roche, Sanofi-Aventis, Servier, Shire, Sunovion, Takeda, the Brain and Behaviour Foundation, the Spanish Ministry of Science and Innovation (CIBERSAM), and the Stanley Medical Research Institute. Michael Bauer has received grants from the Deutsche Forschungsgemeinschaft (DFG), and Bundesministeriums für Bildung und Forschung (BMBF), and served as consultant, advisor or CME speaker for the following entities: Allergan, Aristo, Janssen, Lilly, Lundbeck, neuraxpharm, Otsuka, Sandoz, Servier and Sunovion outside the submitted work. Sarah Kittel-Schneider has received grants and served as consultant, advisor or speaker for the following entities: Medice Arzneimittel Pütter GmbH and Shire. Bernhard Baune has received grants and served as consultant, advisor or CME speaker for the following entities: AstraZeneca, Bristol-Myers Squibb, Janssen, Lundbeck, Otsuka, Servier, the National Health and Medical Research Council, the Fay Fuller Foundation, the James and Diana Ramsay Foundation. Tadafumi Kato received honoraria for lectures, manuscripts, and/or consultancy, from Kyowa Hakko Kirin Co, Ltd, Eli Lilly Japan K.K., Otsuka Pharmaceutical Co, Ltd, GlaxoSmithKline K.K., Taisho Toyama Pharmaceutical Co, Ltd, Dainippon Sumitomo Pharma Co, Ltd, Meiji Seika Pharma Co, Ltd, Pfizer Japan Inc., Mochida Pharmaceutical Co, Ltd, Shionogi & Co, Ltd, Janssen Pharmaceutical K.K., Janssen Asia Pacific, Yoshitomiyakuhin, Astellas Pharma Inc, Wako Pure Chemical Industries, Ltd, Wiley Publishing Japan, Nippon Boehringer Ingelheim Co Ltd, Kanae Foundation for the Promotion of Medical Science, MSD K.K., Kyowa Pharmaceutical Industry Co, Ltd and Takeda Pharmaceutical Co, Ltd. Tadafumi Kato also received a research grant from Takeda Pharmaceutical Co, Ltd. Peter Falkai has received grants and served as consultant, advisor or CME speaker for the following entities Abbott, GlaxoSmithKline, Janssen, Essex, Lundbeck, Otsuka, Gedeon Richter, Servier and Takeda as well as the German Ministry of Science and the German Ministry of Health. Eva Reininghaus has received grants and served as consultant, advisor or CME speaker for the following entities: Janssen and Institut Allergosan. Mikael Landén declares that, over the past 36 months, he has received lecture honoraria from Lundbeck and served as scientific consultant for EPID Research Oy; no other equity ownership, profit-sharing agreements, royalties or patent. Kazufumi Akiyama has received consulting honoraria from Taisho Toyama Pharmaceutical Co, Ltd The other authors have no other conflict of interest to disclose.
ICMJE forms are in the supplementary material, available online at https://doi.org/10.1192/bjp.2022.28.
eLetters
No eLetters have been published for this article.