-
Psychiatric research applies statistical methods from two different frameworks: causal inference and prediction.
-
If prediction methods (i.e., machine learning algorithms) are causally agnostic, their ability to inform clinical decision-making is limited.
-
Common pitfalls can be avoided by considering key causal structures such as confounders, mediators, and colliders.
-
The relative need for prediction vs. causal inference is context-dependent.
-
New methods combining prediction and causal inference may hold great promise for precision psychiatry.
The recent move towards prediction in psychiatry
Psychiatric care involves deciding upon the optimal course of action for individual patients (i.e., clinical decision-making). Psychiatric research aids clinical care by developing diagnostic methods, new treatments, evaluating safety and efficacy, and much more. This research depends on the application of appropriate statistical frameworks to answer specific research questions. These frameworks may generally be divided into frameworks for causal inference and for prediction, as previously described (Breiman, Reference Breiman2001; Bzdok et al., Reference Bzdok, Altman and Krzywinski2018). Causal inference aims to determine the effect of one variable on another, which is crucial for selecting between alternative courses of action. In contrast, prediction aims at forecasting, independently of whether the patterns observed cause the predicted data, or simply correlate with them. Colloquially, it is one thing to build a barometer to predict a storm (prediction), another to know how to improve the weather (causal inference). Importantly, answering causal questions does not require a granular understanding of mechanisms. For example, randomised trials may provide evidence for the effects of a given medication on an outcome, irrespective of whether the mechanism of that medication is known. Likewise, knowing whether smoking causes cancer does not require extensive knowledge about all mediating mechanisms. Traditionally, psychiatric research has focused mostly on understanding, rather than predicting, and on population-level, rather than individual-level, questions, but this focus may be shifting.
Advances in machine learning (ML) and the increasing availability of large datasets have led to wide propositions about the potential applications of ML-based prediction at the level of individual patients in healthcare (Matheny et al., Reference Matheny, Whicher and Thadaney Israni2020). ML methods are able to incorporate large amounts of data, detect complex dependencies, and often focus explicitly on optimising generalisability. These strengths are suggested to furnish reliable predictions for individual patients, and thus more individually tailored treatments – that is, ‘precision medicine’ (Bzdok et al., Reference Bzdok, Varoquaux and Steyerberg2021). In psychiatry, ML prediction methods have been argued to be superior to ‘traditional’ methods (aimed at causal inference) when it comes to individualising psychiatric care, and several authors thus propose a wider adaptation of ML methods in psychiatric research (Paulus, Reference Paulus2015; Bzdok et al., Reference Bzdok, Varoquaux and Steyerberg2021). For example, Bzdok et al. argue that ‘Prediction, not association, paves the road to precision medicine’ and Paulus proposes ‘…that we shift from a search for elusive mechanisms to implementing studies that focus on predictions to help patients now’. These viewpoints not only suggest a wider application of the prediction framework, but also a down-prioritisation of causal inference. The growing interest in prediction is also reflected in the rising number of studies applying ML methods in psychiatry (Chekroud et al., Reference Chekroud, Bondar, Delgadillo, Doherty, Wasil, Fokkema, Cohen, Belgrave, DeRubeis, Iniesta, Dwyer and Choi2021; Salazar de Pablo et al., Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver, Baldwin, Danese, Fazel, Steyerberg, Stahl and Fusar-Poli2021; Koutsouleris et al., Reference Koutsouleris, Hauser, Skvortsova and Choudhury2022). The promise of prediction is great: If we combine ‘big data’ with ML methods, we may be able to make individual-level predictions that can guide clinical, psychiatric care. However, prediction may be insufficient. As argued recently (Prosperi et al., Reference Prosperi, Guo, Sperrin, Koopman, Min, He, Rich, Wang, Buchan and Bian2020; Wilkinson et al., Reference Wilkinson, Arnold, Murray, Smeden, Carr, Sippy, Kamps, Beam, Konigorski, Lippert, Gilthorpe and Tennant2020), precision medicine – that is, choosing the optimal course of action for individual patients – cannot be built on prediction alone, but requires causal inference. Here, we elaborate why, and how it applies to psychiatry.
Prediction is not enough
In this Viewpoint, we claim that the direction set forth by Bzdok et al., Paulus, and others, may not deliver precision psychiatry as intended. We argue that prediction models may very well yield accurate prognostic predictions, but that this is insufficient for improving clinical decision-making. This claim is based on the central difference between determining prognosis – a factual prediction – and deciding between alternative treatment options – a causal (counterfactual) question. Here, we elaborate the claim that precision psychiatry needs causal inference first by describing key causal structures, second, by showing how prediction models may be misinterpreted if causality is neglected, and third, by describing how causal inference and prediction may complement each other in psychiatric research, going forward.
Key causal structures
The field of causal inference highlights three fundamental causal structures which, if not handled appropriately during analysis, will result in incorrect conclusions about the intervention of interest (Hernán & Robins, Reference Hernán and Robins2016; Pearl et al., Reference Pearl, Glymour and Jewell2016). These are 1) confounders, 2) mediators, and 3) colliders. We will discuss each in turn, placing a particular focus on their role in psychiatric research.
Confounders
Confounders are variables that have a causal effect on both the exposure (e.g., an intervention) and the outcome (Figure 1A). If these variables are not conditioned on, estimates of the effect of the intervention on the outcome will be biased. Note that ‘conditioned on’ can mean: a) Participant selection is dependent on the variable or b) the variable is included in the statistical model. A typical example in healthcare is confounding by indication; where the intervention is administered based on some criterion (typically a disease). For example, we may study patients with depression, some of which have been treated with electroconvulsive therapy (ECT). As a predictor of post-treatment depressive symptoms, ECT would likely perform tremendously well (Kellner et al., Reference Kellner, Obbels and Sienaert2020). However, we know that ECT is indicated for patients with severe depression. Initial symptom severity both causes the intervention and causes post-treatment symptom severity (Figure 1B). Pre-treatment symptom severity thus confounds the association between ECT and post-treatment symptoms. The association would, if interpreted incorrectly, lead us to the erroneous conclusion that ECT harms patients, even though randomised trials show consistent benefit. To remove bias from confounding, we can add the confounding variable to our model. If we have measured the variable with sufficient granularity and precision, this will solve the problem. So, is causal inference just a problem of measuring a sufficient number of variables? Can we solve our problems with big(ger) data? Unfortunately, not necessarily.
Mediators
When a variable is caused by the exposure of interest (i.e., an intervention) and has a causal effect on the outcome, it is a mediator (Figure 1C). If this variable is conditioned on, estimates of the intervention’s effect on the outcome will be biased. Mediators, as well as colliders, are examples where adding more variables to your model will not only decrease statistical precision, it will also lead you to draw wrong conclusions. For example, a theory of cognitive behavioural therapy (CBT) assumes that it causes changes to a patient’s cognitive schemas, which causes changes to rumination, which alleviates depressive symptoms (Watkins, Reference Watkins2009). In this case, CBT will be a strong predictor of depressive symptoms if rumination is not included in the analyses, but may or may not be a strong predictor if rumination is included (Figure 1D). The association depends on the particularities of the prediction model chosen, not the actual effect of treatment. For causal estimates, mediators should not be included in the models.
Colliders
When a variable is caused by the exposure (i.e., intervention) and the outcome (or another variable with a causal effect on the outcome), it is a collider (Figure 1E). If this variable is conditioned on, estimates of an exposure’s effect on the outcome will be biased. An example of this is the paradoxical observation that post-traumatic stress disorder (PTSD) is negatively associated with suicide in some studies (Zivin et al., Reference Zivin, Kim, McCarthy, Austin, Hoggatt, Walters and Valenstein2007), despite strong evidence that PTSD increases the risk of suicide in other studies (Gradus et al., Reference Gradus, Qin, Lincoln, Miller, Lawler, Sørensen and Lash2010; Fox et al., Reference Fox, Dalman, Dal, Hollander, Kirkbride and Pitman2021) As described by (H. Jiang et al., Reference Jiang, Huang, Tian, Shi, Yang and Pu2022), this paradox may arise due to conditioning on mediators (e.g., depression) that share a common cause (e.g., other mental illness, lack of social support, etc.) with the outcome (Figure 1F). In this case, it acts as a collider. For causal estimates, colliders should not be conditioned on.
With these concepts in hand, we can turn to how modern ML-based prediction models can be misinterpreted if causality is neglected.
Prediction pitfalls
Pitfall #1 - mistaking feature importance for causal importance
When ML methods are used to predict an outcome, researchers may wish to know which variables were most important for the prediction, that is, evaluate ‘feature importance’. Here, an important pitfall is to mistake feature importance for causal importance, forget the key causal structures described above, and assume that modification of the variable will necessarily alter the risk of the outcome. An example of such misinterpretation is seen in Liu et al., who identified higher body mass index (BMI) as an important variable for predicting cognitive impairment. They concluded: ‘Therefore, interventions for cognitive function among the elderly should target weight management’. This statement assumes a causal effect of BMI on cognitive function, without considering alternative explanations. High feature importance could just as well arise if BMI and lowered cognitive function share a common cause (i.e., confounding), for example historical lack of exercise (Pitrou et al., Reference Pitrou, Vasiliadis and Hudon2022). While weight management is likely beneficial to most patients, there may be cases where mistaking feature importance for causal importance will be less beneficial, or even harmful to patients (Guglin et al., Reference Guglin, Li, Kanwar, Abraham, Kataria, Bhimaraj, Vallabhajosyula and Kapur2023).
Pitfall #2 - mistaking symptom changes for treatment effects
ML-based methods are suggested to support selection of treatment. This calls for a model that can determine which treatment will improve the patient’s state the most. Research in prediction of ‘treatment response’ attempts to answer this question by training models to predict which patients will improve after being administered treatment. If the patient improves, the logic goes, they were given the right treatment. However, improving after being administered treatment can be due to a plethora of factors besides the treatment itself, for example, confounding, regression to the mean, placebo effects, natural course of illness, etc. An example of this pitfall is seen in (Redlich et al., Reference Redlich, Opel, Grotegerd, Dohm, Zaremba, Bürger, Münker, Mühlmann, Wahl, Heindel, Arolt, Alferink, Zwanzger, Zavorotnyy, Kugel and Dannlowski2016), which recruited patients with major depressive disorder (MDD) and obtained baseline structural magnetic resonance imaging (sMRI) data before treatment with ECT + antidepressants. Crucially, the ML algorithms were trained solely on the ECT group. Redlich et al., found ML on baseline sMRI could predict treatment response and concluded: ‘Although determining which ECT recipients will respond remains difficult in clinical practice, a routine assessment with structural MRI before treatment could serve as a decision guide for clinical psychiatrists’. While these predictions may hint at outcomes after treatment with ECT, they do not estimate how patients would have fared if they were not given ECT. For example, the MRI may simply identify patients that would have recovered on their own, irrespective of whether they were treated or not. If the study had been designed to for causal inference, it could have served as a decision guide for treatment.
Pitfall #3 - avoiding causal inference altogether
In order to avoid the pitfalls described above, researchers applying ML methods for prediction purposes may rightfully refrain from making causal interpretations of their predictive models, and many studies indeed do so. For example, Jiang et al., applied ML methods to register-based data to predict suicide in the 30 days post discharge from a psychiatric hospital and stated: ‘It is noteworthy that these predictors should be interpreted as risk markers and not causal risk factors, given that our analyses were not intended to quantify the causal effect of any of these predictors, but rather to examine their contribution to accurate prediction of postdischarge suicide’ (T. Jiang et al., Reference Jiang, Rosellini, Horváth-Puhó, Shiner, Street, Lash, Sørensen and Gradus2021). With this statement, they acknowledge that they leave a crucial question unanswered: How can we better prevent suicide? Brief suicide prevention interventions reduce the number of suicide attempts by roughly 30% (Doupnik et al., Reference Doupnik, Rudd, Schmutte, Worsley, Bowden, McCarthy, Eggan, Bridge and Marcus2020), but what of the remaining 70%? Prediction models can identify which patients are missed and should receive interventions, and when we know which interventions to administer, this is valuable. However, as prediction accuracy improves and fewer patients are missed, intervention efficacy becomes the limiting factor for clinical care, and research may be centred around methods to identify causal mechanisms and develop more effective interventions.
Ways forward
When is prediction enough?
In the example above (i.e., suicide prevention), the challenges lie both in knowing who to act upon and knowing how to act. However, there may be contexts where we know how to act (i.e., treat/prevent), but systematically miss patients who we should act upon. This may be the case for type 2 diabetes (T2D). The causal mechanisms underlying T2D development and the effectiveness of different interventions is well established through RCTs (i.e., causal inference) (Knowler, 2002), but in some populations, for example psychiatric patients, they are systematically undertreated (Scott & Happell, Reference Scott and Happell2011). Hence, ML-based prediction need not provide causal knowledge, but only identify at-risk individuals. In this case, the lack of causal inference in the prediction is compensated by the strong causal understanding of the mechanisms involved in T2D, and this may generalise many clinical issues.
Inferring causality
The development of psychiatric disorders is highly complex, and the underlying causal effects are typically not known, motivating causal inference. Although RCTs remain the gold standard for causal inference, they are often unfeasible or unethical (e.g., for determining the effects of childhood trauma or substance abuse on mental health). Methods to infer causal effects from observational data are thus needed. Algorithm-based identification of causal networks is a promising, ongoing research field, (Eberhardt, Reference Eberhardt2017), but literature in psychiatry is scarce. Instead, the dominant approach relies on experts to specify a set of assumptions which is agreed upon and then to acquire data to estimate causal effects (Hernán & Robins, Reference Hernán and Robins2016). Interactive tools have been developed to exactly this end (Textor et al., Reference Textor, van der Zander, Gilthorpe, Liskiewicz and Ellison2016). Broader approaches to causal inference in health sciences have also been described, such as ‘inference to the best explanation’, ‘triangulation’, and the classical Hill criteria (Krieger & Davey Smith, Reference Krieger and Davey Smith2016; Ohlsson & Kendler, Reference Ohlsson and Kendler2020). Regardless of the exact approach, we are convinced that causal inference frameworks will play a defining role in developing the future of psychiatric care.
Acknowledgements
None.
Author contributions
MJ and OHJ wrote the paper.
Funding statement
OHJ is funded by the Health Research Foundation of the Central Denmark Region (Grant no. R64-A3090-B1898).
Competing interests
The authors declare no competing interests.