Introduction
The quest for objective measures of mental disorders has been a long-standing ambition of psychiatry (Kapur, Phillips, & Insel, Reference Kapur, Phillips and Insel2012; Singh & Rose, Reference Singh and Rose2009). Given the notorious difficulties of classifying mental disorders and the challenge of establishing psychiatric biomarkers, many recent advances put their hope in approaches using machine learning (ML) as a paradigm-shifting way forward (Bzdok & Meyer-Lindenberg, Reference Bzdok and Meyer-Lindenberg2018; Janssen, Mourao-Miranda, & Schnack, Reference Janssen, Mourao-Miranda and Schnack2018; Shatte, Hutchinson, & Teague, Reference Shatte, Hutchinson and Teague2019). By applying ML on large-scale datasets, it seems feasible to distinguish between healthy controls and patients diagnosed with major depressive disorder or schizophrenia on an individual level – although reported diagnostic accuracies differ largely across studies (Ebdrup et al., Reference Ebdrup, Axelsen, Bak, Fagerlund, Oranje, Raghava and Glenthoj2019; Gao, Calhoun, & Sui, Reference Gao, Calhoun and Sui2018; Kambeitz et al., Reference Kambeitz, Kambeitz-Ilankovic, Leucht, Wood, Davatzikos, Malchow and Koutsouleris2015). Furthermore, ML techniques can differentiate successfully between subgroups within psychiatric categories (Drysdale et al., Reference Drysdale, Grosenick, Downar, Dunlop, Mansouri, Meng and Liston2017; Dwyer et al., Reference Dwyer, Cabral, Kambeitz-Ilankovic, Sanfelici, Kambeitz, Calhoun and Koutsouleris2018) and predict the success of specific psychopharmacological interventions for single subjects (Chekroud et al., Reference Chekroud, Zotti, Shehzad, Gueorguieva, Johnson, Trivedi and Corlett2016; Webb et al., Reference Webb, Trivedi, Cohen, Dillon, Fournier, Goer and Pizzagalli2018). Of high clinical interest are ML applications that provide robust probabilistic estimates regarding the future onset of psychosis (Borgwardt et al., Reference Borgwardt, Koutsouleris, Aston, Studerus, Smieskova, Riecher-Rössler and Meisenzahl2013; Chung et al., Reference Chung, Addington, Bearden, Cadenhead, Cornblatt, Mathalon and Genetics Study2018; Koutsouleris et al., Reference Koutsouleris, Kambeitz-Ilankovic, Ruhrmann, Rosen, Ruef, Dwyer and Consortium2018) or the risk of suicide (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang and Nock2017; Just et al., Reference Just, Pan, Cherkassky, McMakin, Cha, Nock and Brent2017; Walsh, Ribeiro, & Franklin, Reference Walsh, Ribeiro and Franklin2017). However, to allow translation to current clinical practice, further multicenter imaging studies, integrating clinical measures and multivariate imaging data, are needed to replicate promising initial findings (Giordano & Borgwardt, Reference Giordano and Borgwardt2019).
Currently, there is no established ML application in psychiatric clinical practice. The drastic increase of FDA approvals for medical applications of artificial intelligence (AI) in the past 2 years (Topol, Reference Topol2019) suggests that some ML programs could soon be integrated into standard clinical care, improving prediction and early detection, diagnostic certainty, and individual treatment outcome in the sense of personalized psychiatry (Perna, Grassi, Caldirola, & Nemeroff, Reference Perna, Grassi, Caldirola and Nemeroff2018). Unfortunately, the majority of ML applications in psychiatry still lack in-depth ethical analysis. With few exceptions discussing specific case studies (Martinez-Martin, Dunn, & Roberts, Reference Martinez-Martin, Dunn and Roberts2018), ethical concerns are often voiced in a general form (Char, Shah, & Magnus, Reference Char, Shah and Magnus2018; Topol, Reference Topol2019; Vayena, Blasimme, & Cohen, Reference Vayena, Blasimme and Cohen2018), thus necessarily neglecting the particular intricacies of potential psychiatric applications.
ML is an extremely broad term, covering many distinct computational approaches for even more heterogeneous real-world problems. We aim to demonstrate that any categorical rejection of the use of ML in psychiatry would be ethically wrong given its potential benefits but that careful evaluation is needed whether a particular procedure improves clinical care or merely constitutes a nifty computational exercise. Using schizophrenia as a paradigmatic case, we will first sketch some fundamental distinctions of different ML methods, before turning to three (hypothetical) case studies. To support our main claim, we will discuss these cases following the principlist framework of Beauchamp and Childress (Reference Beauchamp and Childress2013), which has recently been embraced as providing suitable principles for the ethical use of AI as well (AI HLEG, 2019; Floridi et al., Reference Floridi, Cowls, Beltrametti, Chatila, Chazerand, Dignum and Vayena2018).
Machine learning in psychiatry
The meaning of the term ‘machine learning’ is often ambiguous. In the present paper, we use ML to describe learning algorithms which improve their performance in a certain task based on prior computation (Iniesta, Stahl, & McGuffin, Reference Iniesta, Stahl and McGuffin2016; Mitchell, Reference Mitchell1997). ML in this sense comprises a narrower field than AI, which includes generalized AI and incidentally describes ‘whatever hasn't been done yet’ (Hofstadter, Reference Hofstadter1980, p. 601). At the same time, ML itself entails many specific computational approaches, from deep learning (DL) using artificial neural networks to algorithms relying on support vector machines. Across the many different methods of ML, a common distinction is drawn between three types: supervised, unsupervised, and reinforced learning.
Typical tasks performed by supervised learning are problems of discriminative classification where the ML algorithm assigns a probability of belonging to a certain category Y based on feature X. To do so, supervised learning requires labeled training data, matching the training instances to labels such as ‘diseased’– ‘healthy’, ‘developed psychosis’–‘did not develop psychosis’, or ‘positive treatment outcome’–‘negative treatment outcome’. After training, the ML algorithm can then assign these labels correctly to new data. Unsupervised learning, on the other hand, does not require labeled training data. Instead, it can make use of often more readily available, unlabeled data, such as whole-genome sequences or cell phone metadata, to find clusters within these data points. In real-life settings, applications may fall between these two approaches and are described as ‘semi-supervised’ or, as recently suggested by LeCun, as ‘self-supervised’ (LeCun, Reference LeCun2018), complementing labeled training data with large bits of unlabeled data (Chapelle, Schölkopf, & Zien, Reference Chapelle, Schölkopf and Zien2010). Finally, reinforcement learning denotes ML programs that optimize their interaction with an environment by trying to maximize reward over time (Mnih et al., Reference Mnih, Kavukcuoglu, Silver, Rusu, Veness, Bellemare and Hassabis2015). While this approach, inspired by neuroscientific accounts of learning, does not require fully labeled data, it needs some formalization of rewards, e.g. winning an ATARI game.
The schematic distinction of these three general ML types can also be instructive for ethical debate of applied ML in psychiatry. For as we will show, differences in methodology do not only have a big impact on feasibility since labelling of data often requires cost- and labor-intensive efforts but may also account for important ethical implications (Table 1).
Before turning to the potential of ML techniques to improve clinical care, some methodological limitations of psychiatric ML need to be mentioned, recently stressed by Vieira et al. (Reference Vieira, Gong, Pinaya, Scarpazza, Tognin, Crespo-Facorro and Mechelli2020). Some of these concerns, such as small sample size or publication bias, are pervasive across different research areas and neuroscientific research in particular (Button et al., Reference Button, Ioannidis, Mokrysz, Nosek, Flint, Robinson and Munafo2013; Kellmeyer, Reference Kellmeyer2017; Schnack & Kahn, Reference Schnack and Kahn2016). Other methodological issues arise with specific regard to ML, e.g. regarding failure to rigorously employ nested cross-validation, testing the predictions of an ML program on a fully independent sample (Stahl & Pickles, Reference Stahl and Pickles2018). In addition, psychiatry's high-dimensional and often noisy data demand particular consideration and may hinder adopting computational strategies popular in other medical areas. While DL is frequently considered the method of choice for medical image analysis (Shen, Wu, & Suk, Reference Shen, Wu and Suk2017), some recent results suggest that for imaging-based predictions of cognitive and behavioral measures, classical kernel regression is at least as successful as DL (He et al., Reference He, Kong, Holmes, Nguyen, Sabuncu, Eickhoff and Yeo2020; Mihalik et al., Reference Mihalik, Brudfors, Robu, Ferreira, Lin, Rau, Oxtoby, K., W., E. and M.2019), rendering a linear and more interpretable approach (Heinrichs & Eickhoff, Reference Heinrichs and Eickhoff2020) potentially preferable. These methodological challenges may partially account for inconsistent results across different studies, e.g. reporting largely variable accuracies for potential biomarkers of schizophrenia based on ML and neuroimaging (Kambeitz et al., Reference Kambeitz, Kambeitz-Ilankovic, Leucht, Wood, Davatzikos, Malchow and Koutsouleris2015).
The potentially deepest challenge for implementing ML in psychiatry lies in its long-embattled nosology though (Kendler, Reference Kendler2016; Kendler, Zachar, & Craver, Reference Kendler, Zachar and Craver2011; Zachar, Reference Zachar2015), calling into question the choice of appropriate data for training. Given that psychiatry arguably still lacks a successful diagnostic scheme that is valid and reliable (Barron, Reference Barron2019), establishing psychiatric ML programs relies on a shaky ground truth. This problem is exacerbated by fundamental concerns whether a reductionist framework, considering psychiatric disorders as mere brain diseases to be investigated with neuroimaging and genetics, is convincing (Borsboom, Cramer, & Kalis, Reference Borsboom, Cramer and Kalis2019). While we largely focus on neuroimaging studies in our examples for the sake of simplicity, research should thus be careful to not restrain their input a priori to biological data but also include social and idiosyncratic information on individual patients. Using natural language processing on narrative electronic health records could provide a starting point for such an endeavor (Rumshisky et al., Reference Rumshisky, Ghassemi, Naumann, Szolovits, Castro, McCoy and Perlis2016).
Applications of ML for schizophrenia
Future ML applications for patients with schizophrenia may differ largely. For research purposes, using unsupervised learning to identify altered brain structures in patients with schizophrenia is common. In some of these possible approaches, which have been described as data- or discovery-oriented (Huys, Maia, & Frank, Reference Huys, Maia and Frank2016; Krystal et al., Reference Krystal, Murray, Chekroud, Corlett, Yang, Wang and Anticevic2017), the algorithm is provided with neuroimaging data of patients with schizophrenia and left to find clusters (Dwyer et al., Reference Dwyer, Cabral, Kambeitz-Ilankovic, Sanfelici, Kambeitz, Calhoun and Koutsouleris2018; Schnack, Reference Schnack2019). Hence, apart from sample choice, little human labeling determines the data. Instead, the algorithm is left to find clusters that may or may not map onto a given hypothesis and can, in some cases, correlate with clinical data. Indeed, given the manifold disputes over psychiatric categorizations, some authors hope that embracing such a data-driven ML approach may provide new insights into neurobiological mechanisms of psychiatric diseases (Adams, Huys, & Roiser, Reference Adams, Huys and Roiser2016; Huys et al., Reference Huys, Maia and Frank2016; Madsen, Krohne, Cai, Wang, & Chan, Reference Madsen, Krohne, Cai, Wang and Chan2018; Skatun et al., Reference Skatun, Kaufmann, Doan, Alnaes, Cordova-Palomera, Jonsson and Westlye2017). A recent study that associated neuroanatomically distinct subtypes of schizophrenia with different illness duration and degrees of negative symptoms may serve as an example for this aspiration (Dwyer et al., Reference Dwyer, Cabral, Kambeitz-Ilankovic, Sanfelici, Kambeitz, Calhoun and Koutsouleris2018).
Also for diagnostic purposes, ML presents new opportunities for psychiatry. Based on specific changes in brain volume, several groups have shown that ML can distinguish non-medicated, first-episode patients with schizophrenia from healthy controls using volumetric MRI data (Chin, You, Meng, Zhou, & Sim, Reference Chin, You, Meng, Zhou and Sim2018; Gould et al., Reference Gould, Shepherd, Laurens, Cairns, Carr and Green2014; Haijma et al., Reference Haijma, Van Haren, Cahn, Koolschijn, Hulshoff Pol and Kahn2013; Lee et al., Reference Lee, Chon, Kim, Rathi, Bouix, Shenton and Kubicki2018; Rozycki et al., Reference Rozycki, Satterthwaite, Koutsouleris, Erus, Doshi, Wolf and Davatzikos2018; Xiao et al., Reference Xiao, Yan, Zhao, Tao, Sun, Li and Lui2019). As noted, findings so far have been rather inconsistent and one should avoid overoptimistic interpretations of these results (Kambeitz et al., Reference Kambeitz, Kambeitz-Ilankovic, Leucht, Wood, Davatzikos, Malchow and Koutsouleris2015; Vieira et al., Reference Vieira, Gong, Pinaya, Scarpazza, Tognin, Crespo-Facorro and Mechelli2020). Still, it seems reasonable to assume that in the future some ML techniques could assist physicians in their diagnostic process. Such applications could provide probabilistic estimates regarding one or several diagnostic labels such as schizophrenia, based on overlap with previously diagnosed patients. Arguably, most such methods would fall under the label of supervised learning since the training data need to be labelled, consisting of a vector of individual data such as brain data assigned to a category of ‘diseased’ vs. ‘healthy’, respectively.
Finally, recent psychiatric advances employing ML have seen a turn toward predicting certain quantifiable events beyond diagnostic labels, e.g. providing probabilities for the likelihood of an onset of psychosis (Koutsouleris et al., Reference Koutsouleris, Riecher-Rössler, Meisenzahl, Smieskova, Studerus, Kambeitz-Ilankovic and Borgwardt2015, Reference Koutsouleris, Kambeitz-Ilankovic, Ruhrmann, Rosen, Ruef, Dwyer and Consortium2018) or for the treatment success of one certain drug (Chekroud et al., Reference Chekroud, Zotti, Shehzad, Gueorguieva, Johnson, Trivedi and Corlett2016; Webb et al., Reference Webb, Trivedi, Cohen, Dillon, Fournier, Goer and Pizzagalli2018). While the majority of these approaches draw on supervised or unsupervised ML, some also use reinforcement learning to derive recommendations for optimal dynamic treatment regimes, using e.g. longitudinal data from so-called Sequential Multiple Assignment Randomized Trials (SMARTs). For example, by considering the treatment success of specific antipsychotics from the CATIE study (Stroup et al., Reference Stroup, McEvoy, Swartz, Byerly, Glick, Canive and Lieberman2003), Ertefaie, Shortreed, and Chakraborty (Reference Ertefaie, Shortreed and Chakraborty2016) have constructed a Q-learning approach which optimizes treatment outcome based on a patient's characteristics. Even more to the point, Koutsouleris et al. (Reference Koutsouleris, Kahn, Chekroud, Leucht, Falkai, Wobrock and Hasan2016) have shown that a cross-validated ML tool trained on diverse data from 334 patients could identify individuals which were more likely to benefit from treatment with amisulpride or olanzapine than with haloperidol, quetiapine, or ziprasidone. Such studies should be taken with a grain of salt though, given that there is no agreement what constitutes useful measures of treatment outcomes in psychiatry (Zimmerman & Mattia, Reference Zimmerman and Mattia1999; Zimmerman, Morgan, & Stanton, Reference Zimmerman, Morgan and Stanton2018) – a conundrum the introduction of ML seems unlikely to solve.
To highlight the dissimilarities between different usages, we provide three schematic cases that fall within the range of possible applications, from research to diagnosis and choice of treatment (Table 2). All three cases, we hold it, touch upon important ethical concerns that can be discussed in accordance with the four principles put forth by Beauchamp and Childress: beneficence, non-maleficence, respect for autonomy, and justice (Beauchamp & Childress, Reference Beauchamp and Childress2013).
Beneficence
The principle of beneficence expresses an aspiration to further the welfare and interests of others, potentially implying particular obligations of acting (Beauchamp & Childress, Reference Beauchamp and Childress2013, p. 165–176). As our previous points and cases indicate, patients may benefit from applied ML in many different ways, both directly and indirectly.
Direct
Firstly, ML-supported diagnostic tools aim at improving diagnostic certainty. Techniques such as in the case of D (case 2) may serve as an automated second opinion, confirm a psychiatrist's judgement, and help with unclear cases. In fact, if the algorithm is trained on data of the highest quality, which are, e.g. labeled independently by several internationally leading and experienced psychiatrists, it could provide patients with a reliable diagnosis. Considering the difficulty of establishing whether schizophrenia is accurately diagnosed and given the considerable inter-rater disagreement among experts (Mokros, Habermeyer, & Kuchenhoff, Reference Mokros, Habermeyer and Kuchenhoff2018), a diagnostic algorithm supporting psychiatrists in their decision-making could increase the likelihood of patients receiving a correct diagnosis and hence of receiving an adequate treatment. By providing prognostic estimates concerning the future course of a disorder, such as the occurrence of psychotic episodes, or the success of specific treatments, ML applications may also help to reduce extraneous psychopharmacological interventions (Martinez-Martin et al., Reference Martinez-Martin, Dunn and Roberts2018) and track the progression of the disorder. This is the case for T (case 3), who may be spared an arduous trial-and-error regime of medication by an algorithm suggesting one potentially ideal medication early on. Of course, the benefits of a correct diagnosis might be infringed dramatically by additional risks, to which we turn later, if these diagnostic or predictive processes were to be left unchecked. However, at least for now, such a development seems rather unlikely, both technically and socially, in most medical specialties (Topol, Reference Topol2019).
Indirect
Beyond these immediate clinical uses, patients may also benefit from research projects similar to our first case, leading to more accurate diagnostic categories. After all, most current psychiatric diagnoses as enshrined in the DSM or ICD are purely descriptive, optimized primarily for validity and inter-rater reliability, not for underlying pathophysiology – but this lack of concern for etiological underpinnings has long been of concern to many in the field (Hyman, Reference Hyman2011). In contrast, computational approaches based on ML aspire ‘to automatically segregate brain disorders into natural kinds’ (Bzdok & Meyer-Lindenberg, Reference Bzdok and Meyer-Lindenberg2018, p. 223). Notwithstanding conceptual questions regarding the nature of psychiatric disorders (Kendler, Reference Kendler2016; Zachar, Reference Zachar2015), ML may be eminently suited to develop biologically more plausible diagnostic categories, allowing for more specific treatment options. After all, concerns of insufficiently grasping psychiatric complexity have long accompanied the development of psychiatric biomarkers (Singh & Rose, Reference Singh and Rose2009). ML drawing on rich data, from detailed biological information such as (f)MRI scans or whole-genome sequences to demographic data and electronic health records, could arguably accommodate such complexity. Still, the concern remains that ML applications drawing on ML may overtly reify diagnostic categories designed as heuristic constructs (Hyman, Reference Hyman2010) – and thus end up harming patients.
Non-maleficence
Abstaining from harm is a bedrock of clinical practice (Smith, Reference Smith2005). How does ML in psychiatry fare with regard to this crucial principle? Firstly, privacy concerns may come to mind here (Vayena et al., Reference Vayena, Blasimme and Cohen2018). How is sensitive medical information disclosed to an algorithm and how can data created by the algorithm be protected appropriately? These are essential questions but only concern ML techniques indirectly, via the data used and produced by its applications. Since privacy issues of big data have been addressed extensively elsewhere (Price & Cohen, Reference Price and Cohen2019), we will leave them aside here to focus on harm potentially caused by ML in psychiatry. As in the case of benefits, there are both direct and indirect ways in which its use may harm patients.
Direct
First, using an algorithm may bring about harm directly, e.g. when the diagnosis or predictions made by the ML application are erroneous. Previous shortcomings of health-related ML can be instructive here. IBM's ML-based computer system Watson, advertised as a revolutionary tool for cancer care, has been shown to recommend unsafe treatments endangering patients' safety and health (Ross & Swetlitz, Reference Ross and Swetlitz2018). Such errors are particularly worrying if recommendations of algorithms are readily accepted by medical staff, as in T's case, or if the process would become fully automated. Although an erroneous algorithm is likely to affect more patients compared to an individual mistake made by a physician, errors are far from exclusive to algorithms (McLennan et al., Reference McLennan, Engel, Ruhe, Leu, Schwappach and Elger2013), and these concerns could be tackled by a model of shared responsibility in which competent human agents check the ML-based suggestions (Topol, Reference Topol2019). However, as opposed to human physicians, a trained ML algorithm may not be flexible enough to account for contextual changes such as the swift rise of smartphone usage or altered eating habits. Given the dependency of psychiatric conditions on contingent societal contexts, even a tested and approved program may thus require regular overhauling and retraining to avoid systematic misjudgments.
Indirect
The more intricate questions seem to arise from indirect effects of using ML in patients with schizophrenia. By potentially modifying the expectations of doctors, the result of a computationally assigned risk-category will most likely influence downstream diagnostic and therapeutic decision-making. For example, in mammography screening, risk stratification affects the detection performance of radiologists: a known BRCA mutation strongly decreases the number of missed visible breast cancer lesions in MRI scans (Vreemann et al., Reference Vreemann, Gubern-Merida, Lardenoije, Bult, Karssemeijer, Pinker and Mann2018). Timing the disclosure of ML-based computations to the physician is thus crucial: should she have to decide on one diagnosis first before being confronted with the results of ML diagnostics? Furthermore, the impact of incorporating ML in the clinical setting will require additional scrutiny regarding its effects on the therapeutic relationship. How do patients perceive the use of ML by their physicians to arrive at diagnostic judgements or prognostic estimates? Does it impair their trust in health care professionals and if so, could it harm their compliance and the therapeutic outcome? These questions are of particular importance in the case of psychiatric patients who are particularly vulnerable to so-called ‘diagnostic overshadowing’, i.e. health care professionals falsely attributing somatic symptoms to known mental health issues (Callard, Bracken, David, & Sartorius, Reference Callard, Bracken, David and Sartorius2013; Jones, Howard, & Thornicroft, Reference Jones, Howard and Thornicroft2008; Shefer, Henderson, Howard, Murray, & Thornicroft, Reference Shefer, Henderson, Howard, Murray and Thornicroft2014). These challenges merit ongoing attention and require accompanying efforts of clinical ML implementation with corresponding empirical bioethical research to explore the potential negative impact.
Patients' autonomy and clinicians' judgement
Respect for autonomy demands conveying sufficiently detailed and understandable information to patients about planned medical procedures and asking for their consent (Manson & O'Neill, Reference Manson and O'Neill2007). Such disclosure may be particularly challenging in cases of applied ML, used by medical practitioners who may themselves not fully understand the mathematical underpinnings of an algorithm. Does the, to some extent, unavoidable opacity of ML, commonly discussed as ‘black box’-problem, clash with the requirement to appropriately inform patients? And should one ask patients for their explicit consent when using (existing) data before providing it to the algorithm at all? After all, obtaining informed consent for the use of predictive analytics is not legally mandatory at the moment (Cohen, Amarasingham, Shah, Xie, & Lo, Reference Cohen, Amarasingham, Shah, Xie and Lo2014). One could wonder whether discussing ML algorithms with a group as vulnerable as patients at risk of psychosis or paranoid symptoms might not exacerbate their situation and cause severe additional psychological stress (Martinez-Martin et al., Reference Martinez-Martin, Dunn and Roberts2018).
Questions of autonomy also stretch to the domain of medical doctors' discernment, and respecting clinicians' judgement is vital in the context of modern health care systems (Faden et al., Reference Faden, Kass, Goodman, Pronovost, Tunis and Beauchamp2013). Much depends on the conceptualization of the relation between human expert and ML algorithm. One analogy, recently proposed by Topol (Reference Topol2019), suggests that we conceptualize the relation of clinician and algorithm similarly to assisted driving and increasingly autonomous cars. While the machine may take over some tasks, the drivers or physicians need to remain in charge as a backup, checking the machine's output by comparing it to their own judgements. This would facilitate attributing degrees of responsibility to health care personnel, clarifying important issues of accountability and liability. It implies that human agents need to remain able to weigh ML recommendations and potentially decide against them. Ideally, as a safeguard against bad judgements by single individuals, one could envision provisions in which disagreements between physicians and ML application lead to consultations with other clinicians, e.g. during departmental meetings, providing an opportunity to sharpen the clinical skills of everyone involved. Furthermore, an institutional framework may be needed to test and approve ML applications in a similar fashion as pharmaceutical products (Paulus, Huys, & Maia, Reference Paulus, Huys and Maia2016).
Fair allocation and systematic biases
Finally, using ML in psychiatry also raises important issues concerning justice, from financial aspects to systematic biases. Does increased diagnostic certainty justify the allocation of scarce financial means to additional computational efforts and vindicate even highly expensive exams such as (f)MRI? Integrating the data from examinations such as MRI into psychiatric routines may pose additional serious challenges for equal treatment if certain patients cannot undergo scanning due to limited availability or contraindications such as claustrophobia. Arguably, any new technique needs to establish a measurable clinical benefit over a conventional psychiatric assessment to vindicate its cost (Iwabuchi, Liddle, & Palaniyappan, Reference Iwabuchi, Liddle and Palaniyappan2013), or show that it can avoid costs elsewhere. With regard to discerning different diagnostic entities, research based on ML could also lead to issues commonly known as salami slicing: even without understanding the underlying pathophysiological mechanisms, lobbying by pharmaceutical companies might have an interest to split psychiatric disorders into many distinct categories to gain advantages in the approval of new drugs. On the other hand, we should not forget that in many countries, only a very limited amount of the overall healthcare budget is allocated to mental health (World Health Organization, 2018). More precise diagnoses and better treatments might convince policymakers to overcome this health disparity, ultimately empowering psychiatric patients.
Of further concern are systematic biases, easily induced by poor training data and particularly worrisome in diagnostic contexts (Vayena et al., Reference Vayena, Blasimme and Cohen2018). The example of schizophrenia is a case in point, with its long-standing disproportionate number of diagnoses in African-Americans and Latin-Americans, arguably influenced by stereotypes, the clinician's own ethnicity, or the under-diagnosis of other psychiatric diseases (Schwartz & Blankenship, Reference Schwartz and Blankenship2014). ML trained on data with these or other biases could further purport and reify misconceptions (Tandon & Tandon, Reference Tandon and Tandon2018). If training data are less than carefully curated, ML applications might hence not constitute an independent diagnostic tool for enhancing diagnostic accuracy, undermining the endeavor's very aim. To avoid perpetuating pathophysiologically misleading biases, developing appropriate supervision strategies for the ML algorithm thus seems key to a successful clinical implementation. Such supervision should (1) track which parameters are taken into account by the algorithm to arrive at its recommendations and (2) compare the results of algorithms trained on different databases. Such strategies would also help to foster explicability which the initially mentioned AI4people initiative rightly suggests as a fifth principle for ethical AI use, enabling the other four (Floridi et al., Reference Floridi, Cowls, Beltrametti, Chatila, Chazerand, Dignum and Vayena2018). The implementation of such safety measures will be critical for minimizing biases in decision-making but it is not yet clear how ML algorithms will nonetheless capitalize on existing biases in the data.
Conclusion
A plethora of context-specific ethical issues might arise in applied ML in psychiatry and the treatment of schizophrenia. For now, ML remains in the domain of research and should be accompanied by exploring its ethical aspects as there is no standard rule to determine when an application is ethically permissible given the complexity of each singular case. Further, empowering psychiatric patients can only happen with the help of important support systems such as family, peer, and community members. Still, if some of the vast potential benefits of psychiatric ML can indeed lead to tangible improvements for patients, we believe it is not only permissible but it may in fact be a moral obligation to pursue them further and aim at their successful clinical implementation.
Acknowledgements
The authors would like to thank Andrea Martani and Christopher Poppe for their helpful comments as well as the three anonymous reviewers for substantial improvements on a previous draft.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of interest
None.