Significant outcomes
-
The number of studies on the use of generative AI in psychiatry is growing rapidly, but the field is still at an early stage.
-
Most studies are early feasibility tests or pilot projects, while only very few involve prospective experiments with participants.
-
The field suffers from lack of clear reporting and would benefit from adhering to reporting guidelines such as TRIPOD-LLM.
Limitations
-
There is no clear definition of generative AI in the literature, which means that some relevant studies might have been omitted.
-
The study represents a still image of a rapidly moving field as of February 2024, i.e., recent developments might not have been captured.
-
Due to the relative immaturity of the field, no formal quantitative analysis or quality assessments were made.
Introduction
The recent launch of ChatGPT (OpenAI, 2024a) demonstrated the potential of generative artificial intelligence (AI) to the world (Hu and Hu, Reference Hu and Hu2023). Generative AI encompasses models that produce content, such as text, images, or video, as opposed to rule-based models which are constrained to providing predetermined outputs. There already seems to be wide consensus that generative AI has the potential to transform many aspects of modern society, including the field of medicine (Haug and Drazen, Reference Haug and Drazen2023), where it may aid, e.g., training of medical professionals (Kung et al., Reference Kung, Cheatham, Medenilla, Sillos, De Leon, Elepaño, Madriaga, Aggabao, Diaz-Candido, Maningo, Tseng and Dagan2023), informing/educating patients (Ayers et al., Reference Ayers, Poliak, Dredze, Leas, Zhu, Kelley, Faix, Goodman, Longhurst, Hogarth and Smith2023), diagnostic processes (Lee, et al., Reference Lee, Bubeck and Petro2023), clinical note taking/summarization (Denecke et al., Reference Denecke, Hochreutener, Pöpel and May2018; Schumacher et al., Reference Schumacher, Rosenthal, Nair, Price, Tso and Kannan2023) and reporting of research findings (Else, Reference Else2023).
At present, the medical potential of generative AI is probably most clearly manifested via generative natural language processing, i.e., the use of computational techniques to process speech and text (Nadkarni, et al., Reference Nadkarni, Ohno-Machado and Chapman2011; Gao et al., Reference Gao, Dligach, Christensen, Tesch, Laffin, Xu, Miller, Uzuner, Churpek and Afshar2022). This makes generative AI particularly appealing for the field of psychiatry, where language plays an important role for three primary reasons. First, spoken language is the primary source of communication between patient and clinician, forming the basis for both the diagnostic process and assessment of treatment efficacy and safety (Hamilton, Reference Hamilton1959; Hamilton, Reference Hamilton1960; Kay, et al., Reference Kay, Fiszbein and Opler1987; Lingjærde et al., Reference Lingjærde, Ahlfors, Bech, Dencker and Elgen1987). Second, several core symptoms of mental disorders manifest via spoken language, such as disorganised speech or mutism (schizophrenia in particular), slowed speech (depression), increased talkativeness (mania) or repetitive speech (autism) (World Health Organization, 1993; American Psychiatric Association, 2013). Third, due to the near-total absence of clinically informative biomarkers, psychiatry is the medical specialty in which written language plays the most prominent role for documenting clinical practice (Hansen et al., Reference Hansen, Enevoldsen, Bernstorff, Nielbo, Danielsen and Østergaard2021).
Generative AI, however, is not restricted to language. Indeed, the technology is also able to generate, e.g., images and videos, as showcased by services such as DALL·E (OpenAI, 2023) and Sora (OpenAI, 2024b). These output formats could also be tremendously useful for the field of psychiatry. As an example, they may allow patients with hallucinations and delusions to visualise their experiences for relatives, friends and clinical staff, which may be beneficial for a variety of reasons (for instance to increase understanding/reduce stigma and to assess symptom severity/guide treatment) (Østergaard, Reference Østergaard2024).
While there are systematic reviews published on the use of artificial intelligence and/or conversational agents/chatbots in psychiatry (Graham et al., Reference Graham, Depp, Lee, Nebeker, Tu, Kim and Jeste2019; Vaidyam, et al., Reference Vaidyam, Linggonegoro and Torous2021; Li et al., Reference Li, Zhang, Lee, Kraut and Mohr2023), we are not aware of analogue studies focusing on generative AI – both more narrowly in terms of the technology (much more sophisticated/flexible compared to, e.g., rule-based approaches) and more broadly in terms of output formats (not restricted to text/speech). Therefore, the aim of this study was to systematically review the literature on the current use/application of generative AI in the context of psychiatry and mental health care.
Methods
We performed a systematic review in agreement with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline (Moher et al., Reference Moher, Liberati, Tetzlaff and Altman2009). The screening and data extraction process was supported by Covidence (‘Covidence systematic review software’, 2024). The protocol was preregistered on the Open Science Framework: https://osf.io/mrws8.
Search strategy
The search was conducted across PubMed, Embase and PsycINFO. The search terms used for PubMed were as follows: (“generative ai*”[All Fields] OR “generative artificial*”[All Fields] OR “conversational ai*”[All Fields] OR “conversational artificial*”[All Fields] OR “large language model*”[All Fields] OR “chatbot*”[All Fields] OR “chatgpt*”[All Fields]) AND (“psychiatry”[MeSH Terms] OR “mental disorders”[MeSH Terms] OR “mental health”[MeSH Terms] OR “Psychotherapy”[MeSH Terms] OR “psychiatr*”[Title/Abstract] OR “mental disorder*”[Title/Abstract] OR “mental health”[Title/Abstract] OR “mental disease*”[Title/Abstract] OR “Psychotherap*”[Title/Abstract]). Analogue searches were conducted in Embase and PsycINFO (the search terms are available in the protocol: https://osf.io/mrws8). The search was conducted on February 23, 2024 (an update from the September 12, 2023, search date mentioned in the preregistration).
Screening of identified records
Two authors (SK and RML) independently screened the identified records. Screening was first performed at title/abstract level followed by full-text screening. Conflicts in screening results was resolved by RML and SK, and after consultation with SDØ in cases of doubt. The following inclusion criteria were used when screening the literature:
-
Research articles reporting original data on the use/application (understood broadly) of generative AI* (for instance chatbots such as ChatGPT) in the context of psychiatry or mental health care (including, but not limited to, treatment/psychotherapy and psychoeducation).
-
Only articles published in journals with peer review will be included.
-
No language restriction will be enforced.
-
No time restriction (year of publication) will be enforced.
*By generative AI, we refer to artificial intelligence/machine learning models capable of generating content such as text, speech, images, etc. Examples of these include, but are not limited to, transformer architectures (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017) such as ChatGPT (OpenAI, 2024a) and diffusion models (Sohl-Dickstein et al., Reference Sohl-Dickstein, Weiss, Maheswaranathan and Ganguli2015) such as DALL·E (OpenAI, 2023), which produce output that has not been predefined. During the screening process, we discovered that some studies referred to rule-based systems (i.e., selecting predetermined responses from e.g. decision trees are) as ‘generative’. We do not consider such systems to be generative in the sense implied by generative AI, and, therefore, did not include them in the review.
Conference abstracts, books and theses were not considered (if not also published as research articles).
Data extraction
For the articles identified via the screening procedure, the following data were extracted (by SK, LH, and RML): Author, publication year, country, psychiatric focus, participants (e.g., general population, clinical sample or patients with a specific mental disorder), generative AI model used, study aim, study design (e.g., randomised controlled trial or case report) and findings.
Data analysis
As we assumed that the literature on this topic would not be sufficiently mature to allow for quantitative analysis, a qualitative synthesis was performed.
Results
The identification and screening of the literature is illustrated by the PRISMA flowchart in Figure 1.
A total of 1156 studies were identified in the search. Out of 432 duplicated records, 349 were identified as database duplicates during the search, 77 were automatically marked by Covidence, while six were manually marked by the authors. The titles and abstracts of the remaining 724 studies were screened, based on which 525 studies were excluded. Of the 199 studies that underwent full-text review, 40 were included in the review, while 159 were deemed ineligible, predominantly due to irrelevant interventions (e.g., the body image chatbot, KIT, which allows users to select predefined responses, triggering content from a decision tree (Beilharz et al., Reference Beilharz, Sukunesan, Rossell, Kulkarni and Sharp2021), or a conversational system for smoking cessation, which selects a predefined response based on the classification of free-text messages from users (Almusharraf et al., Reference Almusharraf, Rose and Selby2020)).
The 40 included studies were published between 2022 and 2024, with the median year being 2023. The studies stem from 18 individual countries and seven geographical regions (determined by the first author’s first affiliation). Most countries only appear once, with the most prominent contributor being USA (n = 14), followed by Israel (n = 5) and Australia (n = 4). The countries encompass six geographical regions, with North America being most heavily featured (n = 14), followed by Europe (n = 10), the Middle East (n = 7), Oceania (n = 4), Asia (n = 4), and Africa (n = 1). The studies covered seven overall themes, listed in Table 1.
The characteristics and main findings of the 40 included studies are listed in Table 2.
The studies predominantly pertained to mental health and well-being more broadly (n = 13), while another frequent focus was addiction and substance use (n = 7). Some studies explored topics related to specific mental disorders, including schizophrenia (n = 3), bipolar disorder (n = 2), and depression (n = 2).
The majority of studies were designed as prompt experiments (n = 25), wherein the factualness and/or quality of AI responses to various queries was assessed. The designs of the remaining studies included surveying users regarding their experiences with generative AI, pilot studies, and case reports. Consequently, most studies did not enlist participants (n = 33). The ones that did, either recruited participants for surveys (n = 3), or enlisted participants to use/test generative AI as a part of an experimental setup (n = 3).
Of the 40 identified studies, 39 either implemented or surveyed opinions about models for language generation, while the remaining study used DALL · E 2 for image generation. Thirty-two studies investigated applications of ChatGPT, while the remaining studies examined use of Bard (n = 4), Bing.com (n = 2), Claude.ai (n = 1), LaMDA (n = 1), ES-Bot (n = 1), Replika (n = 1), GPT models not accessed through the ChatGPT interface (n = 4), and 25 mental health focused agents from FlowGPT.com (n = 1). Of the studies interacting with generative AI through the ChatGPT interface, 15 studies used a version of ChatGPT that relied on GPT-3.5, while nine studies investigated versions relying on GPT-4. For 10 of the studies, we could not find specifications of the underlying GPT model used.
Below, the main findings for each of the identified themes are described in brief.
Knowledge verification
A total of 12 studies investigated generative AI’s ‘understanding’ of psychiatric concepts. Heinz et al. (Reference Heinz, Bhattacharya, Trudeau, Quist, Song, Lee and Jacobson2023) assessed domain knowledge and potential demographic biases of generative AI, finding variable diagnostic accuracy across disorders and noting gender and racial discrepancies in outcomes. de Leon and De Las Cuevas (Reference de Leon and De Las Cuevas2023), along with Parker and Spoelma (Reference Parker and Spoelma2024), evaluated ChatGPT’s knowledge of specific medications, such as clozapine, and treatments for bipolar disorder, revealing both strengths in general information provision and weaknesses in providing up-to-date scientific references. McFayden et al. (Reference McFayden, Bristol, Putnam and Harrop2024) and Randhawa and Khan (Reference Randhawa and Khan2023) examined ChatGPT’s utility for patient education on autism and bipolar disorder, respectively, finding mostly accurate and clear responses but noting issues with linking relevant sources and references. Lundin et al. (Reference Lundin, Berk and Østergaard2023) and Amin et al. (Reference Amin, Kawamoto and Pokhrel2023) explored ChatGPT’s potential in psychoeducation for ECT and vaping cessation, respectively, observing generally accurate and empathic responses. Similarly, Luykx et al. (Reference Luykx, Gerritse, Habets and Vinkers2023) and Prada, et al. (Reference Prada, Perroud and Thorens2023) evaluated the quality of ChatGPT’s responses to various questions regarding epidemiology, diagnosis, and treatment in psychiatry and found the answers to be accurate and nuanced. Comparative studies by Hristidis et al. (Reference Hristidis, Ruggiano, Brown, Ganta and Stewart2023) and Sezgin et al. (Reference Sezgin, Chekeni, Lee and Keim2023) showed ChatGPT often outperforming traditional search engines in relevance and clinical quality of responses, but with lower reliability due to a lack of references. Lastly, Reference Herrmann-Werner, Festl-Wietek, Holderried, Herschbach, Griewatz, Masters, Zipfel and MahlingHerrmann-Werner et al. (2024) assessed ChatGPT’s performance on psychosomatic exam questions, demonstrating high accuracy but some limitations in cognitive processing at higher levels of Bloom’s taxonomy.
Education and research applications
Eight studies fell within the category of educational and research applications. While some studies revealed generative AI’s potential to assist in tasks such as providing hypothetical case studies for social psychiatry education (Smith et al., Reference Smith, Hachen, Schleifer, Bhugra, Buadze and Liebrenz2023) and generating drug abuse synonyms to enhance pharmacovigilance (Carpenter and Altman, Reference Carpenter and Altman2023), other applications uncovered significant limitations. McGowan et al. (Reference McGowan, Gui, Dobbs, Shuster, Cotter, Selloni, Goodman, Srivastava, Cecchi and Corcoran2023) found that both ChatGPT and Bard exhibited poor accuracy in literature searches and citation generation. Furthermore, Spallek et al. (Reference Spallek, Birrell, Kershaw, Devine and Thornton2023) observed inferior quality of ChatGPT’s responses for mental health and substance use education, compared to expert-created material. Similarly, Draffan et al. (Reference Archambault and Kouroupetroglou2023) found that generative AI struggled to adapt symbols for augmentative communication, and Rudan et al. (Reference Rudan, Marčinko, Degmečić and Jakšić2023) noted that ChatGPT provided unreliable output when interpreting bibliometric analyses. Additionally, Wang, Feng and We (2023) highlighted the need for vigilance when using ChatGPT due to the potential for inaccurate information. However, they also noted that ChatGPT served as an effective partner for understanding theoretical concepts and their relations. Moreover, Takefuji (Reference Takefuji2023) found ChatGPT to be helpful for generating code for rudimentary data analysis.
Clinician-facing tools
Seven studies examined the performance of AI models in tasks typically performed by mental health professionals, such as diagnosing, treatment planning, risk assessment, and making prognoses. While some studies found that ChatGPT demonstrated proficiency in diagnosing various conditions (Reference D’Souza, Amanullah, Mathew and SurapaneniD’Souza et al., 2023) and creating treatment plans for treatment-resistant schizophrenia in alignment with clinical standards (Galido et al., Reference Galido, Butala, Chakerian and Agustines2023), others highlighted limitations, including inappropriate recommendations for complex cases (Dergaa et al., Reference Dergaa, Fekih-Romdhane, Hallit, Loch, Glenn, Fessi, Ben Aissa, Souissi, Guelmami, Swed, El Omri, Bragazzi and Ben Saad2024) and errors in nursing care planning (Woodnutt et al., Reference Woodnutt, Allen, Snowden, Flynn, Hall, Libberton and Purvis2024). A version of ChatGPT based on GPT-4.0 was deemed capable of generating appropriate psychodynamic formulations from case vignettes and tailoring its responses to the specific wording and interpretations associated with various schools of psychodynamic theory (Hwang et al., Reference Hwang, Lee, Seol, Jung, Choi, Her, An and Park2024). However, studies also revealed performance discrepancies between generative AI and clinicians in areas like suicide risk assessment (Elyoseph and Levkovich, Reference Elyoseph and Levkovich2023) and prognosis (Elyoseph, et al., Reference Elyoseph, Levkovich and Shinan-Altman2024), with ChatGPT generally underestimating risk when compared to clinicians.
Ethics and safety
Four studies fell under the heading of ‘Ethics and safety’. These studies included perspectives on ethical and safety concerns surrounding generative AI. Østergaard and Nielbo (Reference Østergaard and Nielbo2023) addressed the use of stigmatising language in the field of AI. Instead of ‘hallucination’ to describe AI errors, they suggest alternative and more specific phrasing to avoid further stigmatisation of individuals experiencing genuine hallucinations and to provide more clarity about AI errors. The three remaining studies explored the safety of generative AI. Haman and Školník (Reference Haman and Školník2023) and Heston (Reference Heston2023) tested the likelihood of generative AI responses promoting and identifying risky behaviour (e.g., suggesting alcohol- or drug-related activities (Haman and Školník, Reference Haman and Školník2023), or recognising suicidality (Heston, Reference Heston2023)). They found that, although AI did not suggest risky behaviour, it was slow to react appropriately to user messages that should elicit immediate referral to health services. De Freitas et al. (Reference De Freitas, Uğuralp, Oğuz‐Uğuralp and Puntoni2024) evaluated how users respond to interactions with generative AI and determined that users react negatively to harmful responses perceived to originate from an AI. This includes both nonsensensical or unrelated AI replies which disregard sensitive user messages, as well as risky AI responses that contains, e.g., name-calling or encourage harmful behaviour (De Freitas et al., Reference De Freitas, Uğuralp, Oğuz‐Uğuralp and Puntoni2024).
Cognitive process imitation
Three studies investigated AI imitation of cognitive processes, focusing on emotional awareness and interpretation. Elyoseph et al. (Reference Elyoseph, Hadar-Shoval, Asraf and Lvovsky2023) compared ChatGPT’s emotional awareness to the general population while Elyoseph et al. (Reference Elyoseph, Levkovich and Shinan-Altman2024) evaluated the ability of ChatGPT and Bard (now Gemini) to interpret emotions from visual and textual data. They found that ChatGPT demonstrated significantly higher emotional awareness than human norms and performed comparably to humans in facial emotion recognition. Reference Hadar-Shoval, Elyoseph and LvovskyHadar-Shoval et al. (2023) examined ChatGPT’s ability to mimic mentalizing abilities specific to personality disorders, finding that the AI could tailor its emotional responses to match characteristics of borderline and schizoid personality disorders. These findings suggest that generative AI models can imitate certain aspects of human cognitive processes, particularly in emotional comprehension and expression.
Patient/consumer-facing tools
Three studies examined patient facing solutions for mental health. Alanezi (Reference Alanezi2024) conducted a qualitative study to evaluate ChatGPT’s effectiveness in supporting individuals with mental disorders, and found that it can provide self-guided support, though some ethical, legal, and reliability concerns remain. Similarly, Gifu and Pop (Reference Gifu and Pop2022) explored users’ perceptions of virtual assistants for mental health support, revealing that users believe these tools could be useful for reducing mental health problems. Sabour et al. (Reference Sabour, Zhang, Xiao, Zhang, Zheng, Wen, Zhao and Huang2023) evaluated the influence of a chatbot intervention on symptoms of mental distress. Their study found that the intervention decreased depressive symptoms, negative affect, and insomnia. However, the study did not find significant differences between generative and non-generative AI interventions in the short term, suggesting that the specific AI technology may be less critical than the overall digital support approach.
User perceptions and experiences
Under the category of user perceptions and experiences, three studies examined how both patients and mental health staff interact with generative AI. Two studies explored how individuals with mental health issues engaged with AI, while the remaining study investigated clinicians’ experiences with AI. Ma et al. (Reference Ma, Mei and Su2023) examined interactions with the AI companion chatbot, Replika (Luka, Inc., 2024), based on user comments from an online forum. Users appreciated Replika for its non-judgmental, on-demand support, which aided in boosting confidence and self-discovery. However, Replika also had significant limitations, including the production of inappropriate content, inconsistent communication, and the inability to retain new information. In an online survey examining perceptions of stereotyping by ChatGPT, Salah et al. (Reference Salah, Alhalbusi, Ismail and Abdelfattah2023) found correlations between perceived AI stereotyping and user self-esteem.
Blease et al. (Reference Blease, Worthen and Torous2024) conducted an online survey of psychiatrists’ experience with generative AI. The results portrayed a range of opinions on the harms and benefits of generative AI. The majority of psychiatrists were interested in the potential of generative AI to reduce the burden of documentation and administration, and were under the impression that most of their patients ‘will consult these tools before first seeing a doctor’, raising concern over patient privacy (Blease et al., Reference Blease, Worthen and Torous2024).
Discussion
This systematic review of use of generative AI in psychiatry identified 40 studies that met the criteria for inclusion. The vast majority of studies were designed as prompt experiments, in which researchers asked a series of questions to a language model – predominantly ChatGPT – and assessed the responses for correctness and usefulness in relation to specific tasks.
The review clearly demonstrates that the study of generative AI in mental health is a nascent yet exponentially growing field: the oldest study included in this review is from 2022, with 39 out of 40 studies being from 2023 or 2024 (the final search was conducted February 23, 2024). As a consequence, this review represents a still image of a field in rapid expansion. Indeed, most studies included in this review were pilot studies or feasibility studies exploring potential use cases, investigating user perceptions, or identifying potential ethical and safety concerns of prospective generative AI tools.
The relative immaturity of the field is evident in the absence of consensus on the definition of AI and generative AI in the studies screened as part of this review. The term ‘AI’ is used very loosely, often simply to describe a classification model. The majority of studies excluded based on the type of intervention were claiming to be ‘powered by AI’ which meant having a classification model tag, e.g., the sentiment of free-text input, which would then, in turn, trigger a pre-specified response. While this might fall under the broadest definition of generative AI, as the input does result in a textual output, we deemed it necessary to narrow our definition of generative AI to only include content generated in a less deterministic/preestablished manner (e.g., as seen in transformer and diffusion models such as those empowering ChatGPT, DALL·E, Sora and their equivalents).
Most of the identified studies focused on natural language implementations of generative AI, particularly ChatGPT, either by testing its psychiatric knowledge base or evaluating its capabilities as a mental health conversational companion. Though most of the included studies found that generative AI performed well at various tasks, some studies also highlighted potential safety issues. I.e., due to the inherent lack of predefinition in generative AI output, responses cannot be reliably predicted, and, thus, protection from ethical and safety breaches cannot be guaranteed. For these reasons, it is crucial for users, patients, practitioners, and their organisations to carefully consider and scrutinise the legal and ethical aspects of using generative AI.
While we did not conduct a formal quality assessment of the studies included in the review (a large proportion of studies were too preliminary/informal to allow for such assessment), it was our impression that many studies were of relatively low quality and had limited clinical relevance. Specifically, most studies were severely underspecified, both in terms of technology used (such as the type and version of models) and study design (e.g., specification of specific prompts), limiting reproducibility. Additionally, although many studies could be considered pilot studies, their results were often overgeneralised and overstated beyond what could reasonably be claimed from the results. Therefore, to advance the field of generative AI for mental health we propose the following guidelines for future research: First, to facilitate reproducibility and clarity of findings, we highly recommend studies to follow a set of reporting guidelines for generative AI, such as TRIPOD-LLM, to ensure that all relevant items are reported (Gallifant et al., Reference Gallifant, Afshar, Ameen, Aphinyanaphongs, Chen, Cacciamani, Demner-Fushman, Dligach, Daneshjou, Fernandes, Hansen, Landman, Lehmann, McCoy, Miller, Moreno, Munch, Restrepo, Savova, Umeton, Gichoya, Collins, Moons, Celi and Bitterman2024). Second, we encourage the field to move beyond simple ‘knowledge testing’ and prompt experiments and towards rigorously planned clinical trials involving users/patients and tasks with greater clinical relevance. Indeed, it is noteworthy that only a handful of studies recruited participants to interact with the technology, while even fewer structured the interaction (intervention) in a systematic manner. Also, future studies should ideally take the user/patient perspective into account in the design phase (i.e., co-design).
While several studies deemed the responses from generative AI to be clear and in accordance with scientific knowledge, some studies found that generative AI underestimates the risk of e.g. suicide (Haman and Školník, Reference Haman and Školník2023; Heston, Reference Heston2023) and handles crisis scenarios in an less than ideal manner (Heston, Reference Heston2023). Therefore, it is essential that chatbots developed for mental health/patient support ensure adequate handling of all levels of illness/symptom severity – including suicidal ideation.
This study should be interpreted in light of its limitations. First, the field is in its nascence and tangible new developments may happen quickly. This review merely represents a snapshot of the state of the field as of February 23, 2024, and new developments are likely to have emerged since the data collection concluded. Second, we implemented a broad search strategy; however, we cannot rule out the possibility that some relevant studies may have been overlooked. Third, it was not feasible to do a quantitative analysis due to heterogeneity of the studies. Fourth, while the literature identified in this review predominantly emphasised the clinical/care potential of generative AI in the context of mental health/psychiatry (likely due to the databases used for the search), it is apparent that there are important legal/ethical challenges that need to be addressed. An exhaustive review of the literature on these challenges would require a broader search strategy than employed here.
In conclusion, the field of generative AI in psychiatry and mental health is in its infancy, though evolving and growing exponentially. Unfortunately, many of the identified studies investigating the potential of generative AI in the context of mental health/psychiatry were poorly specified (particularly with regard to the methods). Therefore, moving forward, we suggest that studies using generative AI in psychiatric settings should aim for more transparency of methods, experimental designs (including clinical trials), clinical relevance, and user/patient inclusion in the design phase.
Acknowledgements
The authors are grateful to librarian Helene Sognstrup (Royal Danish Library) for her assistance with the search strategy and to Arnault-Quentin Vermillet (Aarhus University) and Jean-Christophe Philippe Debost (Aarhus University Hospital – Psychiatry) for translation from French.
Author contribution
Conception and design: SDØ, RML, and SK. Provision of study data: SDØ. Screening of data: SK and RML. Data analysis: SK, LH, and RML. Interpretation: All authors. Manuscript writing: All authors. Final approval of the manuscript: All authors.
Financial support
There was no specific funding for this study. Outside this study, SDØ is supported by the Novo Nordisk Foundation (grant number: NNF20SA0062874), the Lundbeck Foundation (grant numbers: R358-2020-2341 and R344- 2020-1073), the Danish Cancer Society (grant number: R283-A16461), the Central Denmark Region Fund for Strengthening of Health Science (grant number: 1-36-72-4-20), The Danish Agency for Digitisation Investment Fund for New Technologies (grant number 2020–6720), and Independent Research Fund Denmark (grant number: 7016-00048B and 2096-00055A).
Competing interests
SDØ received the 2020 Lundbeck Foundation Young Investigator Prize. SDØ owns/has owned units of mutual funds with stock tickers DKIGI, IAIMWC, SPIC25 KL and WEKAFKI, and owns/has owned units of exchange traded funds with stock tickers BATE, TRET, QDV5, QDVH, QDVE, SADM, IQQH, USPY, EXH2, 2B76, IS4S, OM3X and EUNL. The remaining authors report no conflicts of interest.