Early prediction of ADHD symptoms from perinatal characteristics: A machine learning study

Yee-Lam Ho; Bonnie Auyeung; Aja Murray

doi:10.1017/S0954579425100783

Early prediction of ADHD symptoms from perinatal characteristics: A machine learning study

Published online by Cambridge University Press: 10 November 2025

Yee-Lam Ho

Bonnie Auyeung and

Aja Murray

Show author details

Yee-Lam Ho*: Affiliation:
Department of Psychology, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, UK
Bonnie Auyeung: Affiliation:
Department of Psychology, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, UK
Aja Murray: Affiliation:
Department of Psychology, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Edinburgh, UK
*: Corresponding author: Yee-Lam Ho; Email: elimhylacademic@gmail.com

Article contents

Abstract
Introduction
Methods
Results
Discussion
Conclusions
Supplementary material
Data availability statement
Pre-registration statement
Funding statement
Competing interests
Ethical standards
References

Rights & Permissions

Abstract

Early identification of risk for attention-deficit hyperactivity disorder (ADHD) symptoms can enable more timely interventions and improve long-term outcomes. While previous research has linked various maternal and perinatal factors to ADHD, few studies have examined these predictors collectively in a single comprehensive analysis. This study aimed to assess whether later ADHD symptoms can be predicted from information available at birth, specifically ethnicity, maternal metabolic markers, mental health, and socioeconomic status. It additionally aimed to identify the most influential predictors. Using data from the Born in Bradford (BiB) study, we applied multiple linear regression (LR) and machine learning techniques to predict ADHD symptoms as measured by the Hyperactivity/Inattention subscale of the Strengths and Difficulties Questionnaire (SDQ). A 10-fold cross-validated LR model explained 6.97% of the variance in SDQ scores. In the random forest model, infant male sex and maternal smoking during pregnancy emerged as the top predictors. These findings provide proof of principle for early identification of children at risk of ADHD. Future models may benefit from incorporating additional perinatal data to improve predictive accuracy.

Keywords

attention-deficit hyperactivity disorder (ADHD)early prediction machine learning

Information

Type: Regular Article
Information: Development and Psychopathology , First View , pp. 1 - 14

DOI: https://doi.org/10.1017/S0954579425100783 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Introduction

Attention-deficit hyperactivity disorder (ADHD) symptoms are defined by difficulties in the domains of attention and/or hyperactivity/impulsivity (American Psychiatric Association, 2013). The estimated prevalence of ADHD in children is around 5% globally (Sayal et al., Reference Sayal, Prasad, Daley, Ford and Coghill2018), though figures vary, with Thomas et al. (Reference Thomas, Sanders, Doust, Beller and Glasziou2015) suggesting a prevalence of 7%, while Polanczyk et al. (Reference Polanczyk, Salum, Sugaya, Caye and Rohde2015) reporting a prevalence rate of 3.4%. In the UK, prevalence rates have been estimated to range from 1.4% (Russell et al., Reference Russell, Rodgers, Ukoumunne and Ford2014) to 5% (NICE, 2008) in children. However, symptoms are dimensional in nature, impacting individuals at both clinical and sub-clinical levels (Coghill & Sonuga-Barke, Reference Coghill and Sonuga-Barke2012; Salum et al., Reference Salum, Sonuga-Barke, Sergeant, Vandekerckhove, Gadelha, Moriyama, Graeff-Martins, Manfro, Polanczyk and Rohde2014).

While ADHD has been associated with certain strengths (Sedgwick et al., Reference Sedgwick, Merwood and Asherson2019), it has also been linked to poorer outcomes in a range of domains, such as peer, academic, occupational, and addiction issues (Cherkasova et al., Reference Cherkasova, Roy, Molina, Scott, Weiss, Barkley, Biederman, Uchida, Hinshaw and Owens2022; Sedgwick, Reference Sedgwick2018; Strine et al., Reference Strine, Lesesne, Okoro, McGuire, Chapman, Balluz and Mokdad2006). There is growing recognition that outcomes for young people with ADHD can be improved with earlier identification and intervention provision (Arnold et al., Reference Arnold, Hodgkins, Kahle, Madhoo and Kewley2020; DuPaul et al., Reference DuPaul, Evans, Mautone, Owens and Power2020; Halperin & Marks, Reference Halperin and Marks2019; Shephard et al., Reference Shephard, Zuccolo, Idrees, Godoy, Salomone, Ferrante, Sorgato, Catao, Goodwin and Bolton2022; Sonuga-Barke & Halperin, Reference Sonuga-Barke and Halperin2010), benefitting young people, their families, and healthcare systems.

Unfortunately, despite advances in this area, it is estimated that approximately 50% of children and adolescents who meet validated diagnostic criteria for ADHD remain undiagnosed (Cuffe et al., Reference Cuffe, Moore and McKeown2005; Froehlich et al., Reference Froehlich, Lanphear, Epstein, Barbaresi, Katusic and Gilman2007; Madsen et al., Reference Madsen, Ravn, Arnfred, Olsen, Rask and Obel2018; Okumura et al., Reference Okumura, Usami, Okada, Saito, Negoro, Tsujii, Fujita and Iida2019), prompting continued efforts to improve the early identification of ADHD. Accumulating evidence suggests that ADHD symptoms may be predictable based on information available very early in life, which could facilitate earlier identification and intervention. For example, studies have linked ADHD to a wide range of maternal, sociodemographic, and perinatal variables about which information can be known at or shortly after birth. For example, some evidence has suggested that younger maternal age (Chang et al., Reference Chang, Lichtenstein, D’Onofrio, Almqvist, Kuja-Halkola, Sjölander and Larsson2014), lower socioeconomic status (Russell et al., Reference Russell, Ford and Russell2015), membership in some ethnic groups (Bax et al., Reference Bax, Bard, Cuffe, McKeown and Wolraich2019; Coker et al., Reference Coker, Elliott, Toomey, Schwebel, Cuccaro, Tortolero Emery, Davies, Visser and Schuster2016; Zilanawala et al., Reference Zilanawala, Sacker and Kelly2018), and birth parity (being first-born) (Marín et al., Reference Marín, Seco, Serrano, García, Gaviria Gómez and Ney2014; Reimelt et al., Reference Reimelt, Wolff, Hölling, Mogwitz, Ehrlich, Martini and Roessner2021) may be linked to a greater risk of ADHD. Maternal health and health behaviors during pregnancy, such as prenatal infection (Hall et al., Reference Hall, Speyer, Murray and Auyeung2022; Walle et al., Reference Walle, Askeland, Gustavson, Mjaaland, Ystrom, Lipkin, Magnus, Stoltenberg, Susser and Bresnahan2022), metabolic syndrome (Kwok et al., Reference Kwok, Speyer, Soursou, Murray, Fanti and Auyeung2023), instrumental delivery (e.g., forceps or ventouse use) (Ben Amor et al., Reference Ben Amor, Grizenko, Schwartz, Lageix, Baron, Ter-Stepanian, Zappitelli, Mbekou and Joober2005; Romero et al., Reference Romero, Lindström, Listermar, Westgren and Ajne2023), pre-eclampsia (Sun et al., Reference Sun, Moster, Harmon and Wilcox2020), anemia (Wiegersma et al., Reference Wiegersma, Dalman, Lee, Karlsson and Gardner2019), stress (Ronald et al., Reference Ronald, Pennell and Whitehouse2011), mental health (Clements et al., Reference Clements, Castro, Blumenthal, Rosenfield, Murphy, Fava, Erb, Churchill, Kaimal and Doyle2015; Speyer et al., Reference Speyer, Neaves, Hall, Hemani, Lombardo, Murray, Auyeung and Luciano2022), alcohol use and smoking (He et al., Reference He, Chen, Zhu, Hua and Ke2020; Langley et al., Reference Langley, Heron, Smith and Thapar2012) have also been linked to a greater risk of ADHD. Finally, birth and early infant outcomes such as prematurity, low birth weight (Franz et al., Reference Franz, Bolat, Bolat, Matijasevich, Santos, Silveira, Procianoy, Rohde and Moreira-Maia2018; Pettersson et al., Reference Pettersson, Sjölander, Almqvist, Anckarsäter, D’Onofrio, Lichtenstein and Larsson2015), and small head circumference (Lahti et al., Reference Lahti, Räikkönen, Kajantie, Heinonen, Pesonen, Järvenpää and Strandberg2006) have additionally been linked to ADHD. Male infants are also at greater risk of developing ADHD than females (Ramtekkar et al., Reference Ramtekkar, Reiersen, Todorov and Todd2010; Willcutt, Reference Willcutt2012). Building a prediction model utilizing these and other candidate exploratory factors related to ADHD could help identify those who could be prioritized for monitoring and early intervention. These commonly recorded factors in healthcare datasets can provide a highly practical means to gain early information on later ADHD risk.

It is important to note that when used as a means to promote early identification, these factors need not be causal in ADHD (e.g., Sciberras et al., Reference Sciberras, Mulraney, Silva and Coghill2017; Thapar et al., Reference Thapar, Cooper, Eyre and Langley2013). Indeed, it is valuable to distinguish causal and predictive modeling. In fields such as ecology, healthcare, and machine learning, there is a growing discussion about the differences between “causal” and “predictive” modeling, even if these terms are not explicitly used (Arif & Macneil, Reference Arif and MacNeil2022; Prosperi et al., Reference Prosperi, Guo, Sperrin, Koopman, Min, He, Rich, Wang, Buchan and Bian2020; Young, Reference Young2019). Causal modeling endeavors to explain “why,” that is, the mechanism behind the relationship between the independent and outcome variables, that is, to explain why the latter will change with the alteration of the former (Arif & Macneil, Reference Arif and MacNeil2022; Prosperi et al., Reference Prosperi, Guo, Sperrin, Koopman, Min, He, Rich, Wang, Buchan and Bian2020; Young, Reference Young2019). This technique requires a thorough control and analysis of all factors related to the variables of interest (Arif & Macneil, Reference Arif and MacNeil2022; Prosperi et al., Reference Prosperi, Guo, Sperrin, Koopman, Min, He, Rich, Wang, Buchan and Bian2020). In contrast, even “confounders” can be considered valuable predictors in predictive modeling contexts (Young, Reference Young2019). Rather than having an explanatory focus, this approach aims to describe correlations and forecast outcomes based on known inputs. A good generalization of predictive results to new observations generally indicates robust predictions. Predictive modeling is significant in identifying potential risk factors in health research contexts (e.g., Ng et al., Reference Ng, Sun, Hu and Wang2015), while causal modeling is critical for decision-making in clinical settings, such as interventions and validating medication effects (e.g., Almeda et al., Reference Almeda, García-Alonso, Salinas-Pérez, Gutiérrez-Colosía and Salvador-Carulla2019; Bica, Reference Bica2022).

Nearly 90% of studies on ADHD prediction modeling included in a recent review focus on diagnosis (Salazar de Pablo et al., Reference Salazar de Pablo, Iniesta, Bellato, Caye, Dobrosavljevic, Parlatini, Garcia-Argibay, Li, Cabras and Haider Ali2024). These studies often emphasize detecting whether an individual has an ADHD diagnosis using longitudinal data or early developmental information (Salazar de Pablo et al., Reference Salazar de Pablo, Iniesta, Bellato, Caye, Dobrosavljevic, Parlatini, Garcia-Argibay, Li, Cabras and Haider Ali2024). However, increasing evidence suggests that ADHD symptoms are dimensional (e.g., Marcus & Barry, Reference Marcus and Barry2011; Panagiotidi et al., Reference Panagiotidi, Zavlis, Jones and Stafford2024). Additionally, diagnostic cut-offs can vary by context, affecting the identification of preclinical risks (e.g., Harrison & Edwards, Reference Harrison and Edwards2023; Miyasaka et al., Reference Miyasaka, Kajimura and Nomura2018). Research predicting continuous ADHD symptom scores helps address these issues but is limited.

In one recent exception, Dooley et al. (Reference Dooley, Healy, Cotter, Clarke and Cannon2024) analyzed secondary data from a cohort study to investigate whether 40 pre- or perinatal factors generally known at birth, including pregnancy complications and maternal demographic information, could predict continuous ADHD symptom scores in children aged 9–10. Elastic net regression identified 17 predictors, which collectively explained 8% of ADHD symptom variance. The study found that predictive accuracy varied by income and sex, but suggested that continuous ADHD symptom prediction is possible to an extent from birth. Nevertheless, the study was limited to the US, and the regression model applied was restricted to linear relationships.

Traditional regression assumes linearity, which may not be suitable for examining prediction that involves complex interactions. This is important in the context of ADHD symptom prediction because many studies have suggested that the development of ADHD is multifactorial, involving genetic and environmental factors and their interactions. It is challenging to accurately define the complex interplay between them (e.g., Faraone& Larsson, Reference Faraone and Larsson2019; Thapar et al., Reference Thapar, Cooper, Eyre and Langley2013). Therefore, complexity, nonlinearity, and interactive effects are more likely to exist in the development of ADHD, and traditional linear regression (LR) has limitations in capturing these. In enhancing a predictive model for studying ADHD, it is better to incorporate a wide range of predictors and apply a model that could automatically detect their intricate interactions without a manual definition of their terms.

Given the potential complexity of relationships between predictive factors and ADHD, machine learning techniques could offer advantages, enhance predictive power relative to regression models, and bypass their restrictive assumptions. Tree-based methods, such as classification and regression trees (CART) and random forest (RF), do not assume additivity and can detect nonlinear relationships, the most salient interactions, and even highly diverse structures without the manual specification required in traditional LR (e.g., Banerjee et al., Reference Banerjee, Reynolds, Andersson and Nallamothu2019; Uddin & Lu, Reference Uddin and Lu2024). Importantly, they may improve on the predictive power of regression models. Certain machine learning methods also provide high interpretability, such that a straightforward understanding of the model findings is not sacrificed (Dwyer et al., Reference Dwyer, Falkai and Koutsouleris2018). A recent study by Garcia-Argibay et al. (Reference Garcia-Argibay, Zhang-James, Cortese, Lichtenstein, Larsson and Faraone2023) utilized registry data based on Sweden’s population, supporting the application of machine learning techniques to large-scale data that provides early-life information. This approach can yield good predictions regarding the diagnosis of ADHD and identify particular early-life risk factors.

Aims

The aims of the current study were to use the UK-based Born in Bradford (BiB) cohort study to examine the overall “predictability” of ADHD from information typically available at birth, and to examine which predictors were the most important.

Methods

Participants

Participants are from the BiB study. BiB was established in 2007 as a longitudinal cohort study examining the multiple factors that impact pregnant individuals’ physical and mental well-being and their children. It is based in Bradford, a city in northern England with an ethnically and socioeconomically diverse population. Approximately half of the mothers in the region are of non-UK origin, primarily South Asian. The cohort study has been found to be approximately representative of the maternal population in Bradford.

The BiB project linked the pregnant individual’s records, obtained during their recruitment while receiving routine procedures at the Bradford Royal Infirmary, with their children’s educational and developmental outcomes through subsequent research. Hence, researchers can use the data to study the relationship between early factors and children’s developmental outcomes (Raynor Reference Raynor2008; Wright et al., Reference Wright, Small, Raynor, Tuffnell, Bhopal, Cameron, Fairley, Lawlor, Parslow and Petherick2013). The current study uses ADHD symptom data from children recruited in the “Starting School,” which were originally from the BiB project cohorts and thus have linked perinatal and ADHD symptom data (Pettinger et al., Reference Pettinger, Kelly, Sheldon, Mon-Williams, Wright and Hill2020; Shire et al., Reference Shire, Andrews, Barber, Bruce, Corkett, Hill, Kelly, McEachan, Mon-Williams and Tracey2020).

From 2007 to 2010, 12,453 women and 13,776 children were involved in the complete BiB cohort study. For the current analyses, 2063 cases were derived with complete outcome variable data. We utilized only the first child of pregnant individuals with multiple pregnancies.

Mothers

During March 2007 and November 2010, 12,453 pregnant individuals were recruited from the Bradford Royal Infirmary between 26 to 28 weeks of gestation while receiving routine care. Baseline measures were obtained through interviews and linked to their and their children’s primary and secondary care records. Information on biological, social, economic, educational and general health was collected. In addition to data obtained through interviews, further research was conducted to extract records from maternal paper notes, providing details on antenatal care, delivery notes, and the biological characteristics of newborns, such as gestational age, maternal blood pressure, delivery complications, and infant birth weight (Wright et al., Reference Wright, Small, Raynor, Tuffnell, Bhopal, Cameron, Fairley, Lawlor, Parslow and Petherick2013).

Children

A subset of BiB children aged 4–5 took part in the “Starting School” study, which included 94 out of 142 primary schools in Bradford during two consecutive academic years from 2012 to 2014. Overall, 3,444 BiB cohort children participated in “Starting School.” “Starting School” aims to predict children’s physical, mental, and educational development by examining their physical motor, cognitive language, and socio-emotional development via various in-school assessments. Assessments include the Strengths and Difficulties Questionnaire (SDQ). It was completed once by teachers during each child’s Reception year (the first year of primary school), when children were between 4 and 5 years old. Each child was assessed at a single time point within this age range (Pettinger et al., Reference Pettinger, Kelly, Sheldon, Mon-Williams, Wright and Hill2020; Shire et al., Reference Shire, Andrews, Barber, Bruce, Corkett, Hill, Kelly, McEachan, Mon-Williams and Tracey2020). A sample of 2063 children from the Hyperactivity/Inattention (H/I) subscales of the SDQ served as the initial analytic sample and outcome variable in our study. This was created by linking the pregnant individual’s data with their first-born child’s biological characteristics at birth, based on completed H/I SDQ subscale data.

Of the initial analytic sample, 51% of child participants were female, and 48.7% were male (sample size = 2042; missing rate = 1.0%). The mean age of the pregnant individual was 27.3 years (SD = 5.67) (sample size = 1560; missing rate = 24.4%). Among the sample of pregnant individuals’ ethnicity (sample size = 1558; missing rate = 24.5%), 51.9% were Pakistani, 36.3% were White British, and 11.8% were from other ethnic groups. Missingness occurred due to incomplete data across a list of predictors. The overall missing rate was around 30%, and the range of missingness varied from 1 to 84%, excluding the outcome variable, with zero missingness. High missingness was found in maternal smoking data, with 84%, and alcohol exposure, with 78%; and children’s cord blood (except leptin), with 70.4%. Full descriptive statistics, including the proportions of missing data for all variables included in the analyses, are provided in Tables 1 and 2.

Table 1. Descriptive statistics of continuous variables

Note. The missing rate for each variable is based on the initial analytical sample of 2063. Details of how the sample of 2063 was derived are described in the main text.

Table 2. Descriptive statistics of categorical variables

Note. The distribution of each level of the categorical variable is provided. The missing rate for each variable is based on the initial analytical sample of 2063. Details of how the sample of 2063 was derived are described in the main text.

Measures

Predictor variables

A set of predictors was prioritized, guided by prior literature both theoretically and empirically linking these factors to an increased risk of ADHD and other neurodevelopmental difficulties (e.g., Chang et al., Reference Chang, Lichtenstein, D’Onofrio, Almqvist, Kuja-Halkola, Sjölander and Larsson2014; He et al., Reference He, Chen, Zhu, Hua and Ke2020; Speyer et al., Reference Speyer, Neaves, Hall, Hemani, Lombardo, Murray, Auyeung and Luciano2022). Being used in previous predictive modeling and machine learning studies (e.g., Dooley et al., Reference Dooley, Healy, Cotter, Clarke and Cannon2024; Garcia-Argibay et al., Reference Garcia-Argibay, Zhang-James, Cortese, Lichtenstein, Larsson and Faraone2023) was an additional selection criterion. Availability from hospital routine, birth record, and sub-studies in the BiB study was a constraint on predictor inclusion.

Our predictor selection was also guided by the aim of developing a predictive model using early-life risk factors that are commonly observable in routine perinatal data. For example, although male sex and maternal smoking differ in clinical modifiability, they are significant as well as prevalentpredictors relevant to ADHD symptom development (e.g., Lawder et al., Reference Lawder, Whyte, Wood, Fischbacher and Tappin2019; Pietersma et al., Reference Pietersma, Mulders, Sabanovic, Willemsen, Jansen, Steegers, Steegers-Theunissen and Rousian2022; Willcutt, Reference Willcutt2012). Additionally, biological and psychosocial variables, including socioeconomic status and maternal and infant health indicators, can reflect the multidimensional influences on ADHD and have proven valuable predictors in the machine learning study by Garcia-Argibay et al. (Reference Garcia-Argibay, Zhang-James, Cortese, Lichtenstein, Larsson and Faraone2023).

Based on the above considerations, predictor variables were the pregnant individual’s age, ethnicity, country of birth, marital status, cohabitation status, educational level, socioeconomic position, Index of Multiple Deprivation (IMD), exposure to alcohol and smoking, mental health well-being (General Health Questionnaire; GHQ), metabolic markers and syndrome (pregnant individual’s BMI, HDL, triglycerides, systolic and diastolic blood pressure and fasting glucose levels, existing diabetes and hypertension), maternal infection and conditions related to adverse pregnant outcomes. Assistance required during birth (obstetric intervention at birth) was also considered a maternal predictor. Predictors related to the infant were their sex, cord blood biomarkers, birth weight, gestational age at birth and abdominal and head circumference. Notably, some categorical variables, typically IMD, were recoded into three levels due to the observation of zero variance in some levels of the original five-level structure. Definitions and coding methods of specific predictors are available in the Supplementary Materials Tables S1 to S3 (pp. 1–7).

Outcome variable

ADHD was measured using the Hyperactivity/Inattention (H/I) subscales of the Strengths and Difficulties Questionnaire (SDQ). (Descriptive information for H/I SDQ score is available in Table 1). Scores ranged from 0 to 10, with an average of 2.67 (SD = 2.81). The H/I subscale in SDQ is widely used internationally in a range of contexts, including epidemiological and clinical studies, for assessing children’s (3–16 years old) ADHD symptoms (e.g., Brandt et al., Reference Brandt, Patalay and Kerner auch Koerner2021; Carballo et al., Reference Carballo, Rodríguez-Blanco, García-Nieto and Baca-García2018). In BiB, the questionnaires were completed by teachers, who were required to have known the child for at least half a year (Shire et al., Reference Shire, Andrews, Barber, Bruce, Corkett, Hill, Kelly, McEachan, Mon-Williams and Tracey2020). The H/I subscale contains five items: “restless, overactive, cannot stay still for long,” “constantly fidgeting or squirming,” “easily distracted, concentration wanders,” “thinks things out before acting,” “sees tasks through to the end, good attention span.” Each item was rated on a three-point (0: Not True, 1: Somewhat True, 2: Certainly True; some are scored in reverse order) Likert Scale. The H/I scale has shown good predictive validity, test–retest reliability and internal consistency in previous research (e.g., Algorta et al., Reference Algorta, Dodd, Stringaris and Youngstrom2016; Almeda et al., Reference Almeda, García-Alonso, Salinas-Pérez, Gutiérrez-Colosía and Salvador-Carulla2019; Brandt et al., Reference Brandt, Patalay and Kerner auch Koerner2021; Carballo et al., Reference Carballo, Rodríguez-Blanco, García-Nieto and Baca-García2018; Hall et al., Reference Hall, Guo, Valentine, Groom, Daley, Sayal and Hollis2019). However, there is debate over the cut-off score for ADHD. For example, in based on a study in Spain, the suggested score is 8 (Carballo et al., Reference Carballo, Rodríguez-Blanco, García-Nieto and Baca-García2018), while in the UK, the suggested score has varied. A lower score of ≥ 4 or ≥ 5 is suggested for youth or younger adults, and a score of ≥ 7 or ≥ 8 for children (Bryant et al., Reference Bryant, Guy, Team and Holmes2020; Riglin et al., Reference Riglin, Agha, Eyre, Bevan Jones, Wootton, Thapar, Collishaw, Stergiakouli, Langley and Thapar2021; Ullebø et al., Reference Ullebø, Posserud, Heiervang, Gillberg and Obel2011). Recommendations are also provided by the test developers: http://www.sdqinfo.org/. Given ongoing uncertainty regarding optimal cut points and the fact that ADHD symptoms are dimensional, we did not introduce a cut-off score and instead analyzed the scores on a continuous scale.

Table 3. Multiple LR results

Note: Reference categories were as follows – ethnicity: other, for example, Asian; marital status: unmarried; cohabitation status: not living with a partner; education level: equal or higher than A’ levels; socioeconomic status: higher dependency or financially difficulty; metabolic syndrome: no; assistance required during birth: no; infant sex: male; maternal smoking: no; maternal alcohol: no; maternal infection: no; conditions associated with adverse pregnancy outcomes: no.

The H/I SDQ total score used in the main analyses was fully complete in the analytic sample, with a sample of 2063 and no imputation was required. Internal consistency of the H/I SDQ score was assessed using item-level data from the same analytic sample (using the “psych” R package; Revelle, Reference Revelle2023). Some H/I SDQ items had missing data; multiple imputation was thus conducted to ensure complete data availability for assessing internal consistency. Cronbach’s alpha was .89, indicating excellent internal reliability.

Analysis

Statistical analyses included correlation, unadjusted LR models, multiple LR models, CART, and RF models.

Correlation and unadjusted regressions were used for descriptive purposes to show the “raw” associations between each predictor and the outcome. Correlation analysis also allowed us to identify potentially problematic levels of multicollinearity. For predictive analyses, LR was included because of its interpretability. Further, because it is widely used for prediction, it provides a useful baseline against which machine learning methods can be compared. Given the advantages discussed earlier, machine learning methods were also employed. Both CART and RF were used because of their complementary strengths and weaknesses. CART provides higher interpretability because it involves fitting only a single tree; however, RF is an ensemble method and has the associated advantages of fitting and aggregating multiple trees.

Multiple imputation using chained equations (MICE) was used to deal with missing data, with a single imputed dataset analyzed due to the complexities (and computational intensity) of combining multiple imputation with RF. Missingness diagnosis and the application of MICE were performed in accordance with Newman’s (2014) guidelines. Little’s MCAR (Missing Completely At Random) test was conducted using the “misty” R package (Yanagida, Reference Yanagida2024), which indicated that the data was not MCAR (χ ² = 13,793.39, df = 10,431, p < .001). The use of multiple imputation is based on an assumption of “missing at random” meaning that the missingness can be predicted based on modeled data. Given that we had relatively comprehensive baseline data and in the absence of any strong reason to assume that the data were subject to a missing not at random (MNAR) mechanism, we judged this assumption to be reasonable. The data distribution before and after imputations is provided in Supplementary Materials Tables S4 to S5 (pp. 8–9).

All continuous predictor variables, including the pregnant individual’s age, GHQ total score (mental health), the infant’s gestational age at birth, cord blood biomarkers, and other information, such as birth weight recorded after birth, were standardized by z-standardization using the scale() function in R prior to initial analysis (i.e., they were rescaled to have Mean = 0, SD = 1).

Initial analysis: correlation and unadjusted regression model

Before analyzing the various predictors’ predictive capabilities, initial analyses including correlation and unadjusted LR models were conducted to evaluate the basic relationships between the variables and to inform the selection of predictors for later analysis by identifying highly collinear predictors that may present issues later.

The “hetcor” function in the “polycor” R package was used to calculate a mixed (Pearson, polyserial, and polychoric) correlation matrix for 136 pairs of variables, excluding nominal variables. An unadjusted LR model was run for each of the 25 predictor variables using the lm() function in R, which handles missing values by conducting a complete-case analysis.

Predictive analyses

The study employed three types of predictive analysis using a single imputed dataset: multiple LR, a CART, and RF.

There are a few advantages in applying multiple LR in our study. First, it is well recognized as an interpretable method in explanatory and predictive research. Second, the LR model can still be considered pragmatic even when the residuals are not normally distributed in large samples. Third, it can be used to handle various types of variables (Schmidt & Finan, Reference Schmidt and Finan2018; Yang et al., Reference Yang, Tu and Chen2019). Nevertheless, linearity assumed by LR means that the effect of a predictor on the outcome remains constant and additive without being modified by other factors unless the interaction relationships are specified explicitly (McClelland & Judd, Reference McClelland and Judd1993; West et al., Reference West, Aiken, Wu, Taylor, Rosenzweig and Porter1991). Assuming linearity and equal variance in a model may limit its predictive power in complex real-world contexts (Ernst & Albers, Reference Ernst and Albers2017). Hence, it is beneficial to have multiple LR serve as a baseline for demonstrating the direction and strength of the associations, as well as for comparison with ML models.

The ML approaches, namely CART and RF, were applied to complement multiple LR to capture more complex patterns. They are flexible models that can accommodate potential nonlinear relationships and real-world and mixed-type data (which include various numeric, ordinal, and nominal variables) while examining the importance of specific predictors. For a more detailed and technical interpretation of ML and the tree-based methods, see (Banerjee et al., Reference Banerjee, Reynolds, Andersson and Nallamothu2019; Uddin & Lu, Reference Uddin and Lu2024).

CART takes on the form of a tree with “branches” representing different paths. Each branch represents a decision based on features that split the data into distinct subsets. These decisions can be based on either categorical (e.g., male or female) or continuous variables (e.g., older or younger than 20). The tree continues to create splits based on maximizing similarity for cases within the splits until reaching the endpoints, known as “leaves,” where final predictions are made based on the path followed by the data (Breiman et al., Reference Breiman, Friedman, Olshen and Stone1984). CART identifies the optimal breakpoints in continuous variable by examining all possible values of a predictor and selecting the one that best separates the outcome variable into groups with more similar values. In essence, it minimizes differences within groups by reducing the average error, which is measured as the mean squared error (MSE). A primary advantage of CART is that it can easily visualize the structure of the predictive relationships; however, it is prone to overfitting (Breiman et al., Reference Breiman, Friedman, Olshen and Stone1984).

RF uses an ensemble of trees to improve upon CART and overcome its limitations.

These trees are generated from a bootstrapped dataset, and their splits are based on a random subset of features. Hence, each tree in the RF model generates a different prediction, which is then aggregated through averaging (for regression; continuous variable) or majority voting (for classification; binary variable) to produce a final prediction (Breiman, Reference Breiman2001; Cutler et al., Reference Cutler, Cutler, Stevens, Zhang and Ma2012). This approach results in a more stable and accurate estimation than a single tree in CART. Compared to LR, it can allow for relaxed assumptions of linearity and equal variance and more accurate prediction (Ali et al., Reference Ali, Khan, Ahmad and Maqsood2012; Marchese Robinson et al., Reference Marchese Robinson, Palczewska, Palczewski and Kidley2017; Prajwala, Reference Prajwala2015; Schonlau & Zou, Reference Schonlau and Zou2020). It is, however, difficult to visualize the structure of the predictive relationships from the RF models, for which is an ensemble of many single classical and regression trees.

To facilitate more direct comparison of LR, CART, and RF all were implemented using a common pipeline: the mikropml pipeline (Topçuoğlu et al., Reference Topçuoğlu, Lapp, Sovacool, Snitkin, Wiens and Schloss2021) using the package of the same name in R. The dataset was pre-processed by scaling the continuous variables and creating dummy variables for the categorical variables (Topçuoğlu et al., Reference Topçuoğlu, Lapp, Sovacool, Snitkin, Wiens and Schloss2021). The pipeline split the dataset into training and testing sets with a typical proportion of 70:30. Additionally, 10-fold cross-validation with 100 partitions was conducted on the three models. The best tuning parameters for each model (a and l for LR; maxdepth for CART and mtry for RF) were automatically selected based on the performance statistics calculated by the pipeline (see Table 4 in the below section).

Table 4. Accuracy metrics of the MR regression, CART, and RF models

Note. CV = Cross-validation; Train = Training dataset; Test = Test dataset; RMSE = Root Mean of Squared Error; MAE = Mean Absolute Error.

Results

Initial analyses

Imputation allowed the use of a sample size of 2063 for these analyses. Correlation and univariate regression analyses indicated weak correlations between most predictors and H/I SDQ scores (r < .24). An unadjusted LR using complete-case data explained minimal variance (R ² = .50), and no individual predictors exhibited statistically significant effects. However, strong intercorrelations were noted between certain predictors, such as head circumference and birth weight, as well as between maternal education and socioeconomic status (r > .60). Additional multicollinearity checks using variance inflation factors (VIF) from a multiple LR revealed high collinearity (VIF>5) between some predictors, indicating potential collinearity between predictors that may bias parameter estimates. As a result, two predictors: country of birth and the IMD, were excluded due to redundancy. After these exclusions, the final model included 23 predictors of the original 25 considered, all of which had acceptable VIF values (VIF < 5). The results of the correlation and univariate regression analyses are provided in the Supplementary Materials (see Tables S6 and S7, pp. 12–14). Diagnostic information for the multiple LR using complete-case data is provided in Figures S1 to S2 (pp. 15–16), while that using imputed data is provided in Figures S3 to S5 (pp. 17–27) Figures 1 to 3.

Predictive models

The multiple LR model (F (25, 2037) = 9.26, p < .001) explained 10% of the variance in the H/I SDQ score (R ² = .10, adj R ² = .09). Significant associations were found for White British (B = .28, 95% CI [.12, .43], p = .001; reference = “other ethnicity,” e.g., Asian) and Pakistani ethnicity (B = .17, 95% CI [.02, .31], p = .022), being married (B = .15, 95% CI [ < .001, .29], p = .045; reference = “unmarried,” infant’s cord bold triglycerides level (B = .10, 95% CI [.05, .15], p < .001), female infant (B = −.53, 95% CI [−.62, −.44], p < .001; reference = “male”), infant’s head circumference (B = −.08, 95% CI [−.15, −.02], p < .012), maternal smoking (B = .23, 95% CI [.14, .33], p < .001; reference = “non-smoker”), with offspring’s H/I SDQ score (Table 3).

Table 4 provides the model adequacy metrics for the multiple LR, CART, and RF models. These metrics suggest that the performance of the three models was similar, but the LR model achieved the best prediction, with a slightly lower RMSE and the highest R-squared. Using 23 predictors, 6.97% of the variation in the offspring’s H/I SDQ score could be explained using a 10-fold cross-validation with multiple LR in the test dataset. This was higher than CART (5.26%) and RF (5.81%). While the multiple LR model had the lowest RMSE values of 2.67 from cross-validation, it was only slightly different from the 2.70 obtained from the CART and RF models. These findings suggest that information available by birth can help predict ADHD symptoms later; however, the variation explained was modest.

The feature importance plot from the optimal model (the multiple LR) showed that infant male sex and maternal smoking were the top two most important predictors, followed by ethnicity = “Others”, infant’s cord blood triglycerides, head circumferences and other marital status (Figure 1). The pattern of results, as well as the feature importance statistics (for the multiple LR, provided in Table S8; p. 28), was similar for CART and RF (Figures 2, 3), which suggested that male sex and maternal smoking as the most important predictors. Notably, the LR and RF models produce importance scores for all predictors, whereas the CART model assigns nonzero importance only to those used in the final tree splits. As such, the CART feature importance plot highlights only the most discriminative predictors, namely, male sex and maternal smoking, and typically produces a more concise set of variables compared to regression- or ensemble-based approaches.

Figure 1. BarplotoftheFeatureImportancefor the Multiple linear regression (LR) model.

Figure 2. Bar plot of the feature importance for the classification and regression trees (CART) model.

Figure 3. Bar plot of the feature importance for the random forest (RF) model.

Discussion

Our study found that around 7% of the variation in ADHD symptoms, measured by children’s H/I SDQ subscale score at the age of five could be predicted based on perinatal and sociodemographic predictors typically easy to gather around the time of birth. Male sex, maternal smoking, and infant’s cord blood leptin emerged as the most influential predictors. The results suggest that prediction models based on data available around the time of birth could be used to help identify those at risk of later ADHD symptoms.

The modest variance explained is consistent with the complex nature of ADHD etiology, involving multiple factors, both genetic and environmental while also involving complex gene–environment interplay (Balogh et al., Reference Balogh, Pulay and andRéthelyi2022; Leffa et al., Reference Leffa, Caye, Belangero, Gadelha, Pan, Salum and Rohde2023). Numerous studies suggest that ADHD is a highly heritable but complex condition influenced by multiple factors (Faraone& Larsson, Reference Faraone and Larsson2019; Gizer et al., Reference Gizer, Ficks and Waldman2009; Thapar et al., Reference Thapar, Cooper, Eyre and Langley2013). Although genetics play a crucial role in its development, with approximately 74% heritability (Faraone& Larsson, Reference Faraone and Larsson2019), common individual DNA risk variants only contribute a tiny effect (Luo et al., Reference Luo, Weibman, Halperin and Li2019). Findings from twin studies have highlighted that even combining the impact of these DNA risk variants only explains around 22% of heritability. Additionally, by estimating and combining the effects of thousands of genetic variants, polygenic risk scores (PRS) revealed only 5.5% of ADHD symptoms can be predicted (Faraone & Larsson, Reference Faraone and Larsson2019), similar to the ∼ 7% of variance explained here by factors feasible to collect around birth.

A similar amount of explained variance, at around 7 to 8% variation, was found in a similar study by Dooley et al. (Reference Dooley, Healy, Cotter, Clarke and Cannon2024). Unlike our focus on ML with 23 predictors related to perinatal and sociodemographic characteristics, they used elastic net regression and included 40 detailed pre- and perinatal variables, such as maternal substance use, obstetric complications, child demographics, maternal drug use and vitamin intake. Furthermore, Dooley et al., measured child ADHD with the parent-reported Child Behaviour Checklist (CBCL) score at ages 9–10, while our study analyzed the teacher-rated H/I subscale from the SDQ when children were 4 to 5 years old. Despite differences in our predictors’ choices and definitions, the similar variance explained arguably highlights the inherent difficulty in predicting ADHD symptoms from perinatal factors alone.

In terms of the predictors that emerged as highest in feature importance, we found similar results to Dooley et al., Their study identified infant male sex and pregnant individual’s smoking during pregnancy as two of the three most significant predictors of ADHD among the 40 variables; these predictors also emerged with the highest feature importance in the current analysis. Garcia-Argibay et al. (Reference Garcia-Argibay, Zhang-James, Cortese, Lichtenstein, Larsson and Faraone2023), who applied ML models’ to registry-based data in Sweden, also found male sex to be one of the five top predictors (the others were: criminal convictions of parents, history of ADHD in the family, communication and learning difficulties, and academic performance of the child). However, results varied in their sex-stratified models. The next most important features in the current analysis (based on the RF model selected as the optimal model) were the infant’s cord blood leptin, the pregnant individual’s age, the infant’s cord blood triglycerides, head circumference, and the infant’scord blood adiponectin. These predictors are rarely investigated in prior ADHD prediction studies and represent a novel contribution of the present study. Taken together, the results suggest that these eight variables might be prioritized in future studies aiming to develop predictive models.

The superiority of the LR model over CART and RF in the present study is also consistent with some previous findings. Garcia-Argibay et al., analyzed the predictive performance of a range of ML models and found that the RF model, which yielded an area under the curve (AUC) of .68 and showed overfitting signs poorer predictive accuracy than the logistic regression model (AUC = .74). This suggests that complex algorithms, particularly nonlinear models, may not outperform traditional regression models when analyzing perinatal predictors, which may indicate a lack of nonlinearity and/or complex interactions.

The use of predictive modeling in ADHD research has been a burgeoning area in recent years; however, the use cases have been mostly in relation to the classification of ADHD presence, rather than early prediction. A systematic review by Salazar de Pablo et al. (Reference Salazar de Pablo, Iniesta, Bellato, Caye, Dobrosavljevic, Parlatini, Garcia-Argibay, Li, Cabras and Haider Ali2024) highlighted that nearly 90% of predictive modeling studies in ADHD had high predictive accuracy, as defined by an AUC ranging from 0.50 to 0.97. Nevertheless, these studies focused on the binary classification of ADHD presence. Further development of the literature of early ADHD symptom predictors (especially treating symptoms as continuous reflecting contemporary understandings of ADHD symptoms) holds potential for leveraging the advances in predictive modeling for clinically meaningful applications. Future investigations could also examine different subdimensions of ADHD symptoms, given that previous research suggests that inattention and hyperactivity/impulsivity may show different developmental trajectories and outcomes, and that profiles of symptoms may differ by gender (e.g., Stibbe et al., Reference Stibbe, Huang, Paucke, Ulke and Strauss2020; Vergunst et al., Reference Vergunst, Tremblay, Galera, Nagin, Vitaro, Boivin and andCôté2019). They could also address the prediction of different “developmental subtypes.” For example, Murray et al. (Reference Murray, Hall, Speyer, Carter, Mirman, Caye and Rohde2022) suggest that distinct early-life risks relate to different ADHD trajectories (e.g., earlier vs. later onset and remitting vs. persistent). It will also be important of assessing how predictive accuracy changes during development given that those with later onsets of ADHD symptoms may not have been captured in the present sample. Similarly, given that additional influences on ADHD come into play at different stages of development, it would be valuable to compare models at different developmental stages that include measures of emerging influences.

Limitations and future directions

A main limitation of the present study concerns the scope of predictors included. It lacks measures of predictors previously associated with children’s developmental outcomes including drug and medication usage (Dooley et al., Reference Dooley, Healy, Cotter, Clarke and Cannon2024), vitamin D deficiency (Tahir et al., Reference Tahir, Munir, Iqbal, Bacha, Amir, Umar, Riaz, Tahir, Ali Shah and Shafiq2023), as well as postnatal factors such as parenting styles (Hutchison et al., Reference Hutchison, Feder, Abar and Winsler2016), adverse childhood experience (Brown et al., Reference Brown, Brown, Briggs, Germán, Belamarich and Oyeku2017) and the contribution of genetic factors quantified by the PRS (Green et al., Reference Green, Baroud, DiSalvo, Faraone and Biederman2022; Ronald et al., Reference Ronald, de Bode and Polderman2021) that may improve prediction. A key predictor that would also be likely to improve prediction is parental ADHD symptoms. These were due to data availability in the cohort study and the complex multifactorial nature of ADHD. It is challenging to comprehensively measure all of these without imposing a significant burden on the participants. Importantly, this also explains why challenges exist in determining what to measure and include in predictive models, as the etiology of ADHD remains not fully understood to date. The measurement of predictor variables and ADHD symptoms could also be enhanced since measurement errors might exist.

These considerations imply two potential future directions: gathering and analyzing more comprehensive data available around birth and building dynamic prediction models that are updated as more information becomes available over the course of the child’s development. However, this must be balanced against feasibility and any data collection used to predict later ADHD in clinical practice needs to minimize clinician and patient time and burden.

A second limitation concerns the high proportion of missingness across several perinatal predictors, particularly in maternal smoking, alcohol use, and cord blood biomarkers. This reflects real-world challenges in birth cohort data collection. These were addressed through multiple imputations under the MAR assumption. MAR assumption adopted by MICE models. If, for example, parents declined their children’s participation in the “Starting School” project due to variables that may be related to ADHD symptoms, then this missing data could be considered non-random. Applying MICE is appropriate after diagnostic tests for missingness, following Newman (Reference Newman2014) for assessing item- and construct-level missing data. Recommended by general psychological research, the observed missingness patterns support the assumption of missing at random (MAR), justifying MICE use (e.g., Enders, Reference Enders2022). Potential biases due to missingness could be mitigated using datasets with high population coverage; however, this is likely to involve a trade-off with the depth of information available for each participant.

Conclusions

LR models provided the best prediction and explained a modest amount (6.97%) of the variance in the H/I SDQ score, with 23 maternal and neonatal factors providing proof-of-principle for predicting ADHD from non-genetic information available at birth. Maternal smoking and the infant’s male sex were the top three predictors of the RF models. Future research could build on the present study to develop improved prediction models by the inclusion of additional variables omitted from the present study.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0954579425100783.

Data availability statement

Due to data access restrictions associated with the Born in Bradford study, the data and materials used in the study are not publicly available. All analyses were conducted using publicly available R packages, as referenced in the manuscript. No custom code was developed for this project. However, statistical code and analysis scripts can be made available upon reasonable request.

Pre-registration statement

The analyses presented in this manuscript were not preregistered.

Funding statement

Bonnie Auyeung was supported by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No.813546, the Baily Thomas Charitable Fund TRUST/VC/AC/SG/469207686, the Data Driven Innovation, and the UK Economic and Social Research Council (ES/W001519/1) during the course of this work.

Competing interests

The author(s) declare none.

Ethical standards

Ethical approval for the current study was obtained from the School of Philosophy, Psychology and Language Sciences Ethics Committee at the University of Edinburgh.

References

Algorta, G. P., Dodd, A. L., Stringaris, A., & Youngstrom, E. A. (2016). Diagnostic efficiency of the SDQ for parents to identify ADHD in the UK: A ROC analysis. European Child & Adolescent Psychiatry, 25(9), 949–957. https://doi.org/10.1007/s00787-015-0815-0 CrossRef Google Scholar

Ali, J., Khan, R., Ahmad, N., & Maqsood, I. (2012). Random forests and decision trees. International Journal of Computer Science Issues (IJCSI), 9(5), 272–278.Google Scholar

Almeda, N., García-Alonso, C. R., Salinas-Pérez, J. A., Gutiérrez-Colosía, M. R., & Salvador-Carulla, L. (2019). Causal modelling for supporting planning and management of mental health services and systems: A systematic review. International Journal of Environmental Research and Public Health, 16(3), 332. https://doi.org/10.3390/ijerph16030332 CrossRef Google Scholar

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders. https://doi.org/10.1176/appi.books.9780890425596.CrossRef Google Scholar

Arif, S., & MacNeil, A. (2022). Predictive models aren’t for causal inference. Ecology Letters, 25, 1741–1745. https://doi.org/10.1111/ele.14033 CrossRef Google Scholar

Arnold, L. E., Hodgkins, P., Kahle, J., Madhoo, M., & Kewley, G. (2020). Long-term outcomes of ADHD: Academic achievement and performance. Journal of Attention Disorders, 24(1), 73–85. https://doi.org/10.1177/1087054714566076 CrossRef Google Scholar PubMed

Balogh, L., Pulay, A. J., & andRéthelyi, J. M. (2022). Genetics in the ADHD clinic: How can genetic testing support the current clinical practice? Frontiers in Psychology, 13, 751041. https://doi.org/10.3389/fpsyg.2022.751041 CrossRef Google Scholar

Banerjee, M., Reynolds, E., Andersson, H. B., & Nallamothu, B. K. (2019). Tree-based analysis: A practical approach to create clinical decision-making tools. Circulation: Cardiovascular Quality and Outcomes, 12(5). https://doi.org/10.1161/CIRCOUTCOMES.118.004879 Google Scholar

Bax, A. C., Bard, D. E., Cuffe, S. P., McKeown, R. E., & Wolraich, M. L. (2019). The association between race/Ethnicity and socioeconomic factors and the diagnosis and treatment of children with attention-deficit hyperactivity disorder. Journal of Developmental & Behavioural Paediatrics, 40(2), 81–91. https://doi.org/10.1097/dbp.0000000000000626 Google Scholar

Ben Amor, L., Grizenko, N., Schwartz, G., Lageix, P., Baron, C., Ter-Stepanian, M., Zappitelli, M., Mbekou, V., & Joober, R. (2005). Perinatal complications in children with attention-deficit hyperactivity disorder and their unaffected siblings. Journal of Psychiatry & Neuroscience, 30(2), 120–126.10.1139/jpn.0518CrossRef Google Scholar PubMed

Bica, I. (2022). Causal inference methods for supporting, understanding, and improving decision-making (Doctoral thesis, University of Oxford). Oxford University Research Archive. https://ora.ox.ac.uk/objects/uuid:b0b9b45c-7f61-48ed-ab35-ebb7f3cb43a3.Google Scholar

Brandt, V., Patalay, P., & Kerner auch Koerner, J. (2021). Predicting ADHD symptoms and diagnosis at age 14 from objective activity levels at age 7 in a large UK cohort. European Child & Adolescent Psychiatry, 30(6), 877–884. https://doi.org/10.1007/s00787-020-01566-9 CrossRef Google Scholar

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Wadsworth.Google Scholar

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. ∼https://www.stat.berkeley.edu/~breiman/randomforest2001.pdf.10.1023/A:1010933404324CrossRef Google Scholar

Brown, N. M., Brown, S. N., Briggs, R. D., Germán, M., Belamarich, P. F., & Oyeku, S. O. (2017). Associations between adverse childhood experiences and ADHD diagnosis and severity. Academic Pediatrics, 17(4), 349–355. https://doi.org/10.1016/j.acap.2016.08.013 CrossRef Google Scholar

Bryant, A., Guy, J., Team, C. A. L. M., & Holmes, J. (2020). The strengths and difficulties questionnaire predicts concurrent mental health difficulties in a transdiagnostic sample of struggling learners. Frontiers in Psychology, 11, 587821. https://doi.org/10.3389/fpsyg.2020.587821 CrossRef Google Scholar

Carballo, J. J., Rodríguez-Blanco, L., García-Nieto, R., & Baca-García, E. (2018). Screening for the ADHD phenotype using the strengths and difficulties questionnaire in a clinical sample of newly referred children and adolescents. Journal of Attention Disorders, 22(11), 1032–1039. https://doi.org/10.1177/1087054714561858 CrossRef Google Scholar

Chang, Z., Lichtenstein, P., D’Onofrio, B. M., Almqvist, C., Kuja-Halkola, R., Sjölander, A., & Larsson, H. (2014). Maternal age at childbirth and risk for ADHD in offspring: A population-based cohort study. International Journal of Epidemiology, 43(6), 1815–1824. https://doi.org/10.1093/ije/dyu204 CrossRef Google Scholar PubMed

Cherkasova, M. V., Roy, A., Molina, B. S., Scott, G., Weiss, G., Barkley, R. A., Biederman, J., Uchida, M., Hinshaw, S. P., & Owens, E. B. (2022). Review: Adult outcome as seen through controlled prospective follow-up studies of children with attention-deficit/hyperactivity disorder followed into adulthood. Journal of the American Academy of Child & Adolescent Psychiatry, 61(3), 378–391. https://doi.org/10.1016/j.jaac.2021.05.019 Google Scholar

Clements, C. C., Castro, V. M., Blumenthal, S. R., Rosenfield, H. R., Murphy, S. N., Fava, M., Erb, J. L., Churchill, S. E., Kaimal, A. J., & Doyle, A. E. (2015). Prenatal antidepressant exposure is associated with risk for attention-deficit hyperactivity disorder but not autism spectrum disorder in a large health system. Molecular Psychiatry, 20(6), 727–734. https://doi.org/10.1038/mp.2014.90 CrossRef Google Scholar PubMed

Coghill, D., & Sonuga-Barke, E. J. (2012). Annual research review: Categories versus dimensions in the classification and conceptualisation of child and adolescent mental disorders–implications of recent empirical study. Journal of Child Psychology and Psychiatry, 53(5), 469–489. https://doi.org/10.1111/j.1469-7610.2011.02511.x CrossRef Google Scholar PubMed

Coker, T. R., Elliott, M. N., Toomey, S. L., Schwebel, D. C., Cuccaro, P., Tortolero Emery, S., Davies, S. L., Visser, S. N., & Schuster, M. A. (2016). Racial and ethnic disparities in ADHD diagnosis and treatment. Pediatrics, 138(3), e20160407. https://doi.org/10.1542/peds.2016-0407.CrossRef Google Scholar

Cuffe, S. P., Moore, C. G., & McKeown, R. E. (2005). Prevalence and correlates of ADHD symptoms in the national health interview survey. Journal of Attention Disorders, 9(2), 392–401. https://doi.org/10.1177/1087054705280413 CrossRef Google Scholar

Cutler, A., Cutler, D. R., & Stevens, J. R. (2012). Random forests. In Zhang, C., & Ma, Y. (Eds.), Ensemble machine learning: Methods and applications (pp. 157–175). Springer. https://doi.org/10.1007/978-1-4419-9326-7_5 CrossRef Google Scholar

Dooley, N., Healy, C., Cotter, D., Clarke, M., & Cannon, M. (2024). Predicting childhood ADHD-linked symptoms from prenatal and perinatal data in the ABCD cohort. Development and Psychopathology, 36(2), 1–14. https://doi.org/10.1017/S0954579423000238 CrossRef Google Scholar PubMed

DuPaul, G. J., Evans, S. W., Mautone, J. A., Owens, J. S., & Power, T. J. (2020). Future directions for psychosocial interventions for children and adolescents with ADHD. Journal of Clinical Child & Adolescent Psychology, 49(1), 134–145. https://doi.org/10.1080/15374416.2019.1689825 CrossRef Google Scholar

Dwyer, D. B., Falkai, P., & Koutsouleris, N. (2018). Machine learning approaches for clinical psychology and psychiatry. Annual Review of Clinical Psychology, 14, 91–118. https://doi.org/10.1146/annurev-clinpsy-032816-045037 CrossRef Google Scholar

Enders, C. K. (2022). Applied missing data analysis (2nd edn.). Guilford Publications.Google Scholar

Ernst, A. F., & Albers, C. J. (2017). Regression assumptions in clinical psychology research practice – a systematic review of common misconceptions. PeerJ, 5, e3323. https://doi.org/10.7717/peerj.3323 CrossRef Google Scholar

Faraone, S. V., & Larsson, H. (2019). Genetics of attention deficit hyperactivity disorder. Molecular Psychiatry, 24(4), 562–575. https://doi.org/10.1038/s41380-018-0070-0 CrossRef Google Scholar

Franz, A. P., Bolat, G. U., Bolat, H., Matijasevich, A., Santos, I. S., Silveira, R. C., Procianoy, R. S., Rohde, L. A., & Moreira-Maia, C. R. (2018). Attention-deficit/hyperactivity disorder and very preterm/very low birth weight: A meta-analysis. Pediatrics, 141(1). https://doi.org/10.1542/peds.2017-1645 CrossRef Google Scholar PubMed

Froehlich, T. E., Lanphear, B. P., Epstein, J. N., Barbaresi, W. J., Katusic, S. K., & Gilman, S. E. (2007). Prevalence, recognition, and treatment of attention-deficit/hyperactivity disorder in a national sample of US children. Archives of Pediatrics & Adolescent Medicine, 161(9), 857–864. https://doi.org/10.1001/archpedi.161.9.857 CrossRef Google Scholar

Garcia-Argibay, M., Zhang-James, Y., Cortese, S., Lichtenstein, P., Larsson, H., & Faraone, S. V. (2023). Predicting childhood and adolescent attention-deficit/hyperactivity disorder onset: A nationwide deep learning approach. Molecular Psychiatry, 28(3), 1232–1239. https://doi.org/10.1038/s41380-022-01918-8 CrossRef Google Scholar

Gizer, I. R., Ficks, C., & Waldman, I. D. (2009). Candidate gene studies of ADHD: A meta-analytic review. Human Genetics, 126, 51–90. https://doi.org/10.1007/s00439-009-0694-x CrossRef Google Scholar PubMed

Green, A., Baroud, E., DiSalvo, M., Faraone, S. V., & Biederman, J. (2022). Examining the impact of ADHD polygenic risk scores on ADHD and associated outcomes: A systematic review and meta-analysis. Journal of Psychiatric Research, 151, 315–324. https://doi.org/10.1016/j.jpsychires.2022.07.032 Google Scholar

Hall, C. L., Guo, B., Valentine, A. Z., Groom, M. J., Daley, D., Sayal, K., & Hollis, C. (2019). The validity of the strengths and difficulties questionnaire (SDQ) for children with ADHD symptoms. PLoS ONE, 14(6), e0218518. https://doi.org/10.1371/journal.pone.0218518 CrossRef Google Scholar

Hall, H. A., Speyer, L. G., Murray, A. L., & Auyeung, B. (2022). Prenatal maternal infections and children’s neurodevelopment in the UK millennium cohort study: A focus on ASD and ADHD. Journal of Attention Disorders, 26(4), 616–628. https://doi.org/10.1177/10870547211015422 CrossRef Google Scholar

Halperin, J. M., & Marks, D. J. (2019). Practitioner review: Assessment and treatment of preschool children with attention-deficit/hyperactivity disorder. Journal of ChildPsychology and Psychiatry, 60(9), 930–943. https://doi.org/10.1111/jcpp.13014 CrossRef Google Scholar PubMed

Harrison, A. G., & Edwards, M. J. (2023). The ability of self-report methods to accurately diagnose attention deficit hyperactivity disorder: A systematic review. Journal of Attention Disorders, 27(12), 1343–1359. https://doi.org/10.1177/10870547231177470 CrossRef Google Scholar

He, Y., Chen, J., Zhu, L.-H., Hua, L.-L., & Ke, F.-F. (2020). Maternal smoking during pregnancy and ADHD: Results from a systematic review and meta-analysis of prospective cohort studies. Journal of Attention Disorders, 24(12), 1637–1647. https://doi.org/10.1177/1087054717696766 CrossRef Google Scholar PubMed

Hutchison, L., Feder, M., Abar, B., & Winsler, A. (2016). Relations between parenting stress, parenting style, and child executive functioning for children with ADHD or autism. Journal of Child and Family Studies, 25, 3644–3656. https://doi.org/10.1007/s10826-016-0518-2 CrossRef Google Scholar

Kwok, J., Speyer, L. G., Soursou, G., Murray, A. L., Fanti, K. A., & Auyeung, B. (2023). Maternal metabolic syndrome in pregnancy and child development at age 5: Exploring mediating mechanisms using cord blood markers. BMC Medicine, 21(1), 124. https://doi.org/10.1186/s12916-023-02835-5 CrossRef Google Scholar

Lahti, J., Räikkönen, K., Kajantie, E., Heinonen, K., Pesonen, A. K., Järvenpää, A. L., & Strandberg, T. (2006). Small body size at birth and behavioural symptoms of ADHD in children aged five to six years. Journal of Child Psychology and Psychiatry, 47(11), 1167–1174. https://doi.org/10.1111/j.1469-7610.2006.01661.x CrossRef Google Scholar

Langley, K., Heron, J., Smith, G. D., & Thapar, A. (2012). Maternal and paternal smoking during pregnancy and risk of ADHD symptoms in offspring: Testing for intrauterine effects. American Journal of Epidemiology, 176(3), 261–268. https://doi.org/10.1093/aje/kwr510 CrossRef Google Scholar

Lawder, R., Whyte, B., Wood, R., Fischbacher, C., & Tappin, D. M. (2019). Impact of maternal smoking on early childhood health: A retrospective cohort linked dataset analysis of 697,003 children born in Scotland 1997–2009. BMJ Open, 9, e023213. https://doi.org/10.1136/bmjopen-2018-023213 CrossRef Google Scholar

Leffa, D. T., Caye, A., Belangero, S. I., Gadelha, A., Pan, P. M., Salum, G. A., & Rohde, L. A. (2023). The synergistic effect of genetic and environmental factors in the development of attention-deficit/hyperactivity disorder symptoms in children and adolescents. Development and Psychopathology, 36(3), 1–11. https://doi.org/10.1017/s0954579423000366 Google Scholar

Luo, Y., Weibman, D., Halperin, J. M., & Li, X. (2019). A review of heterogeneity in attention deficit/hyperactivity disorder (ADHD). Frontiers in Human Neuroscience, 13, 42. https://doi.org/10.3389/fnhum.2019.00042 CrossRef Google Scholar

Madsen, K. B., Ravn, M. H., Arnfred, J., Olsen, J., Rask, C. U., & Obel, C. (2018). Characteristics of undiagnosed children with parent-reported ADHD behaviour. European Child & Adolescent Psychiatry, 27(2), 149–158. https://doi.org/10.1007/s00787-017-1029-4 CrossRef Google Scholar

Marchese Robinson, R. L., Palczewska, A., Palczewski, J., & Kidley, N. (2017). Comparison of the predictive performance and interpretability of random forest and linear models on benchmark data sets. Journal of Chemical Information and Modeling, 57(8), 1773–1792. https://doi.org/10.1021/acs.jcim.6b00753 CrossRef Google Scholar PubMed

Marcus, D. K., & Barry, T. D. (2011). Does attention-deficit/hyperactivity disorder have a dimensional latent structure? A taxometric analysis. Journal of Abnormal Psychology, 120(2), 427–442. https://doi.org/10.1037/a0021405 CrossRef Google Scholar

Marín, A. M., Seco, F. L., Serrano, S. M., García, S. A., Gaviria Gómez, A. M., & Ney, I. (2014). Do firstborn children have an increased risk of ADHD? Journal of Attention Disorders, 18(7), 594–597. https://doi.org/10.1177/1087054712445066 CrossRef Google Scholar

McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting interactions and moderator effects. Psychological Bulletin, 114(2), 376–390. https://doi.org/10.1037/0033-2909.114.2.376 CrossRef Google Scholar

Miyasaka, M., Kajimura, S., & Nomura, M. (2018). Biases in understanding attention deficit hyperactivity disorder and autism spectrum disorder in Japan. Frontiers in Psychology, 9, 244. https://doi.org/10.3389/fpsyg.2018.00244 CrossRef Google Scholar

Murray, A. L., Hall, H. A., Speyer, L. G., Carter, L., Mirman, D., Caye, A., & Rohde, L. (2022). Developmental trajectories of ADHD symptoms in a large population-representative longitudinal study. Psychological Medicine, 52(15), 3590–3596. https://doi.org/10.1017/S0033291721000349 CrossRef Google Scholar

National Institute for Health and Clinical Excellence ( 2008). Attention deficit hyperactivity disorder: Diagnosis and management of ADHD in children, young people and adults (NICE Clinical Guideline 72). National Institute for Health and Clinical Excellence.Google Scholar

Newman, D. A. (2014). Missing data: Five practical guidelines. Organizational Research Methods, 17(4), 372–411. https://doi.org/10.1177/1094428114548590 CrossRef Google Scholar

Ng, K., Sun, J., Hu, J., & Wang, F. (2015). Personalized predictive modeling and risk factor identification using patient similarity. AMIA Summits on Translational Science Proceedings, 2015, 132–136. PMCID: PMC4525240Google Scholar

Okumura, Y., Usami, M., Okada, T., Saito, T., Negoro, H., Tsujii, N., Fujita, J., & Iida, J. (2019). Prevalence, incidence and persistence of ADHD drug use in Japan. Epidemiology and Psychiatric Sciences, 28(6), 692–696. https://doi.org/10.1017/S2045796018000252 CrossRef Google Scholar

Panagiotidi, M., Zavlis, O., Jones, M., & Stafford, T. (2024). The three-dimensional community structure of attention-deficit hyperactivity disorder (ADHD) traits captured by the adult ADHD self-report scale: An exploratory graph analysis. International Journal of Methods in Psychiatric Research, 33(1). https://doi.org/10.1002/mpr.1997 CrossRef Google Scholar

Pettersson, E., Sjölander, A., Almqvist, C., Anckarsäter, H., D’Onofrio, B. M., Lichtenstein, P., & Larsson, H. (2015). Birth weight as an independent predictor of ADHD symptoms: A within-twin pair analysis. Journal of Child Psychology and Psychiatry, 56(4), 453–459. https://doi.org/10.1111/jcpp.12299 CrossRef Google Scholar

Pettinger, K. J., Kelly, B., Sheldon, T. A., Mon-Williams, M., Wright, J., & Hill, L. J. (2020). Starting school: Educational development as a function of age of entry and prematurity. Archives of Disease in Childhood, 105(2), 160–165. https://doi.org/10.1136/archdischild-2019-317124 Google Scholar PubMed

Pietersma, C. S., Mulders, A. G. M. G. J., Sabanovic, A., Willemsen, S. P., Jansen, M. S., Steegers, E. A. P., Steegers-Theunissen, R. P. M., & Rousian, M. (2022). The impact of maternal smoking on embryonic morphological development: The Rotterdam Periconception Cohort. Human Reproduction, 37(4), 696–707. https://doi.org/10.1093/humrep/deac018 CrossRef Google Scholar

Polanczyk, G. V., Salum, G. A., Sugaya, L. S., Caye, A., & Rohde, L. A. (2015). Annual research review: A meta-analysis of the worldwide prevalence of mental disorders in children and adolescents. Journal of Child Psychology and Psychiatry, 56(3), 345–365. https://doi.org/10.1111/jcpp.12381 CrossRef Google Scholar PubMed

Prajwala, T. (2015). A comparative study on decision tree and random forest using R tool. International Journal of Advanced Research in Computer and Communication Engineering, 4(1), 196–199. https://doi.org/10.17148/IJARCCE.2015.4142 Google Scholar

Prosperi, M., Guo, Y., Sperrin, M., Koopman, J. S., Min, J. S., He, X., Rich, S., Wang, M., Buchan, I. E., & Bian, J. (2020). Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nature Machine Intelligence, 2(7), 369–375. https://doi.org/10.1038/s42256-020-0197-y CrossRef Google Scholar

Ramtekkar, U. P., Reiersen, A. M., Todorov, A. A., & Todd, R. D. (2010). Sex and age differences in attention-deficit/hyperactivity disorder symptoms and diagnoses: Implications for DSM-V and ICD-11. Journal of the American Academy of Child & Adolescent Psychiatry, 49(3), 217–228. https://doi.org/10.1016/j.jaac.2009.11.011 Google Scholar

Raynor, P. (2008). Born in Bradford, a cohort study of babies born in Bradford, and theirparents: Protocol for the recruitment phase. BMC Public Health, 8(1), 327. https://doi.org/10.1186/1471-2458-8-327 CrossRef Google Scholar

Reimelt, C., Wolff, N., Hölling, H., Mogwitz, S., Ehrlich, S., Martini, J., & Roessner, V. (2021). Siblings and birth order– Are they important for the occurrence of ADHD? Journal of Attention Disorders, 25(1), 81–90. https://doi.org/10.1177/1087054718770020 CrossRef Google Scholar PubMed

Revelle, W. (2023). psych: Procedures for personality and psychological research (R package version 2.5.6) [Computer software]. Northwestern University. https://CRAN.R-project.org/package=psych.Google Scholar

Riglin, L., Agha, S. S., Eyre, O., Bevan Jones, R., Wootton, R. E., Thapar, A. K., Collishaw, S., Stergiakouli, E., Langley, K., & Thapar, A. (2021). Investigating the validity of the strengths and difficulties questionnaire to assess ADHD in young adulthood. Psychiatry Research, 301, 113984. https://doi.org/10.1016/j.psychres.2021.113984 CrossRef Google Scholar

Romero, S., Lindström, K., Listermar, J., Westgren, M., & Ajne, G. (2023). Long-term neurodevelopmental outcome in children born after vacuum-assisted delivery compared with second-stage caesarean delivery and spontaneous vaginal delivery: A cohort study. BMJ Paediatrics Open, 7(1), e002048. https://doi.org/10.1136/bmjpo-2023-002048 CrossRef Google Scholar

Ronald, A., de Bode, N., & Polderman, T. J. (2021). Systematic review: How the attention-deficit/hyperactivity disorder polygenic risk score adds to our understanding of ADHD and associated traits. Journal of the American Academy of Child & Adolescent Psychiatry, 60(10), 1234–1277. https://doi.org/10.1016/j.jaac.2021.01.019 Google Scholar

Ronald, A., Pennell, C. E., & Whitehouse, A. J. (2011). Prenatal maternal stress associated with ADHD and autistic traits in early childhood. Frontiers in Psychology, 1, 223. https://doi.org/10.3389/fpsyg.2010.00223 CrossRef Google Scholar PubMed

Russell, A. E., Ford, T., & Russell, G. (2015). Socioeconomic associations with ADHD: Findings from a mediation analysis. PLoS One, 10(6), e0128248. https://doi.org/10.1371/journal.pone.0128248 CrossRef Google Scholar

Russell, G., Rodgers, L. R., Ukoumunne, O. C., & Ford, T. (2014). Prevalence of parent-reported ASD and ADHD in the UK: Findings from the millennium cohort study. Journal of Autism and Developmental Disorders, 44, 31–40. https://doi.org/10.1007/s10803-013-1849-0 CrossRef Google Scholar

Salazar de Pablo, G., Iniesta, R., Bellato, A., Caye, A., Dobrosavljevic, M., Parlatini, V., Garcia-Argibay, M., Li, L., Cabras, A., & Haider Ali, M. (2024). Individualized prediction models in ADHD: A systematic review and meta-regression. Molecular Psychiatry, 29(12), 1–9. https://doi.org/10.1038/s41380-024-02606-5 CrossRef Google Scholar

Salum, G., Sonuga-Barke, E., Sergeant, J., Vandekerckhove, J., Gadelha, A., Moriyama, T., Graeff-Martins, A., Manfro, G., Polanczyk, G., & Rohde, L. (2014). Mechanisms underpinning inattention and hyperactivity: Neurocognitive support for ADHD dimensionality. Psychological Medicine, 44(15), 3189–3201. https://doi.org/10.1017/S0033291714000919 CrossRef Google Scholar PubMed

Sayal, K., Prasad, V., Daley, D., Ford, T., & Coghill, D. (2018). ADHD in children and young people: Prevalence, care pathways, and service provision. The Lancet Psychiatry, 5(2), 175–186. https://doi.org/10.1016/S2215-0366(17)30167-0 CrossRef Google Scholar

Schmidt, A. F., & Finan, C. (2018). Linear regression and the normality assumption. Journal of Clinical Epidemiology, 98, 146–151. https://doi.org/10.1016/j.jclinepi.2017.12.006 CrossRef Google Scholar

Schonlau, M., & Zou, R. Y. (2020). The random forest algorithm for statistical learning. The Stata Journal, 20(1), 3–29. https://doi.org/10.1177/1536867X20909688 CrossRef Google Scholar

Sciberras, E., Mulraney, M., Silva, D., & Coghill, D. (2017). Prenatal risk factors and the etiology of ADHD – Review of existing evidence. Current Psychiatry Reports, 19, 1–8. https://doi.org/10.1007/s11920-017-0753-2 CrossRef Google Scholar PubMed

Sedgwick, J. (2018). University students with attention deficit hyperactivity disorder (ADHD): A literature review. Irish Journal of Psychological Medicine, 35(3), 221–235. https://doi.org/10.1017/ipm.2017.20 CrossRef Google Scholar

Sedgwick, J. A., Merwood, A., & Asherson, P. (2019). The positive aspects of attention deficit hyperactivity disorder: A qualitative investigation of successful adults with ADHD. ADHD Attention Deficit and Hyperactivity Disorders, 11, 241–253. https://doi.org/10.1007/s12402-018-0277-6 CrossRef Google Scholar

Shephard, E., Zuccolo, P. F., Idrees, I., Godoy, P. B., Salomone, E., Ferrante, C., Sorgato, P., Catao, L. F., Goodwin, A., & Bolton, P. F. (2022). Systematic review and meta-analysis: The science of early-life precursors and interventions for attention-deficit/hyperactivity disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 61(2), 187–226. https://doi.org/10.1016/j.jaac.2021.03.016 Google Scholar

Shire, K., Andrews, E., Barber, S., Bruce, A., Corkett, J., Hill, L. J., Kelly, B., McEachan, R. R., Mon-Williams, M., & Tracey, L. (2020). Starting school: A large-scale start of school assessment within the Born in Bradford longitudinal cohort. Wellcome Open Research, 5, 47. https://doi.org/10.12688/wellcomeopenres.15610.1 CrossRef Google Scholar

Sonuga-Barke, E. J., & Halperin, J. M. (2010). Developmental phenotypes and causal pathways in attention deficit/hyperactivity disorder: Potential targets for early intervention? Journal of Child Psychology and Psychiatry, 51(4), 368–389. https://doi.org/10.1111/j.1469-7610.2009.02195.x CrossRef Google Scholar PubMed

Speyer, L. G., Neaves, S., Hall, H. A., Hemani, G., Lombardo, M. V., Murray, A. L., Auyeung, B., & Luciano, M. (2022). Polygenic risks for joint developmental trajectories of internalizing and externalizing problems: Findings from the ALSPAC cohort. Journal of Child Psychology and Psychiatry, 63(8), 948–956. https://doi.org/10.1111/jcpp.13549 CrossRef Google Scholar

Stibbe, T., Huang, J., Paucke, M., Ulke, C., & Strauss, M. (2020). Gender differences in adult ADHD: Cognitive function assessed by the test of attentional performance. PLoS One, 15(10), e0240810. https://doi.org/10.1371/journal.pone.0240810 CrossRef Google Scholar PubMed

Strine, T. W., Lesesne, C. A., Okoro, C. A., McGuire, L. C., Chapman, D. P., Balluz, L. S., & Mokdad, A. H. (2006). Emotional and behavioral difficulties and impairments in everyday functioning among children with a history of attention-deficit/hyperactivity disorder. Preventing Chronic Disease, 3(2), A52. PMCID: PMC1563970.Google Scholar

Sun, B. Z., Moster, D., Harmon, Q. E., & Wilcox, A. J. (2020). Association of preeclampsia in term births with neurodevelopmental disorders in offspring. JAMA Psychiatry, 77(8), 823–829. https://doi.org/10.1001/jamapsychiatry.2020.0306 CrossRef Google Scholar

Tahir, H., Munir, N., Iqbal, S. S., Bacha, U., Amir, S., Umar, H., Riaz, M., Tahir, I. M., Ali Shah, S. M., & Shafiq, A. (2023). Maternal vitamin D status and attention deficit hyperactivity disorder (ADHD), an under diagnosed risk factor; A review. European Journal of Inflammation, 21. https://doi.org/10.1177/1721727X231161013 CrossRef Google Scholar

Thapar, A., Cooper, M., Eyre, O., & Langley, K. (2013). What have we learnt about the causes of ADHD? Journal of Child Psychology and Psychiatry, 54(1), 3–16. https://doi.org/10.1111/j.1469-7610.2012.02611.x CrossRef Google Scholar PubMed

Thomas, R., Sanders, S., Doust, J., Beller, E., & Glasziou, P. (2015). Prevalence of attention-deficit/hyperactivity disorder: A systematic review and meta-analysis. Pediatrics, 135(4), e994–e1001. https://doi.org/10.1542/peds.2014-3482 CrossRef Google Scholar

Topçuoğlu, B. D., Lapp, Z., Sovacool, K. L., Snitkin, E., Wiens, J., & Schloss, P. D. (2021). Mikropml: User-friendly R package for supervised machine learning pipelines. Journal of Open Source Software, 6(61), 3073. https://doi.org/10.21105/joss.03073 CrossRef Google Scholar PubMed

Uddin, S., & Lu, H. (2024). Confirming the statistically significant superiority of tree-based machine learning algorithms over their counterparts for tabular data. PLoS One, 19(4), e0301541. https://doi.org/10.1371/journal.pone.0301541 CrossRef Google Scholar PubMed

Ullebø, A. K., Posserud, M.-B., Heiervang, E., Gillberg, C., & Obel, C. (2011). Screening for the attention deficit hyperactivity disorder phenotype using the strength and difficulties questionnaire. European Child & Adolescent Psychiatry, 20, 451–458. https://doi.org/10.1007/s00787-011-0198-9 CrossRef Google Scholar

Vergunst, F., Tremblay, R. E., Galera, C., Nagin, D., Vitaro, F., Boivin, M., & andCôté, S. M. (2019). Multi-rater developmental trajectories of hyperactivity–impulsivity and inattention symptoms from 1.5 to 17 years: A population-based birth cohort study. European Child & Adolescent Psychiatry, 28, 973–983. https://doi.org/10.1007/s00787-018-1258-1 CrossRef Google Scholar

Walle, K. M., Askeland, R. B., Gustavson, K., Mjaaland, S., Ystrom, E., Lipkin, W. I., Magnus, P., Stoltenberg, C., Susser, E., & Bresnahan, M. (2022). Risk of attention-deficit hyperactivity disorder in offspring of mothers with infections during pregnancy. JCPP Advances, 2(2). https://doi.org/10.1002/jcv2.12070 CrossRef Google Scholar PubMed

West, S. G., Aiken, L. S., Wu, W., & Taylor, A. B. (1991). Multiple regression. In Rosenzweig, M. R., & Porter, L. W. (Eds.), Handbook of research methods in personality psychology (pp. 573–613). Cambridge University Press.Google Scholar

Wiegersma, A. M., Dalman, C., Lee, B. K., Karlsson, H., & Gardner, R. M. (2019). Association of prenatal maternal anemia with neurodevelopmental disorders. JAMA Psychiatry, 76(12), 1294–1304. https://doi.org/10.1001/jamapsychiatry.2019.2309 CrossRef Google Scholar

Willcutt, E. G. (2012). The prevalence of DSM-IV attention-deficit/hyperactivity disorder: A meta-analytic review. Neurotherapeutics, 9(3), 490–499. https://doi.org/10.1007/s13311-012-0135-8 CrossRef Google Scholar

Wright, J., Small, N., Raynor, P., Tuffnell, D., Bhopal, R., Cameron, N., Fairley, L., Lawlor, D. A., Parslow, R., & Petherick, E. S. (2013). Cohort profile: The born in Bradford multi-ethnic family cohort study. International Journal of Epidemiology, 42(4), 978–991. https://doi.org/10.1093/ije/dys112 CrossRef Google Scholar PubMed

Yanagida, T. (2024). misty: Miscellaneous functions (R package version 0.6.3) [Computer software]. CRAN. https://CRAN.R-project.org/package=misty.Google Scholar

Yang, K., Tu, J., & Chen, T. (2019). Homoscedasticity: An overlooked critical assumption for linear regression. General Psychiatry, 32(5), e100148. https://doi.org/10.1136/gpsych-2019-100148 CrossRef Google Scholar PubMed

Young, C. (2019). The difference between causal analysis and predictive models: Response to “Comment on Young and Holsteen, 2017. Sociological Methods & Research, 48(2), 431–447. https://doi.org/10.1177/0049124118782542 CrossRef Google Scholar

Zilanawala, A., Sacker, A., & Kelly, Y. (2018). Mixed ethnicity and behavioural problems in the Millennium Cohort Study. Archives of Disease in Childhood, 103(1), 61–64. https://doi.org/10.1136/archdischild-2015-309701 CrossRef Google Scholar

Table 1. Descriptive statistics of continuous variables

Table 2. Descriptive statistics of categorical variables

Table 3. Multiple LR results

Table 4. Accuracy metrics of the MR regression, CART, and RF models

Figure 1. BarplotoftheFeatureImportancefor the Multiple linear regression (LR) model.

Figure 2. Bar plot of the feature importance for the classification and regression trees (CART) model.

Figure 3. Bar plot of the feature importance for the random forest (RF) model.

Ho et al. supplementary material

DOI: https://doi.org/10.1017/S0954579425100783.sm001

File 16.9 MB

Article contents

Early prediction of ADHD symptoms from perinatal characteristics: A machine learning study

Abstract

Keywords

Information

Introduction

Methods

Participants

Mothers

Children

Measures

Predictor variables

Outcome variable

Analysis

Initial analysis: correlation and unadjusted regression model

Predictive analyses

Results

Initial analyses

Predictive models

Discussion

Limitations and future directions

Conclusions

Supplementary material

Data availability statement

Pre-registration statement

Funding statement

Competing interests

Ethical standards

References

Ho et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests