Introduction
Attention-deficit/hyperactivity disorder (ADHD) is characterized by inattention and/or hyperactivity/impulsivity (HI) (APA, 2022; WHO, 2019). Three primary presentations are described as predominantly inattentive, hyperactive/impulsive, and combined symptom presentations. To meet the diagnostic thresholds, symptoms must persist over time, be pervasive across situations, and cause significant impairment (Asherson, Reference Asherson2016).
The overall prevalence of ADHD is estimated to be 7.1% in children (Thomas, Sanders, Doust, Beller, & Glasziou, Reference Thomas, Sanders, Doust, Beller and Glasziou2015) and 2.5–5% in adults (Simon, Czobor, Bálint, Mészáros, & Bitter, Reference Simon, Czobor, Bálint, Mészáros and Bitter2009; Willcutt, Reference Willcutt2012). A preponderance of males with ADHD is widely recognized both in clinical samples where male/female ratios range from 3:1 to 16:1 (Nøvik et al., Reference Nøvik, Hervas, Ralston, Dalsgaard, Rodrigues Pereira and Lorenzo2006) and in community samples where the ratio of 3:1 is reported (Willcutt, Reference Willcutt2012).
Empirical research reports heterogeneity over time in the symptom presentation for both sexes. Epidemiological samples identify that the hyperactive/impulsive subtype predominates in young children, whereas the inattentive subtype is the more common presentation for adolescents and adults (Simon et al., Reference Simon, Czobor, Bálint, Mészáros and Bitter2009; Willcutt, Reference Willcutt2012). In contrast, studies of clinical samples identify a greater prevalence of combined-type ADHD, perhaps reflecting that individuals of greater severity in their presentation are more likely to be referred for diagnosis (Du Rietz et al., Reference Du Rietz, Cheung, McLoughlin, Brandeis, Banaschewski, Asherson and Kuntsi2016; Larsson, Dilshad, Lichtenstein, & Barker, Reference Larsson, Dilshad, Lichtenstein and Barker2011; Michielsen et al., Reference Michielsen, Semeijn, Comijs, van de Ven, Beekman, Deeg and Kooij2012).
To date, there have been two meta-analyses of sex effects reporting the severity of ADHD symptoms. The first, published in 1997, yielded 18 studies (including one unpublished dissertation) comparing sex differences in children with ADHD (the search criteria were for children aged 13 years and younger) (Gaub & Carlson, Reference Gaub and Carlson1997). Although published in 1997, all the included studies were conducted prior to 1992 and therefore were prior to the Diagnostic and Statistical Manual of Mental Disorders IV (DSM-IV) nomenclature. Only five studies reported outcomes for symptoms of inattention, nine for hyperactivity, and three for impulsivity. Girls were reported to have significantly lower symptoms of hyperactivity and inattention; there were no significant differences in symptoms of impulsivity. Furthermore, girls had greater intellectual impairments and lower ratings of externalizing and internalizing problems.
The second meta-analysis (Gershon, Reference Gershon2002) imposed no age limit, included 38 studies (including unpublished studies) and two research reports. Few studies included adult samples or data reporting ADHD subtypes. Females with ADHD were found to have less severe symptoms of inattention, hyperactivity, and impulsivity than males with ADHD (low effect sizes). Females had significantly fewer externalizing problems, more internalizing problems, and a lower IQ than males. No significant sex differences emerged on measures of academic achievement or social functioning.
Gershon's (Reference Gershon2002) study also separated effect sizes from clinical and community samples. Using community samples only, females were rated as significantly less severe than males on all the dependent measures, particularly regarding hyperactivity (large effect size). The author recommended that future studies include community-based samples, as well as clinical samples.
These early reviews suggest that girls have substantially less severe ADHD symptoms than boys; however, they are subject to methodological limitations (including limited power, a lack of adult studies, and the use of unpublished non-peer-reviewed data). Nevertheless, the reported sexual dimorphism in the prevalence and presentation of ADHD has fueled speculation over possible contributory factors (including genetic, endocrine, and psychosocial) (Greven, Richards, & Buitelaar, Reference Greven, Richards, Buitelaar, Banaschewski, Coghill and Zuddas2018; Hinshaw, Owens, Sami, & Fargeon, Reference Hinshaw, Owens, Sami and Fargeon2006).
ADHD is now recognized to be a condition affecting many individuals over their lifespan. In parallel, there has been an exponential increase in international research reporting the prevalence of ADHD, associated impairments, and long-term outcomes (Arnold, Hodgkins, Caci, Kahle, & Young, Reference Arnold, Hodgkins, Caci, Kahle and Young2015; Hinshaw et al., Reference Hinshaw, Owens, Sami and Fargeon2006; Hodgkins et al., Reference Hodgkins, Arnold, Shaw, Caci, Kahle, Woods and Young2012; Shaw et al., Reference Shaw, Hodgkins, Caci, Young, Kahle, Woods and Arnold2012). This research has resulted in substantial revisions to the diagnostic criteria since the early DSM-II, DSM-III, and DSM-IIIR studies included by Gaub and Carlson (Reference Gaub and Carlson1997) and extended to include some DSM-IV studies by Gershon (Reference Gershon2002).
In 2012, Willcutt (Reference Willcutt2012) conducted a meta-analytic review of the literature published between 1994 and 2010 investigating the prevalence rates of ADHD subtypes in children and adults diagnosed using the DSM-IV criteria. Unlike the previous studies, the primary aim was to investigate prevalence rates and they were generally similar across the rating sources for children (parent, teacher, diagnostic procedure: 5.9–7.1%) and self-reporting in young adults (5%). The diagnostic outcomes, however, differed between the sexes. A significantly greater proportion of females than males met the diagnostic criteria for the inattentive subtype. In contrast, males were more likely than females to meet the diagnostic criteria for the combined type.
It has become clear that we need to develop better insights into the presentation and difficulties of ADHD in girls and women (Willcutt, Reference Willcutt2012). Our understanding has been hampered, however, because the existing evidence predominantly draws on male samples (Gershon, Reference Gershon2016; Willcutt, Reference Willcutt2012). It is unknown to what extent ADHD is missed or misdiagnosed in females. A key question is, does the presentation of ADHD symptoms differ between boys and girls in childhood and/or in their adulthood? A prevalence study does not directly answer this question, whereas a study investigating sex differences in the severity of symptoms is more informative because it provides a dimensional perspective rather than focusing on the outcomes of a categorical threshold (Kraemer, Noda, & O'Hara, Reference Kraemer, Noda and O'Hara2004).
In the past decade, there have been substantial changes in the diagnostic nomenclature and this study conducts an updated and contemporary systematic review of the data reporting severity of ADHD symptoms which is not present in the current literature. The unique features of the present study are the reliance on more refined symptom criteria (using the DSM-IV [APA, 1994], DSM-IV(TR) [APA, 2000], and DSM-5 [APA, 2022]), the separation of clinical diagnostic assessment data from rating scale data, and the inclusion of a broader age range of adults.
Given the exploratory nature of the study, a priori hypotheses were not formulated. Rather, we examined the following research question: Does the severity of ADHD symptoms differ between boys and girls in childhood (aged <18) and/or in adulthood?
Materials and methods
The main objective of the analysis was to compare the severity of ADHD between females and males. The secondary objective was to evaluate possible gender differences in children and in adults. The severity of ADHD was stated as the core ADHD symptoms score (inattention, hyperactivity, impulsivity) based on established rating scales or diagnostic criteria scores.
We followed the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Page et al., Reference Page, McKenzie, Bossuyt, Boutron, Hoffmann, Mulrow and Moher2021). The protocol of the present study was registered on PROSPERO (registration number: CRD42020103830). The protocol on PROSPERO also includes a review question on the difference in ADHD outcomes between males and females which will be published in another paper. We also followed the MOOSE guidelines for reporting meta-analysis of observational studies (Stroup et al., Reference Stroup, Berlin, Morton, Olkin, Williamson, Rennie and Thacker2000).
We searched the PubMed, PsycINFO, and Scopus literature databases to include articles published through 28 May 2021. The specific search terms related to (1) attention-deficit and hyperactivity disorder and (2) either gender or sex (the articles included studies that compared males/females + only male + only female) were applied, including ‘Attention Deficit Hyperactivity Disorder’, ‘ADHD’, ‘attention deficit disorder’, ‘disturbance of activity and attention’, ‘TDAH’, ‘hyperkine’, ‘Hyperkinetic disorder’, and ‘Hyperkinetic syndrome’. Comparator terms included ‘female’, ‘girl’, ‘woman’, ‘women’, ‘mother’, ‘male’, ‘boy’, ‘men’, ‘man’, ‘father’, ‘gender’, and ‘sex’. Term indexing using free Boolean operators was employed. No age restrictions were applied. The entire search strings are provided in online Supplementary File S1.
Study selection
The inclusion criteria were peer-reviewed articles written in the English language; documentation of empirical, primary research regarding ADHD symptoms assessed using the DSM-IV, DSM-IV-TR, or DSM-5 criteria; and results reported separately for males and females. As the diagnostic criteria for ADHD were substantially different before DSM-IV, we excluded the studies where patients were diagnosed using earlier versions of DSM (i.e. DSM-III and earlier versions).
The search results were imported from the databases to COVIDENCE. Covidence.org was used to store the search results, identify duplicates, and track screening decisions. After removing duplicate articles, a first round of screening titles and abstracts was used to eliminate the articles that did not meet the inclusion criteria. A second round of screening was carried out by reading the complete text of the articles. Articles that reported data from the same dataset, that focused on conditions other than ADHD, or that presented aggregated data for females and males were excluded.
For both rounds of screening, any two of the following reviewers, including two psychiatrists, four clinical psychologists, a neuroscientist, and a medical student (O.K., J.K., A.S.-K., J.H., B.G., N.S., K.C., and/or U.E.Y.) independently evaluated the potential articles for inclusion. Disagreements were resolved by discussion with a lead author (O.K.). When articles reported data from the same dataset, data from the most recent and biggest dataset were considered.
Figure 1 shows the PRISMA flow chart of the articles yielded by the initial search and the article screening process.
Data extraction
The following data were extracted into an Excel spreadsheet by any two of the following reviewers independently (O.K., A.S.-K., B.G., N.S., K.C., J.H., and/or U.E.Y.): first author, year of publication (we used this information instead of the year of data collection because the latter was missing in 43.1% of the studies); geographic location; sample size; population (adults v. youths); study setting (clinic v. community); participant characteristics (age, sex); rater source (parent-report, self-report); assessment tools (e.g. rating scales, clinical diagnostic assessment tools such as interviews based on DSM-IV, DSM-IV-TR; K-SADS; DISC), and diagnostic criteria (DSM-IV, DSM-IV-TR, or DSM-5) and the scores and standard deviations from the assessment tool as the severity of ADHD measure. All rating scales to measure severity of ADHD in the included articles were established and valid measures in the field. For children, we included parent-rated data only for consistency across the studies as this is the most common source of reporting for children with ADHD. Child and adult studies were distinguished by the mean age of the sample reported in the study, the cut-off being <17 for children (as applied by DSM-5). If neither the mean nor median scores of the data were published, a mid-range value was calculated as ((high age − low age)/2) + low age = mid-range age (Cochran, Reference Cochran1954; Viechtbauer, Reference Viechtbauer2010).
Statistical analysis
We used Jamovi statistical software (version 1.8) for analysis (The Jamovi Project, 2022). The main planned meta-analysis combining all 52 selected studies was performed to determine if females differ from males in the severity of ADHD throughout their lifespan. Subgroup analyses were performed to determine (1) sex differences in the severity of ADHD during childhood and adulthood and (2) differences when severity was determined using clinical interviews (diagnostic) v. rating scales (not diagnostic). The standardized mean difference was used as the outcome variable in the analysis and is used in the present study as an effect size. The data were fitted with a random effects model, chosen based on the variability of individual studies regarding the diagnostic criteria and other sample characteristics, as recommended in these cases (e.g. Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2021) and consistent with previous meta-analyses in the field (e.g. Cortese et al., Reference Cortese, Adamo, Del Giovane, Mohr-Jensen, Hayes, Carucci and Cipriani2018). Forest plots were generated to visualize the outcome variables: differences in severity of inattention and hyperactivity/impulsivity. The restricted maximum-likelihood estimator was used to estimate the degree of heterogeneity (i.e. tau 2) (Arcia & Conners, Reference Arcia and Conners2007). Along with the tau 2 estimate, the Q-test for heterogeneity (Ebejer et al., Reference Ebejer, Duffy, van der Werf, Wright, Montgomery, Gillespie and Medland2013) and the I 2 statistic indicating the percentage of heterogeneity due to true heterogeneity were calculated. Heterogeneity represents to what extent the results of the studies are consistent and the variation in study outcomes between the studies. An I 2 value higher than 75% represented high heterogeneity whereas an I 2 value lower than 25% represented small heterogeneity. We used I 2 and Q to explore the heterogeneity of the studies.
A publication bias analysis was conducted, and the Funnel plots were inspected for inattention, hyperactivity/impulsivity, and combined presentations of ADHD. Egger's test was conducted (Egger, Davey Smith, Schneider, & Minder, Reference Egger, Davey Smith, Schneider and Minder1997). Egger's regression with significance was given in each meta-analytic result. Fifteen separate meta-analyses were conducted.
Sensitivity analysis of the main analysis in total sample data was performed excluding those articles reporting ‘poor-quality’ studies, as determined in the risk of bias assessment.
Risk of bias assessment
The risk of bias and methodological quality of the included observational studies in the systematic review and meta-analysis were assessed using the Newcastle–Ottawa Scale for observational studies and the modified version of the Newcastle–Ottawa Scale for cross-sectional studies assessing representativeness, sample size, respondents and non-respondents, ascertainment of the presentation of ADHD, comparability of subjects, assessment of outcome, and quality of the statistics reported (seven items with three subscales and with a total maximum score of 9) (Wells et al., Reference Wells, Shea, O'Connell, Peterson, Welch, Losos and Tugwell1997). Any two of the authors independently assessed the quality of the studies (O.K., A.S.-K., B.G., N.S., K.C., J.H., and/or U.E.Y.). The Newcastle–Ottawa Scale for case–control or cohort studies consists of eight items with three subscales, with a total maximum score of 9. A standard criterion for a high-quality study has not yet been universally established. In the present study, we considered a score ⩾7 to indicate a high-quality study, 5–6 a fair-quality study, and ⩽4 a poor-quality study. National Institute of Health assessment tool for controlled intervention studies was used for the one controlled study. The results of this classification and individual scoring can be found in online Supplementary File S2.
Results
From a total of 10 562 potentially eligible references, 51 manuscripts were retained consisting of 52 studies (two independent studies were drawn from the study by DuPaul et al., Reference DuPaul, Schaughency, Weyandt, Tripp, Kiesner, Ota and Stanish2001) and comprised of a total of 18 408 participants (8423 females and 9985 males). The studies were separated into groups of articles that present data for inattention, hyperactivity/impulsivity, and combined domains. For clarity and succinctness, Table 1 provides a summary of the findings presented in Figs 2–16 representing 15 separate meta-analyses. See online Supplementary Files S3 and S4 for a summary of the included child (Bianchini et al., Reference Bianchini, Postorino, Grasso, Santoro, Migliore, Burlo and Mazzone2013; Bröring, Rommelse, Sergeant, & Scherder, Reference Bröring, Rommelse, Sergeant and Scherder2008; Castellanos et al., Reference Castellanos, Lee, Sharp, Jeffries, Greenstein, Clasen and Rapoport2002; Chen et al., Reference Chen, Zhou, Sham, Franke, Kuntsi, Campbell and Asherson2008; DuPaul et al., Reference DuPaul, Anastopoulos, Power, Reid, Ikeda and McGoey1998, Reference DuPaul, Reid, Anastopoulos, Lambert, Watkins and Power2016; El Hamrawy, El Sayed, Soltan, & Abd El-Gwad, Reference El Hamrawy, El Sayed, Soltan and Abd El-Gwad2017; Fliers et al., Reference Fliers, Buitelaar, Maras, Bul, Höhle, Faraone and Rommelse2013; Gabel, Schmitz, & Fulker, Reference Gabel, Schmitz and Fulker1996; Gadow, Sprafkin, & Nolan, Reference Gadow, Sprafkin and Nolan2001; Ghanizadeh, Mohammadi, & Moini, Reference Ghanizadeh, Mohammadi and Moini2008; Graetz, Sawyer, Baghurst, & Ettridge, Reference Graetz, Sawyer, Baghurst and Ettridge2006; Gudjonsson, Sigurdsson, Adalsteinsson, & Young, Reference Gudjonsson, Sigurdsson, Adalsteinsson and Young2013; Hartung et al., Reference Hartung, Willcutt, Lahey, Pelham, Loney, Stein and Keenan2002; Hellström, Wagner, Nilsson, Leppert, & Åslund, Reference Hellström, Wagner, Nilsson, Leppert and Åslund2017; Hogue, Dauber, Lichvar, & Spiewak, Reference Hogue, Dauber, Lichvar and Spiewak2014; Kean et al., Reference Kean, Sarris, Scholey, Silberstein, Downey and Stough2017; Kim et al., Reference Kim, Ha, Lim, Kwon, Yoo, Kim and Paik2018; Lahey et al., Reference Lahey, Hartung, Loney, Pelham, Chronis and Lee2007; Lefler, Hartung, Bartgis, & Thomas, Reference Lefler, Hartung, Bartgis and Thomas2015; Major, Martinussen, & Wiener, Reference Major, Martinussen and Wiener2013; Nøvik et al., Reference Nøvik, Hervas, Ralston, Dalsgaard, Rodrigues Pereira and Lorenzo2006; Øie, Hovik, Andersen, Czajkowski, & Skogli, Reference Øie, Hovik, Andersen, Czajkowski and Skogli2018; Paavonen et al., Reference Paavonen, Raikkonen, Lahti, Komsi, Heinonen, Pesonen and Porkka-Heiskanen2009; Riddle et al., Reference Riddle, Yershova, Lazzaretto, Paykina, Yenokyan, Greenhill and Posner2013; Rosch, Dirlikov, & Mostofsky, Reference Rosch, Dirlikov and Mostofsky2015; Serra-Pinheiro, Mattos, & Angélica Regalla, Reference Serra-Pinheiro, Mattos and Angélica Regalla2008; Seymour, Mostofsky, & Rosch, Reference Seymour, Mostofsky and Rosch2016; Sihvola et al., Reference Sihvola, Rose, Dick, Korhonen, Pulkkinen, Raevuori and Kaprio2011; Skogli, Teicher, Andersen, Hovik, & Øie, Reference Skogli, Teicher, Andersen, Hovik and Øie2013; Thorell & Rydell, Reference Thorell and Rydell2008; Tseng et al., Reference Tseng, Kawabata, Gau, Banny, Lingras and Crick2012; Waschbusch & King, Reference Waschbusch and King2006; Willcutt & Pennington, Reference Willcutt and Pennington2000; Yoo et al., Reference Yoo, Cho, Ha, Yune, Kim, Hwang and Lyoo2004) and adult (Amador-Campos, Gómez-Benito, & Ramos-Quiroga, Reference Amador-Campos, Gómez-Benito and Ramos-Quiroga2014; DuPaul et al., Reference DuPaul, Schaughency, Weyandt, Tripp, Kiesner, Ota and Stanish2001a; DuPaul et al.,, Reference DuPaul, Schaughency, Weyandt, Tripp, Kiesner, Ota and Stanish2001b; Ebejer et al., Reference Ebejer, Duffy, van der Werf, Wright, Montgomery, Gillespie and Medland2013; Edebol, Helldin, & Norlander, Reference Edebol, Helldin and Norlander2013; Fedele, Hartung, Canu, & Wilkowski, Reference Fedele, Hartung, Canu and Wilkowski2010; Fredriksen et al., Reference Fredriksen, Dahl, Martinsen, Klungsoyr, Faraone and Peleikis2014; Gomez, Reference Gomez2016; Jaconis et al., Reference Jaconis, Boyd, Hartung, McCrea, Lefler and Canu2016; Levitan, Jain, & Katzman, Reference Levitan, Jain and Katzman1999; Millenet et al., Reference Millenet, Laucht, Hohm, Jennen-Steinmetz, Hohmann, Schmidt and Zohsel2018; Mosalanejad, Mosalanejad, & Lashkarpour, Reference Mosalanejad, Mosalanejad and Lashkarpour2013; Murphy & Barkley, Reference Murphy and Barkley1996; Onnink et al., Reference Onnink, Zwiers, Hoogman, Mostert, Kan, Buitelaar and Franke2014; Park & Park, Reference Park and Park2016; Retz-Junginger, Rösler, Jacob, Alm, & Retz, Reference Retz-Junginger, Rösler, Jacob, Alm and Retz2010; Robison et al., Reference Robison, Reimherr, Marchant, Faraone, Adler and West2008) studies, respectively.
Using these data, we conducted meta-analyses in three broad areas: (1) total sample data (three analyses), (2) rating scale data (six analyses), and (3) clinical interview diagnostic data (six analyses). Fifteen meta-analyses were conducted in total. The results report sex differences in the severity of symptoms of ADHD: (1) in the ‘total sample’ across the entire lifespan when aggregating rating scale and clinical diagnostic assessment data (52 studies); (2) in a ‘rating scales sample’ that we obtained by extracting the rating scale data sample (18 studies; 83.3% community-based studies), and (3) in a ‘clinical diagnostic interview sample’ which exclusively focused on studies employing clinical diagnostic assessment data in samples of child and adult study participants (33 studies; 57.6% in clinical settings).
As shown in Table 1, out of 15 meta-analyses, only three showed significant sex differences in Figs 3, 6, and 7. The most significant sex difference was found in the rating scale data for hyperactivity/impulsivity where male children exhibited significantly higher symptoms (z value = −4.62).
Meta-analyses of the total sample
For the initial analysis, no study was excluded based on any age, mode of assessment, or setting representing a sample that aggregated all data across the lifespan (i.e. including both rating scale and clinical diagnostic interview data). This was comprised of a total of 52 studies: n = 8451 male and n = 7304 female participants. The most common adult rating scales used in the studies were the Adult ADHD Rating Scale and the Wender Utah rating scale for adults. For children, the most common were the Conners’ scales, Child Behavior Checklist, Barkley Current Symptoms Scale, and the Strength and Difficulties Questionnaire. The most common clinical interviews used in the studies were reported to be based on DSM-IV and DSM-IV-TR criteria for adults and, for children, the Kiddie Schedule for Affective Disorders and Schizophrenia and the Diagnostic Interview Schedule for Children.
Sex differences in symptoms of inattention: total sample
When we included all studies both in children and adults and studies that use both diagnostic interviews and rating scales for assessment (48 studies; n = female 6890; male 7711), the overall observed standardized mean sex difference of −0.0819 (95% confidence interval [CI] −0.1988 to 0.0350) was not significant (test of overall effect: z = −1.3726, p = 0.1699). The heterogeneity data resulted as I 2 = 89.9%, tau 2 = 0.13, Q(47) = 471.3260, and df = 47, p < 0.001. Egger's regression value = 2.483, p = 0.013. The data are presented in Fig. 2.
Sex differences in symptoms of hyperactivity/impulsivity: total sample
When we included all studies both in children and adults and studies that use diagnostic interviews and rating scales for assessment (43 studies; n = female 6014; male 6860), the overall observed standardized mean sex differences for hyperactivity/impulsivity of −0.1489 (95% CI −0.2658 to −0.0321) was significant (z = −2.4982, p = 0.0125) indicating that females scored significantly lower than males in the domain of hyperactivity/impulsivity. The heterogeneity data resulted as I 2 = 88%, tau 2 = 0.11, Q(42) = 347.8804, and df = 42, p < 0.001. Egger's regression value = 0.297, p = 0.767. The data are presented in Fig. 3.
Sex differences in combined presentation: total sample
When we included all studies both in children and adults and studies that use diagnostic interviews and rating scales for assessment (19 studies; n = female 3422; male 3850), the overall observed standardized mean difference was not significant (z = −0.21, p = 0.835) with a standardized mean difference of −0.0161 (95% CI −0.1682 to 0.1360). The heterogeneity data resulted as I 2 = 86%, tau 2 = 0.08, Q(18) = 134.1339, and df = 18, p < 0.001. Egger's regression value = 2.231, p = 0.026. Data are presented in Fig. 4. The data of one study (DuPaul et al., Reference DuPaul, Schaughency, Weyandt, Tripp, Kiesner, Ota and Stanish2001) conducted in New Zealand were removed because the study was an extreme outlier (this study also reported data for the USA and Italy). The data are presented in Fig. 4.
Meta-analyses of rating scale data (n = 17)
A secondary analysis of the ‘total sample’ data was performed by extracting the rating scale data across the lifespan, representing a ‘rating scales sample’. The sample consisted of a total of 17 studies (n = female 3890; male 4451).
Sex differences in inattention for children: rating scales
A total of 12 studies (n = female 1865; male 2113) were included in this analysis. The overall observed standardized mean difference was not significant (z = −1.88, p = 0.060) with −0.2531 (95% CI −0.5170 to 0.0108). The heterogeneity data resulted as I 2 = 91.9%, tau 2 = 0.17, Q(11) = 101.2418, and p < 0.0001. Egger's regression value = 2.075, p = 0.038. The data are presented in Fig. 5.
Sex differences in inattention for adults: rating scales
A total of four studies (n = 890 female; 697 male) were included in this analysis. The overall observed standardized mean difference was significant (z = −3.25, p = 0.001) with a standardized mean difference of −0.1716 (95% CI −0.2751 to −0.0682) indicating that women scored significantly lower than men in the domain of inattention. The heterogeneity data resulted as I 2 = 0%, tau 2 = 0, Q(3) = 1.7343, and p = 0.6293. Egger's regression value = 0.353, p = 0.724. The data are presented in Fig. 6.
Sex differences in hyperactivity/impulsivity for children: rating scales
A total of 12 studies (n = female 1867; male 2050) were included in the analysis. The overall observed standardized mean sex difference was significant (z = −4.6, p < 0.001) indicating that girls scored significantly lower than boys in the domain of hyperactivity/impulsivity. Standardized mean difference = −0.3587 (95% CI −0.5109 to −0.2065). Test of overall effect: z = −4.3170, p < 0.0001. The heterogeneity data resulted as I 2 = 75.4%, tau 2 = 0.05, Q(11) = 45 961, and p < 0.001. Egger's regression value = 2.574, p = 0.01. The data are presented in Fig. 7.
Sex differences in hyperactivity/impulsivity for adults: rating scales
A total of three studies (n = female 576; male 466) were included in the analysis. The overall observed standardized mean difference was not significant (z = 0.85, p = 0.3941). Standardized mean difference = 0.2287 (95% CI −0.2972 to 0.7546). Heterogeneity: I 2 = 83.4%, tau 2 = 0.17, Q(2) = 6.23477, and p = 0.0443. Egger's regression value = 1.433, p = 0.152. The data are presented in Fig. 8.
Sex differences in combined presentation for children: rating scales
A total of five studies (n = female 688; male 951) were included in the analysis. The overall observed standardized mean slightly favored girls exhibiting a less severe combined symptom presentation compared with boys, but this difference (−0.1629 [95% CI −0.4287 to 0.1029]) was not significant (z = −1.20, p = 0.2297). Heterogeneity: I 2 = 72.4%, tau 2 = 0.05, Q(4) = 7.6175, and p = 0.1066. Egger's regression value = 0.648, p = 0.517. The data are presented in Fig. 9.
Sex differences in combined presentation for adults: rating scales
A total of three studies (n = female 576; male 466) were included in the analysis. The overall observed standardized mean difference was not significant (z = 0.64, p = 0.524). Standardized mean difference = 0.1866 (95% CI −0.3877 to 0.7609). Heterogeneity: I 2 = 86%, tau 2 = 0.21, Q(2) = 8.1081, and p = 0.0174. Egger's regression value = 1.791, p = 0.073. The data are presented in Fig. 10.
Meta-analyses of clinical interview diagnostic sample
This analysis exclusively analyzed data from child and adult samples obtained from clinical diagnostic assessment tools. This ‘clinical interview diagnostic sample’ consisted of 34 studies. The child studies had 2121 female and 3110 male participants with ADHD and the adult studies had 2266 female and 2290 male participants with ADHD. The most common clinical interviews used in the studies were DSM-IV and DSM-IV-TR for adults and the K-SADS and DISC for children.
Sex differences in inattention for children: clinical interview
When we included the studies in children that use clinical interview data for assessment (23 studies; n = female 2073; male 2834), the overall observed standardized mean difference was not significant. Standardized mean difference = −0.0089 (95% CI −0.1788 to 0.1610). Test of overall effect: z = −0.10, p = 0.918. Heterogeneity: I 2 = 84.7%, tau 2 = 0.13, Q(22) = 142.7029, df = 22, and p < 0.0001. Egger's regression value = 1.353, p = 0.176. The data are presented in Fig. 11.
Sex differences in inattention for adults: clinical interview
When we included the studies in adults that use interview data for assessment (9 studies; n = female 1148; male 1223), the observed standardized mean difference was not significant. Standardized mean difference = −0.0231 (95% CI −0.2721 to 0.2258). Test of overall effect: z = 0.18, p = 0.855. Heterogeneity: I 2 = 85.4%, tau 2 = 0.11, Q(8) = 61.6453, df = 9, and p < 0.0001. Egger's regression value = −0.015, p = 0.988. The data are presented in Fig. 12.
Sex differences in hyperactivity/impulsivity for children: clinical interview
When we included the studies in children assessing hyperactivity/impulsivity with interview data (22 studies; n = female 1979; male 2647), the overall observed standardized mean difference was not significant. Standardized mean difference = −0.0928 (95% CI −0.3152 to 0.1296). Test of overall effect: z = −0.82, p = 0.413. Heterogeneity: I 2 = 90.5%, tau 2 = 0.23, Q(21) = 181.5585, df = 21, and p < 0.0001. Egger's regression value = −1.276, p = 0.202. The data are presented in Fig. 13.
Sex differences in hyperactivity/impulsivity for adults: clinical interview
When we included the studies in adults assessing hyperactivity/impulsivity with interview data (8 studies; n = female 1085; male 1090), the overall observed standardized mean difference was not significant. Standardized mean difference = −0.0535 (95% CI −0.1405 to 0.0334). Test of overall effect: z = −1.2, p = 0.227. Heterogeneity: I 2 = 0%, tau 2 = 0, Q(7) = 25.8101, df = 7 and p = 0.0005. Egger's regression value = 0.153, p = 0.878. The data are presented in Fig. 14.
Sex differences in combined presentation for children: clinical interview
When we included the studies in children assessing a combined presentation with interview data (6 studies; n = female 501; male 771), the overall observed standardized mean difference was not significant. Standardized mean difference = 0.0366 (95% CI −0.1144 to 0.1876). Test of overall effect: z = 0.47, p = 0.634. Heterogeneity: I 2 = 28%, tau 2 = 0.01, Q(5) = 8.7297, df = 5, and p = 0.1203. Egger's regression value = 0.006, p = 0.995. The data are presented in Fig. 15.
Sex differences in combined presentation for adults: clinical interview
When we included the studies in adults assessing a combined presentation with interview data (4 studies; n = female 720; male 748). The overall observed standardized mean was not significant. Standardized mean difference = 0.1473 (95% CI −0.2138 to 0.5085). Test of overall effect: z = 0.7996, p = 0.4239. Heterogeneity: I 2 = 84.7%, tau 2 = 0.09, (Q(3) = 33.3378, df = 3 and p < 0.0001. Egger's regression value = 0.665, p = 0.506. The data are presented in Fig. 16.
Sensitivity analyses excluding ‘poor-quality’ studies
Sensitivity analyses removing ‘poor-quality’ studies from the meta-analyses in the total sample that used diagnostic interviews and rating scales for assessment demonstrated no differences in the results in the comparison of hyperactivity/impulsivity and combined symptoms. One comparison that was not significantly different in the total sample that used diagnostic interviews and rating scales for assessment exhibited significance in the sensitivity analysis (females exhibited less severe inattention than males). The effect size in this comparison was small in both analyses (−0.08 and −0.16). The results of the sensitivity analyses are presented in online Supplementary File 5.
Discussion
Our review of the literature identified 51 manuscripts (52 studies) that included male (n = 9985) and female (n = 8423) participants with ADHD across childhood and/or adulthood. The included studies that only used diagnoses based on DSM-IV or DSM-5, reflecting an updated understanding of ADHD. Drawing on this substantially greater and updated sample than those applied over 20 years previously, our findings shed a different perspective on the symptom presentation of females with ADHD compared with males with ADHD.
First, we examined all 52 studies by aggregating all the available data across the lifespan, all settings, and modes of assessment. Next, we examined rating scale data only, which was based predominantly on community samples in children and adults. Then, we exclusively focused on clinical diagnostic interview data, analyzing the data independently for child and adult study participants. Most of the studies in this group were conducted in clinical settings.
As can be seen from Table 1 there were only significant differences between groups in the comparison of rating scale data. When solely clinical interview data were analyzed there were no significant differences between sexes (for both child and adult data).
There were three main findings. First, the average standardized mean difference, based on a random effects model, in the pooled sample data of all 52 studies, aggregating both rating scale and clinical diagnostic interview data, showed that males had significantly more severe hyperactivity and impulsivity symptoms than females (small effect size). Second, using the same statistical method, a further analysis that investigated solely rating scale studies of symptoms showed that among children, boys had significantly more severe symptoms of hyperactivity/impulsivity (small effect size). In contrast, among the adult sample, the sex difference was significantly greater for inattention among men (small effect size) with no difference in the hyperactivity/impulsivity dimension. Third, the average standardized mean difference showed no significant difference between females and males in clinical diagnostic interview sample studies, either for children or adults. The essence of the three significant findings was that the effect sizes estimating the population parameters using the standardized mean difference were all small (i.e. below 0.50).
The sex difference in the pattern of rating scale findings among the children and adult samples in ADHD symptoms raises important questions about the role of sex in the remittance of symptoms over time. Children are evaluated for ADHD mainly using scales completed by parents or teachers of children whereby the symptom of hyperactivity/impulsivity is more obvious to refer for clinical evaluation. In contrast, inattention, being a more subtle symptom, may be less noticable. The rating scales that are used to screen, evaluate, and monitor ADHD symptoms but are not diagnostic may not be ideal for capturing the complex clinical picture of inattention in adults. As shown in a study by Young et al. (Reference Young, Gudjonsson, Wells, Asherson, Theobald, Oliver and Mooney2009), rating scales are associated with both false-positive and false-negative symptoms; however, when diagnostic interviews were conducted, it may become clearer that the severity of inattention is similar in males and females both in children and adults.
The findings from our rating scale data, drawing on childhood samples, are consistent with previous meta-analyses reporting significant differences in symptom severity between males and females with ADHD (Gershon, Reference Gershon2002; Thomas et al., Reference Thomas, Sanders, Doust, Beller and Glasziou2015). Importantly, the largest sex difference among the childhood rating scale data was the lower rates of hyperactivity/impulsivity in girls. This subset of symptoms is particularly elevated in childhood and remits more quickly with age than inattention (Willcutt, Reference Willcutt2012).
The rating scale samples drew predominantly on community samples (82%) and the larger sex differences identified may reflect participants presenting with a broader range of ADHD symptoms with the peak being at the lower end of the symptom dimension. Clinical samples are a more selective population and typically more severe in presentation. Indeed, only a small proportion of participants in community-based studies meet the diagnostic screening criteria for ADHD (Willcutt, Reference Willcutt2012). In contrast, the distribution of scores among clinical diagnostic samples is likely to be the reverse of those found in community-based studies (Amador-Campos et al., Reference Amador-Campos, Gómez-Benito and Ramos-Quiroga2014). This important point needs further research.
A key finding from the present study, however, was that the severity of ADHD symptoms among females and males did not differ significantly when only clinical diagnostic interviews were included in the analysis. This was evident for both children and adults. This diverges from clinical prevalence data; Willcutt (Reference Willcutt2012) investigated clinical prevalence (as opposed to the severity of symptoms) in a clinical population (in the literature published between 1994 and 2010) and reported that the clinical and community diagnostic patterns were consistent, i.e. that diagnostic outcomes differed between sexes with females being more likely to be diagnosed with the inattentive domain and males with the combined domains.
The present results are largely consistent with those reported by the Child and Adolescent Twin Study (CATSS), a large Swedish study that investigated sex differences in the severity and presentation of ADHD symptoms, conduct problems, and learning problems in boys and girls with and without clinically diagnosed ADHD. The participants were parents of 19 804 twins (50.64% male) who were assessed at 9 years of age (Anckarsäter et al., Reference Anckarsäter, Lundström, Kollberg, Kerekes, Palm, Carlström and Lichtenstein2011). Children were linked to Patient Register data on clinical ADHD diagnosis and medication prescriptions. At a population level, boys had higher scores for all symptom domains, but a similar severity was observed in clinically diagnosed boys and girls. For girls, prediction analyses found that hyperactivity/impulsivity and conduct problems were stronger predictors of clinical diagnosis and the prescription of medication. The authors concluded that ‘females with ADHD may be more easily missed in the ADHD diagnostic process and less likely to be prescribed medication unless they have prominent externalizing problems’.
If the severity of symptom presentation is similar between boys and girls with ADHD, how can one understand the substantial difference in male/female ratios? (Willcutt, Reference Willcutt2012). One possibility is that there may be a sex bias in the process of receiving a clinical diagnosis of ADHD but we cannot confirm this from the current data. The CATSS study reported that externalizing behaviors drive the referrals for ADHD, hence females with ADHD who are without prominent externalizing problems may be missed or misdiagnosed (Anckarsäter et al., Reference Anckarsäter, Lundström, Kollberg, Kerekes, Palm, Carlström and Lichtenstein2011).
One explanation may be the ‘female protective effect’ theory which posits that girls and women may need to reach a higher threshold of genetic and environmental exposures for ADHD to be expressed (Taylor et al., Reference Taylor, Lichtenstein, Larsson, Anckarsäter, Greven and Ronald2016). This may also contribute to the lower prevalence of ADHD and less severe externalizing hyperactive-impulsive problems in females (Rhee & Waldman, Reference Rhee and Waldman2016).
It is possible that the ‘zeitgeist’ of early findings, albeit important at the time, contributed to perceived differences in symptom profiles that then led to expectations of how females with ADHD present. Females may present with different forms of behavioral problems, yet the severity of symptom ratings remains similar. It may be the underlying expression, in terms of functional behavior, that differs; males may present with greater disruptive, aggressive, and conduct-related problems whereas the focus for females may be more social, relational, and emotional in nature, including deliberate self-harming behaviors (Young et al., Reference Young, Adamo, Asgeirsdottir, Branney, Beckett, Colley and Woodhouse2020). Moreover, referrals, which are usually initiated by parents or teachers, may prioritize children who are difficult to manage.
Indeed, the different expectations in how females present with ADHD can be an influential factor. When teachers are presented with ADHD-like vignettes that vary solely on the use of male or female names and pronouns, boys’ names were more likely to be referred for additional support (Sciutto, Nolfi, & Bluhm, Reference Sciutto, Nolfi and Bluhm2004) and considered more suitable for treatment (Pisecco, Huzinec, & Curtis, Reference Pisecco, Huzinec and Curtis2001). Parents may also overrate the severity of hyperactive/impulsive symptoms and impairments in boys and underestimate these same symptoms in girls (Mowlem, Agnew-Blais, Taylor, & Asherson, Reference Mowlem, Agnew-Blais, Taylor and Asherson2019a; Reference Mowlem, Rosenqvist, Martin, Lichtenstein, Asherson and Larsson2019b).
Once referred for assessment, the expectation of sexual dimorphism may also influence the outcome for girls (Young et al., Reference Young, Adamo, Asgeirsdottir, Branney, Beckett, Colley and Woodhouse2020). ADHD symptoms may not be detected because females may be more likely to camouflage and/or engage in compensatory behaviors (Mowlem et al., Reference Mowlem, Agnew-Blais, Taylor and Asherson2019a; Reference Mowlem, Rosenqvist, Martin, Lichtenstein, Asherson and Larsson2019b; Quinn & Madhoo, Reference Quinn and Madhoo2014); this may delay the time to referral as well as misdirect the assessment process. For females, the diagnosis may be missed or misdiagnosed due to comorbid problems. Females with ADHD have been reported to have a higher presence of mental health problems (such as anxiety, depression, eating disorders, and self-harming behaviors), leading to admission for inpatient care (Cortese, Faraone, Bernardi, Wang, & Blanco, Reference Cortese, Faraone, Bernardi, Wang and Blanco2016; Dalsgaard et al., Reference Dalsgaard, Mortensen, Frydenberg, Maibing, Nordentoft and Thomsen2014; Dalsgaard, Mortensen, Frydenberg, & Thomsen, Reference Dalsgaard, Mortensen, Frydenberg and Thomsen2002).
In the last decade, it has become clear that by adulthood the gap in prevalence rates by sex is substantially reduced. Young adult studies (both epidemiological and clinical) show a more balanced distribution of prevalence of ADHD total symptoms and subtypes (combined, hyperactivity/impulsivity, and inattention) in men and women (Willcutt, Reference Willcutt2012). Social relational difficulties (infrequently mentioned in academic reports) are unlikely to be seen as symptomatic, and as girls become young women, this may be explained by an increasing interface with health services and/or associated self-referral.
There are several reasons why our findings differ from those previous meta-analyses reported. Our study applied a more refined methodology by drawing on a much greater sample size than earlier meta-analyses conducted over 20 years ago (Gaub & Carlson, Reference Gaub and Carlson1997; Gershon, Reference Gershon2002). This may reflect that substantial revisions have been made to the diagnostic nosology in the intervening years, including the recognition for some individuals that ADHD persists well into adulthood.
Furthermore, previous reports of studies that have found sex differences in presenting symptoms have discussed ideas that may explain this; for example, suggesting differences in hormones (Waddell & McCarthy, Reference Waddell and McCarthy2012) or chromosomes (Greven, Rijsdijk, & Plomin, Reference Greven, Rijsdijk and Plomin2011). The results of our present analysis, however, indicate that there are not large differences in presenting symptoms between the sexes, but rather suggest a systemic bias in the referral and assessment of people for ADHD. In other words, the previously described sex differences in presenting symptoms may not exist so much within the sexual biology of the people with ADHD but in the gender biases of their surrounding caretakers. It is reasonable to us also that documented differences in biology may well affect other aspects of ADHD and the resulting behavior; but it appears from our study that presenting symptoms are not largely affected by sex differences.
Much of the community data appears to be driven by information drawn from rating scales. Rating scales are helpful indicators that may be usefully applied to assess treatment outcomes; however, they are not diagnostic (Kooij et al., Reference Kooij, Bejerot, Blackwell, Caci, Casas-Brugué, Carpentier and Asherson2010; Young et al., Reference Young, Adamo, Asgeirsdottir, Branney, Beckett, Colley and Woodhouse2020). There is a variation in how well they map onto clinical diagnostic criteria; indeed, many are not compliant with current diagnostic nosology (e.g. DSM-5 criteria). They may overclassify ADHD symptoms as a ‘current’ presenting problem; the etiology of which may be another distinct and predominant clinical presentation (e.g. anxiety, bipolar, and/or personality disorders). For example, an empirical study of ADHD in male prisoners identified that rating scales are associated with both false-positive and false-negative symptoms (Young et al., Reference Young, Gudjonsson, Wells, Asherson, Theobald, Oliver and Mooney2009), emphasizing the need for practitioners to move to a clinical diagnostic interview when borderline scores are obtained.
Importantly, a scoping search in PubMed limited to the papers published from May 28, 2021 to January 6, 2024 identified 30 studies. None of those consisted of relevant gender-disaggregated data on ADHD symptom severity. This aligns with our expectations as not examining characteristics of males and females separately is the gap in the literature that our meta-analysis has addressed. We therefore encourage future studies to prioritize gender-specific analyses in this area.
Another research direction that we suggest is acknowledging and exploring the effect of ethnicity. We expect that as more primary studies including reports of outcomes broken out by ethnicity are published, future systematic reviews and meta-analyses will be able to approach this research question, with the foundation laid by the present study of gender.
Strengths and limitations
The main advantages of the current meta-analysis study over the previous studies were the reliance on more refined symptom criteria (using the DSM-IV, DSM-IV-TR, and DSM-5), the separation of clinical diagnostic assessment and rating scale samples data, and the inclusion of both child and adult populations (the latter including a broader age range of adults). To provide a comprehensive picture of the issues we wished to investigate, we included a wide age range in our search which generated a substantial amount of data. The number of studies across the 15 meta-analyses shown in Table 1 ranged from 3 to 52 and the proportion of females v. males varied across the analysis. Some studies did not compare all three ADHD presentations (inattention, hyperactivity/impulsivity, and combined).
We only included studies in the English language as we did not have funding for translation. We did not include unpublished study data, because we wanted to maintain the level of data quality provided by the peer-review process. Nevertheless, 9 of the 51 manuscripts were rated as poor quality. When the analysis was rerun after removing the poor-quality studies in the total sample, the only change was inattention exhibiting significance in the sensitivity analysis (females exhibited less severe inattention than males) still with a small effect size. Significant heterogeneity was found in all but four meta-analyses (see Fig 6, 9, 14); the latter indicating diversity across the studies in terms of possible outliers, populations, methodology, comorbid factors, and/or publication bias. Publication bias was present for the data of inattention and combined presentations in studies that aggregated both children and adults and both rating scale and interview data (p < 0.05). For children, according to the rating scale data, publication bias was present for inattention and HI (p < 0.05). Once more publications are available, it would be helpful for future research to conduct similar analyses drawing solely on strong/good quality studies.
Another limitation is the small number of studies included in some of the sub-analyses using subsets of the full number of studies included in the systematic review. In some of the sub-analyses, the low number of studies included due to a lack of adult data that met the inclusion criteria for the community rating scale population, we were unable to analyze the contribution of these data for adults. Furthermore, there was insufficient data for a meta-regression which would be informative in establishing which variables influence the outcomes of inattention, hyperactivity/impulsivity, and combined classifications. Future research should focus on this to increase our understanding of the different types of ADHD compared with a general population sample.
Conclusions
The present study extended early systematic reviews and meta-analyses of data comparing the symptom presentation of males and females with ADHD. Our review of the literature identified 51 manuscripts (52 studies) that in total included male (n = 8451) and female (n = 7304) participants with ADHD across childhood and/or adulthood. Drawing on this substantially greater sample than those applied over 20 years previously, our findings yield a different perspective and an important insight on the symptom presentation of females with ADHD.
Both males and females appear to equally endorse the severity of ADHD symptoms when assessed using clinical diagnostic interview data (the great majority of which were from clinical settings). By contrast, rating scale data (predominantly drawn from community samples) showed males present with more severe symptoms than females, but the type of symptoms was different for the child and adult studies, suggesting an important sex differential in the remittance of symptoms from childhood into adulthood. This is a novel finding and needs further investigation. Taking the findings as a whole, there may be a sex bias in the initiation of the process of receiving a clinical diagnosis of ADHD possibly due to the perception that ADHD is a behavioral conduct-related problem. The bias may influence the ratings of clinicians and/or of individuals completing them. Perhaps it is the underlying expression and functional behaviors associated with ADHD symptoms that differ between sexes leading to the under-recognition and underestimation of the prevalence of ADHD in females. This means that many girls and women with ADHD are likely to be unidentified and untreated, which in turn may have implications for social, educational, and mental health outcomes.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291724001600.
Data availability statement
The template data collection forms, data extracted from included studies, and data used for all analyses are not publicly available. Requests for accessing the datasets should be directed to O.K., drozgekilic@gmail.com.
Acknowledgments
We thank Victoria Williams, Patrycja Garasimczyk, Emma Bush, Sonya McCrea, Sanushiya Ananthakumar, and Tamsin Crook for their support in the initial screening of abstracts.
Author contributions
S. Y. led the project in collaboration with O. K., J. K., and J. H. in the planning and scientific input of the study. O. K., J. K., A. S.-K., J. H., B. G., N. S., K. C., and U. E. Y. substantially assisted with the data screening and extraction process under the supervision of O. K. O. K. and O. U. conducted the statistical analysis. S. Y. wrote the manuscript with input from O. K., J. K., S. C., G. H. G., and B. S. All authors have read and agreed to the published version of the manuscript.
Competing interests
In the last 5 years, S. Y. has received honoraria for consultancy and educational talks from HB Pharma, Medice, and Takeda. She is the author of the ADHD Child Evaluation (ACE) and ACE+ for adults. J. K. is the owner of BPS International and IHS International, which have received consultancy fees in the last 5 years from Sage Therapeutics, Cognition Therapeutics, Jazz Pharmaceuticals, and Greenwich Biosciences; no projects were related to ADHD. S. C. declares honoraria and reimbursement for travel and accommodation expenses for lectures from the following non-profit associations: Association for Child and Adolescent Central Health (ACAMH), Canadian ADHD Alliance Resource (CADDRA), British Association of Pharmacology (BAP), and from Healthcare Convention for educational activity on ADHD. O. K., O. U., G. H. G., J. H., A. S.-K., B. G., K. C., N. S., U. E. Y., and B. S. declare no competing interests.