Introduction
Mild cognitive impairment (MCI) represents an intermediate stage of cognitive impairment that is often, but not always, a transitional phase between normal aging and dementia (Petersen et al., Reference Petersen, Smith, Waring, Ivnik, Tangalos and Kokmen1999, Reference Petersen, Caracciolo, Brayne, Gauthier, Jelic and Fratiglioni2014; Scharre, Reference Scharre2019, Winblad et al., Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni, Wahlund, Nordberg, Bäckman, Albert, Almkvist, Arai, Basun, Blennow, De Leon, DeCarli, Erkinjuntti, Giacobini, Graff, Hardy and Petersen2004). The original construct of MCI was intended to identify and evaluate individuals for suspected Alzheimer’s disease (AD) (Petersen et al., Reference Petersen, Smith, Waring, Ivnik, Tangalos and Kokmen1999; Petersen, Reference Petersen2004), and indeed those who carry a diagnosis of MCI are at increased risk for further cognitive and functional decline with the rate of conversion to dementia ranging from 5 to 15% annually (Angevaare et al., Reference Angevaare, Vonk, Bertola, Zahodne, Watson, Boehme, Schupf, Mayeux, Geerlings and Manly2022; Dunne et al., Reference Dunne, Aarsland, O’Brien, Ballard, Banerjee, Fox, Isaacs, Underwood, Perry, Chan, Dening, Thomas, Schryer, Jones, Evans, Alessi, Coulthard, Pickett, Elton and Burns2021, Thomas et al., Reference Thomas, Cook, Bondi, Unverzagt, Gross, Willis and Marsiske2020). However, not everyone who carries a diagnosis of MCI progresses to AD or other subtypes of dementia, and longitudinal studies show that 5–53% of individuals with MCI at baseline no longer meet diagnostic criteria at their subsequent study visit. Higher rates of reversion occur in community-based settings (Ganguli et al., Reference Ganguli, Snitz, Saxton, Chang, Lee, Vander Bilt, Hughes, Loewenstein, Unverzagt and Petersen2011; Manly et al., Reference Manly, Tang, Schupf, Stern, Vonsattel and Mayeux2008; Pandya, Lacritz, Weiner, Deschner, & Woon, Reference Pandya, Lacrtiz, Weiner, Deschner and Woon2017; Angevaare et al., Reference Angevaare, Vonk, Bertola, Zahodne, Watson, Boehme, Schupf, Mayeux, Geerlings and Manly2022). Importantly, it is now widely recognized that the underlying causes of MCI are heterogenous (e.g., stroke, traumatic brain injury, medication side effects, low vitamin B12 levels, etc.; Al-Qazzaz, Ali, Ahmad, Islam, & Mohamad., Reference Al-Qazzaz, Ali, Ahmad, Islam and Mohamad2014; Calvillo & Irimia, Reference Calvillo and Irimia2020; Marvanova, Reference Marvanova2016; Moore et al., Reference Moore, Mander, Ames, Carne, Sanders and Watters2012), reversible (in the case of medication side effects or recovery from neurological insult), and not necessarily due to a neurodegenerative disorder (Albert et al., Reference Albert, DeKosky, Dickson, Dubois, Feldman, Fox, Gamst, Holtzman, Jagust, Petersen, Snyder, Carrillo, Thies and Phelps2011; Dunne et al., Reference Dunne, Aarsland, O’Brien, Ballard, Banerjee, Fox, Isaacs, Underwood, Perry, Chan, Dening, Thomas, Schryer, Jones, Evans, Alessi, Coulthard, Pickett, Elton and Burns2021; Petersen et al., Reference Petersen, Doody, Kurz, Mohs, Morris, Rabins, Ritchie, Rossor, Thal and Winblad2001). In the absence of methods to slow or halt many causes of neurodegeneration, a diagnosis of MCI without a clear understanding about the likely prognosis carries the risk of causing confusion, concern, and potential overtreatment of individuals who may never progress to dementia (Kaduszkiewicz et al., Reference Kaduszkiewicz, Eisele, Wiese, Prokein, Luppa, Luck, Jessen, Bickel, Mosch, Pentzek, Fuchs, Eifflaender-Gorfer, Weyerer, Konig, Brettschneider, van den Bussche, Maier, Scherer and Riedel-Heller2014; Gomersall et al., Reference Gomersall, Astell, Nygard, Sixsmith, Milhailidis and Hwang2015; Visser et al., Reference Visser, van Maurik, Bouwman, Staekenborg, Vreeswijk, Hempenius, de Beer, Roks, Boelaarts, Kleijer, van der Flier, Smets and Ginsberg2020).
Although diagnostic criteria for “MCI” and international consensus guidelines have been developed and refined over decades beginning in the 1990s, (Albert et al., Reference Albert, DeKosky, Dickson, Dubois, Feldman, Fox, Gamst, Holtzman, Jagust, Petersen, Snyder, Carrillo, Thies and Phelps2011; Artero et al., Reference Artero, Petersen, Touchon and Ritchie2006; Chen et al., Reference Chen, Liang, Li, Yang, Wang and Shi2021; Dunne et al., Reference Dunne, Aarsland, O’Brien, Ballard, Banerjee, Fox, Isaacs, Underwood, Perry, Chan, Dening, Thomas, Schryer, Jones, Evans, Alessi, Coulthard, Pickett, Elton and Burns2021; Petersen et al., Reference Petersen, Smith, Waring, Ivnik, Tangalos and Kokmen1999; Winblad et al, Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni, Wahlund, Nordberg, Bäckman, Albert, Almkvist, Arai, Basun, Blennow, De Leon, DeCarli, Erkinjuntti, Giacobini, Graff, Hardy and Petersen2004) several definitions of MCI are still widely used (Bondi et al, Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014; Edmonds et al., Reference Edmonds, Smirnov, Thomas, Graves, Bangen, Delano-Wood, Galasko, Salmon and Bondi2021; Graves et al., Reference Graves, Edmonds, Thomas, Weigand, Cooper and Bondi2020; Vuoksimaa et al., Reference Vuoksimaa, McEvoy, Holland, Franz and Kremen2020; Wong et al., Reference Wong, Thomas, Edmonds, Weigand, Bangen, Eppig, Jak, Devine, Delano-Wood, Libon, Edland, Au and Bondi2018). Fluid biomarkers, alone or in combination with cognitive assessments, are now recommended for earlier identification of individuals in the preclinical or prodromal stages of AD (Dubois et al., Reference Dubois, Villain, Frisoni, Rabinovici, Sabbagh, Cappa, Bejanin, Bombois, Epelbaum, Teichmann, Habert, Nordberg, Blennow, Galasko, Stern, Rowe, Salloway, Schneider, Cummings and Feldman2021; Hansson et al., Reference Hansson, Edelmayer, Boxer, Carrillo, Mielke, Rabinovici, Salloway, Sperling, Zetterberg and Teunissen2022; Jack et al., Reference Jack, Bennett, Blennow, Carrillo, Dunn, Haeberlein, Holtzman, Jagust, Jessen, Karlawish, Liu, Molinuevo, Montine, Phelps, Rankin, Rowe, Scheltens, Siemers, Snyder and Silverberg2018; Scharre, Reference Scharre2019), although their use outside research settings or in low-income countries is limited. There are no specific biomarkers to confirm the presence of MCI (Giau et al., Reference Giau, Bagyinszky and An2019), nor does an abnormal result provide certainty of decline (Dubois et al., Reference Dubois, Villain, Frisoni, Rabinovici, Sabbagh, Cappa, Bejanin, Bombois, Epelbaum, Teichmann, Habert, Nordberg, Blennow, Galasko, Stern, Rowe, Salloway, Schneider, Cummings and Feldman2021; Mormino & Papp, Reference Mormino, Papp, Perry, Avila, Moreira, Sorensen and Tabaton2018). Rather, MCI diagnosis continues to rely largely on phenotypic presentation and clinical judgment (Donders, Reference Donders2020; Dubois et al., Reference Dubois, Villain, Frisoni, Rabinovici, Sabbagh, Cappa, Bejanin, Bombois, Epelbaum, Teichmann, Habert, Nordberg, Blennow, Galasko, Stern, Rowe, Salloway, Schneider, Cummings and Feldman2021; Hansson et al., Reference Hansson, Edelmayer, Boxer, Carrillo, Mielke, Rabinovici, Salloway, Sperling, Zetterberg and Teunissen2022; Scharre, Reference Scharre2019). Current conventional clinical criteria are largely iterations of criteria first proposed by Petersen et al. (Reference Petersen, Smith, Waring, Ivnik, Tangalos and Kokmen1999) and updated by Winblad et al. (Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni, Wahlund, Nordberg, Bäckman, Albert, Almkvist, Arai, Basun, Blennow, De Leon, DeCarli, Erkinjuntti, Giacobini, Graff, Hardy and Petersen2004) at the First Key Symposium on MCI. Notably, these criteria were never explicitly operationally defined, leaving them open for divergent interpretations and applications. There is no consensus, for example, for how objective impairment should be operationalized. This results in wide variation in the application of the definition, including which neurocognitive domains to include, the number of tests within a domain required to demonstrate impairment, the specific tests to use, and the level of impairment required (Ritchie et al., Reference Ritchie, Artero and Touchon2001; de Vent et al., Reference de Vent, Agelink van Rentergem, Huizenga, van der Flier, Sikkes, Murre, van den Bosch, Scheltens and Schmand2020). Differences in the operationalization of various case definitions contribute to variations in prevalence estimates, reversion rates, and predictive validity for subsequent dementia (Bondi & Smith, Reference Bondi and Smith2014).
With no guidance from the conventional Petersen/Winblad definition on the number of tests that must show impairment within a cognitive domain, the minimum requirement is often assumed (i.e., one impaired score). However, requiring only one impaired score has been criticized (Jak et al., Reference Jak, Preis, Beiser, Seshadri, Wolf, Bondi and Au2016; Loewenstein et al., Reference Loewenstein, Acevedo, Small, Agron, Crocco and Duara2009; Trittschuh et al., Reference Trittschuh, Crane, Larson, Cholerton, McCormick, McCurry, Bowen, Baker and Craft2011), as upwards of 55% of cognitively normal adults have one or more impaired scores when given a battery of neurocognitive tests (Binder et al., Reference Binder, Iverson and Brooks2009; Blackford & LaRue, Reference Blackford and LaRue1989; Brooks et al., Reference Brooks, Iverson and White2007; Brooks et al., Reference Brooks, Iverson, Holdnack and Feldman2008). Having a false-positive diagnosis increases with the number of tests administered. In addition, the number of false positives followed by reversion instead of progression, might also increase due to day-to-day variation in cognitive performance if testing occurs on a bad day (Brose et al., Reference Brose, Schmiedek, Lövdén and Lindenberger2012; von Stumm, Reference von Stumm2016; Facer-Childs, Boiling, & Balanos, Reference Facer-Childs and Boiling2018).
To improve diagnostic stability and reduce the number of false-positive diagnostic errors, Jak/Bondi proposed an MCI classification using comprehensive neuropsychological criteria requiring two impaired scores to establish reliable cognitive impairment (Jak, Bondi et al., Reference Jak, Bondi, Delano-Wood, Wierenga, Corey-Bloom, Salmon and Delis2009). Using this comprehensive neuropsychological approach, Jak/Bondi’s definition increased diagnostic accuracy (i.e., had a similar strength of association with incident dementia to Petersen/Winblad criteria despite diagnosing 30% fewer participants with MCI) and correlated more strongly with AD biomarkers when compared to Petersen/Winblad criteria in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014; Jak et al., Reference Jak, Preis, Beiser, Seshadri, Wolf, Bondi and Au2016).
Despite progress in refining MCI criteria, there is still no consensus as to (1) which specific cognitive tests (e.g., list learning, story memory, or both, within the memory domain) should be used; (2) how many tests across cognitive domains should comprise a test battery; or (3) what norms or thresholds should indicate cognitive impairment (Vuoksimaa et al., Reference Vuoksimaa, McEvoy, Holland, Franz and Kremen2018; Oltra-Cucarella et al., Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018). Moreover, sole reliance on quantitative data may omit important contextual information for interpreting whether an abnormal result truly reflects a decline (Ganguli et al., Reference Ganguli, Snitz, Saxton, Chang, Lee, Vander Bilt, Hughes, Loewenstein, Unverzagt and Petersen2011). For example, it is well-documented that racially/ethnically diverse individuals, on average, tend to obtain lower scores on cognitive tests than their non-Hispanic White counterparts due to a variety of socioeconomic factors such as quality of early education, acculturation, and health environments (Boone et al., Reference Boone, Victor, Wen, Razani and Pontón2007; Byrd et al., Reference Byrd, Walden Miller, Reilly, Weber, Wall and Heaton2006; Manly & Echemendia, Reference Manly and Echemendia2007; Zahodne et al., Reference Zahodne, Sharifian, Kraal, Zaheed, Sol, Morris, Schupf, Manly and Brickman2021). In the oldest-old (aged 80+), there may be several reasons beyond incipient dementia for low cognitive scores (Hong et al., Reference Hong, Zarit and Johansson2003; Kravitz et al., Reference Kravitz, Schmeidler and Schnaider Beeri2012; Wong et al., Reference Wong, Thomas, Edmonds, Weigand, Bangen, Eppig, Jak, Devine, Delano-Wood, Libon, Edland, Au and Bondi2018), such as age-related accumulation of comorbidities, fatigue, or sensory impairment (Arosio et al., Reference Arosio, Ostan, Mari, Damanti, Ronchetti, Arcudi, Scurti, Franceschi and Monti2017). Furthermore, Brooks and colleagues (Reference Brooks, Iverson and White2007) demonstrated that low scores on cognitive tests occur in 44% of individuals with high average or extremely high intellectual functioning and 80% of older adults with low average intellectual abilities using a 1 SD cutoff. The risk of overinterpreting low test scores because of high base rates and normal variability in these various populations is high, and as such, all definitions, including the Jak/Bondi definition, may be subject to false-positive errors.
To account for the effects of normal variability in cognitive test performance and to address certain criticisms of Petersen and Jak/Bondi classification criteria, Oltra-Cucarella and colleagues (Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018) proposed to classify individuals with MCI based on the number of impaired tests (NIT). The researchers first defined a base rate of low scores as the average number of low scores in the worst-performing 10% of cognitively normal individuals. They then theorized that individuals with a greater number of low scores than expected would be more likely to demonstrate true impairment (thereby minimizing the number of false positives). When applied to the ADNI dataset, the use of this classification led to better prediction of progression from MCI to AD (34%) than both the Petersen/Winblad (29%) and the Jak/Bondi criteria (31%) (Oltra-Cucarella et al., Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018). However, the following concerns were raised regarding wide utility of this approach: (1) cutoff for impairment is based on the ADNI sample rather than external or nationwide norms, and it may not be as easy to calculate a base rate of low scores in practice; (2) the number of impaired scores in the lowest 10% will vary depending on the number of measures and scores available; and (3) this study used multiple scores from the same test, possibly inflating the number of impaired scores (i.e., individuals with impaired list-learning recognition will likely have impaired recall as well) (Vuoksimaa et al., Reference Vuoksimaa, McEvoy, Holland, Franz and Kremen2020).
Most recently, Alfano and colleagues (Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022) suggested using a clinical ratings (Global CR) approach, which has been validated and is considered a “gold standard” for detecting cognitive impairment in the human immunodeficiency virus literature (Heaton et al., Reference Heaton, Grant, Butters, White, Kirson, Atkinson, McCutchan, Taylor, Kelly, Ellis, Wolfson, Velin, Marcotte, Hesselink, Jernigan, Chandler, Wallace and Abramson1995; Malaspina et al., Reference Malaspina, Woods, Moore, Depp, Letendre, Jeste and Grant2011; Blackstone et al., Reference Blackstone, Moore, Franklin, Clifford, Collier, Marra, Gelman, McArthur, Morgello, Simpson, Ellis, Atkinson, Grant and Heaton2012) and closely parallels the profile analysis strategy seen in the clinical decision-making process (Alfano et al., Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022). Using this classification method, 26% of the sample was diagnosed with MCI and there was substantial agreement between Petersen/Winblad and Jak/Bondi definitions (Alfano et al., Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022). Notably, this study was cross-sectional so there is no information regarding the predictive validity of the Global CR approach for incident dementia. Moreover, Alfano and colleagues (Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022) acknowledged the importance of further validating the approach in individuals with known or suspected MCI and of incorporating demographic variables such as race/ethnicity.
Much of the work on defining MCI criteria has been carried out at the Mayo Clinic in Rochester, Minnesota, and in ADNI where over 90% of participants are non-Hispanic White and highly educated with college degrees (Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014; Oltra-Cucarella et al., Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018; Petersen et al., Reference Petersen, Aisen, Beckett, Donohue, Gamst, Harvey, Jack, Jagust, Shaw, Toga, Trojanowski and Weiner2010; Vuoksimaa et al., Reference Vuoksimaa, McEvoy, Holland, Franz and Kremen2020). Comparisons between conventional and neuropsychological criteria have similarly been attempted in demographically homogeneous samples (Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014; Edmonds et al., Reference Edmonds, Smirnov, Thomas, Graves, Bangen, Delano-Wood, Galasko, Salmon and Bondi2021; Graves et al., Reference Graves, Edmonds, Thomas, Weigand, Cooper and Bondi2020; Jak et al., Reference Jak, Preis, Beiser, Seshadri, Wolf, Bondi and Au2016; Oltra-Cucarella et al., Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018; Thomas et al., Reference Thomas, Eppig, Weigand, Edmonds, Wong, Jak, Delano‐Wood, Galasko, Salmon, Edland and Bondi2019, Wong et al., Reference Wong, Thomas, Edmonds, Weigand, Bangen, Eppig, Jak, Devine, Delano-Wood, Libon, Edland, Au and Bondi2018). However, diverse older adults tend to obtain lower test scores across cognitive domains (Díaz-Venegas et al., Reference Díaz-Venegas, Downer, Langa and Wong2016; Gasquoine, Reference Gasquoine2009; Zahodne, Manly, Azar, Brickman, & Glymour, Reference Zahodne, Manly, Azar, Brickman and Glymour2016; Werry et al., Reference Werry, Daniel and Bergström2019) due to non-neurological factors including, but not exclusive to, quality of education, acculturation, or bias (Boone et al., Reference Boone, Victor, Wen, Razani and Pontón2007; Byrd et al., Reference Byrd, Walden Miller, Reilly, Weber, Wall and Heaton2006; Manly & Echemendia, Reference Manly and Echemendia2007; Manly, Reference Manly2005). Thus far, two studies have compared the diagnostic agreement of MCI definitions in diverse older adults. However, these studies utilized individuals from a specialized memory clinic and measured Jak/Bondi criteria against the consensus approach and not Petersen/Winblad criteria alone (Devlin et al., Reference Devlin, Brennan, Saad, Giovannetti, Hamilton, Wolk, Xie, Mechanic-Hamilton and Okonkwo2022; Graves et al., Reference Graves, Edmonds, Thomas, Weigand, Cooper, Stickel, Zlatar, Clark and Bondi2022); furthermore, neither study included NIT or Global CR approaches. Thus, it remains unclear (1) how Petersen/Winblad (not a consensus definition) and Jak/Bondi criteria compare in predicting incident dementia in a community-based sample and (2) how other methods of MCI measurement that may account for normal variability (i.e., NIT or Global CR) would fare relative to these two-widely used definitions.
In the current study, we compared the four definitions of MCI described above: conventional Petersen/Winblad criteria, Jak/Bondi’s comprehensive neuropsychological criteria, NIT, and Global CR within the Einstein Aging Study (EAS), a community-residing, racially/ethnically and educationally diverse cohort of older adults from Bronx, NY. Specifically, we sought to address gaps in the literature by comparing the prevalence of MCI diagnosis and predictive validity for incident dementia to determine which MCI definition approach offers the greatest diagnostic and prognostic accuracy, particularly in diverse, community-based samples. These results will inform how MCI will be operationalized in the EAS sample and may be relevant to other samples with similar demographic characteristics.
Methods
Participants
The EAS is a longitudinal study of community-residing individuals in a racially and ethnically diverse urban setting (Katz et al., Reference Katz, Lipton, Hall, Zimmerman, Sanders, Verghese, Dickson and Derby2012). Details about study recruitment have been described elsewhere (Katz et al., Reference Katz, Lipton, Hall, Zimmerman, Sanders, Verghese, Dickson and Derby2012). In brief, participants were systematically recruited using Bronx County Voter Registration lists, mailed introductory letters, and given a telephone screen to determine study eligibility. Those who met preliminary eligibility criteria were invited for further in-person evaluations. In-person assessments were then conducted annually and included comprehensive neurological, medical, psychosocial, and neuropsychological evaluations. All protocols were approved by the Einstein Institutional Review Board and written informed consent was obtained at the initial clinic visit. This research was completed in accordance with the Helsinki Declaration. Inclusion criteria were age 70 and above, resident of Bronx, NY, community-dwelling, and English speaking. Exclusion criteria at enrollment included severe audiovisual or physical impairments, or active psychiatric symptomatology, which may interfere with the ability to complete assessments (Katz et al., Reference Katz, Lipton, Hall, Zimmerman, Sanders, Verghese, Dickson and Derby2012). Participants eligible for the analysis were enrolled between October 1993 and June 2016, free of dementia at baseline, and had at least one annual follow-up. Due to the limited number of participants who were Hispanic or of other racial/ethnic groups, the study was further restricted to those who self-reported as non-Hispanic White or non-Hispanic Black.
Neuropsychological assessment
Participants completed standardized neuropsychological testing at baseline and all follow-up visits, which were conducted annually (Katz et al., Reference Katz, Lipton, Hall, Zimmerman, Sanders, Verghese, Dickson and Derby2012). Five cognitive domains were used for MCI diagnosis: memory, attention, executive functioning, language, and visuospatial functioning, with two tests included in each domain. The memory domain included the free recall measure from the Free and Cued Selective Reminding Test (Buschke, Reference Buschke1984) and the Wechsler Memory Scale-Revised Logical Memory I subtest (WMS-R-LMI) (Wechsler, Reference Wechsler1987). Attention/processing speed was measured using the Digit Span subtest of the Wechsler Adult Intelligence Scale-III (Wechsler, Reference Wechsler1997) and the Trail Making Test, part A (Reitan, Reference Reitan1958). Executive functioning tests included the Trail Making Test, part B (Reitan, Reference Reitan1958), and the Letter Fluency “FAS” task (Spreen & Strauss, Reference Spreen and Strauss2006). Language was measured with the Category Fluency task (animals, vegetables, fruits) (Rosen, Reference Rosen1980) and the Boston Naming Test (Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983). Visuospatial functioning tests included the Block Design and Digit Symbol subtest from the WAIS-III (Wechsler, Reference Wechsler1997).
Group/diagnostic MCI classification
MCI classifications were made using four sets of criteria: Petersen/Winblad standard criteria (Petersen et al., Reference Petersen, Smith, Waring, Ivnik, Tangalos and Kokmen1999; Winblad et al., Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni, Wahlund, Nordberg, Bäckman, Albert, Almkvist, Arai, Basun, Blennow, De Leon, DeCarli, Erkinjuntti, Giacobini, Graff, Hardy and Petersen2004), Jak/Bondi’s comprehensive neuropsychological criteria (Jak, Bondi et al., Reference Jak, Bondi, Delano-Wood, Wierenga, Corey-Bloom, Salmon and Delis2009; Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014), NIT (Oltra-Cucarella et al., Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018), and Global CR (Alfano et al., Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022). Normative data were calculated using local norms derived for cognitively unimpaired (CU) individuals in the sample (described in Supplementary Methods), and scores for all cognitive tests used were adjusted for age, sex, education, and race/ethnicity.
Petersen/Winblad
Updated Petersen criteria (Artero et al., Reference Artero, Petersen, Touchon and Ritchie2006; Winblad et al., Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni, Wahlund, Nordberg, Bäckman, Albert, Almkvist, Arai, Basun, Blennow, De Leon, DeCarli, Erkinjuntti, Giacobini, Graff, Hardy and Petersen2004) were used and to receive a diagnosis of MCI, it required (1) objective cognitive impairment (>1.5 SD below the age-, sex-, education-, and race/ethnicity-adjusted mean) in at least one test; (2) subjective cognitive concern operationalized as any SCC indicated by self- or informant-report; (3) absence of functional decline as measured by the IADL Lawton Brody scale (Lawton & Brody, Reference Lawton and Brody1969); and (4) no diagnosis of dementia.
Jak/Bondi
The most widely used Jak/Bondi definition (Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014) for MCI was adapted for use in our study: (1) one low score (>1 SD below the age-, sex-, education-, and race/ethnicity-adjusted mean) on both measures within at least one cognitive domain or (2) at least one low score (>1 SD below the age-, sex-, education-, and race/ethnicity-adjusted mean) across at least two cognitive domains. The three-domain criterion that was previously used in a study conducted in the ADNI sample (Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014) was modified to account for a larger battery of five cognitive domains and follows a similar approach that Graves and colleagues (2020, 2022) used for application of the Jak/Bondi criteria in the National Alzheimer’s Coordinating Center cohort. Although Functional Activities Questionnaire (FAQ; Pfeffer et al., Reference Pfeffer, Kurosaki, Harrah, Chance and Filos1982) scores were included in the Jak/Bondi definition from this previous ADNI study (Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014), this was a standalone criterion used to classify MCI (i.e., individuals had to meet one of three criteria; the third being FAQ scores), and most recent studies using Jak/Bondi do not include it in their operationalizations. Moreover, there are no guidelines on cutoff scores (i.e., 9 as a standalone criterion in the original definition but more recent studies incorporate FAQ +objective impairment, but only for the multiple domain criteria). Lastly, FAQ is an informant-reported measure that may be affected by demographic and relational characteristics (Hackett et al., Reference Hackett, Mis, Drabick and Giovannetti2020; Jessen et al., Reference Jessen, Amariglio, van Boxtel, Breteler, Ceccaldi, Chételat, Dubois, Dufouil, Ellis, van der Flier, Glodzik, van Harten, de Leon, McHugh, Mielke, Molinuevo, Mosconi, Osorio and Perrotin2014). In a recent study examining sources of discrepancy on FAQ in a diverse sample, informants who identify as Black/African American endorsed lower ratings (less impairment) (Hackett et al., Reference Hackett, Mis, Drabick and Giovannetti2020), suggesting there may be some cross-cultural differences that reduce the validity of this measure. As such, FAQ scores were not included for the Jak/Bondi criteria. Further, SCC is not used in this definition.
Number of impaired tests (NIT)
For the NIT definition (Oltra-Cucarella et al., Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018), MCI was diagnosed when the NIT (>1.5 SD below the age-, sex-, education-, and race/ethnicity-adjusted mean) equaled or exceeded the NIT obtained by the worst performing 10% of the CU group (inclusion criteria described in Supplementary Methods). To diagnose MCI, three or more impaired scores were required. Functional impairment and SCC are not included in this definition.
Clinical ratings (Global CR)
For the Global CR approach, MCI was diagnosed according to Alfano and colleagues (Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022). Individual scores were converted to demographically (age-, sex-, education-, race/ethnicity-) corrected T-scores and assigned a CR ranging from 1 (above average; T ≥ 55) to 9 (severe impairment; T ≤ 19), with a CR of 5 indicating “definite mild impairment” (35 < T<39). Domain CRs were derived from the highest individual test CR in a given domain. If both tests in a given domain had different CRs, then domain CR was defined as the higher CR minus one. If not (i.e., both tests had the same CRs), that rating was the domain CR. Global CR was derived from the highest domain CR such that if only one domain has the highest CR, then Global CR was the highest domain CR minus one. If two or more domains had the same highest CR, that rating was the Global CR. MCI was diagnosed if an individual’s Global CR equaled or exceeded a rating of five. Functional impairment and SCC are not included in this definition.
Dementia
Incident dementia of any etiology at follow-up was the main study outcome. DSM-IV criteria were used for dementia diagnosis (APA, 1994), and participants were classified as having incident dementia if (1) there was substantial cognitive impairment – that is, scores at least 1.5 standard deviations below the age-adjusted mean across multiple cognitive domains; (2) the participant or study informant reported changes in cognitive function; (3) there was functional decline determined at a case conference based on information from the IADL Lawton Brody Scale (Lawton & Brody, Reference Lawton and Brody1969) and clinical evaluation; and (4) cognitive impairment was not better explained by the effects of a substance or medication.
Demographic and clinical characteristics
Demographic information from the EAS included self-reported race/ethnicity as defined by the U.S. Census Bureau in 1994 (recategorized to non-Hispanic White, non-Hispanic Black), number of years of education, sex, and age. Subclinical symptoms of depression were assessed using the GDS short form, excluding the memory item (Sheikh & Yesavage, Reference Sheikh and Yesavage1986).
Data analysis
Demographic information, neuropsychological test scores, and all definitions of MCI status at baseline were summarized using descriptive statistics by incident dementia status during follow-up. Preliminary comparisons of these characteristics were performed using the Wilcoxon rank sum test and chi-square test for continuous and categorical variables, respectively. Time to event was defined as the time in years from the baseline clinic visit to the visit when dementia was diagnosed or to the last follow-up visit. Cox proportional hazard models were used to assess the associations of various definitions of MCI at baseline with the risk of incident dementia. Additional models controlling for covariates including age, gender, education, race/ethnicity, and depressive symptoms at baseline were also applied. The modification effects of race/ethnicity on the associations of each MCI definition with the risk of dementia were examined by testing the interaction between each MCI definition and race/ethnicity in the Cox models. To evaluate the accuracy of discriminating those at risk of developing dementia within specific time periods using each MCI definition at baseline, time-dependent sensitivity and specificity were examined at 2, 3, 5, and 7 years, and Youden’s index was calculated. Variations and confidence intervals (CIs) of these statistics were obtained using the bootstrap method (Tibshirani & Efron, Reference Tibhirani and Efron1993) with 1000 resamples and 2.5% and 97.5% percentiles used as limits of the 95% CI. Comparisons of sensitivity, specificity, and Youden’s index among different MCI definitions were performed by obtaining bootstrap confidence intervals of the differences and checking if zero was covered in the intervals. The analysis was further stratified by race group, and comparisons of sensitivity, specificity, and Youden’s index for each MCI definition between racial groups were performed using Z-tests based on the estimates and their variances from the bootstrap method. All statistical analyses were performed using SAS 9.4 (SAS Institute Inc., Cary, NC).
Results
At baseline (N = 1073), participants’ ages ranged from 70 to 100 (mean = 78.4 ± 5.3) years. The sample was 62.5% female, and educational achievement averaged 13.9 ± 3.5 years (range 2–24 years), with 42.6% obtaining 12 years or fewer years of education. Notably, nearly one-fifth of participants (17%) did not complete high school, highlighting the rich educational diversity within the sample. Most participants identified as White (70.0%), though Black participants were well-represented (30.0%). The total number and proportion of participants diagnosed with MCI using each classification method are shown in Table 1. During a mean of 4.5 (median 3.3, maximum 19) years of follow-up, 118 individuals developed incident dementia. As shown in Table 1, those who developed dementia were older at baseline. The groups did not differ in number of depressive symptoms, sex, years of education, race/ethnicity, or follow-up time.
Note. NIT = number of impaired test; CR = clinical ratings.
^ Participants free of prevalent dementia at baseline who had at least one follow-up evaluation, completed neuropsychological tests and self-reported subjective cognitive concern at enrollment and self-reported as non-Hispanic White or non-Hispanic Black.
+ Excluding the dichotomous memory item from the short form of the GDS.
Mean time between baseline and conversion to dementia or end of study participation was 4.5 ± 3.5 years.
Hazard ratios from unadjusted models were similar among the four definitions (Table 2) and all classifications significantly predicted incident dementia (all p <0.001). Results were similar when adjusted for age, sex, education, race/ethnicity (Model 2), and further adjusted for depressive symptoms at baseline (Model 3). Results of time-dependent Receiver Operating Characteristic (ROC) statistics for MCI classifications, at 2-, 3-, 5- and 7-years of follow-up, and the number of participants at risk at the time are shown in Table 3. The tests of interactions between race/ethnicity and MCI on the risk of dementia in the aforementioned Cox models were not statistically significant for any MCI definition. Nonetheless, we further evaluated the time-dependent ROC statistics within each racial/ethnic group in a secondary analysis due to a specific interest in exploring any disparities between groups.
Note. HR = hazard ratio; CI = confidence interval; NIT = number of impaired test; CR = clinical ratings.
Note. MCI = mild cognitive impairment; CI = confidence interval; NIT = number of impaired test; CR = clinical ratings.
The bootstrap method comparing sensitivity and specificity among the MCI classifications demonstrated varying statistical significance in sensitivity and specificity across definitions. Between Jak/Bondi and Petersen definitions, Jak/Bondi classification significantly increased specificity (p <0.05) while there was no significant difference between levels of sensitivity. All other definitions significantly differed in sensitivity and specificity (p <0.05), with Jak/Bondi and Petersen classifications demonstrating more intermediate values of sensitivity and specificity relative to NIT and Global CR classifications.
In the overall sample, Youden’s index, which is determined by the sum of sensitivity and specificity minus 1, was highest for the Jak/Bondi classification for dementia incidence within 2-, 3-, 5-, and 7-years of follow-up, followed by the Global CR classification (Table 3). While NIT had significantly higher specificity than all other definitions, these gains were more than offset by a substantial loss of sensitivity that was significantly lower than all the other definitions. The most extreme definition in the opposite direction, the Global CR classification, had significantly higher sensitivity at all follow-up; however, this was achieved through a significant sacrifice to specificity (e.g., specificity was significantly lower than in all other definitions). Between Jak/Bondi and Petersen/Winblad classification methods, Jak/Bondi classification demonstrated significantly higher specificity than Petersen/Winblad; sensitivity was slightly lower but this was not a significant finding. Importantly, differences in Youden’s index were not statistically significant across definitions as measured by the bootstrap confidence intervals. Specificity, sensitivity, and Youden’s index across MCI classifications and follow-up years for non-Hispanic Blacks and non-Hispanic Whites are shown in Appendix Tables S1a and S1b. Within the non-Hispanic Black group, the Global CR classification showed the highest Youden’s index, though this was not significantly different from other MCI classifications. Comparisons of sensitivity, specificity, and Youden’s index among the MCI definitions within each racial group were overall similar to those of the whole sample. Between the two racial/ethnic groups, MCI based on the Global CR classification had significantly higher sensitivities at 2, 3, 5, and 7 years and lower specificities at 2, 3, and 5 years in the non-Hispanic Black group when compared with the non-Hispanic White group. MCI based on Petersen/Winblad had significantly lower specificities at 2, 3, and 5 years in the non-Hispanic Black group, but differences in sensitivities and specificities for other definitions, and differences in Youden’s index for all definitions, were not statistically significant.
Discussion
The present study investigated four specific approaches to MCI classifications (Petersen, Jak/Bondi, NIT, and Global CR) in an educationally and racially/ethnically diverse cohort of older adults. Overall, we found that the MCI classifications varied significantly in sensitivity and specificity for incident dementia, with Jak/Bondi and Petersen classifications demonstrating more intermediate values of sensitivity and specificity relative to NIT and Global CR classifications, which demonstrated the greatest specificity and greatest sensitivity values, respectively. However, when examining Youden’s index, reflecting the balance between sensitivity and specificity, all four classifications performed comparably. Jak/Bondi MCI classification demonstrated the highest Youden’s index, but this was not significantly different from the other classification methods. Comparisons of sensitivity, specificity, and Youden’s index among the MCI definitions within each racial group were overall similar to those of the whole sample.
This is an important finding as researchers or clinicians could have increased confidence in choosing one classification method over another, depending on their referral or research question of interest. For example, if a group was interested in examining SCC, they might select the Jak/Bondi classification to avoid circularity, as other criteria, such as Petersen/Winblad include SCC in the definition of MCI. The optimal balance between sensitivity and specificity depends upon the research, clinical, or public health setting of intended use. Sensitivity might be a priority if those who screen positive receive additional cognitive testing or low-cost, blood-based biomarkers. Specificity might be a priority if those who meet the MCI definition will receive invasive or expensive follow-up and resources are constrained. Professionals could consider using the cost-benefit matrix as proposed by Manly and Echemendia (Reference Manly and Echemendia2007) when considering whether they require increased sensitivity or specificity and then select their classification method accordingly. Due to the benefits of early detection and treatment for dementia, increased diagnostic sensitivity and consideration of Global CR may be warranted. Moreover, when evaluating non-Hispanic Black or non-Hispanic White individuals, professionals might choose to utilize one definition over another in terms of differing sensitivity or specificity within these groups.
These results also replicate and extend previous findings comparing Jak/Bondi and Petersen/Winblad classifications in non-Hispanic White samples (Bondi et al., Reference Bondi, Edmonds, Jak, Clark, Delano-Wood, McDonald, Nation, Libon, Au, Galasko and Salmon2014; Edmonds et al., Reference Edmonds, Smirnov, Thomas, Graves, Bangen, Delano-Wood, Galasko, Salmon and Bondi2021; Jak et al., Reference Jak, Preis, Beiser, Seshadri, Wolf, Bondi and Au2016; Thomas et al., Reference Thomas, Eppig, Weigand, Edmonds, Wong, Jak, Delano‐Wood, Galasko, Salmon, Edland and Bondi2019; Wong et al., Reference Wong, Thomas, Edmonds, Weigand, Bangen, Eppig, Jak, Devine, Delano-Wood, Libon, Edland, Au and Bondi2018). In our overall sample, the Jak/Bondi definition significantly increased specificity (decreased false positives) while maintaining comparable sensitivity when compared to Petersen/Winblad. These findings suggest Petersen/Winblad may result in more false positive MCI diagnoses. This is consistent with literature demonstrating that (1) defining cognitive impairment based on a single impaired score does not account for normal variability and may inflate MCI prevalence rates (Loewenstein et al., Reference Loewenstein, Acevedo, Small, Agron, Crocco and Duara2009; Trittschuh et al., Reference Trittschuh, Crane, Larson, Cholerton, McCormick, McCurry, Bowen, Baker and Craft2011), and (2) the subjective cognitive concern criterion in Petersen/Winblad is not additive to predicting incident dementia and may contribute to misdiagnosis of MCI (Chang et al., Reference Chang, Wang, Nester, Katz, Byrd, Lipton and Rabin2023 in press; Edmonds et al., Reference Edmonds, Delano‐Wood, Clark, Jak, Nation, McDonald, Libon, Au, Galasko, Salmon and Bondi2015, Reference Edmonds, Weigand, Thomas, Eppig, Delano-Wood, Galasko, Salmon and Bondi2018; Ilardi t al., Reference Ilardi, Chieffi, Iachini and Iavarone2021).
Although the NIT classification (Oltra-Cucarella et al., Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018) performed well in the ADNI sample, its criteria appeared to be too stringent for our participants. In our study, the specificity for incident dementia was highest using the NIT approach, ranging from 0.86 to 0.91; however, sensitivity for incident dementia was substantially reduced (0.48–0.54), resulting in a failure to identify almost half of the individuals who ultimately progressed to dementia. It is important to note certain methodological differences between the studies of interest. Oltra-Cucarella and colleagues’ original approach was previously criticized for possibly inflating the number of impaired scores by using multiple scores from the same test (Vuoksimaa et al., Reference Vuoksimaa, McEvoy, Holland, Franz and Kremen2020). For example, if delayed recall of a list-learning test is impaired, it is possible that recognition may also be impaired. To avoid this possibility, the current study used 10 separate measures and their corresponding scores. To diagnose MCI, three or more impaired scores were required – and notably a higher cutoff than the two impaired tests required for Jak/Bondi classification. Oltra-Cucarella and colleagues (Reference Oltra‐Cucarella, Sánchez‐SanSegundo, Lipnicki, Sachdev, Crawford, Pérez‐Vicente, Cabello‐Rodríguez and Ferrer‐Cascales2018) also only adjusted for age, sex, and education, while the current study additionally adjusted for race/ethnicity. Lastly, compared to our sample of demographically diverse adults, over 90% of ADNI participants are non-Hispanic White and have a mean education corresponding to a four-year university degree. Further study using non-overlapping measures and diverse populations is needed to fully understand the possible utility of this approach.
The Global CR approach (Alfano et al., Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022) demonstrated good sensitivity for incident dementia, possibly supporting the authors’ claim that this approach may parallel the profile analysis strategies in the clinical decision-making process (Alfano et al., Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022). However, there was a substantial loss of specificity, resulting in false positives. The authors acknowledged the risk of Type I error due to the relatively high number of comparisons that occur throughout the classification’s formulation (Alfano et al., Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022). Important to consider is the scale used by Alfano and colleagues to convert T-scores to their respective clinical ratings – wherein a T-score between 35 and 39 (equivalent to -1.0 to -1.5 SD) is converted to a clinical rating of 5 and labeled as “definite mild impairment,” despite some of these T-scores falling within the “low average” range. Because the Global CR approach places greater importance on CRs that are in the “impaired” range (Alfano et al., Reference Alfano, Grummisch, Gordon and Hadjistavropoulos2022), it is possible the current conversion scale may misdiagnosis individuals who at a cognitive unimpaired baseline may fall within the “low average” range with “definite mild impairment.” Within the clinical decision-making process are behavioral observations and the interpretation of a low score. In this case, sole reliance on quantitative data and labeling may omit important contextual information for interpreting whether an “abnormal” result truly reflects impairment (Ganguli et al., Reference Ganguli, Snitz, Saxton, Chang, Lee, Vander Bilt, Hughes, Loewenstein, Unverzagt and Petersen2011). That said, because of its high sensitivity to incident dementia, clinicians may still choose to consider using the Global CR approach if the referral question requires greater sensitivity. Future work could improve this classification by revising the conversion scale to flag performance more appropriately as “impaired.”
To our knowledge, this was the first longitudinal study directly comparing four specific MCI criteria in a diverse community-based sample. Unfortunately, although non-Hispanic Blacks were well-represented, the population of Hispanic older adults or those of other racial/ethnic groups was insufficient to include in these analyses. There was a relatively low number of participants with incident dementia at follow-up and a lower proportion of non-Hispanic Blacks who converted compared to non-Hispanic Whites. Also, we used normative data derived from our study. We previously showed that the EAS sample performs more poorly relative to national normative data even after adjusting for demographic variables (Wang et al., Reference Wang, Katz, Chang, Qin, Lipton, Zwerling, Sliwinski, Derby, Rabin and Abner2021). It will be important to replicate these results in larger, demographically diverse samples.
Study limitations related to our test battery also warrant mention. For example, because we did not have Logical Memory II data available, we included Logical Memory I, which is not a pure measure of episodic memory and can be impacted by attentional and working memory abilities. In addition, to enable us to include a visuospatial domain for the Jak/Bondi criteria, we utilized the Digit Symbol task as a task of visuospatial functioning. While it is typically considered a measure of psychomotor processing speed, previous research has demonstrated that the Digit Symbol test is sensitive to visuoperceptual and spatial rotation variables (Bigelow et al., Reference Bigelow, Semenov, Trevino, Ferrucci, Resnick, Simonsick, Xue and Agrawal2015; Crowe et al., Reference Crowe, Benedict, Enrico, Mancuso, Matthews and Wallace1999; Glosser et al., Reference Glosser, Butters and Kaplan1977; Shao et al., Reference Shao, Wang, Zhang, Zhen, Dong, Tian and Yu2023), and is most strongly related to spatial navigation when compared to other cognitive measures (Wei et al., Reference Wei, Anson, Resnick and Agrawal2020). However, performance on this task may still be impacted by other domains (e.g., working memory, processing speed, motor function) making it suboptimal as a pure measure of visuospatial functioning.
In sum, although not statistically significant, the Jak/Bondi MCI classification provided the highest balance of sensitivity and specificity for predicting incident dementia across all years of follow-up considered. This classification method significantly decreased false positives while maintaining comparable sensitivity to the Petersen classification. Clinicians and researchers should consider designing their neuropsychological batteries to allow for the utilization of this classification method. Beyond this, clinicians and researchers may have increased confidence in using one classification method over another as they were comparable to Youden’s index for predicting incident dementia. This work provides an important step toward improving the generalizability of the MCI diagnosis to underrepresented or health-disparate populations and can inform researchers and clinicians on the most appropriate MCI classification method for their populations or questions of interest. Importantly, this research can lead to meaningful discussions among experts to arrive at an optimal MCI definition that promotes diagnostic accuracy for incident dementia in diverse older adult populations.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617724000729.
Acknowledgments
We thank the EAS participants, investigators, and staff for their contribution to this study.
Financial support
This work was supported by the National Institute on Aging at the National Institutes of Health (grant numbers P01 AG03949, R21 AG056920), the Leonard and Sylvia Marx Foundation, and the Czap Foundation.
Competing interests
The authors report there are no conflict-of-interest disclosures.