Background
Various assessment tools are available for screening cognitive impairment or dementia. The most commonly used tests directly assess cognition via questions or ‘pencil and paper’ tasks (Harrison, Noel-Storr, Demeyere, Reyish, & Quinn, Reference Harrison, Noel-Storr, Demeyere, Reyish and Quinn2016a). These direct assessments provide a ‘snapshot’ of cognitive function that does not capture change in cognition, yet cognitive deterioration is a fundamental component of dementia diagnosis. In addition, direct assessments are often compromised, or not possible, in various acute secondary care settings (Elliott et al., Reference Elliott, Drozdowska, Taylor-Rowan, Shaw, Cuthbertson and Quinn2019). There is a need, therefore, to identify measures that can provide an alternative to traditional ‘direct’ cognitive screening methods.
An attractive approach is to assess cognition using informant-based interview tools. Through this method, a patient's close relative or friend (i.e. informant) is used to indirectly identify a temporal change in patients' cognition and related function.
There are several informant tools available that are used in practice, such as the informant questionnaire on cognitive decline in the elderly (IQCODE) (Jorm & Jacomb, Reference Jorm and Jacomb1989), the eight-item interview to ascertain dementia (AD8), (Galvin et al., Reference Galvin, Roe, Powlishta, Coats, Much, Grant and Morris2005) and the general practitioner assessment of cognition (GPCOG) (Brodaty et al., Reference Brodaty, Pond, Kemp, Luscombe, Harding, Berman and Huppert2002). Current guidelines recommend the use of structured informant interviews for cognitive assessment, but do not recommend a particular tool in preference to others (NICE, 2020).
A number of systematic reviews have attempted to establish the diagnostic accuracy of informant-based tools in order to inform best tool selection (Harrison et al., Reference Harrison, Fearon, Noel-Storr, McShane, Stott and Quinn2014, Reference Harrison, Fearon, Noel-Storr, McShane, Stott and Quinn2015, Reference Harrison, Noel-Storr, Demeyere, Reyish and Quinn2016a, Reference Harrison, Stott, McShane, Noel-Storr, Swann-Price and Quinn2016b; Quinn et al., Reference Quinn, Fearon, Noel-Storr, Young, McShane and Stott2014). However, this rapidly growing literature may be overwhelming for clinicians and decision-makers, and to date has only considered available tools in isolation, precluding an answer to the question: which tool is best?
Novel evidence synthesis techniques (Owen, Cooper, Quinn, Lees, & Sutton, Reference Owen, Cooper, Quinn, Lees and Sutton2018) allow for comparative assessment and are well suited to the analysis of the accuracy of the various informant tools. A synthesis of published systematic reviews, i.e. an overview of systematic reviews, combined with a comparative summary could help to concisely summarise the broader evidence-base, improving clinicians' and policy makers' ability to select or recommend tools for cognitive assessment.
Aims and objectives
We performed an overview of systematic reviews to draw together results from systematic reviews of the diagnostic properties of informant-based cognitive screening tools.
Our primary question was: what is the comparative accuracy of informant-based screening tools for identifying cognitive impairment or dementia?
Secondary objectives
Where possible, we used this overview of systematic reviews to inform a number of secondary objectives:
-
To determine variability in informant tool diagnostic test accuracy across various settings and cognitive syndromes.
-
To evaluate the quality of systematic reviews of diagnostic test accuracy research such that common methodological issues can be highlighted, and standards improved.
-
To produce an ‘evidence map’ that reveals gaps in the evidence-base where new primary research is needed.
Methods
Design
We used the PRISMA (preferred reporting for systematic review and meta-analysis) checklist for reporting in this overview of systematic reviews (see online Supplementary materials e-1).
Design, conduct and interpretation of overviews of systematic reviews are evolving; we followed recent best practice guidance (Higgins et al., Reference Higgins, Thomas, Chandler, Cumpston, Li, Page and Welch2019; McKenzie & Brennan, Reference McKenzie and Brennan2017).
All aspects of searching, data extraction and review assessment were performed by two reviewers independently, with recourse to a third arbitrator where disagreement could not be resolved.
A detailed description of our methodology is found in the previously published protocol (Taylor-Rowan, Nafisi, Patel, Burton, & Quinn, Reference Taylor-Rowan, Nafisi, Patel, Burton and Quinn2020). A summary of our methodology is provided in the sections below.
Inclusion and exclusion criteria
We included systematic reviews that investigated the diagnostic properties (test accuracy) of an informant-based cognitive screening tool. We included reviews conducted in any setting or patient population. We operationalised the settings in which informant tools are used as secondary care, primary care and community. We made no exclusions on the basis of methodological quality, use of best practice methods, or approach to data synthesis.
Reviews were excluded if they exclusively reported on the diagnostic test accuracy of telephone-based assessment, prognostic accuracy, or ‘functional’ informant tools that measure the ability to perform activities of daily living, rather than cognition per se. We also excluded non-English reviews.
Search methods for identification of reviews
We searched EMBASE (OVID); Health and Psychosocial Instruments (OVID); Medline (OVID); CINAHL (EBSCO); PSYCHinfo (EBSCO) and the PROSPERO registry of review protocols. All databases were searched from inception to December 2019. Search syntax is provided in Supplementary materials e-2.
We additionally contacted authors working in the field of dementia test accuracy to identify other relevant systematic reviews, and studied reference lists of all included reviews in order to identify additional titles not found by our search (Greenhalgh & Peacock, Reference Greenhalgh and Peacock2005).
Data collection and analysis
Title selection and data extraction
Titles were screened using Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia, available at www.covidence.org. Data were extracted on to a data collection proforma that was specifically designed by the author team (see Supplementary materials e-3).
Assessment of methodological and reporting quality of included reviews
The methodological quality of included reviews was evaluated using a modified version of the AMSTAR-2 (assessment of multiple systematic reviews) measurement tool (Shea et al., Reference Shea, Reeves, Wells, Thuku, Hamel, Moran and Henry2017) which considered the following key domains: clarity of review objective; description of study eligibility criteria; extent of searching undertaken; transparency of assessment process; assessment of publication bias; and assessment of heterogeneity. Overall study quality conclusions were established based on guidance from Shea et al. (Reference Shea, Reeves, Wells, Thuku, Hamel, Moran and Henry2017). However, as this guidance is based on the reviews of healthcare interventions, we modified the critical domains to include only: adequacy of the literature search (item 4); risk of bias from individual studies included in the review (item 9); appropriateness of meta-analytical methods (item 11); and consideration of the risk of bias when interpreting the results of the review (item 13) (see Supplementary materials e-4).
AMSTAR-2 assessment was complemented with an evaluation of reporting standards of included reviews, utilizing the PRISMA-DTA (Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies) checklist (McInnes et al., Reference McInnes, Moher, Thombs, McGrath, Bossuyt and Grp2018).
Data synthesis
We extracted data for analyses directly from original papers identified within respective reviews. We calculated summary estimates for each informant questionnaire using the bivariate approach (Reitsma et al., Reference Reitsma, Glas, Rutjes, Scholten, Bossuyt and Zwinderman2005). Where suitable data (defined below) were available, we then conducted comparative analyses, creating a network where each questionnaire at a particular threshold score is a node and inferences around relative test performance can be made through indirect comparison and ranking. We used a bivariate network meta-analysis model accounting for the correlations between multiple test accuracy measures from the same study (O'Sullivan, Reference O'Sullivan2019; Owen et al., Reference Owen, Cooper, Quinn, Lees and Sutton2018). All models were estimated in a Bayesian framework using Markov Chain Monte Carlo (MCMC) simulation and implemented in WinBUGS 1.4.3 software (Lunn, Thomas, Best, & Spiegelhalter, Reference Lunn, Thomas, Best and Spiegelhalter2000). Non-informative prior distributions were specified for test and threshold-specific accuracy parameters. Informant-based screening tools with the highest sensitivity and specificity were ranked in first place at each MCMC iteration. The estimated rankings overall were calculated as a summary of the individual ranks at each iteration. The probability that each screening tool was the best overall was calculated as the proportion of MCMC iterations that each informant tool ranked in the first place. Further details on the analyses used are available in the original paper describing the method (Owen et al., Reference Owen, Cooper, Quinn, Lees and Sutton2018).
We only included studies that evaluated informant tool test accuracy against a diagnostic standard consistent with recognised criteria for diagnosis of dementia or mild cognitive impairment (MCI) (e.g. ICD-10, DSM III–V). We attempted meta-analysis where informant tools were assessed in at least two studies. Case-control studies were excluded due to the potential to over inflate test accuracy. For our primary analysis, we restricted the analysis to the cut-points that were most regularly used and of most clinical relevance (3.3 and 3.6 for IQCODE; 2 and 3 for AD8). As our primary question was to evaluate the accuracy of tools as measures of cognitive impairment or dementia (all-inclusive), we did not discriminate between the forms of cognitive impairment evaluated in included studies. However, where single studies provided sensitivity and specificity data for multiple forms of cognitive screening (e.g. sensitivity/specificity values for screening of dementia v. no dementia and sensitivity/specificity values for screening ‘any cognitive impairment’ v. normal cognition), we selected one reported sensitivity and specificity figure based on the following hierarchy: ‘any cognitive impairment v. normal cognition’ > ‘dementia v. no dementia’ > ‘MCI’ v. normal cognition’.
We employed GRADE (Grading of recommendations assessment, development and evaluation) (Guyatt et al., Reference Guyatt, Oxman, Vist, Kunz, Falck-Ytter, Alonso-Coello and Schunemann2008) to evaluate the overall strength of sensitivity and specificity evidence for each tool in our meta-analysis, following recommended guidelines on the application of GRADE to diagnostic test accuracy evidence (Singh, Chang, Matchar, & Bass, Reference Singh, Chang, Matchar and Bass2012).
Subgroup analysis
In addition to our primary analysis, we conducted subgroup analyses designed to provide specific data on the performance of tools when used to screen for cognitive syndromes of differing severity and when used in particular settings. Specifically, we evaluated the performance of respective informant tools when used to differentiate between people with and without dementia (dementia v. no dementia) and between people with MCI and normal cognition (MCI v. normal cognition). For each analysis, we sub-grouped by setting (primary care, secondary care and community care), where possible.
Sensitivity analysis
We conducted a sensitivity analysis restricting to studies that had no high risk of bias categories and at least 50% low risk of bias categories (based on individual study level data within the included review).
Method for generation of evidence map
In addition to our search for relevant reviews, we identified individual (i.e. non-review) informant-based diagnostic test accuracy studies to generate an ‘evidence heat-map’.
Search strategy for evidence map
We accessed referenced studies in included reviews and supplemented this with a search of study reference lists and, where provided, review exclusion lists for further available studies.
Inclusion/exclusion criteria for evidence map
To be included in the evidence heat-map, individual studies could be either cohort or case-control, but were required to be published in a peer-reviewed scientific journal and report on the diagnostic test accuracy (i.e. sensitivity and specificity) of an informant tool. We included non-English papers in our evidence heat-map, but studies were excluded if they reported participant numbers <20; were abstracts; were repeat data sets; assessed prognostic diagnostic test accuracy; described a ‘functional’ informant measure only (e.g. independent activities of daily living scale); or if the informant tool was completed by patients rather than informants.
The extent of available evidence was depicted via a shading scheme ranging from dark (0–10 studies; limited evidence), to light (>40 studies; substantial evidence).
Results
Our search identified 4865 titles. After screening, we found 25 reviews (including 93 studies) that met our inclusion criteria (see Table 1). Details of the screening process and reasons for each exclusion is provided in Supplementary materials e-5.
IQCODE, Informant Questionnaire on Cognitive Decline in the Elderly; AD8, 8-item interview to Ascertain Dementia; CIDS, Concord informant dementia scale; DECO, Deterioration cognition observe; B-ADL, Bayer Activities of Daily Living scale; DQ, Dementia Questionnaire; SDS, Symptoms Dementia Screener; SMQ, Short Memory Questionnaire; GPCOG, General Practitioner Assessment of Cognition; BCS, Brief Cognitive Scale; PAS, Psychogeriatric Assessment Scale; FAQ, Functional Activities questionnaire; IADL, Instrumental activities of daily living; BDS, Blessed dementia rating scale; KDSQ, Korean Dementia Screening Questionnaire.
a Diagnostic tests accuracy properties of informant tool are not described in the review.
b Informant tool designed to measure activities of daily living rather than cognition per se.
Summary of reviews' findings
Thirteen informant-based assessment tools were discussed in included reviews. The diagnostic test accuracy properties of 11 of these tools were described. Each reviewed tool is presented below.
IQCODE
The most comprehensively assessed informant tool was the IQCODE, which was included in 18 reviews and 52 original studies. Five distinct versions of the IQCODE were described based on the number of component question items (IQCODE-32, IQCODE-26, IQCODE-16, IQCODE-17 and IQCODE-7); the most commonly used versions were the 26-item and the 16-item adaptation.
Pooled estimates of IQCODE accuracy for dementia diagnosis ranged from sensitivity 80% to 91% and specificity 66% to 85%. Review evaluations of IQCODE diagnostic test accuracy studies suggested that study quality was generally poor. In Cochrane reviews, (Harrison et al., Reference Harrison, Fearon, Noel-Storr, McShane, Stott and Quinn2014, Reference Harrison, Fearon, Noel-Storr, McShane, Stott and Quinn2015; Quinn et al., Reference Quinn, Fearon, Noel-Storr, Young, McShane and Stott2014) just 2/25 IQCODE studies were judged to have no high risk of bias categories. Typical issues were around lack of blinding and unnecessary patient exclusions – particularly removal of those who may benefit most from an informant-based assessment (e.g. patients with comorbidities that make traditional cognitive assessments challenging).
AD8
The AD8 was assessed in five reviews (20 studies). Pooled sensitivity rates for dementia diagnosis ranged from 88% to 97% and pooled specificity rates ranged from 64% to 81%. Cochrane review evaluations (Hendry et al., Reference Hendry, Green, McShane, Noel-Storr, Stott, Anwer and Quinn2019) determined that 4/10 AD8 studies had no high risk of bias categories. Areas of study limitation were around inadequate reporting, inappropriate exclusions of participants, and high participant drop-out rates due to the inability to complete tests.
GPCOG
The GPCOG was evaluated in six reviews, describing five distinct studies.
All but two reviews evaluated the diagnostic test accuracy of the GPCOG based on the evidence of just one ‘fair quality’ study (Lin, O'Connor, Rossom, Perdu, & Eckstrom, Reference Lin, O'Connor, Rossom, Perdu and Eckstrom2013). A more recent review (Tsoi, Chan, Hirai, Wong, & Kwok, Reference Tsoi, Chan, Hirai, Wong and Kwok2015) evaluated five GPCOG studies and reported a pooled sensitivity of 92% and specificity of 87%. However, the risk of bias was substantial (25% of studies rated high risk of bias in three out of four domains). Unlike most other informant tools, the GPCOG has a combined patient and informant assessment. When the informant component of the GPCOG was used in isolation, it appeared to have poor specificity (49–66%) (Kansagara & Freeman, Reference Kansagara and Freeman2010).
Other informant-based assessment tools
Ten additional informant tools were described in at least one included review. A summary of the diagnostic test accuracy evidence for each is provided in Table 2.
DECO, Deterioration cognition observe; BDS, Blessed dementia rating scale; CIDS, Concord informant dementia scale; SMQ, Short Memory Questionnaire; PAS, Psychogeriatric Assessment Scale; DQ, Dementia Questionnaire; KDSQ, Korean Dementia Screening Questionnaire; BCS, Brief Cognitive Scale; SDS, Symptoms Dementia Screener.
Network meta-analysis
From each review, we identified a total of 37 suitable studies (11 052 participants) to evaluate the comparative performance of respective tools. One study (Jorm et al., Reference Jorm, Broe, Creasey, Sulway, Dent, Fairley and Tennant1996) provided direct (within the study) comparative data on the IQCODE-26 and IQCODE-16; two studies (Jackson, MacLullich, Gladman, Lord, & Sheehan, Reference Jackson, MacLullich, Gladman, Lord and Sheehan2016; Razavi et al., Reference Razavi, Tolea, Margrett, Martin, Oakland, Tscholl and Galvin2014) provided direct comparative data on IQCODE-16 and AD8. All other studies provided test accuracy properties of single informant tools in isolation, meaning indirect (between study) comparisons were predominant in our network meta-analyses.
Primary analysis
Our primary network meta-analysis examined the performance of informant tools as measures of cognitive impairment or dementia (all-inclusive). Only three informant tools had sufficient data for comparative analysis (IQCODE-26; IQCODE-16 & AD8).
Results suggest AD8 at cut-point 2 may have the highest sensitivity [90%; 95% credible intervals (CrI) = 82–95; ‘best test’ probability = 36%] for detecting cognitive impairment or dementia, although there was little difference between AD8 at cut point 2, AD8 at cut point 3 and IQCODE-16 at cut point 3.6 with probability best of 36%, 23% and 22% respectively. IQCODE-26 at cut-point 3.6 may have the highest specificity (81%; 95% CrI = 66–90; ‘best test’ probability = 29%), although again there was little difference between IQCODE-26 at cut-point 3.6, IQCODE-16 at cut point 3.6, and IQCODE-16 at cut point 3.3 with probability best of 29% 26% and 17%, respectively. We noted that two studies (de Jonghe, Reference de Jonghe1997; Jackson et al., Reference Jackson, MacLullich, Gladman, Lord and Sheehan2016) were conducted in distinct populations (delirious and depressed, respectively) that could alter diagnostic test accuracy properties. We, therefore, conducted an additional sensitivity analysis, removing these two studies. Results were unchanged (see Supplementary materials e-6).
Comparative performance for each tool at respective cut-points is provided in Table 3.
IQCODE, Informant Questionnaire on Cognitive Decline in the Elderly; AD8, 8-item interview to Ascertain Dementia.
Subgroup analysis
We evaluated the performance of tools when screening for a specific cognitive syndrome in a particular setting. Sufficient data for pooling in this subgroup analysis were only available for respective tools at certain cut-points (see Table 4).
IQCODE, Informant Questionnaire on Cognitive Decline in the Elderly; AD8, 8-item interview to Ascertain Dementia.
Comparative data on tool performance for ‘dementia v. no dementia’ screening suggest that the AD8 at cut-point 2 may have the highest sensitivity for dementia in both secondary care (96%; 95% CrI = 72–99; ‘best test’ probability = 76%) and community settings (86%; 95% Crl = 64–95; ‘best test’ probability = 48%). IQCODE-16 at cut point 3.3 had the greatest specificity for dementia assessment in secondary care (71%; 95% Crl = 35–93; ‘best test’ probability = 73%) while IQCODE-26 at cut-point 3.6 had the highest specificity (93%; 95% CrI = 81–98; ‘best test’ probability = 90%) in the community.
Comparisons of general tool performance across settings suggest that the sensitivity of each tool is consistently higher when used in the secondary care setting than when used in the community (secondary care sensitivity range: 82–96%; community care sensitivity range: 68–86%), whereas specificity is comparatively reduced (secondary care specificity range: 39–71%; community care specificity range:71–93%).
There were insufficient studies to compare tool performance when used in primary care or for assessing MCI v. normal cognition.
Risk of bias sensitivity analysis
We evaluated reported rates when restricted to studies deemed to be at lower risk of bias. Seven studies were available in total; however, there was too much heterogeneity to pool data, hence individual study findings were assessed (Supplementary materials e-6). The general trend of informant tool performance was consistent with our pooled analyses.
Strength of overall evidence
Our GRADE rating of the strength of the IQCODE and AD8 diagnostic test accuracy evidence was ‘low’ for sensitivity and specificity of both tools, primarily due to the risk of bias present in included studies and the imprecision apparent in our pooled rates (see Supplementary materials e-7).
Overview of systematic reviews – evaluation of review methodological and reporting quality
Our AMSTAR-2 evaluations highlighted a number of methodological issues in included reviews. Overall review quality was mixed: 8/25 (32%) reviews were ‘critically low’ quality; 6/25 (24%) reviews were rated moderate and 3/25 (12%) were high methodological quality. All reviews rated moderate or above were conducted from 2010 onwards (see online Supplementary materials for AMSTAR-2 evaluation, e-8). All reviews performed a comprehensive search and study inclusion criteria were generally adequately explained. However, a number of reviews did not perform the systematic search and/or conduct data extraction in duplicate via two independent investigators (9/25; 36%); errors in data extraction were frequent, and very few reviews pre-registered a protocol (5/25; 20%).
Meta-analyses were performed in 11/25 (44%) reviews and appropriate statistical methods were used in each – although it was common for reviews to include case-control studies in pooled analyses, potentially exaggerating diagnostic test accuracy (Higgins et al., Reference Higgins, Thomas, Chandler, Cumpston, Li, Page and Welch2019).
The risk of bias was not adequately investigated in 9/25 (36%) reviews. Where a risk of bias assessment was conducted, conclusions regarding individual studies were often contrasting. For instance, Chen et al. (Reference Chen, Sun, Yeh, Liu, Huang, Kuo and Huang2017) rated all seven included AD8 studies to be ‘high quality’, identifying no high risk of bias domains in any study; Hendry et al. (Reference Hendry, Green, McShane, Noel-Storr, Stott, Anwer and Quinn2019) rated 4/7 of the same studies to have at least one high risk of bias domain. No reviews conducted a sensitivity analysis gauging the impact of high risk of bias studies upon reported pooled results, and only one review (Chen et al., Reference Chen, Sun, Yeh, Liu, Huang, Kuo and Huang2017) investigated possible publication bias.
Evaluation of reporting standards via PRISMA-DTA revealed main issues around explicit statements of objectives [12/25 (48%) studies], describing information sources in adequate detail [12/25 (48%) studies] and reporting sufficient details of test accuracy from individual included studies [11/25 (44%) studies].
Evidence map findings
A total of 93 distinct informant tool studies were identified and diagnostic test accuracy properties were described across a range of settings and populations (Fig. 1). Our findings suggest that IQCODE and AD8 have a greater evidence-base than other available tools, but there is a lack of diagnostic test accuracy evaluations in primary care and specialised populations (e.g. stroke). References of included papers, along with the risk of bias judgements for each included study are provided in Supplementary materials (e-9).
Discussion
Comparative evidence for available tools
At least 13 informant tools for cognitive assessment are available, although there is a lack of evidence to justify the use of all but two of these tools: the IQCODE and the AD8. The reviewed literature suggests that both tools have reasonable diagnostic test accuracy for assessment of cognitive impairment or dementia, comparable with other popular cognitive screening tools such as the mini-mental state examination and Montreal cognitive assessment (Tsoi et al., Reference Tsoi, Chan, Hirai, Wong and Kwok2015). Our network meta-analysis indicates the AD8 may be the more sensitive of the two tools, and the IQCODE the more specific; however, the CrI were overlapping and estimates of ‘best test’ probability were close for both sensitivity and specificity, implying little performance difference between respective tools. The overall strength of the available evidence was also low according to our GRADE evaluation, tempering conclusions.
Our findings highlight that the general performance of each tool is variable and typically lower than originally suggested by the developers (Galvin et al., Reference Galvin, Roe, Powlishta, Coats, Much, Grant and Morris2005; Jorm & Jacomb, Reference Jorm and Jacomb1989). Moreover, although both tools appear capable of screening for dementia, test performance may vary by setting. When used in specialised secondary care settings, where specificity may be the preferred property, at traditional clinical thresholds neither tool appears well suited to differentiating patients with dementia from those with mild or age-related cognitive changes. Although the IQCODE-16 demonstrated a reasonable specificity of 73% in secondary care at cut point 3.3, this value was inconsistent with the suggested performance (57%) of the longer IQCODE-26 at a cut point (3.6) that prioritises specificity; thus, this may be an example of study bias exaggerating tool performance. Specificity may be comparatively higher in community settings. However, in this setting, sensitivity may be the preferred property.
We, therefore, suggest that neither informant tool is well suited for use as a solitary cognitive screening tool. However, these tools can still be useful as solitary assessments in instances where patients are unable or unwilling to complete a more direct test; thus, where clinicians seek to employ an informant tool, selection of the IQCODE or AD8 should be guided by a desire for sensitivity or specificity. The AD8 at cut point 2 will likely provide the greatest sensitivity, while the IQCODE-26 at cut point 3.6 will provide the greatest specificity.
It is important to emphasise that our analyses were designed to assess test accuracy only. Other properties are also important for consideration when selecting an appropriate tool for cognitive screening. Feasibility, inter-rater reliability, responsiveness to change, and suitability for use in special populations are all-important test characteristics that may influence the selection of one test over another in clinical practice. Although it is beyond the scope of this review to discuss each respective tool in these terms, we encourage further research on this topic to supplement the test accuracy finding we present here.
The state of diagnostic test accuracy literature
Previous overviews of systematic reviews have highlighted significant issues with regards to review methodological quality (Arevalo-Rodriguez et al., Reference Arevalo-Rodriguez, Segura, Sola, Bonfill, Sanchez and Alonso-Coello2014). We similarly found prevalent methodological issues, but also some promising signs.
In contrast to previous diagnostic test accuracy overviews of systematic reviews, the majority of our included reviews conducted formal risk of bias assessments and the higher quality reviews were all conducted within the previous decade, suggesting increasing standards.
However, that risk of bias assessments was inconsistent across reviews indicates a poor understanding of the ways in which a diagnostic test accuracy study design can introduce bias. Existing risk of bias assessment tools typically requires investigators to tailor presented questions to the topic of interest. The robustness of this modification process is heavily impacted by the amount of experience investigators have in the topic area; thus, subjectivity influences the process of assessing the risk of bias even when formal rating tools are operationalised. Furthermore, study bias is generally under-considered when results are discussed: conclusions and recommendations are frequently made in reviews without full exploration of the potential impact biased studies may have had on pooled results. Clinicians should be mindful of these limitations when consuming the evidence provided in a review.
Gaps in the evidence-base
Our evidence map highlights the main areas in which informant tool test accuracy studies are a priority. Primary care has comparatively little evidence to other healthcare settings despite being arguably the most important location for cognitive screening or triage (Quinn et al., Reference Quinn, Fearon, Noel-Storr, Young, McShane and Stott2014). Similarly, informant tool diagnostic test accuracy evaluations are lacking in specialised populations that typically struggle with more traditional cognitive tests (e.g. stroke populations). We would therefore encourage further research to determine the accuracy of available informant tools in these populations.
Future directions
Although our data suggest that informant tools may not generally be suitable as solitary screening tools, they may have utility when combined with direct screening tests. Most available evidence suggests that direct and informant tools perform better when used together (e.g. Narasimhalu, Lee, Auchus, & Chen, Reference Narasimhalu, Lee, Auchus and Chen2008; Srikanth et al., Reference Srikanth, Thrift, Fryer, Saling, Dewey, Sturm and Donnan2006; Tew, Ng, Cheong, & Yap, Reference Tew, Ng, Cheong and Yap2015). Thus, informant tools may make ideal supplements to the standard cognitive assessment, yet no reviews exist on this topic.
This type of evaluation is very much needed if we are to confirm the value of a dual (i.e. direct and informant) approach to assessment. It is important to note that available tests (both direct and informant) typically cover varying cognitive domains (Cullen, O'Neill, Evans, Coen, & Lawlor, Reference Cullen, O'Neill, Evans, Coen and Lawlor2007); hence, the best combinations of tests may change dependent upon the types of cognitive problems that are present in a given population.
Strengths and limitations
We have conducted a comprehensive overview of systematic reviews that brings together the findings of 25 distinct reviews, depicts an extensive evidence map, and employs new statistical techniques that allow formal statistical comparisons, ranking, and ‘best test’ probability estimates between informant tools – addressing a major limitation of this literature.
However, our overview of systematic reviews has some limitations. First, the CrIs in our network meta-analysis are wide for our specificity estimates and most included studies are at risk of bias; hence, resultant rankings should not be viewed as definitive and uncertainty in these estimates should be considered.
Second, our comparisons between tools are overwhelmingly based on indirect comparisons, reliant upon statistical control for random variations in populations – although our findings are strengthened by consistency with those studies that directly compared to the IQCODE and AD8 within the same participant pool (Jackson et al., Reference Jackson, MacLullich, Gladman, Lord and Sheehan2016; Razavi et al., Reference Razavi, Tolea, Margrett, Martin, Oakland, Tscholl and Galvin2014).
Third, due to limited study numbers, we were unable to conduct some of our pre-specified analyses, such as evaluations of tool performance in primary care settings.
Finally, our evidence map is restricted to studies referenced in published systematic reviews; thus, there are some recently published studies and informant tools which have not been reviewed, such as the recently developed quick dementia rating system (Galvin, Reference Galvin2015), that do not feature.
Conclusion
Our findings suggest that only the IQCODE and AD8 have had their diagnostic test accuracy properties widely evaluated. Based on available data, the AD8 at cut point 2 may be the most sensitive available tool for detecting cognitive impairment or dementia, while the IQCODE-26 at cut point 3.6 is the most specific. However, there is little evidence to suggest an important difference in tool performance overall, and neither tool performs well enough to be used alone for dementia assessment. Further evaluations of test accuracy in primary care and specialised populations are a priority.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291721002002.
Acknowledgements
The authors thank the Cochrane test accuracy methods group; and contributors to CRSU workshops.
Author contributions
TQ conceived the idea. MT and TQ designed the study and drafted the manuscript. SN and RD were the second and third reviewers on the paper. JB dealt with disagreements between reviewers. RO performed statistical analysis for the review. AP contributed to data interpretation and writing. MT is the guarantor and all authors have read and commented on the final draft.
Financial support
This work is funded by the National Institute of Health Research. The funders played no part in the conduct of this review.
Conflict of interest
Dr Owen is a member of the NICE Technology Appraisals Committee and the NICE Decision Support Unit (DSU). Dr Owen has served as a paid consultant to the pharmaceutical industry, not in relation to this research.
Ethical standards
Ethics approval and consent to participate not required.