Introduction
The reduction in opportunities for in-person cognitive assessments during the COVID-19 pandemic has accelerated research efforts into feasibility and reliability of remote cognitive testing, including assessment through video teleconference (VTC) (Bilder et al., Reference Bilder, Postal, Barisa, Aase, Cullum, Gillaspy, Harder, Kanter, Lanca, Lechuga, Morgan, Most, Puente, Salinas and Woodhouse2020; Geddes et al., Reference Geddes, O’Connell, Fisk, Gauthier, Camicioli and Ismail2020; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020; Owens et al., Reference Owens, Ballard, Beigi, Kalafatis, Brooker, Lavelle, Brønnick, Sauer, Boddington, Velayudhan and Aarsland2020; Pulsifer et al., Reference Pulsifer, Grieco, Burstein, Parsons, Gardner and Sherman2021; Rochette, Rahman-Filipak, Spencer, Marshall & Stelmokas, Reference Rochette, Rahman-Filipiak, Spencer, Marshall and Stelmokas2021; Singh & Germine, Reference Singh and Germine2021; Underwood et al., Reference Underwood, Thompsell, Sidhom and Burns2021). In addition to maintaining sufficient access to crucial neuropsychological services under exceptional circumstances (Owens et al., Reference Owens, Ballard, Beigi, Kalafatis, Brooker, Lavelle, Brønnick, Sauer, Boddington, Velayudhan and Aarsland2020; Underwood et al., Reference Underwood, Thompsell, Sidhom and Burns2021), implementing solid protocols for remote, at-home cognitive testing can also facilitate diagnostic care and monitoring in the memory clinic on the long-term by taking away several barriers to undergo (repeated) testing (Hewitt et al., Reference Hewitt, Block, Bellone, Dawson, Garcia, Gerstenecker, Grabyan, Howard, Kamath, LeMonda, Margolis, McBride, Salinas, Tam, Walker and Del Bene2022; Tsiakiri et al., Reference Tsiakiri, Koutzmpi, Megagianni, Toumaian, Geronikola, Despoti, Kanellopoulou, Arampatzi, Margioti, Davila, Zoi, Kalligerou, Liozidou, Tsapanou and Sakka2022). Among supervised remote neuropsychological assessments, VTC may, for example, be preferable to telephone-based assessment for a variety of reasons, including reliability (Hunter et al., Reference Hunter, Jenkins, Dolan, Pullen, Ritchie and Muniz-Terrera2021). Undergoing assessment through VTC from home can be an alternative to assessment at the clinic for patients who live in rural communities (Vestal et al., Reference Vestal, Smith-Olinde, Hicks, Hutton and Hart2006), struggle with health or mobility limitations, have lower motivation for testing (Castanho et al., Reference Castanho, Amorim, Zihl, Palha, Sousa and Santos2014) or experience anxiety surrounding clinic visits (that may also affect test performance (Dorenkamp et al., Reference Dorenkamp, Irrgang and Vik2021)).
In a memory clinic setting, it is crucial to know which tests can be dependably administered through VTC, but also which patients are more (or less) suitable for and open to undergoing this method of assessment. Research has shown that VTC assessment is likely feasible in community-dwelling older individuals (Hildebrand et al., Reference Hildebrand, Chow, Williams, Nelson and Wass2004), individuals with self-perceived cognitive decline but no major impairment (Gnassounou et al., Reference Gnassounou, Defontaines, Denolle, Brun, Germain, Schwartz, Schück, Michon, Belin and Maillet2021), and even those with cognitive impairment (Cullum et al., Reference Cullum, Hynan, Grosch, Parikh and Weiner2014; Hunter et al., Reference Hunter, Jenkins, Dolan, Pullen, Ritchie and Muniz-Terrera2021; Parikh et al., Reference Parikh, Grosch, Graham, Hynan, Weiner, Shore and Cullum2013; Wadsworth et al., Reference Wadsworth, Dhima, Womack, Hart, Weiner, Hynan and Cullum2018). At the same time, the reliability and validity of neuropsychological testing administered through VTC in a clinic-to-home setting are not yet adequately established (Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020; Parsons et al., Reference Parsons, Gardner, Sherman, Pasquariello, Grieco, Kay, Pollak, Morgan, Carlson-Emerton, Seligsohn, Davidsdottir, Pulsifer, Zarrella, Burstein and Mancuso2022), in particular for complete, comprehensive assessment protocols used for regular clinical diagnostics. Furthermore, we do not have an adequate picture of other important issues relating to the integration VTC in memory clinic neuropsychological services for individuals with varying levels of cognitive impairment (Bilder et al., Reference Bilder, Postal, Barisa, Aase, Cullum, Gillaspy, Harder, Kanter, Lanca, Lechuga, Morgan, Most, Puente, Salinas and Woodhouse2020; Hildebrand et al., Reference Hildebrand, Chow, Williams, Nelson and Wass2004), specifically user experiences of both administrators and patients.
Research investigating reliability of VTC assessment commonly focuses on tests that have existing adaptations for remote administration (König et al., Reference König, Zeghari, Guerchouche, Duc Tran, Bremond, Linz, Lindsay, Langel, Ramakers, Lemoine, Bultingaire and Robert2021) or are based on verbal instruction and response, such as verbal memory and fluency tests. Such tests are relatively easy to implement in remote assessment and appear to have largely sufficient reliability metrics (Brearly et al., Reference Brearly, Shura, Martindale, Lazowski, Luxton, Shenal and Rowland2017; Cullum et al., Reference Cullum, Hynan, Grosch, Parikh and Weiner2014; Fox-Fuller et al., Reference Fox-Fuller, Ngo, Pluim, Kaplan, Kim, Anzai, Yucebas, Briggs, Aduen, Cronin-Golomb and Quiroz2022) and produce scores that are not significantly different from face-to-face assessment (Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020), although test-retest reliability of remote assessment may deviate slightly from in-person administration (Fox-Fuller, Ngo, et al., Reference Fox-Fuller, Ngo, Pluim, Kaplan, Kim, Anzai, Yucebas, Briggs, Aduen, Cronin-Golomb and Quiroz2022; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020). VTC administration of other tests that are often part of a neuropsychological workup, such as timed tests, tests relying on visual or motor modalities or those that tap into executive functions may come with challenges concerning administration, accuracy and performance monitoring (Bloch et al., Reference Bloch, Maril and Kavé2021; Fox-Fuller, Ngo, et al., Reference Fox-Fuller, Ngo, Pluim, Kaplan, Kim, Anzai, Yucebas, Briggs, Aduen, Cronin-Golomb and Quiroz2022). Findings regarding reliability of such tests and the deviation in scores compared to regular administration vary, are sometimes based on adapted versions of the original or too limited for clinical implementation (Brearly et al., Reference Brearly, Shura, Martindale, Lazowski, Luxton, Shenal and Rowland2017; Hunter et al., Reference Hunter, Jenkins, Dolan, Pullen, Ritchie and Muniz-Terrera2021; Lunardini et al., Reference Lunardini, Luperto, Romeo, Basilico, Daniele, Azzolino, Damanti, Abbate, Mari, Cesari, Borghese and Ferrante2020; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020; Owens et al., Reference Owens, Ballard, Beigi, Kalafatis, Brooker, Lavelle, Brønnick, Sauer, Boddington, Velayudhan and Aarsland2020; Wadsworth et al., Reference Wadsworth, Dhima, Womack, Hart, Weiner, Hynan and Cullum2018).
Alongside sufficient reliability, an important part in evaluating the value of VTC assessment involves the user experiences from both administrators and patients (Appleman et al., Reference Appleman, O’Connor, Boucher, Rostami, Sullivan, Migliorini and Kraft2021; Parsons et al., Reference Parsons, Gardner, Sherman, Pasquariello, Grieco, Kay, Pollak, Morgan, Carlson-Emerton, Seligsohn, Davidsdottir, Pulsifer, Zarrella, Burstein and Mancuso2022). A survey among US-based neuropsychological test administrators (Fox-Fuller et al., Reference Fox-Fuller, Rizer, Andersen and Sunderaraman2022), identified connection issues, limited technological access and external distractions as some of the major challenges. Such challenges are likely to exist across countries and settings (Hewitt et al., Reference Hewitt, Block, Bellone, Dawson, Garcia, Gerstenecker, Grabyan, Howard, Kamath, LeMonda, Margolis, McBride, Salinas, Tam, Walker and Del Bene2022; Sumpter et al., Reference Sumpter, Camsey, Meldrum, Alford, Campbell, Bois, O’Connell and Flood2023). Still, in memory clinics we may also encounter challenges inherent to the patient population, including high average age, varying technological literacy, and substantial cognitive and functional decline. All of these factors potentially affect patients’ ability to navigate VTC systems as well as the cognitive profile derived from remote assessment (Brearly et al., Reference Brearly, Shura, Martindale, Lazowski, Luxton, Shenal and Rowland2017; Parks et al., Reference Parks, Davis, Spresser, Stroescu and Ecklund-Johnson2021). More research is needed to understand how previously identified challenges with VTC relate to the memory clinic context, and to gauge specific challenges and opportunities at play.
The aims of the current study were to 1) investigate the stability of performances on a VTC assessment comprised of commonly used, non-adapted neuropsychological tests in relation to in-person assessment using a test-retests design, and 2) evaluate user experiences of neuropsychologists and cognitively impaired and unimpaired patients of a Dutch memory clinic.
Methods
Participants and study design
The current study is a mixed-methods observational prospective study, with a test-retest design. Participants considered for inclusion were older adults who visited Alzheimer Center Amsterdam between August 2020 and February 2021 for a multidisciplinary diagnostic evaluation. The current diagnostic workup includes a neurological evaluation, neuroimaging, lumbar puncture and face-to-face in-clinic neuropsychological assessment. Details regarding the complete diagnostic procedure and criteria are described in (van der Flier & Scheltens, Reference van der Flier and Scheltens2018). The neuropsychological test protocol is described in Table 1.
* This test is also used for clinical classification of performance on the corresponding domain (see statistical analyses- Clinically relevant changes in performance).
† Baseline visit only.
Inclusion criteria were a (completed) face-to-face in-clinic neuropsychological assessment and access to a computer for VTC at home. Exclusion criteria were: insufficient knowledge of the Dutch language, and insufficient clinical condition as judged by the neuropsychologist based on information gathered at the diagnostic evaluation (e.g., advanced dementia or behavioral problems). Diagnosis was known at time of inclusion but was not a deciding factor for in- or exclusion by itself in the current study.
VTC assessments took place within 4 months after the face-to-face assessment. Data were collected between August 2020 and April 2021. The study was conducted in accordance with the Declaration of Helsinki (World Medical Association, 2013) and was approved by the Institutional Review Board of VU University Medical Center (VUmc). All participants signed informed consent.
Measures of the neuropsychological assessment
The same standardized neuropsychological assessment protocol was performed at both the in-clinic assessment and VTC assessment, except the Mini Mental State Examination (MMSE, in-clinic only). The assessment comprised a standard clinical workup used for diagnostic purposes at the Alzheimer center Amsterdam, also see van der Flier & Scheltens (Reference van der Flier and Scheltens2018). The protocol consisted of 19 tests, covering six cognitive domains: global cognition, memory, attention, executive function, language, and visuo-spatial function. See Table 1 for an overview of the tests as well as the measures derived from the tests that were subsequently used in the analyses.
We used both raw scores and standardized T-scores, depending on the analysis performed (see Statistical Analyses). Once standardized, the T-scores were also classified according to published guidelines for Dutch clinical practice that were adapted from the American Academy of Clinical Neuropsychology (Hendriks & Kessels, Reference Hendriks and Kessels2020). into the following performance classes: very low/impaired (T score <30), below average (30 ≤ T score ≤36), low-average (37 ≤ T score ≤ 42), average (43 ≤ T score ≤ 56), high average (57 ≤ T score ≤ 63), above average (64 ≤ T score ≤ 71), and high (T score ≥ 72). As Dutch norms for the Letter Digit Substitution Test (LDST) (van der Elst et al., Reference van der Elst, van Boxtel, van Breukelen and Jolles2006) were based on the 60-seconds version of the test (as opposed to the 90-second version in our protocol) we adapted the raw scores of the LDST (0.67* score at 90 s) before standardizing. Performance classes for the Visual Association Test (VAT) and Number Location tests included impaired vs unimpaired only, due to the skewed distribution of scores (ceiling effect).
VTC assessment procedure
VTC assessment was clinic-to-home, i.e., the neuropsychologist resided at the clinic and the patient at home. For the VTC assessment, instructions and response forms where required (TMT, Rey Complex Figure test, LDST, MoCa) were sent to patients’ homes with explicit instructions to not open these until the start of the VTC visit. Patients were instructed to use a laptop or desktop computer for VTC, and to limit risk of distractions by undergoing the assessment in a quiet room in their home and instructing cohabitants to not disturb. For patients who were unsure of how the system should be setup, a partner or family member was allowed in the room to help. Audio and video settings were tested with the neuropsychologist in a brief preparatory meeting and at the start of the neuropsychological assessment.
The VTC assessment took place via secure conferencing using Microsoft Teams®, which is the primary conferencing method at the Amsterdam UMC. For some paper-and-pencil tests that required performance monitoring, including the TMT and Rey Complex Figure test, the patient was asked to adjust their camera to allow the neuropsychologist to view their actions during the task. Technological or logistic issues and consequences for the assessment (e.g., missing test data) were noted by the neuropsychologist.
Missing data
We inspected whether one of the modalities produced more missing test scores than the other, and for what reason (e.g., if there were systematically more missing test scores due to lack of test comprehension in one modality). Neuropsychologists recorded the number and reason for missing test measures at baseline and VTC. Furthermore, they recorded any issues that arose during the VTC assessment that could be of influence on the test outcome, e.g., lagging connection. Reasons for missing data were dichotomized into 0) patient was unable to perform the test (due to lack of understanding of the instructions) or 1) not administered (e.g., due to time restrictions of the assessment). We evaluated whether the number and main reason for missing test scores were different for the assessment modalities with chi-square tests.
User friendliness and system usability
After the VTC assessment, both the participant and test administrator completed a Dutch version of the User Satisfaction and Ease of use (USE) questionnaire (Lund, Reference Lund2001) and the System Usability Scale (SUS) (Brook, Reference Brook1995). The SUS consists of 10 statements aimed at evaluating general usability, using a 5-point likert scale ranging from strongly disagree (score of 1) to strongly agree (score of 5). The USE questionnaire consists of 30 statements using a 7-point Likert scale, and covers domains of usefulness, ease of use, ease of learning and satisfaction. A focus group was held with the neuropsychologists (LW, RJ, PZ) after the final assessment for an evaluation of their experiences with the VTC assessment, including main issues, opportunities and the experience with testing patients with different diagnoses.
Statistical analyses
Statistical analyses were performed with SPSS, version 27. A p-value of .05 was considered statistically significant, unless otherwise specified.
Descriptive statistics (mean, median, proportions, depending on the variable) were used to describe sociodemographic and clinical characteristics. Comparisons between diagnostic groups (Subjective Cognitive Decline [SCD] versus Mild Cognitive Impairment [MCI]/dementia) were made using independent samples-tests and chi-square tests, as appropriate.
Stability of performances across modalities
Raw scores on the test measures were used for investigate performance stability.
Intraclass Correlation Coefficients (ICC) were computed (type absolute agreement, two-way-mixed model) to measure homogeneity of scores across baseline vs. the VTC assessment for each test with continuous scores. We adopted published cutoffs to evaluate the ICC values as “poor” (<0.50), “moderate” (0.50–0.74), “good” (0.75–0.90) or “excellent” (0.90) (Koo & Li, Reference Koo and Li2016). Tests with restricted score ranges (i.e., VAT – naming, Visual Object and Space Perception [VOSP] – Fragmented letters and Number location subtests) were considered non-continuous, in which case we calculated the percentage absolute agreement between measurements. ICC values were calculated for the total sample and plotted as a function of clinical diagnosis (SCD and MCI/dementia).
To investigate stability of rank ordering between assessments, and compare with published literature, we ran correlational analyses between the in-person baseline and VTC assessment scores (Pearson product-moment or Spearman rho, for continuous scores). For non-continuous measures, chi-square or Fisher’s exact tests were performed.
We investigated differences in mean test scores from the in-person baseline to VTC assessment with paired-samples t-tests or Wilcoxon signed-rank tests, depending on data distribution. Effect sizes of the differences were calculated as Cohen’s D (M difference / SD difference) or r (Wilcoxon Z / √test sample size).
For tests with continuous scores, we used Bland–Altman plots to visualize the difference in scores between both measurements against the average of both measurements for each individual. These plots indicate whether there are systematic differences between scores on the face-to-face vs the remote assessment (Altman & Bland, Reference Altman and Bland1983). T-scores of the measures (see below) were used for scalability. 95% CI limits of agreement were set.
Clinical relevance of differences in performance across modalities
Standardized scores were used for the evaluation of clinical relevance of differences between modalities. For tests with continuous scores, a clinically relevant difference was defined as a difference of at least two of the previously mentioned performance classes between the measurements, e.g., below average at baseline assessment vs. average at VTC assessment. For tests with non-continuous scores, a shift between the performance classes (impaired/unimpaired) was seen as clinically relevant. We calculated the proportion of patients showing a clinically relevant difference in performances (worse or better) for each test.
Usability
Total scores on the USE questionnaire and domain scores for patients and neuropsychologists were computed and compared using Mann–Whitney U tests due to non-normal distributions. We also computed scores domain scores between the group without vs with cognitive impairment (i.e., SCD vs MCI/dementia diagnosis) as reported by patients and neuropsychologists separately. Responses on each item of the SUS were deemed categorical in nature, and statistically compared, again between the group without vs with cognitive impairment, with Chi-square or Fisher’s exact tests for patients and neuropsychologists separately. Focus group responses were summarized (author: SS) and described in addition to the quantitative results.
Results
Sample
133 individuals who underwent a face-to-face assessment as part of clinical care were considered for inclusion. Thirty-two patients met the inclusion criteria and wanted to participate. Of the 101 patients who were excluded, 12 did not have the appropriate setup for VTC at home and 2 mentioned they did not think they possessed sufficient technological skills for VTC. Other main reasons for exclusion were poor clinical status or psychological symptoms (e.g., stress after diagnosis or depressive symptoms), visual or language impairments, not wanting to participate or logistical reasons (e.g., not being able to plan a VTC assessment in the appropriate timeframe).
One patient withdrew after informed consent, but before the VTC assessment. The final sample thus consisted of 31 patients (mean age 62.2 ± 6.7 years, 45.2% female, 84.2% higher education; see Table 2 for an overview of sample characteristics). The group without cognitive impairment was younger (59.7 ± 5.2 years) than the group with cognitive impairment (65.9 ± 7.0 years, p < .01). As expected, MMSE score was higher in the former (28.7 ± 1.3) compared to the latter group (25.5 ± 3.2, p < .01). There were no differences between groups with regard to other baseline characteristics. One assessment was conducted on IPad, all other assessments were conducted via laptop or personal computer.
Note: MMSE = Mini Mental State Examination.
**Significant difference between diagnostic groups, p < .01.
Verhage coding can be translated as follows: lower education = elementary school up to lower vocational training (generally 6 to 13 years total education); middle education = intermediate vocational training (e.g., generally 10–14 years total education); higher education = preparatory scientific education, a bachelor’s degree or higher (generally 12 to 16+ years
† With or without vascular pathology.
Analysis of missing data for face-to-face and VTC administration
Supplementary table 1 provides an overview of missing data per test (in total and stratified according to primary reason for absence).
Face-to-face
Twelve patients had at least one missing test value. Four of these patients (diagnoses MCI N = 1; AD N = 2; vascular dementia N = 1) had not completed at least one task because of insufficient understanding or execution of the test instructions. In the eight other cases, missing data were not related to test understanding or execution, but due to time restrictions.
VTC
Nine patients had at least one missting test value. Four of these patients (diagnoses AD N = 3; vascular dementia N = 1) had not completed at least one task because of insufficient understanding or execution of the test instructions. In the other five cases, missings were not related to test understanding or execution, but due to time restrictions.
Chi-square tests indicated no difference in the total number of missing scores between the two assessments, p’s < .05.
Neuropsychological assessment-stability of performances
Test scores at face to face and remote assessment and the stability indicators (change in mean scores with effect sizes, ICC, correlations) in the total sample are displayed in Tables 3 and 4. ICC values are also depicted in the total sample and stratified according to diagnosis (SCD vs MCI/dementia) in Figure 1.
Note: BL = baseline assessment in-person, VTC = video-teleconference assessment, VAT = Visual Association Test, RAVLT = Rey Auditory Verbal Learning Test, Rey CFT = Rey Complex Figure Test, TMT = Trail Making Test, VOSP = Visual Object & Space Perception test. P < .05* or p < .01**.
ǂ Difference between baseline-remote assessment (pairs).
† Cohen’s D (Mdiff/SDdiff) or r (Wilcoxon Z/√(test sample size)).
Note: BL = baseline assessment in-person, VTC = video-teleconference assessment, VAT = Visual Association Test, RAVLT = Rey Auditory Verbal Learning Test, Rey CFT = Rey Complex Figure Test, TMT = Trail Making Test, VOSP = Visual Object & Space Perception test.
˥ Pearson product-moment or Spearman rho.
† Reliability coefficients (ICC or r) across multiple face-to-face assessments reported in literature (Bruijnen et al., Reference Bruijnen, Dijkstra, Walvoort, Budy, Beurmanjer, De Jong and Kessels2020; Jutten et al., Reference Jutten, Harrison, Lee Meeuw Kjoe, Opmeer, Schoonenboom, de Jong, Ritchie, Scheltens and Sikkes2018; Lezak et al., Reference Lezak, H., Bigler, Tranel, Lezak, H., Bigler and Tranel2012; Schmand et al., Reference Schmand, Groenink and van den Dungen2008).
All tests showed moderate to excellent ICC values (range 0.63–0.93) in the total sample. Absolute agreement (non-continuous test measures) ranged from 19% (VOSP number location) to 87% (VAT naming). Pearson and Spearman correlation indicated significant relationships (all p’s < .05) between face-to-face and VTC administration for all test measures. Chi-square tests indicated similar score distributions between the VAT Naming and VOSP Fragmented Letters (p’s > .05), but not for VOSP Number Location (p < .01).
Independent samples t-tests/Wilcoxon Signed-rank tests showed higher mean scores for the Rey Auditory Verbal Learning Test (RAVLT) recognition – false negatives (i.e., more false negatives on average), and Rey Complex Figure Test (Rey CFT) – copy condition (i.e., more correctly drawn parts of the figure) at the VTC assessment, showing small to medium effect sizes. After stratifying for diagnosis, the significant difference in the Rey CFT mean scores in the total sample was explained by a higher score among the cognitively unimpaired patients (SCD group face-to-face: 34,35 ± 1.50; VTC: 35,41 ± 1.12; p = .02, vs. MCI/dementia group face-to-face: 30,8 ± 37,04; VTC: 33,42 ± 4,72; p > .05). The difference in RAVLT recognition - false negative scores in the total sample was significant in neither the SCD group alone (face-to-face: 0.59 ± 0.71; VTC: 1.24 ± 1.2.02; p = .13) nor the MCI/dementia group alone (2.33 ± 2.87; VTC: 3.42 ± 2.54, p = .07).
Bland–Altman plots
Differences between the assessment modalities were close to T = 0 for all tests except for the Rey CFT-delayed condition (mean difference T = −10; 1 standard deviation). We note that differences in scores between the assessments appeared fairly stable as a function of patients’ mean performance across measurements for most tests (i.e., the differences in scores between face-to-face and VTC assessment did not become more larger as patients’ mean performances over the assessments were lower or higher). Especially the RAVLT conditions, Stroop test 2 and Digit Span forward (verbal tests of memory, processing speed and attention), but also LDST, (a written test of visuomotor speed) showed stability. For the Rey CFT delayed and TMTa, there appeared to be less variability in difference scores when mean performance over assessments was worse. For Letter fluency the opposite seemed to be the case; variability decreased somewhat as the mean performance was better.
Bland–Altman plots for the LDST, Rey CFT delayed and Letter fluency are shown in Figure 2, and those for the other tests are shown in Supplementary figure 1.
Clinical interpretation of stability and differences in test scores
Supplementary figure 2 displays the amount of stability and clinically relevant shifts in performance classes from baseline to VTC follow-up per test in the sample according to the predefined cut offs.
For all tests, most patients showed no clinically relevant difference. Number Location (100%), LDST (96%) and Letter Fluency and Stroop 2 (both 93%) measures showed the highest number of stable classifications from baseline to VTC assessment. TMT A (17%) and TMT B (15%) and (their derived) were the tests with the highest number of patients showing a higher classification at baseline, while Rey CFT copy (34%), Digit Span Backward (27%) and Letter Fluency (21%) showed the highest number of patients with higher classification at VTC assessment.
VTC assessment – usability
Figure 3 shows total scores reported on the USE after the VTC assessment stratified by patient vs neuropsychologist reporting (a), and clinical diagnosis (b,c). Mann–Whitney U tests indicated no significant differences on total scores of the USE (p’s > .05) as reported by patient or neuropsychologist for SCD vs MCI/dementia diagnosis. Supplementary figure 2 shows scores on the Ease of use and learning, Usefulness and Satisfaction domains. Neither the neuropsychologist- nor patient-reported mean scores on these domains differed for SCD vs MCI/dementia diagnosis.
Figure 4 shows the scores on each item of the SUS as judged by patients and neuropsychologists. As can be seen in the figure, the vast majority of patients agreed (i.e., responded with “somewhat agree” or “strongly agree”) with positively formulated items, namely those stating that they would use the system frequently (84% agreed, item 1), that the VTC setup was easy to use (93% agreed, item 3) and that they felt confident using the VTC setup (81% agreed, item 9). Twenty-three percent of patients agreed that they would need help from someone with technical knowledge to use the VTC system (item 4). With regard to the patient-reported scores on the SUS, Fisher’s exact tests indicated significant differences between SCD vs MCI/dementia patients in responses to item 4 (SCD patients had a higher proportion of “strongly disagree” response than MCI/dementia patients) and item 7 (SCD patients had a higher proportion of “somewhat agree” responses than MCI/dementia patients).
Similar to patients, most neuropsychologist responses were in agreement with items (i.e., responded with “somewhat agree” or “strongly agree”) stating they would want to use the system frequently (87%, item 1) that the VTC setup was easy to use (83%, item 3), and that they felt confident using the VTC setup (77%, item 9). Fisher’s exact tests revealed that neuropsychologists’ scores on the SUS were significantly different for administrations concerning SCD vs MCI/dementia patients, specifically for item 8 (higher proportion of “strongly disagree” for SCD patients compared to MCI/dementia, and higher proportion of “somewhat disagree” for MCI/dementia patients compared to SCD).
In the focus group session, neuropsychologists reported technical issues as the biggest barrier, where delays in connection and trouble with audio were most prevalent. The home environment was perceived as more distracting compared to the consultation room, and it was harder to monitor and correct mistakes when needed (e.g., for the TMT). Clinical observations could be sufficiently made, and the neuropsychologists who touched on this topic mentioned that they did not experience differences in clinical observations during the VTC assessment compared to face-to-face. Some tests, including the VAT, were described as easier to administer digitally. In terms of patient suitability, neuropsychologists reported that patients without cognitive impairment (i.e., SCD diagnosis) were functioning more independent during the VTC assessment and needed less help with the technical setup of the VTC than patients with (extensive) impairment. An additional preparation session with the patient before the VTC assessment was helpful to prevent and timely resolve issues, e.g., with the MS Teams setup or with internet connection. Still, it was also mentioned that this added to workload.
Discussion
In this study, we aimed to add to the growing literature on reliability and usability of remote neuropsychological assessment by integrating a standard neuropsychological assessment protocol used for clinical diagnostic purposes in a Dutch memory clinic across face-to-face (baseline) and clinic-to-home VTC (follow-up) modalities in a repeated measures design. All of the test measures used in the workup provided sufficient to excellent reliability in the total patient sample. We observed stability of the clinical interpretation of the performances for most patients on all tests. Usability scores were positive, and most usefulness and usability indicators were rated similarly between patients with and without cognitive impairment (MCI/dementia vs SCD diagnosis).
Former studies have shown similar results regarding performances (Gnassounou et al., Reference Gnassounou, Defontaines, Denolle, Brun, Germain, Schwartz, Schück, Michon, Belin and Maillet2021; Vestal et al., Reference Vestal, Smith-Olinde, Hicks, Hutton and Hart2006; Wadsworth et al., Reference Wadsworth, Dhima, Womack, Hart, Weiner, Hynan and Cullum2018), correlations (Galusha-Glasscock et al., Reference Galusha-Glasscock, Horton, Weiner and Cullum2016), and validity aspects (Martin-Khan et al., Reference Martin-Khan, Flicker, Wootton, Loh, Edwards, Varghese, Byrne, Klein and Gray2012) for face-to-face and VTC neuropsychological testing. Our results support and extend on this research by including a comprehensive test protocol that is administered in our memory clinic setting for clinical purposes as opposed to a selection of tests. The reliability findings (ICC or Pearson r/Spearman’s rho) of the 24 test measures in the total sample were largely similar to those found in literature for repeated face-to-face assessment in adults (Bruijnen et al., Reference Bruijnen, Dijkstra, Walvoort, Budy, Beurmanjer, De Jong and Kessels2020; Lezak et al., Reference Lezak, H., Bigler, Tranel, Lezak, H., Bigler and Tranel2012; Schmand et al., Reference Schmand, Groenink and van den Dungen2008). For the RAVLT direct and delayed recall, standard MoCa and LDST, ICC values were somewhat higher. The fact that performances were overall more stable in in the MCI/dementia group than the SCD group and that SCD patients were mostly the ones showing better scores at the VTC follow-up is consistent with the notion that practice effects associated with repeated test administration tend to attenuate as cognition worsens (Jutten et al., Reference Jutten, Grandoit, Foldi, Sikkes, Jones, Choi, Lamar, Louden, Rich, Tommet, Crane and Rabin2020). We found no strong indication of a systematic difference in scores across modalities, except for the Rey Complex Figure test. Still, the correlation between the assessments’ scores was similar to that of repeated face-to-face (Lezak et al., Reference Lezak, H., Bigler, Tranel, Lezak, H., Bigler and Tranel2012).
Compared to previous findings in a design involving repeated VTC assessment of English speaking cognitively healthy adults (Fox-Fuller, Ngo, et al., Reference Fox-Fuller, Ngo, Pluim, Kaplan, Kim, Anzai, Yucebas, Briggs, Aduen, Cronin-Golomb and Quiroz2022), it seems that reliability for verbal tests that tap into semantic fluency, attention and working memory (Fluency- Animals, Digit Span Forward and Backward tests) was somewhat higher in our study of older adults in the memory clinic. On the one hand, it may be beneficial for reliability if patients can familiarize themselves with neuropsychological tests in a face-to-face setting first, but we should also acknowledge that higher reliability could also be driven by other factors, such as the implementation of a VTC setup session with the neuropsychologist before actual assessment, and likely the high performance stability specifically present in our impaired patients.
It should be noted that reliability of some reaction time-based measures, such as Stroop test 1, was relatively low compared to other. This is not surprising, as they are granular scores that are particularly susceptible to intraindividual fluctuations from one assessment to the next, especially in elderly (Bielak et al., Reference Bielak, Cherbuin, Bunce and Anstey2014). We hypothesize that these scores could be affected by subtle delays in connection during remote assessment, as reaction times at VTC follow-up were somewhat higher compared to face-to-face baseline. The neuropsychologists also reported timely correction of mistakes on the TMT remotely as more challenging, which could contribute to different completion times and accuracy. The amount of clinically relevant performances differences on these tests was, however, comparable to others.
A comprehensive evaluation of the applicability of a novel assessment modality requires evaluation of psychometric properties in conjunction with experiences of those who undergo and administer it in daily practice (Parsons et al., Reference Parsons, Gardner, Sherman, Pasquariello, Grieco, Kay, Pollak, Morgan, Carlson-Emerton, Seligsohn, Davidsdottir, Pulsifer, Zarrella, Burstein and Mancuso2022). Earlier research in cognitively unimpaired adults indicates patient and clinician satisfaction with remote testing in a Scottish neuropsychology service catering to a diverse (non-demented) patient population (Sumpter et al., Reference Sumpter, Camsey, Meldrum, Alford, Campbell, Bois, O’Connell and Flood2023). While encouraging, this result cannot simply be generalized to the memory clinic setting where patients present with greatly varying degrees and types of functional and cognitive impairment that can influence ability to learn and use technology systems. Furthermore, one study found that face-to-face assessment was preferred more often in community-based individuals with MCI and Alzheimer’s disease dementia than cognitively unimpaired individuals (Parikh et al., Reference Parikh, Grosch, Graham, Hynan, Weiner, Shore and Cullum2013).
We found positive system usability scores, and patients with and without cognitive impairment rated assessment with VTC equally in terms of usefulness, ease of use and satisfaction. Although the patient confidence in system usage corresponds with literature (Narasimh et al., Reference Narasimha, Madathil, Agnisarman, Rogers, Welch, Ashok, Nair and McElligott2017), we note that patients without cognitive impairment (SCD) were unsurprisingly more inclined to report confidence in their ability to learn the VTC system quickly than cognitively impaired patients (MCI/dementia diagnosis). More efficient learning of new skills in cognitively unaffected individuals should be taken into account (e.g., taking more time for system instructions and supervision if a patient has impairment). Interestingly, the neuropsychologists who performed the assessments rated similar user experiences for the diagnostic groups on the questionnaires, but they did report more problems and less independence in dementia patients in the focus group. Overall, their experience suggests that suitability for VTC assessment declines as cognitive and functional impairments progress over time.
Core experiences and challenges for US-based test administrators identified in previous survey research - having to adapt to a new mode of administration, occasional technological problems, environmental distractions in patients’ homes, differences in technological fluency between individual patients – (Fox-Fuller, Rizer, et al., Reference Fox-Fuller, Rizer, Andersen and Sunderaraman2022) largely correspond with reporting from our neuropsychologists. In addition, our neuropsychologists reported that some tests were easier to administer through VTC, and that their ability for clinical observation during the assessment was largely unaffected. While additional preparations for VTC assessment in the form of a practice session with patients were seen as potentially helpful to prevent and timely resolve issues, it was also mentioned that this added to workload. The benefits and added efficiency for patients have to be weighed against such issues that can also result in additional strain on care planning and administration.
Decisions about when, how and for whom remote assessment in the memory clinic is a suitable option require in-depth evaluation (Bloch et al., Reference Bloch, Maril and Kavé2021). The current study integrates some important issues that are key to such decisions, including reliability, clinical interpretability and experiences from key user groups. Although we should take caution in drawing definitive conclusions from this study alone due to the small sample size, we see a fairly consistent image across our and previous investigations that VTC administration can be done reliably (Fox-Fuller, Ngo, et al., Reference Fox-Fuller, Ngo, Pluim, Kaplan, Kim, Anzai, Yucebas, Briggs, Aduen, Cronin-Golomb and Quiroz2022; Marra et al., Reference Marra, Hamlet, Bauer and Bowers2020) and to users’ satisfaction (Parsons et al., Reference Parsons, Gardner, Sherman, Pasquariello, Grieco, Kay, Pollak, Morgan, Carlson-Emerton, Seligsohn, Davidsdottir, Pulsifer, Zarrella, Burstein and Mancuso2022) with some of the most commonly used neuropsychological tests, also in a memory clinic setting. It can be particularly useful for follow-up assessments of patients who may not have presented with formal cognitive impairment at a first visit, but should be monitored for timely identification of decline, as these patients may prefer remote testing more often and be more independent, requiring less instruction or help. This group could include individuals with amyloid positivity, subjective complaints that correspond with SCDplus, or a family history of degenerative disorders. Especially more advanced cases of dementia may not be suitable (Hunter et al., Reference Hunter, Jenkins, Dolan, Pullen, Ritchie and Muniz-Terrera2021).
Based on the results, it seems that tests that are timed, rely on speed, and/or are performed with pencil-and-paper can be administered through VTC with largely sufficient reliability and little deviation from repeated face-to-face administrations, suggesting that memory clinic administrators can test a broader range of functions beyond verbal (and) memory tests, such as attention, speed and cognitive flexibility. Naturally, administrating assessments remotely will require adjustments to regular care (Hewitt et al., Reference Hewitt, Block, Bellone, Dawson, Garcia, Gerstenecker, Grabyan, Howard, Kamath, LeMonda, Margolis, McBride, Salinas, Tam, Walker and Del Bene2022); the patient needs to receive assessment forms on time, possibly a practice session that will require time and personnel investment, and privacy concerns should be taken seriously. Furthermore, we should remain aware that in clinic-to-home assessments, some factors that can influence the assessment, but can be controlled with in-clinic assessments, are introduced .Such factors include external distractions and fluctuations in internet connections during the assessment.
It should be noted that reliability of neuropsychological testing can be investigated from different perspectives. Our design allows us to conclude that VTC assessment administered after initial in-person assessment has similar reliability as repeated in-person assessment, but it does not allow for a direct comparison of the modalities in absence of effects related to repeated testing, such as (varying) practice effects and test familiarity. In a heterogenous sample like ours consisting of patients with and without clinical diagnoses, these factors can vary. Still, this is inherent to all administration methods for neuropsychological assessment, not just remote testing. Studies with a counterbalanced design in which patients can receive either modality first or second would allowing for a direct comparison between different assessment modalities, and whether indeed reliability is better if the first assessment is done face-to-face. It is also important to address that we were able to include 31 patients with varied diagnoses in our inclusion period in the memory clinic. Similar studies that have been published with different samples are somewhat larger (Fox-Fuller et al., Reference Fox-Fuller, Ngo, Pluim, Kaplan, Kim, Anzai, Yucebas, Briggs, Aduen, Cronin-Golomb and Quiroz2022; Wadsworth et al., Reference Wadsworth, Galusha-Glasscock, Womack, Quiceno, Weiner, Hynan, Shore and Cullum2016). Our results need to be replicated in a larger sample, which ideally would also be sufficient to subsequently more extensively investigate differences in reliability and user experiences as a function of socioeconomic and clinical characteristics to get a better picture of patient suitability. Importantly, our study was performed in as a tertiary diagnostic center in a Western country, and included a fairly high proportion of individuals with higher education with relatively good clinical status. Our sample therefore likely overrepresents access to and experience with technology of elderly individuals seeking help for (experienced) cognitive dysfunction, and may not reflect the entire memory clinic population. Finally, we note that due to a lack of normative data for tests administered through VTC assessment we used norms taken from in-person assessment, and that our cut off for clinically relevant change was based on clinical consensus.
In conclusion, reliability of remote clinic-to-home administration of tests of a standard neuropsychological workup were largely similar to repeated face-to-face assessment for most tests, with most patients showing no clinically relevant difference between modalities. Systematic and clinically relevant differences from face-to-face assessment appear limited. While we should remain aware of differences in technological access and literacy among elderly, more memory clinic patients, especially those at risk for cognitive decline in the future, may be candidate for VTC.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617724000432.
Acknowledgements
Research of Alzheimer Center Amsterdam is part of the neurodegeneration research program of Amsterdam Neuroscience. Alzheimer Center Amsterdam is supported by Stichting Alzheimer Nederland and Stichting Steun Alzheimercentrum Amsterdam. WF is supported by the Pasman stichting. SS and WF are recipients of TAP-dementia (www.tap-dementia.nl), receiving funding from ZonMw (#10510032120003) in the context of Onderzoeksprogramma Dementie, part of the Dutch National Dementia Strategy. No funding is provided for this specific study.