BACKGROUND
Cognitive impairment is increasingly recognized as a core feature of multiple sclerosis (MS) (Langdon, Reference Langdon2011), with prevalence rates ranging from 43 to 70% (Chiaravalloti & DeLuca, Reference Chiaravalloti and DeLuca2008). A characteristic pattern for these impaired cognitive domains has been identified in this population with some interpatient variability (Amato, Zipoli, & Portaccio, Reference Amato, Zipoli and Portaccio2008). Deficits in processing speed and episodic memory, and to a lesser extent, in executive and visuospatial functions (Benedict et al., Reference Benedict, Cookfair, Gavett, Gunther, Munschauer, Garg and Weinstock-Guttman2006; Chiaravalloti & DeLuca, Reference Chiaravalloti and DeLuca2008; Deloire et al., Reference Deloire, Ruet, Hamel, Bonnet, Dousset and Brochet2011; Rao, Leo, Bernardin, & Unverzagt, Reference Rao, Leo, Bernardin and Unverzagt1991; Strober et al., Reference Strober, Englert, Munschauer, Weinstock-Guttman, Rao and Benedict2009; Sumowski et al., Reference Sumowski, Benedict, Enzinger, Filippi, Geurts, Hamalainen and Rocca2018; Trenova et al., Reference Trenova, Slavov, Manova, Aksentieva, Miteva and Stanilova2016) are the most common, while language abilities and comprehension are usually spared (Brassington & Marsh, Reference Brassington and Marsh1998; Calabrese, Reference Calabrese2006; Piras et al., Reference Piras, Magnano, Canu, Paulus, Satta, Soddu and Aiello2003; Prosiegel & Michael, Reference Prosiegel and Michael1993; Rao, Reference Rao1995; Rocca et al., Reference Rocca, Amato, De Stefano, Enzinger, Geurts, Penner and Filippi2015). Cognitive impairment is present in all types of MS as well as disease stages – independent from any physical disability. It influences several aspects of the patient’s life negatively (Amato, Zipoli, & Portaccio, Reference Amato, Zipoli and Portaccio2006), health-related quality of life (Mitchell, Benito-León, González, & Rivera-Navarro, Reference Mitchell, Benito-León, González and Rivera-Navarro2005), activities of daily living (Goverover, Genova, Hillary, & DeLuca, Reference Goverover, Genova, Hillary and DeLuca2007), social life (Bobholz & Rao, Reference Bobholz and Rao2003), and employment activities (Clemens & Langdon, Reference Clemens and Langdon2018). The latest longitudinal studies showed that cognitive impairment in MS predicts disability progression and cortical thinning (Moccia et al., Reference Moccia, Lanzillo, Palladino, Chang, Costabile, Russo and Maniscalco2016; Pitteri, Romualdi, Magliozzi, Monaco, & Calabrese, Reference Pitteri, Romualdi, Magliozzi, Monaco and Calabrese2017). Subsequently, valid assessments of cognition in this population could contribute to more accurate care plans, therefore, enhancing patients’ functioning, productivity, and quality of life (Cheng et al., Reference Cheng, Crandall, Bever, Giesser, Haselkorn, Hays and Vickrey2010; Kalb et al., Reference Kalb, Beier, Benedict, Charvet, Costello, Feinstein and Harris2018).
Several batteries for assessing cognitive function in MS have been developed and validated (Sumowski et al., Reference Sumowski, Benedict, Enzinger, Filippi, Geurts, Hamalainen and Rocca2018). The most commonly used ones are the Brief Repeatable Battery of Neuropsychological Tests (BRB-N) (Rao, Reference Rao1990), Minimal Assessment of Cognitive Function in MS (MACFIMS) (Benedict et al., Reference Benedict, Fischer, Archibald, Arnett, Beatty, Bobholz and Caruso2002), and the Brief International Cognitive Assessment for MS (BICAMS) (Langdon et al., Reference Langdon, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012). The BICAMS has several advantages over other batteries, such as the BRB-N and MACFIMS – non-neuropsychology specialists can administer it requires a shorter time for administration, and international use was taken into consideration during development. Moreover, the battery was shown to have ecological validity; it predicted the real-life functional performance of MS patients (Goverover, Chiaravalloti, & DeLuca, Reference Goverover, Chiaravalloti and DeLuca2016). The BICAMS examines the commonly affected cognitive functions in MS using three standardized tests with stable psychometric properties (Langdon et al., Reference Langdon, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012); the Symbol Digit Modalities Test (SDMT) (Smith, Reference Smith1982), California Verbal Learning Test-Second Edition (CVLT-II) (Delis, Reference Delis, Kramer, Kaplan and Ober2000), and the Brief Visuospatial Memory Test-Revised Edition (BVMT-R) (Benedict, Reference Benedict1997).
Although the BICAMS scores have shown evidence of their validity across 11 languages (Corfield & Langdon, Reference Corfield and Langdon2018), such evidence is currently lacking among Arabic-speaking populations. An Arabic BICAMS with evidence of reliability is currently present (Kishk et al., Reference Kishk, Shalaby, Shehata, Hassan, Hegazy, Elmazny and Farghaly2017); however, the psychometric validation of the battery has not been completed and might not be appropriate for other Arab countries, as noted by Paul, Brown and Hughes (Reference Paul, Brown and Hughes2019) in their recent systematic review (Paul et al., Reference Paul, Brown and Hughes2019). Results point to the need for additional translation, adaptation, and validation of standard MS cognitive measures for use in Arabic-speaking populations, although some limitations of BICAMS exist (El Ghoneimy, Hassan, Homos, Farghaly, & Dahshan, Reference El Ghoneimy, Hassan, Homos, Farghaly and Dahshan2015; Hamdy et al., Reference Hamdy, Shaheen, Aboumousa, Farghaly, Ezzat and Daker2013; Kishk et al., Reference Kishk, Shalaby, Shehata, Hassan, Hegazy, Elmazny and Farghaly2017; Paul et al., Reference Paul, Brown and Hughes2019). In our study, we examined a new Arabic version of the battery following the BICAMS international standards for validation (aim 1) (Benedict et al., Reference Benedict, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012). This version of the BICAMS includes a newly developed Verbal Memory Arabic Test (VMAT) that could be generalized to most Arab cultures (Zeinoun, Farran, Khoury, & Darwish, Reference Zeinoun, Farran, Khoury and Darwish2020). Additionally, given the importance of producing normative data relevant to local populations (Smerbeck et al., Reference Smerbeck, Benedict, Eshaghi, Vanotti, Spedo, Blahova Dusankova and Langdon2018), we provide normative values for the BICAMS in a Lebanese sample (aim 2).
Design and Methods
This cross-sectional observational study was approved by the American University of Beirut Institutional Review Board, and all participants signed an informed consent. Human data were obtained in compliance with the Helsinki Declaration. The study was conducted in the period 2017–2019. The methods followed were primarily based on other studies that provided evidence of BICAMS validity (Filser et al., Reference Filser, Schreiber, Pöttgen, Ullrich, Lang and Penner2018; Goretti et al., Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014; Ozakbas et al., Reference Ozakbas, Yigit, Cinar, Limoncu, Kahraman and Kösehasanoğulları2017; Polychroniadou et al., Reference Polychroniadou, Bakirtzis, Langdon, Lagoudaki, Kesidou, Theotokis and Boziki2016; Sousa et al., Reference Sousa, Rigueiro-Neves, Miranda, Alegria, Vale, Passos and Sá2018).
Sample
For the sample size of both healthy and MS patients, we followed Benedict et al. (Reference Benedict, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012) recommendations. The authors mentioned that 150 or more healthy persons are needed for data applicable to persons of all ages and diverse ethnicity. Benedict et al. (Reference Benedict, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012) likewise mentioned that additional 35 healthy participants could be recruited to round out the normalization sample. The authors also mentioned that ideally 65 healthy individuals should be matched with MS patients. Nevertheless, in our study, data from the MS patients were limited in terms of the number of participants.
Initially, we recruited patients using flyers posted at the hospital, clinics, and various social media platforms, then we relied on the participants’ word of mouth and snowball techniques that we found to be more efficient. Around 75% of the final sample was recruited through word of mouth and snowballing. We attempted to purposefully target potential participants from different areas of Lebanon and age groups. There was no incentive for participating.
The final sample included 180 healthy participants recruited from the community. From this group, a subsample of 63 individuals was retested after 1–3 weeks. We subjected all participants to the same rigorous two phases of inclusion/exclusion screening process. During the first phase of enrollment, healthy participants, older than 16 years without a history of neurological disorders, traumatic brain injury, and psychiatric disorders, including alcohol and/or drug dependence, were included in the study. Men who consumed more than 15 drinks/week and women who consumed more than eight drinks/week were considered excessive alcohol consumers and excluded from the study (McGuire, Reference McGuire2011). During the 3 months prior to screening, individuals who were on antidepressants, mood stabilizers, or medications known to affect cognitive performance were excluded as well.
Excerpts from the BRFSS were used to complete the screening (Centers for Disease Control and Prevention, 2009). The BRFSS is a screening questionnaire that enquires about the participant’s current health status, medical history, physical activity (weekly frequency, type, and duration of activity), smoking habits (current and previous smoking habits), age, years of education, and alcohol consumption (weekly and monthly). The following data were gathered using the BRFSS: age, years of education, marital status, educational attainment, current employment, annual household income, area of residence, primary language, diagnosis with any illness (if yes, indicate illness), current use of medications, current participation in volunteering activities, frequency of physical activity (if any), presence of any difficulties that limit one’s activities in addition to factors associated with cognitive function such as smoking including hubble bubble (if yes, amount smoked per day), alcohol intake in the past 30 days (frequency and amount), leisure activities, and cognitive performance in the last 12 months (presence of any difficulties, impact on daily activities.). Information about these variables was obtained through participant’s self-report.
After completing the second phase of enrollment, we used the Arabic version of the Hopkins Symptoms Checklist-25 (HSCL-25) to screen for symptoms of depression (during the past week). This phase was administered post-consenting. Participants were also excluded if they scored 3.3 or more on the depression subscale (Fares, Dirani, & Darwish, Reference Fares, Dirani and Darwish2019; Mahfoud et al., Reference Mahfoud, Kobeissi, Peters, Araya, Ghantous and Khoury2013; Winokur, Winokur, Rickels, & Cox, Reference Winokur, Winokur, Rickels and Cox1984).
The Montreal Cognitive Assessment (MoCA) was used post-consenting to screen for cognitive impairment (Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin and Chertkow2005; Rahman & El Gaafary, Reference Rahman and El Gaafary2009). A cutoff score of 26 for individuals below the age of 60 years and 24 for those 60 years or older was used (Carson, Leach, & Murphy, Reference Carson, Leach and Murphy2018). In other words, individuals below the age of 60 years were excluded if they scored less than 26, while those older than 60 years were excluded if they scored below 24.
MS patients were recruited from the American University of Beirut Medical Center MS center. MS patients were diagnosed according to the McDonald 2010 criteria by a neurologist (Polman et al., Reference Polman, Reingold, Banwell, Clanet, Cohen, Filippi and Kappos2011). This information was checked in the patient’s medical record. Only patients with a disease duration greater than 1 year were enrolled in this study. This was established through reviewing the patient’s date of symptom appearance. Following the same eligibility criteria of the healthy subjects, 43 MS patients were matched to 43 healthy participants based on age, sex, and years of education on an individual level. (Table 1, Part A). The bands used were the following: age +/- 3 years, 1:1 ratio on sex, and +/- 2 years on education. We did not exclude MS patients with cognitive deficits or depression given its high prevalence in MS patients and importance for examining the discriminative abilities of the tests.
* Learning slope of trials 1 to 5.
† The difference between the lowest value and the highest value.
‡ Value transformed based on Cohen’s 1998 and 2008 recommendations, to meet Mann–Whitney U test statistics requirements. Results of measures with significant differences between groups are reported.
PROCEDURE
Data Collection
Training for data collectors
We trained two data collectors who were either in their senior year or holders of an undergraduate degree in psychology. Training consisted of a 3-hr practical workshop that included a live demonstration of flawed and flawless administrations. Then, the data collectors were observed in three mock administrations, until no errors were detected. Authors PZ and NF supervised the training and ongoing data collection; NF also collected data and was previously trained by PZ. All tests were scored by NF.
All tests were administered in a standardized manner, in a quiet room, using the Lebanese Arabic dialect. Tests were also performed in a fixed order, beginning with screening, including the BRFSS and HSCL-25, and next, the SDMT, and the VMAT learning trials and short delay recalls, and then by the BVMT-R. Other segments of the verbal memory test were performed last (25 min post-VMAT short delay recalls). Hence, the total duration of the BICAMS administration was around 55 min.
The oral version of the SDMT (Smith, Reference Smith1982) was administered to all participants. Using a test form that contains a 9 symbol-digit paired (key) with a sequence of symbols (stimuli), the participant was required to respond by voicing the digit associated with each symbol as quickly as possible. A sequence of 10 symbols is first used for practice. Then the participant was given 90 s to complete as many items as possible present in the form after the practice items (Smith, Reference Smith1982). The dependent variable was the number of correct responses in 90 s.
For the BVMT-R (Benedict, Reference Benedict1997), participants were asked to recall a matrix of six simple abstract designs after 10 s of visual exposure. The participants reproduced the designs using paper and pencil as accurately as possible in their correct positions. In total, the test was repeated three times for each participant. Each figure received a score of 0, 1, or 2, based on accuracy and location scoring criteria (Benedict, Reference Benedict1997). The dependent variable was the total score across the three trials.
We recently developed and validated a VMAT (Zeinoun et al., Reference Zeinoun, Farran, Khoury and Darwish2020), which substituted the CVLT-II in our study. The VMAT was developed indigenously in Arabic using quantitative and qualitative methods. Following a rigorous process, words that are more or less familiar to all Arab regions were selected during the development of the VMAT to facilitate use in other Arab countries. The instrument measures verbal learning, short-term memory, long-term memory, and recognition. Similar to other standardized verbal learning/memory tests, and in line with Benedict et al. (Reference Benedict, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012) recommendations, the examinee is presented with 15 words (List A) to be recalled freely, across 5 trials, and is then presented by another 15 words (List B) which serve as an interference trial. Following the recall of List B, the participant was required to recall List A with and without semantic cues. Following a 25-min delay, the test-taker was required to recall List A with and without cues and then recognize the words from List A from an array of 45 words that include List A, List B, and additional distractors. Several scores could be derived from the VMAT such as the number of words recalled per trial, the total number of words recalled on trials 1 to 5, in addition to a recognition discriminability index (i.e., the ability to endorse the 15 target items and reject all 30 distractors) (Zeinoun et al., Reference Zeinoun, Farran, Khoury and Darwish2020). However, the regression model and normative data included only the total number of words recalled in trials 1 to 5 as dependent variables. We chose this variable for the VMAT based on the methods of other studies (Filser et al., Reference Filser, Schreiber, Pöttgen, Ullrich, Lang and Penner2018; Ozakbas et al., Reference Ozakbas, Yigit, Cinar, Limoncu, Kahraman and Kösehasanoğulları2017; Polychroniadou et al., Reference Polychroniadou, Bakirtzis, Langdon, Lagoudaki, Kesidou, Theotokis and Boziki2016; Sousa et al., Reference Sousa, Rigueiro-Neves, Miranda, Alegria, Vale, Passos and Sá2018).
Medical charts were also reviewed to collect the MS patients’ disease type, disease duration, and Expanded Disability Status Scale score.
Normative Values
Regression-based norms were calculated following the previously described procedure applied for the MACFIMS (Parmenter, Testa, Schretlen, Weinstock-Guttman, & Benedict, Reference Parmenter, Testa, Schretlen, Weinstock-Guttman and Benedict2010), which has recently begun to be utilized for the BICAMS (Goretti et al., Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014). To ensure the normal distribution of the raw test scores of healthy participants, we have first retrieved the cumulative frequency distribution of the SDMT, BVMT-R, and VMAT score of trials 1 to 5. The resulting distribution was converted into a standard scaled score with a mean (M) of 10 and a standard deviation (SD) of 3 (actual scaled score). Next, regression equations for the predicted scaled scores were modeled; stepwise regression analyses were performed including age, age2, sex (1 = male, 2 = female), and years of education. A squared term of age was used to adjust for the nonlinear relationship between age and cognition (Goretti et al., Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014; Parmenter et al., Reference Parmenter, Testa, Schretlen, Weinstock-Guttman and Benedict2010). We used stepwise regression based on Parmenter et al. (Reference Parmenter, Testa, Schretlen, Weinstock-Guttman and Benedict2010), MACFIMS, and Goretti et al. (Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014). We performed a forced entry analysis to compare the results. The derived equations of the statistically significant models include unstandardized β-coefficients of the predictors and the constant. Both assumptions of homoscedasticity and normality of residuals were evaluated and met.
These regression models were used to generate normative values. First, the equations using specific demographic information were derived. Next, the predicted scaled score was subtracted from the actual scaled score. The difference was then divided by the residual SD of the healthy group tests. Finally, the derived value could be converted to other standardized scores, such as Z scores, to classify performance (Goretti et al., Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014).
Statistical Analysis
Descriptive statistics were calculated; mean (M), standard deviation (SD) and median for continuous data, and frequencies and percentages for categorical ones. For the test–retest reliability analysis, Pearson’s correlations between test scores on both sessions were calculated (coefficient: r). When data violated the assumption of normality, the nonparametric alternative Spearman’s Rho was used.
To evaluate whether the BICAMS score could differentiate between known groups membership, scores were compared between matched healthy subjects and MS patients using Mann–Whitney U-test, since the data were significantly skewed. Effect sizes were computed for variables that discriminated between the groups.
To derive normative values, a stepwise multiple regression analyses for each of the dependent variables, SDMT, BVMT-R, and VMAT (total score on trials 1 to 5), were conducted with age, age2, sex, and years of education entered as predictors (see the previous section on “normative values” for more details). Regression analyses using forced entry method were also run to compare results – similar outcomes were obtained. All analyses were performed on SPSS version 25, a two-tailed test, and results with p < 0.05 were considered statistically significant.
RESULTS
Descriptives
Two hundred and thirty-four healthy participants from the community were screened for eligibility, 54 were excluded, and 2 from the MS group (psychiatric illnesses). Figure 1 is a flow chart of participants’ recruitment. Table 2 summarizes demographic information of the full healthy sample, in addition to the MS group.
Healthy Individuals (Full Sample)
The average age of healthy individuals (n = 180) was 45.01 ± 19.36 years. The youngest participant was 16 and the oldest was 80 years old, and Table 3 reports the scores received on the different tests of BICAMS. The 63 individuals who were retested had a mean age of 33.15 ± 15.98 years, similar sex distribution (54% males), and high educational attainment (71.4% completed university).
MS Patients and the Matched Healthy Sample
Regarding the MS patients, mean MoCA scores were 25.15 ± 2.81 where 25 patients (61%) scored less than 26 on the MoCA, suggestive of cognitive impairment. Symptoms of depression scores were 1.91 ± 0.61; 27 patients showed low symptoms of depression (64.3%), 15 moderate levels (35.7%), and none with severe symptoms. Mean years of education were 14.63 ± 3.17 years.
Concurrently, the 43 healthy individuals matched with the 43 MS patients had a mean age of 36.7 ± 13.06 with 8 males and 35 females constituting the sample. Average years of education were 15.13 ± 3 years. Symptoms of depression scores were 1.52 ± 0.38, and MoCA scores were 28.19 ± 1.20.
There were no significant differences between the two matched samples on age (t = 0.234, p = 0.816) and education (t = 0.751, p = 0.455). There were also equal numbers of males and females. Nevertheless, the samples differed based on symptoms of depression scores (t = -3.56, p = 0.001) and MoCA (t = 5.85, p < 0.001).
MS Patients Retested
The mean age of the 10 MS patients who were subject to test–retest was 38.71 ± 12.87 years, and most were females (n = 8). Six individuals were diagnosed with RRMS, three with SPMS, and one with PPMS. Most of this subsample completed university education (n = 7), one completed high school, one obtained vocational education, and one had some high school education.
Reliability (Test–Retest)
Among the healthy participants, the test–retest agreement on the SDMT and BVMT-R were 73 and 60%, respectively (p < 0.001). SDMT scores of the subsample were 59.22 ± 12.27 on test, and 63.62 ± 12.45 on retest, while BVMT-R scores were 24.23 ± 6.66 on test and 29.87 ± 4.81 on retest. The test–retest reliability was good on the VMAT, total learning trials (1 to 5) r = 0.69 (test = 54.10 ± 8.71, retest = 61.9 ± 8.26). On the VMAT short delay segments, and the free and cued recall trials, the Spearman’s Rho values were 0.64 (test = 11 ± 2.46, retest = 12.81 ± 2.06), and 0.73 (test = 11.33 ± 2.31, retest = 12.81 ± 2.09), respectively. Similar results were obtained on the long delay segments – free and cued recall Rho = 0.60 and 0.70, respectively, with free recall test scores equal to 11.51 ± 2.26, and retest scores equal to 13.11 ± 2.18, and cued recall test scores equal to 11.71 ± 2.43, and retest scores equal to 12.97 ± 2.27. The lowest test–retest reliability was on the VMAT recognition trial (Rho = 0.39, test = 43.52 ± 1.64, retest = 44.14 ± 1.38).
Among the MS patients, the test–retest reliability on both the SDMT and BVMT-R was excellent (Rho = 0.92, and r = 0.83, respectively), with SDMT test scores = 47.2 ± 17.98, and retest = 47.3 ± 15.79, as well as BVMT-R t = 22 ± 9.79, while the r = 26 ± 8.79. On the VMAT short delay free recall and cued recall, the test–retest agreement was fair, 59 and 43% respectively, as opposed to higher values on the total score of the learning trials 1 to 5 (Rho = 0.64), and the long delay segments (free = 0.68, cued = 0.67). During test period, MS patients scored 5 on the learning trials 56.9 ± 10.04, short delay free 11.1 ± 4.98, short delay cued 12.5 ± 3.21, long delay free 13.2 ± 2.3, and on long delay cued 13.1 ± 2.56. Alternatively, during retest, the MS patients scored on the learning trials 65.8 ± 8.7, short delay free 13.1 ± 2.28, short delay cued 13.9 ± 1.73, long delay free 13.5 ± 2.12, and long delay cued 13.7 ± 1.95. The highest test–retest reliability coefficient was on the recognition trial, Rho = 0.92 (test = 43 ± 3.3, retest = 43.6 ± 2.27).
Criterion-Related Validity (Group Differences)
Forty-three MS patients were matched with 43 healthy individuals. MS patients scored lower than healthy participants on SDMT, BVMT-R, VMAT trials 1 to 5, short delay-free and cued, as well as long delay-free and cued. The SDMT and BVMT-R discriminated the most between the groups with a larger effect size for the SDMT (Table 1).
Because the difference in scores between the groups on the VMAT did not reach statistical significance, for further validation, we examined the scores of the MS patients on the verbal memory section of the MoCA (total score 5), and we found a higher than average mean score of 3.31 ± 1.07.
Normative Values
Our data suggest that age and education were the strongest demographic contributors to test performance. Specifically, age or age2 and education predicted the SDMT and BVMT-R scores. Age2 mainly predicted the VMAT, total scores on trials 1 to 5. Table 4 reports the raw to scaled scores (M = 10, SD = 3) conversion using the BICAMS cumulative frequency distribution derived from our sample (Part A) and the results of the regression models, which were statistically significant (Part B). The derived equations used to compute predicted scaled scores are also listed in the table, alongside an example on how to apply them. To facilitate the adoption and usage of the proposed norms in the clinical setting, the reader can access an Excel file with built-in formulas through https://arabicbicams.wixsite.com/website to calculate the following parameters: actual scaled score, predicted scaled score, Z score, and t score.
Equations derived from Part B to convert MS raw scores to regression-based t scores or any other type of standardized scores:
SDMT scaled scorepredicted = 11.569 – 0.001 (age2) + 0.082 (education).
BVMT-R scaled scorepredicted = 12.121 – 0.067 (age) + 0.085 (education).
VMAT (trials 1 to 5) scaled scorepredicted = 12.282 – 0.001 (age2).
An example on how to apply the normative values:
Consider a 40-year-old female MS patient with 16 years of education. This patient scored 42 on the SDMT. From Part A, we know that her SDMT raw score corresponds to an actual scaled score of 7, and the predicted scaled score is 11.281 based on the formula provided above [11.569 – 0.001 (402) + 0.082 (16)]. Next, to deduce the Z score, we subtract the predicted scaled score from the patient’s actual scaled score and then divide the difference by the residual SD of the healthy participants; residual SD is provided in Part B. Here, we have (7 – 11.281) ÷ 2.132, which equals to – 2 (i.e., t score: 30) and suggests that her SDMT performance can be classified as impaired (Z ≤ -2 SD). To facilitate the adoption and usage of the proposed norms in the clinical setting, the corresponding author, upon request, can provide a spreadsheet table with a built-in formula that calculates and yields the above parameters.
DISCUSSION
This study contributes to the international utility of the BICAMS by providing partial evidence of validity and fair evidence of reliability for the Arabic version of the battery. The SDMT and BVMT-R tests mostly showed good psychometric properties. Here, we followed the recommendations and standards of the BICAMS consensus committee (Benedict et al., Reference Benedict, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012). We also provide Lebanese normative data.
Although the BICAMS does not replace a more comprehensive evaluation of cognitive function in MS, it is valuable as a brief tool that can be integrated into broader assessments. The availability of BICAMS increases accessibility for cognitive assessments in nonspecialized centers (Langdon et al., Reference Langdon, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012). Translation and validation of the BICAMS were performed in many regions of the world and languages (Corfield & Langdon, Reference Corfield and Langdon2018), and normative data were established for populations in several countries, such as Italy (Goretti et al., Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014) and Greece (Polychroniadou et al., Reference Polychroniadou, Bakirtzis, Langdon, Lagoudaki, Kesidou, Theotokis and Boziki2016).
In this study, the BICAMS showed good evidence of test–retest reliability, which is essential for longitudinal assessments of cognition in MS. The SDMT, in particular, whereby different test forms and retest sessions were utilized, was higher than other measures. This result is in line with other studies (Benedict, Reference Benedict2005; Goretti et al., Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014; Sousa et al., Reference Sousa, Rigueiro-Neves, Miranda, Alegria, Vale, Passos and Sá2018).
The VMAT showed fair reliability results (including the total learning trials 1 to 5 score). Regarding this main outcome variable, results are comparable to other BICAMS studies, which provided evidence of validity (Polychroniadou et al., Reference Polychroniadou, Bakirtzis, Langdon, Lagoudaki, Kesidou, Theotokis and Boziki2016; Sousa et al., Reference Sousa, Rigueiro-Neves, Miranda, Alegria, Vale, Passos and Sá2018). The highest values were on the cued recall trial. It should be noted that while good test–retest results were present for the healthy group, weaker results were found in the MS group in terms of the short delay free and cued recall trials. The use of one form of the VMAT in our study most likely elevated practice effects, thus hindering reliability analyses. We are currently developing alternative forms of the test and recruiting a larger and more diverse sample for additional psychometric validation.
The BVMT-R, on the other hand, showed slightly lower evidence of reliability in the healthy sample. This could be due to several participants reaching high ceiling scores on the first testing sessions, while others significantly increased their performance toward the second testing session. Nonetheless, the BVMT-R is known to be adequate for international use (Benedict et al., Reference Benedict, Amato, Boringa, Brochet, Foley, Fredrikson and Penner2012) and showed good test–retest reliability measures among persons with MS.
The matched MS sample performed lower than the healthy participants on the Arabic BICAMS, but only SDMT and BVMT-R could significantly discriminate between the samples. We compared the current results on the VMAT with our prior work using the same tool in a smaller MS sample as well. In Zeinoun et al. (Reference Zeinoun, Farran, Khoury and Darwish2020), results indicated that the VMAT could significantly discriminate between MS patients and healthy individuals on various subscores. In our current study, this significance in discrimination was absent – although MS patients scored lower than healthy individuals. We partly attribute this discrepancy in results to the different sampling methods and performance of the patients included in the studies. In the current study, 43 patients were matched with 43 healthy individuals, as opposed to the original VMAT study, in which 16 MS patients were matched with 32 healthy individuals. The scores of the healthy participants groups used in both studies were similar but differed for MS patients. In this study, the MS patients’ scores on the VMAT were higher; for example, on the long delay free recall, MS patients in the original VMAT study scored 8.62 ± 3.07, while in the current study, they scored 10.83 ± 3.2. It could suggest that the examined MS sample had no or little impairment in verbal memory, which was confirmed when examining their performance on the verbal section of the MoCA. The majority scored above average. The better performance of the MS group in the current study when compared to the original VMAT study could have contributed to losing statistical significance when examining the discrimination of the test.
In essence, we encourage the use of the battery in Arab-speaking MS patients. Nevertheless, caution should be exercised in reference to the VMAT. In particular, while the full battery might be useful during an initial assessment of the patient, we advice to administer the battery again following a substantial period of time (such as 6 months). Alternatively, we are in the process of building and examining an alternative version of the VMAT which might yield more reliable results in the clinical or research settings.
In this study, we also provide regression-based norms for a Lebanese sample. This approach has several advantages over conventional norming methods, such as linearly adjusting covariates. Also, it applies to smaller samples (Oosterhuis, van der Ark, & Sijtsma, Reference Oosterhuis, van der Ark and Sijtsma2016). Although the scores and the impact of a few demographic variables on test results were similar to other studies (Goretti et al., Reference Goretti, Niccolai, Hakiki, Sturchio, Falautano, Minacapelli and Murgia2014; Vanotti, Smerbeck, Benedict, & Caceres, Reference Vanotti, Smerbeck, Benedict and Caceres2016), not all variables contributed equally. This is not surprising in light of cultural differences in testing. The current results, age, and education as main predictors of cognitive performance are similar to findings from another study in the same country where we validated and normed a visual memory test (Rey Complex Figure Test) (Darwish, Zeinoun, Farran, & Fares, Reference Darwish, Zeinoun, Farran and Fares2018). Furthermore, the Portuguese BICAMS validation study and regression-based norms reached similar conclusions (Sousa et al., Reference Sousa, Rigueiro-Neves, Miranda, Alegria, Vale, Passos and Sá2018). This highlights the importance of national validation of cognitive tests across cultures as subtle differences in test performance could be present.
Several limitations need to be taken into account when interpreting our findings. The majority of the MS sample were diagnosed with RRMS, and the sex distribution was not balanced. Further evidence on the validity of the new Arabic version of the BICAMS with a more diverse profile and a larger sample will be pursued in future studies. Along a similar vein, examining external validity can potentiate a wider use of the battery, especially in other Arab countries.
Lastly, the frequency of individuals who attained a university degree in this study was higher in younger individuals when compared to older participants. This is in line with the age distribution, literacy, and education rates in Lebanon. The literacy rate is 99.24% for those between the ages of 15 and 24 years, and 60.15% for the 65 years and older. Sixteen percent of the population is between the ages 15 and 24 years, the majority, 45.27%, are between the ages 25 and 54, 8.3 and 7% are 55–64 and 65 years or older, respectively, with around 11–12 years of school life expectancy (CIA, 2020). Also, in 2017, it was reported that 93% of the population finished primary education, 63–70% secondary education, and 45–49% tertiary education (UNESCO, 2017).
The current study shows encouraging psychometric properties of a new Arabic version of the BICAMS and provides regression-based norms for a Lebanese sample. The findings and data presented can enhance MS-related clinical practice in this region, and we encourage the use of this battery in both research and clinical settings.
ACKNOWLEDGMENT
Novartis, MENACTRIMS, funded by SANOFI-AVENTIS GROUPE (DMCC Branch), the representative office of Sanofi – Aventis Groupe SA.
CONFLICT OF INTEREST
The authors have nothing to disclose.