From 2000 to 2021, approximately 450,000 US service members were diagnosed with a traumatic brain injury (TBI) with the majority (82%) being mild in severity (Traumatic Brain Injury Center of Excellence (TBICoE), 2021). The prevalence of mild traumatic brain injury (mTBI), also known as concussion, in recent era, veterans has led to an increased need for assessments, including neuropsychological evaluations for diagnostic conclusions, which in part determine the distribution of disability benefits, service connection, and access to health care. As of 2015, approximately 100,000 veterans were receiving VA disability compensation for TBIs (Denning & Shura, Reference Denning and Shura2017).
Performance validity tests (PVTs) are a crucial component of neuropsychological evaluations as they measure credible or valid performance and ensure that the results of testing are a true representation of cognitive functioning (Sweet et al., Reference Sweet, Heilbronner, Morgan, Larrabee, Rohling, Boone, Kirkwood, Schroeder and Suhr2021). In addition to the recommendations of the American Academy of Clinical Neuropsychology (Sweet et al., Reference Sweet, Heilbronner, Morgan, Larrabee, Rohling, Boone, Kirkwood, Schroeder and Suhr2021), the Military Traumatic Brain Injury Task Force has recommended the inclusion of validity measures in neuropsychological evaluations given possible external motivation or incentives that may impact the assessment and recovery processes (McCrea et al., Reference McCrea, Pliskin, Barth, Cox, Fink, French, Hammeke, Hess, Hopewell, Orme, Powell, Ruff, Schrock, Terryberry-Spohr, Vanderploeg and Yoash-Gantz2008). Failing PVTs suggests atypical patterns of test performance that are likely noncredible; in other words, interpreting the neuropsychological testing results may lead to a misdiagnosis. For example, an individual may incorrectly be diagnosed with a neurocognitive disorder. Serious adverse consequences from misdiagnosis may include individuals being referred to inappropriate and costly treatments, depleting healthcare resources, and creating financial burden (Denning & Shura, Reference Denning and Shura2017). A misdiagnosis can also cause significant emotional distress for individuals and their families as well as lead to the unnecessary restriction of independent activities of daily living. Additionally, a misdiagnosis can lead to iatrogenic effects, erroneously reinforcing symptoms that would otherwise not be present, and exacerbating functional decline. Therefore, PVTs are an imperative component in neuropsychological assessment, including TBI assessment.
Invalid performances, or PVT failure, among post-9/11 veterans and service members has ranged widely from 6% to 68% across studies (Armistead-Jehle & Hansen, Reference Armistead-Jehle and Hansen2011; Armistead-Jehle, Reference Armistead-Jehle2010; McCormick et al., Reference McCormick, Yoash-Gantz, McDonald, Campbell and Tupler2013; Russo, Reference Russo2012). Studies in those with TBI have demonstrated that poor performance on validity tests accounts for much of the variability in neuropsychological testing (Green et al., Reference Green, Rohling, Lees-Haley and Allen III2001; Meyers et al., Reference Meyers, Volbrecht, Axelrod and Reinsch-Boothby2011). One study showed that patients with an active compensation claim (72%) demonstrated poorer performance validity compared to those without an active claim (15%; Critchfield et al., Reference Critchfield, Soble, Marceaux, Bain, Chase Bailey, Webber, Alex Alverson, Messerly, Andrés González and O’Rourke2019). A recent study suggested that applying for disability benefits, which is associated with the motivation for secondary gain, can impact performance validity (Horner et al., Reference Horner, Denning and Cool2022). Alternatively, when assessments are completed outside of clinical setting where there are no potential external incentives or financial compensation (e.g., in a research context), PVT failures among post-9/11 veterans are much lower, ranging from 4% to 9% (Clark et al., Reference Clark, Amick, Fortier, Milberg and McGlinchey2014).
Although suboptimal performance validity can be due to external incentives such as those seeking disability benefits (e.g., increase of service connection), poor performance validity does not equate to malingering and may also be associated with internal, psychiatric factors. For example, among the more than half (58%) of veterans who were positive on TBI screening and performed below the cutoffs on the Medical Symptom Validity Test (MSVT), approximately 69% had depression (Armistead-Jehle, Reference Armistead-Jehle2010). Another recent study demonstrated that severity of posttraumatic stress (PTS; formerly posttraumatic stress disorder; PTSD) symptoms was associated with MSVT failure (Miskey et al., Reference Miskey, Martindale, Shura and Taber2020). Veterans who failed the Word Memory Test (WMT), a verbal memory task similar to the MSVT, had greater prevalence of current PTS and Major Depressive Disorder compared to those who passed (Shura et al., Reference Shura, Miskey, Rowland, Yoash-Gantz and Denning2016). Furthermore, those with comorbid psychiatric diagnoses (e.g., TBI, PTS, depression) have increased rates of negative response bias (Lange et al., Reference Lange, Pancholi, Bhagwat, Anderson-Barnes and French2012).
Young et al. (Reference Young, Roper and Arentsen2016) reported that 45% of psychologists from the VA (Veterans Affairs) Healthcare System determined that failing even one PVT was sufficient to deem a performance invalid, while 47% used at least two PVT failures as a minimum benchmark. One study examining veterans with mTBI found that there were significant differences on tests of verbal memory, processing speed, and cognitive flexibility among those who passed versus those who failed one PVT (WMT). However, those who failed one PVT compared to two PVTs only differed in one measure of delayed free recall, suggesting that clinicians should consider a performance invalid if individuals failed even a single PVT (Proto et al., Reference Proto, Pastorek, Miller, Romesser, Sim and Linck2014). However, several other studies have suggested that failure of two or more PVTs has high specificity and the use of several PVTs increases sensitivity without compromising specificity (Martin et al., Reference Martin, Schroeder and Odland2015; Schroeder & Marshall, Reference Schroeder and Marshall2011). Therefore, noting a failure in two or more independent (e.g., no two embedded PVTs from the same measure) well-validated PVTs is the recommended threshold for detecting invalid cognitive performances (Jennette et al., Reference Jennette, Williams, Resch, Ovsiew, Durkin, O’Rourke, Marceaux, Critchfield and Soble2022), as relying on a single PVT may result in high false positive rates (Victor et al., Reference Victor, Boone, Serpa, Buehler and Ziegler2009).
Whereas PVTs evaluate the validity of objective cognitive abilities, symptom validity tests (SVTs) evaluate the credibility of subjective reports. Symptom validity tests are used to identify symptom exaggeration or overreporting in self-report measures and should also be regularly utilized in neuropsychological assessments (Boe & Evald, Reference Boe and Evald2022; Larrabee, Reference Larrabee2012). However, the use of SVTs is not consistently and routinely used in conjunction with clinical assessments such as the Clinician-Administered PTSD Scale for DSM-5 (CAPS-5) or the Structured Clinical Interview for DSM-V (SCID-5). One study utilizing the Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF) found that approximately 5–27% of a veteran sample failed validity scales that detect overreporting (Ingram et al., Reference Ingram, Tarescavage, Ben-Porath and Oehlert2020), highlighting the need to include SVTs in all clinical assessments and not limit their use to the field of neuropsychology.
The Neurobehavioral Symptom Inventory (NSI), which assesses self-report of postconcussive symptoms, has been widely used by the DoD and VA in TBI evaluations. The Validity-10 is the most recommended and effective scale within the NSI to detect noncredible reporting of symptoms (Ashendorf, Reference Ashendorf2019; Lange et al., Reference Lange, Brickell, Lippa and French2015; Vanderploeg et al., Reference Vanderploeg, Cooper, Belanger, Donnell, Kennedy, Hopewell and Scott2014). Symptom validity tests and PVTs are related such that those who perform suboptimally on cognitive testing are more likely to express greater subjective complaints, however, they measure independent constructs (Boe & Evald, Reference Boe and Evald2022; Clark et al., Reference Clark, Amick, Fortier, Milberg and McGlinchey2014; Ord et al., Reference Ord, Shura, Sansone, Martindale, Taber and Rowland2021). Aase et al. (Reference Aase, Soble, Shepard, Akagi, Schroth, Greenstein, Proescher and Phan2021) examined performances on four embedded validity measures and their relationship with the Validity-10 in a sample of post-9/11 veterans. Veterans who passed PVTs were more likely to pass the Validity-10 (at ≥13 and ≥19 cutoffs), while veterans who failed at least one embedded PVT were more likely to fail the Validity-10. Additionally, veterans who had both PTS and mTBI were more likely to fail the Validity-10.
The current study first examines cognitive performance based on PVT failure to determine whether there are significant differences in failing one versus two PVTs among a research sample of post-9/11 veterans. Second, the clinical characteristics and functional outcomes of those who failed 2+ PVTs (stand-alone and embedded measures) are examined within this population. Last, we examine if PVT failure is associated with SVT (NSI; Validity-10) failure using three distinct cutoffs (Lange et al., Reference Lange, Brickell, Lippa and French2015), and whether SVT failure predicts PVT failure.
Method
Participants
Participants included 813 veterans and National Guard/Reservists who deployed to post-9/11 conflicts (Operations Enduring Freedom, Iraqi Freedom, and New Dawn; this sample will be collectively labeled as “veterans” for simplicity) who were enrolled in Translational Research Center for Traumatic Brain Injury and Stress Disorders (TRACTS) longitudinal cohort study. Participants were recruited primarily from Boston, Massachusetts (New England area) and Houston, Texas by a recruitment specialist who attended military events augmented by the distribution of flyers within the VA Healthcare Systems and the greater community (for more details, please see McGlinchey et al., Reference McGlinchey, Milberg, Fonda and Fortier2017). The sample includes veterans from over 30 U.S. states and is reflective of post-9/11 era military demographics. Veterans were excluded for a history of neurological disorder (with the exception of TBI), seizure disorder (not related to TBI), significant psychiatric conditions (e.g., bipolar disorder, psychotic disorders), or active suicidal or homicidal ideations. Participants are from a research sample where primary and secondary gain has been minimized; they were informed that research evaluations were not documented in clinical medical records and therefore had no impact on establishing or increasing disability benefits. This study has been approved by the VA Boston Institutional Review Board for human participants’ protection. All study procedures were completed in accordance with the Declaration of Helinski principles.
For the present study, we removed participants who were only administered a limited set of PVTs (MSVT, CVLT-II) at the Houston assessment site (n = 177). We further excluded participants with a moderate or severe TBI (e.g., loss of consciousness >30 minutes, alteration of mental status >24 hours, posttraumatic amnesia >24 hours; n = 26), non-native English speakers (n = 2), and participants who had a personality disorder or other significant psychiatric concern (n = 4), neurologic condition (e.g., heavy metal exposure, brain atrophy evident in imaging scan; n = 3), or concerns related to the accuracy of the clinical interview (n = 1). An additional 84 participants did not complete PVTs (e.g., MSVT, CVLT-II, BVMT-R, Digit Span) due to time constraints and were therefore excluded from the current analysis, yielding a final sample size of 516.
Measures
Psychological assessments
The diagnoses of PTS, TBI, and other psychiatric conditions were assessed via clinical interviews administered by a doctoral-level clinician. The Clinician-Administered PTSD Scale for DSM-IV (CAPS-IV) assessed for PTS (Blake et al., Reference Blake, Weathers, Nagy, Kaloupek, Gusman, Charney and Keane1995), the Boston Assessment of Traumatic Brain Injury-Lifetime (BAT-L) assessed history of TBI (Fortier et al., Reference Fortier, Amick, Grande, McGlynn, Kenna, Morra, Clark, Milberg and McGlinchey2014), and the Structured Clinical Interview for DSM-IV/V (SCID-IV/V; First et al., Reference First, Spitzer, Gibbon and Williams1997) assessed mental health disorders including mood and anxiety disorders. Clinical interviews at both sites were reviewed in diagnostic consensus meetings with at least three doctoral-level clinicians.
Neuropsychological testing
Participants in TRACTS were administered a fixed neuropsychological battery measuring the cognitive domains of verbal (California Verbal Learning Test – Second Edition; CVLT-II; Delis et al., Reference Delis, Kramer, Kaplan and Ober2000) and visual memory (Brief Visuospatial Memory Test – Revised; BVMT-R; Benedict, Reference Benedict1997), attention/working memory (e.g., digit span and coding from the Wechsler Adult Intelligence Scale – Fourth Edition (WAIS-IV; Wechsler, Reference Wechsler2008)), executive functioning (e.g., verbal fluencies including letter, category, and category switching and trail making tests including number sequencing and number letter sequencing from Delis-Kaplan Executive Function System (D-KEFS; Delis et al., Reference Delis, Kaplan and Kramer2001)), Grooved Pegboard (Tiffen, Reference Tiffen1968), and Auditory Consonant Trigram (ACT; Stuss et al., Reference Stuss, Ely, Hugenholtz, Richard, LaRochelle, Poirier and Bell1985). The Wechsler Test of Adult Reading (WTAR; Wechsler, Reference Wechsler2001) was administered to provide a measure of premorbid functioning.
Performance validity tests
Participants were given a stand-alone, computer-administered PVT, the Medical Symptom Validity Test (MSVT), which evaluated level of test engagement (Green, Reference Green2004). Cutoffs suggesting suboptimal performances on the MSVT are described in the manual. All neuropsychological tests and PVTs in the standard battery were administered in the same order to all participants.
Among the embedded PVTs, a systematic review informed a cutoff score of ≤14 (sensitivity 50% and specificity 93%) on the CVLT-II Forced choice (Schwartz et al., Reference Schwartz, Erdodi, Rodriguez, Ghosh, Curtain, Flashman and Roth2016). In the BVMT-R, a cutoff score of ≤4 in the recognition discrimination index (sensitivity 50% and specificity 93%) or ≤4 recognition hits (sensitivity 45% and specificity 89%) identified noncredible performances (Bailey et al., Reference Bailey, Soble, Bain and Fullen2018; Denning, Reference Denning2012). A retention rate of ≤58% in the BVMT-R (sensitivity 31% and specificity 92%) was also identified as a cutoff for embedded PVT failure (Sawyer et al., Reference Sawyer II, Testa and Dux2017). Lastly, a cutoff score of ≤6 (sensitivity 54% and specificity 91%) on the reliable digit span (RDS) from the WAIS-IV Digit Span, which measures attention and working memory, was identified as a PVT failure (Webber & Soble, Reference Webber and Soble2018; Wechsler, Reference Wechsler2008).
Symptom validity test
The Validity-10 from the NSI includes unlikely and low-frequency items (e.g., items that are uncommonly endorsed) that can identify symptom exaggeration; failure of the Validity-10 may prompt further follow up (Lange et al., Reference Lange, Brickell, Lippa and French2015; Vanderploeg et al., Reference Vanderploeg, Cooper, Belanger, Donnell, Kennedy, Hopewell and Scott2014). Lange and colleagues (2015) suggested that a cutoff score of ≥19 indicated “possible exaggeration” (59% sensitivity; 89% specificity; 74% positive predictive value (PPV); 80% negative predictive value (NPV)), ≥23 indicated “probable exaggeration” (41% sensitivity; 96% specificity; 75% PPV; 83% NPV), and ≥28 indicated “highly probable exaggeration” (22% sensitivity; 99% specificity; 94% PPV; 70% NPV).
Self-report questionnaires
Self-report questionnaires included the Depression Anxiety Stress Scale-21 (DASS-21; Henry & Crawford, Reference Henry and Crawford2005), Lifetime Drinking History (LDH; Skinner & Sheu, Reference Skinner and Sheu1982), McGill Pain Questionnaire (short form; Melzack, Reference Melzack1975), Pittsburgh Sleep Quality Index (PSQI; Buysse et al., Reference Buysse, Reynolds, Monk, Berman and Kupfer1989), Neurobehavioral Symptom Inventory (NSI; Cicerone, Reference Cicerone1995), and the WHO Disability Assessment Scale-II (WHODAS-II; Üstün et al., Reference Üstün, Kostanjsek, Chatterji and Rhehm2010).
Statistical analyses
To compare PVT cutoffs, independent t-tests and effect sizes (Cohen’s d) were used to examine differences in neuropsychological performance for the following pairwise combinations of PVT groups: (1) no failed PVTs vs. failed 1 PVT, (2) no failed PVTs vs. failed 2+ PVTs, (3) failed 1 vs. 2+ PVTs (similar to Proto et al. (Reference Proto, Pastorek, Miller, Romesser, Sim and Linck2014)). Cohen’s d for unequal variance was calculated when comparison groups did not meet equal variance assumptions. Analyses were conducted on norm-standardized scores. Since the RDS score was derived from Digit Span, it was not included as part of the neuropsychological variables (Table 2). (However, the CVLT-II was included because CVLT-II Forced Choice is a separate trial within the CVLT-II and not directly derived from the total recall and long delay trials. Similarly, the embedded measures from the BVMT-R are not derived from the total or delayed recall.) Additionally, we calculated the area under the curve and 95% confidence intervals using logistic regression models to evaluate the use of the Validity-10 to predict failure for 1+ and 2+ PVTs.
Differences in demographic and clinical characteristics between PVT groups (e.g., passed vs. failed 2+ PVTs) were determined using independent t-tests for continuous variables and chi-square for categorical variables. Fisher’s exact test was used for categorical variables when an expected cell count was less than 5. Similar to Clark et al. (Reference Clark, Amick, Fortier, Milberg and McGlinchey2014), we used linear regression models to examine differences in psychological symptom severity, somatic, and functional outcomes after controlling for age and education. For outcomes that did not meet linear regression assumptions, we applied a square root transformation to normalize the residuals. As a sensitivity analysis, we examined whether differences in outcomes persisted after removing SVT failures using all three cutoffs. Additionally, we explored whether standalone or embedded performance validity measures better-predicted differences in outcomes. All p-values refer to two-tailed tests. Statistical analyses were conducted in SAS (version 9.4)
Results
Participants were largely male (88.8%) and white (75.6%) and representative of U.S. military demographics. The average education was 14.1 years (Standard Deviation [SD] = 2.1), and estimated premorbid intelligence measured by the Weschler Test of Adult Reading (WTAR; Wechsler, Reference Wechsler2001) was 104.2 (SD = 11.8). Demographic information is presented in Table 1.
Note. SD = Standard Deviation; WTAR = Wechsler Test of Adult Reading.
Participants who failed one PVT test performed significantly worse than those who failed none on the CVLT-II total trials and long delay free recall; DKEFS letter fluency, category fluency, category switching, number sequencing, and number/letter switching; WAIS-IV coding; ACT total score on 0–36 s delay; Grooved Pegboard dominant hand trial; and BVMT-R total recall and delayed recall. Effect sizes for these differences ranged from small to medium (Cohen’s d = 0.28–0.62; Cohen, Reference Cohen1988). Similarly, participants who failed 2+ PVTs performed significantly worse on all neuropsychological measures except for the Grooved Pegboard compared to counterparts who failed none. Effect sizes were larger for the 2+ PVT failure group, with Cohen’s d estimates ranging from 0.82 to 2.02. Participants who failed two or more PVTs performed significantly worse than those who failed one PVT on all measures except Grooved Pegboard and DKEFS letter fluency and number/letter switching. These effect sizes ranged from medium to large (Cohen’s d = 0.60–1.10; see Table 2).
Note. CVLT TL = California Verbal Learning Test – Second Edition (CVLT-II) trials 1–5 total learning; CVLT LD = CVLT-II long delay free recall; FAS LF = letter fluency; FAS CF = category fluency; FAS CS = category switching; TMT NS = trails making test number sequencing total time; TMT NLS = trails making test number/letter switching total time; DS = Wechsler Adult Intelligence Scale – Fourth Edition Coding; ACT = Auditory Consonant Trigram 0–36 s delay total; GRV DH = Grooved Pegboard dominant hand total time; GRV NDH = Grooved Pegboard non-dominant hand total time; BVMT TR = Brief Visuospatial Memory Test – Revised (BVMT-R) total recall; BVMT TR = BVMT-R delayed recall. T indicates t-score. Z indicates z-score. SS indicates standard score. Italics = Cohen’s d for unequal variance.
*p < .05
Among the 516 participants, 5.4% (n = 28) of participants failed the RDS, 4.8% (n = 25) failed the MSVT, 4.8% (n = 25) failed the BVMT-R recognition discrimination index, and 2.1% (n = 11) failed the CVLT-II forced choice, 1.9% (n = 10) failed the BVMT-R recognition hits, and 1.4% (n = 7) failed the BVMT-R percent retention. Veterans who failed 2+ PVTs (n = 17) had less education (Mean = 12.9 years vs. 14.2 years; p = .0114) and lower WTAR standard scores (Mean = 97.6 vs. 104.4; p = .0183; see Table 3). They also differed in clinical characteristics such that those with multiple PVT failures were more likely to have PTS diagnoses (88.2% vs. 55.4%; p = .0073) as well as greater PTS severity (Mean = 77.7 vs. 47.3; p < .0001), mood disorders (64.7% vs. 25.1%; p = .0008), and deployment trauma phenotype (DTP; also known as comorbid depression, PTS, and military-related mTBI diagnoses; 35.3% vs. 14.6%; p = .0327). Participants who failed 2+ PVTs also reported greater pain (Mean = 51.9 vs. 30.8; p = .0012), sleep disturbances (Mean = 13.6 vs. 9.8; p = .0036), and functional impairment (Mean = 36.9 vs. 17.9; p < .0001). Notably, there were no differences in the prevalence of lifetime or miliary-related mTBI based on PVT failure. After adjusting for age and education, CAPS-IV PTS symptom severity (β = 0.16; p = .0002) and self-reported depression (β = 0.17; p = .0001) and anxiety symptoms (β = 0.15; p = .0007) were higher among those who failed 2+ PVTs (see Table 4). Furthermore, they had greater sleep disturbances (β = 0.10; p = .0233) and worse functional impairment (β = 0.15; p = .0009).
Note. PVT = performance validity test; WTAR = Wechsler Test of Adult Reading; CAPS-IV = Clinician-Administered PTSD Scale for DSM-IV; PTS = posttraumatic stress; DASS-21 = Depression, Anxiety, and Stress Scale – 21 items; LDH = Lifetime Drinking History; PSQI = Pittsburgh Sleep Quality Index; WHODAS = World Health Organization Disability Assessment Scale; mTBI = mild traumatic brain injury; SCID = Structured Clinical Interview for DSM-IV.
* A PVT failure was considered a (1) Medical Symptom Validity Test (MSVT) immediate recognition, delayed recognition, or consistency index ≤85%; or a (2) Wechsler Adult Intelligence Scale - Fourth Edition (WAIS-IV) reliable digit span score ≤6; or a (3) California Verbal Learning Test – Second Edition (CVLT-II) forced choice score ≤14; or a (4) Brief Visuospatial Memory Test – Revised (BVMT-R) recognition discrimination index score ≤4, recognition hits score ≤4, or percent retained ≤58%.
Note. PVT = performance validity test; SE = standard error; CAPS-IV = Clinician-Administered PTSD Scale for DSM-IV; PTS = posttraumatic stress; DASS-21 = Depression, Anxiety, and Stress Scale – 21 items; PSQI = Pittsburgh Sleep Quality Index; WHODAS = World Health Organization Disability Assessment Scale.
* A PVT failure was considered a (1) Medical Symptom Validity Test (MSVT) immediate recognition, delayed recognition, or consistency index ≤85%; or a (2) Wechsler Adult Intelligence Scale - Fourth Edition (WAIS-IV) reliable digit span score ≤6; or a (3) California Verbal Learning Test – Second Edition (CVLT-II) forced choice score ≤14; or a (4) Brief Visuospatial Memory Test – Revised (BVMT-R) recognition discrimination index score ≤4, recognition hits score ≤4, or percent retained ≤58%.
Models are adjusted for age and education.
Among a subset of 488 participants who completed the SVT, 7.8% (n = 38) failed using a Validity-10 cutoff score of ≥19, 3.3% (n = 16) failed using a cutoff score of ≥23, and 1.4% (n = 7) failed using a cutoff score of ≥28. Multiple PVT failures were significantly associated with Validity-10 failure when using the ≥19 and ≥23 cutoffs (p’s < .0012), but not the ≥28 cutoff. Additionally, we looked at the area under the curve (AUC) to evaluate how well the Validity-10 predicted PVT failures. AUC values greater than 0.9 indicate high discrimination, values between 0.7 and 0.9 indicate moderate discrimination, and values below 0.7 indicate poor discrimination between measures (Fischer et al., Reference Fischer, Bachmann and Jaeschke2003; Swets, Reference Swets1988). The Validity-10 had poor correspondence with failing one or more PVTs (AUC = 0.65; 95% Confidence Interval [CI] = 0.58, 0.73). However, the Validity-10 had moderate correspondence with failing two or more PVTs (AUC = 0.83; 95% CI = 0.76, 0.91).
We conducted a sensitivity analysis by removing Validity-10 failures using all three cutoffs (≥19, ≥23, and ≥28). Once the Validity-10 failures were removed, we examined the association between multiple PVT failures and clinical characteristics to see if any associations changed. After removing Validity-10 scores ≥19, failing 2+ PVTs was associated with higher PTS symptom severity (β = 0.14; p = .0036), self-reported depression symptoms (β = 0.12; p = .0107), and functional impairment (β = 0.10; p = .0361). However, self-reported anxiety symptoms and sleep disturbances were no longer significant. After removing Validity-10 scores ≥23, PTS symptom severity (β = 0.13; p = .0041), self-reported depression (β = 0.14; p = .0027) and anxiety symptoms (β = 0.10; p = .0407), and functional impairment (β = 0.12; p = .0072) were higher among those with multiple failures, but sleep disturbances were no longer associated with multiple PVT failures. Finally, after Validity-10 scores ≥28 were removed, 2+ PVT failure was associated with higher PTS symptom severity (β = 0.15; p = .0008), self-reported depression (β = 0.16; p = .0005) and anxiety symptoms (β = 0.14; p = .0031), sleep disturbances (β = 0.10; p = .0363), and functional impairment (β = 0.14; p = .0017).
We also further we examined the association between failing one or more measure on the standalone MSVT measure versus an embedded measure within the WAIS-IV (RDS), CVLT-II, or BVMT-R. PTS symptom severity and self-reported depression and anxiety were higher among participants regardless of whether they failed the standalone MSVT or one of the embedded measures (p’s < .02). Any failure was associated with greater pain severity and worse sleep disturbances and functional impairment for both standalone and embedded measures (p’s < .02). For all psychiatric, somatic, and functioning outcomes, failure on the standalone MSVT was associated with a greater increase in impairment scores as compared to an embedded measure.
Discussion
With the high prevalence of head injuries sustained during post-9/11 conflicts, there is a demand for TBI assessment including neuropsychological evaluations. PVTs are necessary components of TBI assessment as they can detect suboptimal performances affecting the interpretation of the test data and ultimately clinical decision making and service connection status (Sweet et al., Reference Sweet, Heilbronner, Morgan, Larrabee, Rohling, Boone, Kirkwood, Schroeder and Suhr2021). Approximately 15% of veterans with a TBI failed at least one PVT as did 10% of veterans without TBI. TBI was not associated with failing 2+ PVTs, further suggesting that history of TBI did not play a significant role in PVT failure in our sample. Our findings were similar to previous studies showing that PVT failure rates were much lower in a research setting (ranging from 1.4% to 5.4% failure rates in any one of the PVTs administered) compared to forensic or clinical settings where medical records may be used to determine disability compensation (Clark et al., Reference Clark, Amick, Fortier, Milberg and McGlinchey2014; Denning & Shura, Reference Denning and Shura2017; McCormick et al., Reference McCormick, Yoash-Gantz, McDonald, Campbell and Tupler2013). Only 17 veterans in the research sample failed 2+ PVTs; due to the low rate of failures, there are limits to generalizability in other study populations as well as clinical veteran populations where there may be motivation for secondary gain. It remains unclear what proportion of participants believed that there were no potential external incentives as a participant in research.
Proto et al. (Reference Proto, Pastorek, Miller, Romesser, Sim and Linck2014) suggested that failing even one PVT, specifically the WMT could invalidate neuropsychological results, however, our findings strengthen the recommendation of using a threshold of 2+ PVT failures for detecting noncredible cognitive performances in a veteran research sample. This study examined the incidence of failure across PVTs from four different tests. Effect sizes were larger when comparing the no PVT failure group to the 2+ PVT failure group. Among our veteran research sample, those who failed multiple PVTs performed worse on most cognitive measures compared to those who failed one PVT, with medium to large effect sizes, suggesting that the cutoff of 2+ PVTs should be used to determine assessment invalidity in this population. Since performance and testing engagement may change over time and throughout the evaluation (Boone, Reference Boone2009), clinicians are recommended to utilize multiple PVT (both standalone and embedded; Critchfield et al., Reference Critchfield, Soble, Marceaux, Bain, Chase Bailey, Webber, Alex Alverson, Messerly, Andrés González and O’Rourke2019; Sweet et al., Reference Sweet, Heilbronner, Morgan, Larrabee, Rohling, Boone, Kirkwood, Schroeder and Suhr2021), across various neuropsychological domains. Clinicians are also recommended to use the appropriate cutoffs considering the sensitivity and specificity (as well as positive and negative predictive value) of measures in a given population (e.g., intellectual disability, mild cognitive impairment or dementia (Dean et al., Reference Dean, Victor, Boone, Philpott and Hess2009), English as a second language (Lippa, Reference Lippa2018)). PVTs are designed to have greater specificity (at least 90%) compared to sensitivity as it minimizes the number of false positives to avoid erroneously labeling someone as potentially malingering. In our sample, failing 2+ PVTs increases certainty that performances in cognitive testing is invalid and should not be interpreted as results likely underestimate true ability (Boone, Reference Boone2021). Providers who only use a single PVT failure as a minimum criterion may be overclassifying test performances as invalid (Young et al., Reference Young, Roper and Arentsen2016).
Post-9/11 veterans who failed 2+ PVTs had significantly higher rates of PTS as well as greater severity of PTS symptoms (e.g., higher CAPS-IV scores) and diagnosable mood disorders with higher self-reported depression and anxiety symptoms (based on self-report questionnaires) compared to those passed. Greater physical pain, poorer sleep quality, and lower overall functional outcomes were also significantly associated with 2+ PVT failures. Results are consistent with several prior studies highlighting the link between poor PVT performances and clinical psychiatric factors (Armistead-Jehle, Reference Armistead-Jehle2010; Miskey et al., Reference Miskey, Martindale, Shura and Taber2020). Furthermore, having multiple PVT failures were also associated with a trio of diagnoses consisting of PTS, mTBI, and mood disorders (e.g., Major Depressive Disorder, Persistent Depressive Disorder), also known as the deployment trauma phenotype (DTP; Lippa et al., Reference Lippa, Fonda, Fortier, Amick, Kenna, Milberg and McGlinchey2015). In the current study, approximately 35% of those who failed 2+ PVTs had DTP, suggesting that these particular comorbid psychiatric conditions may be highly linked to poorer performance validity (Clark et al., Reference Clark, Amick, Fortier, Milberg and McGlinchey2014; Greiffenstein, & Baker, Reference Greiffenstein and Baker2008). Prior research has also suggested that DTP was linked to poorer functional and cognitive outcomes (Amick et al., Reference Amick, Meterko, Fortier, Fonda, Milberg and McGlinchey2018; Kim et al., Reference Kim, Currao, Bernstein, Fonda and Fortier2022; Lippa et al., Reference Lippa, Fonda, Fortier, Amick, Kenna, Milberg and McGlinchey2015). Even when adjusted for age and education, 2+ PVTs failures were associated with greater PTS severity and self-reported depression/anxiety (based on self-reported questionnaires) as well as sleep and functional impairment. The only standalone measure, the MSVT, was comparable to the other embedded PVT measures as they were both were associated with negative clinical outcomes (Table 5).
Note. MSVT = Medical Symptom Validity Test; WAIS-IV = Wechsler Adult Intelligence Scale – Fourth Edition; CVLT-II = California Verbal Learning Test – Second Edition; BVMT-R = Brief Visuospatial Memory Test – Revised; SE = standard error; CAPS-IV = Clinician-Administered PTSD Scale for DSM-IV; PTS = posttraumatic stress; DASS-21 = Depression, Anxiety, and Stress Scale – 21 items; PSQI = Pittsburgh Sleep Quality Index; WHODAS = World Health Organization Disability Assessment Scale.
Models are adjusted for age and education.
To further ensure that psychiatric factors predicted PVT failure rates, a sensitivity analysis removing those who failed SVTs at all three different cutoffs showed that 2+ PVT failures were associated with greater PTS severity, depression symptoms, and functional impairment. When removing SVT failures at the most conservative cutoff score (≥28; denoting highly probable symptom exaggeration), the PVT failures were additionally linked to increased self-reported anxiety and sleep problems. The results highlight that the relationship between poor PVT performance and psychiatric factors remained in the absence of those who were prone to highly probable symptom exaggeration (Table 6). In sum, findings indicate that clinicians should consider clinical diagnoses and clinical symptom severity when interpreting validity measures given their strong association with PVT failures.
Note. PVT = performance validity test; SE = standard error; CAPS-IV = Clinician-Administered PTSD Scale for DSM-IV; PTS = posttraumatic stress; DASS-21 = Depression, Anxiety, and Stress Scale – 21 items; PSQI = Pittsburgh Sleep Quality Index; WHODAS = World Health Organization Disability Assessment Scale.
* A PVT failure was considered a (1) Medical Symptom Validity Test (MSVT) immediate recognition, delayed recognition, or consistency index ≤85%; or a (2) Wechsler Adult Intelligence Scale - Fourth Edition (WAIS-IV) reliable digit span score ≤6; or a (3) California Verbal Learning Test – Second Edition (CVLT-II) forced choice score ≤14; or a (4) Brief Visuospatial Memory Test – Revised (BVMT-R) recognition discrimination index score ≤4, recognition hits score ≤4, or percent retained ≤58%.
Models are adjusted for age and education.
Past literature showed those who failed the Validity-10 were more likely to also fail PVTs (Jurick et al., Reference Jurick, Twamley, Crocker, Hays, Orff, Golshan and Jak2016). Aase et al. (Reference Aase, Soble, Shepard, Akagi, Schroth, Greenstein, Proescher and Phan2021) examined the concordance of the Validity-10 (pass or fail) with embedded PVTs (pass or fail) including the CVLT-II forced choice and total trials 1-5, BVMT-R recognition discrimination score, and CPT-II Commissions score, and found associations at ≥13 and ≥19 cutoff scores (moderate effect sizes), but not at ≥23. In the current study, failure of 2+ PVTs was associated with failure on the Validity-10 on the NSI at ≥19, and ≥23 cutoffs, denoting possible and probable exaggeration, respectively. Approximately 38% of veterans who failed 2+ PVTs also failed the Validity-10 at “possible” exaggeration level. However, 2+ PVT failures were not associated with ≥28 cutoff score, which indicated highly probable exaggeration; this may be attributable to the small sample size (n = 7) who met the ≥28 threshold.
The Validity-10 had moderate correspondence in predicting those who failed 2+ PVTs, but low correspondence in predicting those who failed at least one PVT. The latter finding was consistent with Bomyea et al., Reference Bomyea, Jurick, Keller, Hays, Twamley and Jak2020 which demonstrated that the Validity-10 is a poor predictor of those failing at least one of PVT (e.g., TOMM or CVLT). Several studies have demonstrated that SVTs and PVTs measure separate constructs (Boe & Evald, Reference Boe and Evald2022; Ord et al., Reference Ord, Shura, Sansone, Martindale, Taber and Rowland2021) but are related. Therefore, both SVTs and PVTs are essential components in neuropsychological testing and the inclusion of both approaches should be considered.
Failure of SVTs may reflect high clinical distress, a cry for help (Berry et al., Reference Berry, Adams, Clark, Thacker, Burger, Wetter and Baer1996; Miskey et al., Reference Miskey, Martindale, Shura and Taber2020), and/or psychiatric symptomatology. Specifically, the elevated NSI Validty-10 scores were strongly linked to increased PTS (Aase et al., Reference Aase, Soble, Shepard, Akagi, Schroth, Greenstein, Proescher and Phan2021) and depression symptoms, but not with TBI (Bomyea et al., Reference Bomyea, Jurick, Keller, Hays, Twamley and Jak2020). Although it may be utilized as a screening tool, the Validity-10 has limitations as it has low sensitivity and not as robust as standalone SVTs (Boone, Reference Boone2021; Vanderploeg et al., Reference Vanderploeg, Cooper, Belanger, Donnell, Kennedy, Hopewell and Scott2014). If failed, clinicians are recommended to follow up utilizing other well-validated SVTs (Lange et al., Reference Lange, Brickell, Lippa and French2015).
One study showed that having a PTS diagnosis, greater symptom severity, and poorer distress tolerance was associated with failure in the Structured Inventory of Malingered Symptomatology (SIMS), which is a self-reported standalone symptom validity measure (Miskey et al., Reference Miskey, Martindale, Shura and Taber2020). They further found that veterans with PTS and depression (which was prevalent in our sample) may have difficulty dealing with strong and negative emotions leading to symptom exaggeration. Furthermore, depression may further contribute to symptom exaggeration as negative cognitive biases may exacerbate symptom report (Agnoli et al., Reference Agnoli, Zuberer, Nanni-Zepeda, McGlinchey, Milberg, Esterman, DeGutis and Carona2023; Armistead-Jehle, Reference Armistead-Jehle2010; McCormick et al., Reference McCormick, Yoash-Gantz, McDonald, Campbell and Tupler2013). Our findings highlight the need for clinical assessments, including the CAPS-4/5 and SCID-4/5, to also include separate validity measures as overreporting can bias findings. Although some studies utilize the SIMS and MMPI which has specific validity indicators (e.g., fake bad scale; Frueh et al., Reference Frueh, Hamner, Cahill, Gold and Hamlin2000; Miskey et al., Reference Miskey, Martindale, Shura and Taber2020), including symptom validity with clinical assessments is not the current standard of care in psychological or psychiatric assessment.
Limitations
The use of the NSI Validity-10 scale as the only SVT is a relative weakness in the study. Future studies should include standalone, well-validated SVT measures as they are robust method in determining response biases. Also, the percent retention from the BVMT-R was used as one of the embedded PVTs included in the analyses; the percent retained does not have a fixed range and can widen based on the amount of information encoded on previous learning trials resulting in highly variable range of scores. Additionally, findings from veteran research sample settings where there are reduced secondary gain of data may not be generalizable to common clinical settings.
Conclusions
Failing of 2+ PVTs may best indicate invalid neuropsychological profiles in a sample of post-9/11 veterans who were informed that their research evaluation would not impact establishing or increasing disability benefits. Failure of PVTs are associated with greater clinical psychiatric diagnoses rather than TBI history. Additionally, PVT failures predicted SVT failure and vice versa. Validity measures are crucial for both neuropsychological testing as well as psychiatric assessments as general practice. Converging data from PVTs and SVTs may be helpful in determining credibility of both neuropsychological evaluations and subjective reports, leading to accurate interpretations and the most appropriate treatments.
Acknowledgments
This work was supported by Translational Research Center for TBI and Stress Disorders (TRACTS), a VA Rehabilitation Research and Development Traumatic Brain Injury National Research Center (B3001-C)
Competing interests
None.