“Facts are stubborn things; and whatever may be our wishes, our inclinations, or the dictates of our passion, they cannot alter the state of facts and evidence.”
— John AdamsAlthough we have many important areas of agreement with Sackett and colleaguesFootnote 1, we must address two issues that form the backbone of the focal article. First, we explain why range restriction corrections in concurrent validation are appropriate, describing the conceptual basis for range restriction corrections, and highlighting some pertinent technical issues that should elicit skepticism about the focal article’s assertions. Second, we disagree with the assertion that the operational validity of cognitive ability is much lower than previously reported. We conclude with some implications for applied practice.
Conceptual basis for range restriction corrections
Range restriction results in underestimation of criterion-related validities (Carretta & Ree, Reference Carretta and Ree2022). The formulae for range restriction corrections are well known and uncontroversial (Schmidt et al., Reference Schmidt, Hunter and Urry1976; Sackett & Yang, Reference Sackett and Yang2000). The focal article, following the logic offered in Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022), purported that most range restriction corrections in previous meta-analyses of predictor validities were inappropriate. In particular, Sackett et al., challenged the use of artifact distributions for corrections in validity generalization by asserting that range restriction data used are not typically representative. Consequentially, they reasoned those corrections with, in their view, unrepresentative distributions overestimated actual validities. This is an important challenge that requires empirical evidence.
What evidence was offered? Fundamentally, their reasoning was conceptual. Sackett et al., stated, “studies containing the needed information to compute a U ratio [to correct for range restriction] come solely from predictive studies” and they thus argued that the use of those U-ratios would overcorrect concurrent validities because they believed that in concurrent studies range restriction is not a concern. Their revisions to meta-analytic validities of predictors were based on the proportion of predictive and concurrent studies in each meta-analysis, where range restriction was assumed not to affect concurrent studies. Sackett et al., did not use real-world data to probe the degree of range restriction in concurrent studies of each predictor they included in their review. Rather, they assumed—without consulting the existing empirical evidence—that direct and indirect range restriction could not be influential in concurrent studies. As we show below, a fanciful and limited conception of how concurrent studies are conducted led to erroneous conclusions. In the employee selection literature, there are large numbers of concurrent studies which show empirical evidence of range restriction. In practice, employers periodically validate predictors that are or have been in use.Footnote 2
Concurrent studies are affected by range restriction. Sackett et al., suggested that when validation reports do not exclusively contain descriptions of operational range restriction mechanisms, it is inappropriate to correct for the effects of this artifact. Absence of evidence in narrative study descriptions is not evidence of absence. Indeed, what matters is the empirical story that predictor distributions tell: for example, are standard deviations lower than those that are found in less restricted populations, such as those of applicants or the labor force at large? If such comparisons indicate reduced variability, regardless of the mechanism that produced such homogeneity (e.g., organizational selection, placement decisions, gravitation to positions, occupational turnover forces), range restriction corrections are essential to uncover operational validities. The focal article’s conclusions and recommendations are based on flawed conceptual reasoning and empirically untested hypotheses, and are applied across the board, without nuance, to dozens of predictor meta-analyses. Accordingly, its conclusions about employee selection predictors are speculations.
Range restriction corrections are needed and should be applied to the vast majority of concurrent validity estimates. Many concurrent validity study reports contain information or standard deviations necessary for appropriate range restriction corrections. Using reported concurrent study standard deviations in range restriction corrections in individual studies and range restriction distributions in psychometric meta-analyses is wholly appropriate.
Technical issues in range restriction corrections
Correcting for range restriction in validation studies can be fraught with many technical issues given the multiple forms of range restriction that inevitably occur in applied field settings (see Sackett & Yang, Reference Sackett and Yang2000, for an informative and insightful summary of this complex literature). It is not just direct range restriction on the predictor or indirect range restriction due to selection on a third variable that affects validity (Carretta & Ree, Reference Carretta and Ree2022). Range restriction on the predictor is not limited to truncation on one end of its score continuum either—there can be restriction at both the low and high ends of score distributions. For example, individuals being let go during or after probationary or training periods, employees being promoted due to excellent performance, or top choice candidates rejecting offers can also artificially reduce the predictor’s range, depressing validity coefficients (Murphy, Reference Murphy1986).
Given the reality of multiple forms of indirect range restriction that inevitably operate in all validation studies and the inability of classical correction formulae to address them, in a foundational article published in the Journal of Applied Psychology, Hunter et al. (Reference Hunter, Schmidt and Le2006) presented a multi-step process model and formulae to make unbiased range restriction corrections.Footnote 3 Other excellent subsequent papers on range restriction also highlighted the importance of addressing both direct and indirect range restriction in employee selection as well as in organizational science research (e.g., Dahlke & Wiernik, Reference Dahlke and Wiernik2020; Le et al., Reference Le, Oh, Schmidt and Wooldridge2016). More directly relevant to the debate at hand, Oh, Le, and Roth (Reference Oh, Le and Rothin press) addressed some of Sackett et al.’s errors pertaining to range restriction and questioned their assumptions and reasoning. Based on conceptual, technical, and empirical grounds noted above and the referenced articles, we remain skeptical of the focal article’s conclusions and recommendations.
We urge practitioners to systematically consider how range restriction affects variability of both their predictors and criteria, rather than relying on a uniform, unsubstantiated assumption that range restriction is not a problem in concurrent validities. An important point is that multiple forms of range restriction affect both concurrent and predictive studies. Any single form of range restriction correction is likely to be dwarfed by the multiple types of range restriction that we fail to correct in empirical estimations of operational validity. This calls into question Sackett and colleagues’ conclusion that widely used selection predictors have significantly lower operational validity for overall performance. In particular, as we demonstrate below, their inference that “cognitive ability is no longer the stand-out predictor that it was in the prior work” (p. 5) is untenable.
Operational validity of cognitive ability tests
Assumed differential range restriction in concurrent and predictive studies is essential to Sackett et al.’s reasoning that concurrent validities should not be corrected for range restriction. Yet, there is ample empirical evidence that for ability tests, concurrent and predictive validities are similar (e.g., Bemis, Reference Bemis1968; Jensen, Reference Jensen1980; Hartigan & Wigdor, Reference Hartigan and Wigdor1989), a fact noted by Schmidt et al. (Reference Schmidt, Pearlman, Hunter and Hirsch1985): “Contrary to general belief, predictive and concurrent studies suffer from range restriction to about the same degree” (p. 750). Yet, Sackett and colleagues—without real-world data—single out concurrent studies, argue that there is no range restriction effects in such studies, and consequently recommend and apply no range restriction corrections.
We describe here how this distorts their operational validity estimate for cognitive ability tests. The bulk of their cognitive ability validity re-estimation relied on validities from the United States Employment Service’s (USES) General Aptitude Test Battery (GATB) datasets. Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) deemed Hunter’s estimation of GATB range restriction U values “implausible” and “not trustworthy” (p. 2054) because they asserted that such levels of range restriction could not possibly exist in concurrent studies, which they assumed would only be impacted minimally by indirect range restriction. However, the GATB technical manual reports all means and standard deviations for each specific sample, alongside observed validities (U.S. Department of Labor, 1979). Hunter’s range restriction corrections used these data, forming an artifact distribution based on specifically what was reported for each study.Footnote 4 To empirically determine the degree of range restriction in each sample, Hunter compared validation sample standard deviations to the unrestricted working population standard deviation. Sackett et al., deemed these data dubious, arguing that the magnitude of actual range restriction was greater than what they thought it should be, based on their idiosyncratic description of GATB’s concurrent studies. Curiously, they made corrections for ability test range restriction for neither concurrent nor predictive studies.Footnote 5
Sackett and colleagues did not analyze any GATB predictor standard deviations but instead hypothesized an effect, assumed it to be correct, and proceeded with no corrections for range restriction, without verifying their hypothesis with actual data. When empirical data from validation studies show empirical evidence of restriction, such as reported in the GATB manual, applying no range restriction corrections disregards real-world data. This is not sound practice for research or applications.
The distinction between concurrent versus predictive validation is more apparent than real. Regarding GATB studies (which make up 88% of the studies that the Sackett et al.’s re-estimates of cognitive ability validity are based on), the National Academy of Science’s report on the GATB stated “the predictive/concurrent distinction is too crude to be of real value” (Hartigan & Wigdor, Reference Hartigan and Wigdor1989; p. 154). Hartigan and Wigdor indicated “applicants are screened using either the GATB itself or some other predictor, and thus range restriction is likely in both predictive and concurrent studies” (p. 154).
For range restriction corrections, we note that Hartigan and Wigdor (Reference Hartigan and Wigdor1989) suggested that working population standard deviations may be larger than job-specific applicant pools, inflating the degree of range restriction. However, Sackett and Ostgaard (Reference Sackett and Ostgaard1994) presented excellent empirical evidence that applicant standard deviations are on average only 10% smaller than population norms, assuaging inflationary concerns in range restriction corrections (see also Lang et al., [Reference Lang, Kersting and Hülsheger2010] for virtually identical findings, using independent data from Germany). Therefore, we posit that even if 10% lower unrestricted norm group standard deviations were used in range restriction corrections, thusly corrected operational validity for cognitive ability would be more accurate than those presented by Sackett et al., which imposed an arbitrary and implausible constraint of no range restriction for all ability test validity studies.Footnote 6
Operational validity of cognitive ability tests since 2000
Sackett et al., argued that Schmidt and Hunter’s (Reference Schmidt and Hunter1998) estimate of general cognitive ability’s validity (ρ = .51) relied on 40+ year old data. They referenced an unpublished conference poster that examined cognitive ability validities since 2000, based on 114 studies (Griebie et al., Reference Griebe, Bazian, Demeke, Priest, Sackett and Kuncel2022). That poster indicated a meta-analytic validity of .24, but only unreliability in the criterion was corrected for. No range restriction corrections were applied to the 82 concurrent validity studies in their database. We cannot provide an in-depth analysis of this meta-analysis given that we only have access to the poster that was presented at the SIOP conference, and the brief summary offered by Sackett and colleagues. However, it appears that the authors’ disbelief that range restriction can affect concurrent validities has produced a severe underestimation of operational validity for cognitive ability tests. The large true standard deviation accompanying the mean estimate (SD ρ = .15) is telling and suggests an inflation in variability due to unaddressed range restriction.
We also have concerns about the studies that may constitute the Griebe et al., database. By 2000, there was near scientific consensus about cognitive ability for employee selection (Reeve & Hakel, Reference Reeve and Hakel2002; Viswesvaran & Ones, Reference Viswesvaran and Ones2002); numerous meta-analyses had established excellent validity for them (Dilchert, Reference Dilchert, Ones, Anderson, Viswesvaran and Sinangil2018). Could it be that (a) studies that show contrary findings, and (b) studies that show incremental validity for novel predictors over cognitive ability dominated the literature and hence biased the Griebie et al., database, which was limited to the past 21 years? Sackett et al.’s summary of findings from a potentially distorted database cannot be the basis of scientific revisions to the relevance of cognitive ability for job performance and the basis of sweeping practice recommendations to pivot away from cognitive ability assessments for employee selection. Voluminous supporting data and intense scientific scrutiny are required.
Using the potentially distorted database from Griebie et al. (Reference Griebe, Bazian, Demeke, Priest, Sackett and Kuncel2022), the focal article offers two post hoc explanations for the lower cognitive ability validity they reported. First, the authors posit that there were fewer manufacturing jobs in their meta-analytic database, reflecting a reduced role of manufacturing jobs in the economy.Footnote 7 Second, Sackett et al., suggest that a “greater emphasis on less cognitively loaded interpersonal aspects of work” could have resulted in the lower validities. However, cognitive ability has greater importance in contemporary economic systems. Increasing complexity of jobs and workplaces (e.g., technological, economic, culturally diverse) as well as increasing knowledge and speed requirements of jobs suggest an increased importance of cognitive abilities. Information processing and learning ability are essential, more than ever, in the information age. Inconsistent with their rationale for the lower validity of cognitive ability they reported, Sackett et al., asserted that job knowledge tests and work sample tests are among the top predictors based on criterion-related validity. Cognitive ability is a primary causal determinant of both and is highly correlated with acquiring domain specific knowledge (Kuncel et al., Reference Kuncel, Ones and Sackett2010; McCloy et al., Reference McCloy, Campbell and Cudeck1994; Schmidt et al., Reference Schmidt, Hunter and Outerbridge1986). Surprisingly, Sackett et al., highlighted “study and practice” as important determinants of knowledge acquisition and skill development (“Measures such as work samples and knowledge tests lend themselves more readily to skill development via study and practice,” Sackett et al.). Yet, “meta-analyses demonstrate that deliberate practice fails to account for all, nearly all, or even most of the variance in expert performance, and often even explains only a surprisingly small proportion of the total variance” (Ullén et al., 2015, p. 435). In contrast, the influence of cognitive ability in knowledge and skill acquisition and expertise development is strong and well established (e.g., see Ullén et al., section “Expert performance and cognitive ability”).
Takeaways for practice
It is indeed true that typical meta-analyses of predictors contain a mixture of predictive and concurrent studies. Neither Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) nor the focal article examined actual, real-world data contributing to meta-analyses to determine and appropriately address the degree of range restriction from concurrent studies. They assumed that concurrent studies would not be subject to range restriction. More perilously, they ignored USES data and Hunter’s analyses of those data which presented a standard deviation for each concurrent and predictive sample included in the GATB database that provided a reasonable empirical basis for range restriction corrections. Researchers should not knowingly discount practitioner data and knowingly report an underestimate of operational validity. Such underestimation can affect decisions about job applicants and shift staffing to rely on predictors with sparse supporting evidence (e.g., emotional intelligence, games, and gamified assessments) or no empirical support at all (e.g., physiognomic analysis), degrading evidence-based practice.
In the final analysis, we recommend organizations and practitioners use cognitive ability validity estimates reported by Hunter et al. (Reference Hunter, Schmidt and Le2006). Operational validities are summarized in Table 1, separately for different job complexity levels. (These data are for 425 jobs that used overall job performance as the criterion. For the summary they offered for 515 jobs from the GATB database, Sackett et al., included these and an additional 90 studies where the criterion was training performance.) Unlike Sackett et al., Hunter and colleagues carefully and appropriately addressed cumulative effects of range restriction. These operational validities and their associated standard deviations correctly summarize cognitive ability validities from USES’ GATB data for overall job performance. They refute Sackett et al.’s conclusions about cognitive ability tests.
Analyses are for US Employment Service’s General Aptitude Test Battery’s (GATB) measure of general mental ability. Findings are summarized from Hunter (Reference Hunter1983) and Hunter et al. (Reference Hunter, Schmidt and Le2006).
k = Number of studies; mean r = sample size weighted mean r; SDr = observed standard deviation of validities; SD ρ = true standard deviation (i.e., standard deviation corrected for statistical artifacts, as indicated in the respective column heading; Mean ρ = operational validity where mean rs are corrected for statistical artifacts, as indicated in the respective column heading. a Job complexity family 2; b job complexity family 1 (most complex); c job complexity family 3; d job complexity family 4; e job complexity family 5 (least complex), f equally weighted data combined from professional and managerial, complex setting up, and technician and skilled jobs; g also criterion unreliability corrected for. h Beatty et al. (Reference Beatty, Barratt, Berry and Sackett2014) indicated their preference for these indirect range restriction corrections: “We conclude that Hunter et al.’s correction should generally be preferred when compared to its common alternative, Thorndike’s Case II correction for direct range restriction” (p. 587).
Sackett et al., maintain that conservative estimates of operational validity are prudent. In this they may not be alone. However, we restate what we have previously written about conservative estimates in another context: “Many researchers maintain that being conservative is good science, but conservative estimates are by definition biased estimates. We believe it is more appropriate to aim for unbiased estimates because the research goal is to maximize the accuracy of the final estimates” (Viswesvaran et al., Reference Viswesvaran, Ones and Schmidt1996, p. 567). Science and evidence-based practice both require accuracy. Research and applied practice should aim for accuracy by considering the entirety of empirical evidence surrounding each predictor and appropriately correcting for range restriction. The reports of our best predictors’ demise are greatly exaggerated.