Proxy failure and poor measurement practices in psychological science

Wendy C. Higgins; Alexander J. Gillett; David M. Kaplan; Robert M. Ross

doi:10.1017/S0140525X2300287X

Proxy failure and poor measurement practices in psychological science

Published online by Cambridge University Press: 13 May 2024

and

Wendy C. Higgins*: Affiliation:
School of Psychological Sciences, Macquarie University, Macquarie Park, NSW, Australia david.kaplan@mq.edu.au https://researchers.mq.edu.au/en/persons/wendy-higgins https://researchers.mq.edu.au/en/persons/david-michael-kaplan
Alexander J. Gillett: Affiliation:
Department of Philosophy, Macquarie University, Macquarie Park, NSW, Australia. alexander.gillett@mq.edu.au robert.ross@mq.edu.au https://researchers.mq.edu.au/en/persons/alexander-gillett https://researchers.mq.edu.au/en/persons/robert-ross
David M. Kaplan: Affiliation:
School of Psychological Sciences, Macquarie University, Macquarie Park, NSW, Australia david.kaplan@mq.edu.au https://researchers.mq.edu.au/en/persons/wendy-higgins https://researchers.mq.edu.au/en/persons/david-michael-kaplan
Robert M. Ross: Affiliation:
Department of Philosophy, Macquarie University, Macquarie Park, NSW, Australia. alexander.gillett@mq.edu.au robert.ross@mq.edu.au https://researchers.mq.edu.au/en/persons/alexander-gillett https://researchers.mq.edu.au/en/persons/robert-ross
*: Corresponding author: Wendy C. Higgins; Email: wendy.higgins@mq.edu.au

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

We argue that proxy failure contributes to poor measurement practices in psychological science and that a tradeoff exists between the legibility and fidelity of proxies whereby increasing legibility can result in decreased fidelity.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e75

DOI: https://doi.org/10.1017/S0140525X2300287X [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

John et al. offer insights about proxy failure across a range of disciplines (see their Table 1). We argue that this proxy failure lens can also be fruitfully applied in psychological science, where construct validity serves as a proxy for the goal of measuring unobservable psychological phenomena. Validated measurements (i.e., scores on self-report questionnaires and tests of ability) then serve as proxies for higher-order goals, such as improving clinical outcomes.

The American Psychological Association guidelines state: “A sound validity argument integrates various strands of evidence into a coherent account of the degree to which existing evidence and theory support the intended interpretation of test scores for specific uses” (American Educational Research Association et al., 2014, p. 21). Further, because the validity of test scores can vary depending on the properties of the sample being examined, “[b]est practice is to estimate both reliability and validity, when possible, within the researcher's sample or samples” (Appelbaum et al., Reference Appelbaum, Cooper, Kline, Mayo-Wilson, Nezu and Rao2018, p. 9). Unfortunately, there is a growing body of evidence demonstrating that studies across psychological science routinely accept measurements as valid without sufficient validity evidence (Alexandrova & Haybron, Reference Alexandrova and Haybron2016; Flake, Pek, & Hehman, Reference Flake, Pek and Hehman2017; Higgins, Ross, Polito, & Kaplan, Reference Higgins, Ross, Polito and Kaplan2023b; Hussey & Hughes, Reference Hussey and Hughes2020; Shaw, Cloos, Luong, Elbaz, & Flake, Reference Shaw, Cloos, Luong, Elbaz and Flake2020; Slaney, Reference Slaney2017), including measurements used for important clinical applications such as diagnosing and treating depression (Fried, Flake, & Robinaugh, Reference Fried, Flake and Robinaugh2022).

Building on the work of John et al., we propose that the proxy failure framework highlights a key cause of inadequate construct validity evidence in psychological science: When test scores become targets, focus is diverted away from the relationship between test scores and the underlying psychological constructs they are intended to measure. This, we suggest, can result in divergence between test scores and psychological phenomena. For instance, tests are sometimes used in different populations without considering whether the relationship between test scores and psychological constructs holds across populations.

Consider the Reading the Mind in the Eyes Test (Eyes Test; Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, Reference Baron-Cohen, Wheelwright, Hill, Raste and Plumb2001), which is widely used as a measure of social cognitive ability in samples drawn from many clinical and nonclinical populations and countries (Pavlova & Sokolov, Reference Pavlova and Sokolov2022). Despite the near universal practice of calculating a single sum score for the 36-item Eyes Test, there are two key pieces of evidence that the structural properties of Eyes Test scores vary across samples and that the interpretation of sum scores is not always supported. First, factor analysis studies spanning multiple language versions of the Eyes Test have reported poor unidimensional model fit (e.g., Dordevic et al., Reference Dordevic, Zivanovic, Pavlovic, Mihajlovic, Karlicic and Pavlovic2017; Higgins, Ross, Langdon, & Polito, Reference Higgins, Ross, Langdon and Polito2023a; Olderbak et al., Reference Olderbak, Wilhelm, Olaru, Geiger, Brenneman and Roberts2015; Redondo & Herrero-Fernández, Reference Redondo and Herrero-Fernández2018; Topić & Kovačević, Reference Topić and Kovačević2019), and it has even been found that the factor structure of Eyes Test scores for different ethnic and linguistic groups within the same country can vary (Van Staden & Callaghan, Reference Van Staden and Callaghan2021). Second, a recent meta-analysis identified substantial variation in the internal consistency estimates of Eyes Test scores across samples, with half falling below the level conventionally taken to be acceptable (Kittel, Olderbak, & Wilhelm, Reference Kittel, Olderbak and Wilhelm2022). Yet, Eyes Test sum scores are frequently compared between populations, with inferences drawn about relative levels of social cognitive ability.

An outstanding question when the proxy failure framework is applied to psychological science is why studies that fail to meet existing construct validity evidence reporting standards are routinely published (Flake & Fried, Reference Flake and Fried2020; Slaney, Reference Slaney2017). As John et al. note, a proxy must be simple enough for agents and regulators to identify and understand (i.e., must be legible), so that it can “feasibly be observed, rewarded, and pursued” (target article, sect. 3.2, para. 2). However, legibility can come at a cost to fidelity: “There is a natural human tendency to try to simplify problems by focusing on the most easily measurable elements. But what is most easily measured is rarely what is most important” (Muller, Reference Muller2018, p. 23). Although psychological research standards state that the construct validity proxy should be based on a variety of sources of validity evidence (American Educational Research Association et al., 2014; Clark & Watson, Reference Clark and Watson2019), some sources of evidence are more legible than others. We contend that the failure to enforce best practices in construct validation can be explained in part because of the prioritisation of legibility over fidelity. This results in less legible sources of validity evidence being overlooked in a phenomenon we refer to as “proxy pruning.” Unfortunately, proxy pruning can result in test scores being accepted as valid that might be deemed invalid if other sources of validity evidence were examined.

A key example of proxy pruning in psychological science is ignoring the importance of providing construct validity evidence that is derived from a psychological theory (Alexandrova & Haybron, Reference Alexandrova and Haybron2016; Bringmann, Elmer, & Eronen, Reference Bringmann, Elmer and Eronen2022; Eronen & Bringmann, Reference Eronen and Bringmann2021; Feest, Reference Feest2020). In particular, researchers often over-rely on the psychometric properties of test scores when establishing construct validity and avoid challenging theoretical questions about how to define psychological constructs (Alexandrova & Haybron, Reference Alexandrova and Haybron2016; Clark & Watson, Reference Clark and Watson2019). In addition to being important in their own right, these theoretical questions can be critical to interpreting psychometric properties. Consider the use of convergent validity evidence in the well-being literature where “it can seem as if nearly everything correlates substantially with nearly everything else” (Alexandrova & Haybron, Reference Alexandrova and Haybron2016, p. 1104). Some researchers have deemed that “better” measures of well-being are those that correlate more strongly with measures of life circumstances, including income, governance, and freedom. However, without having done the hard conceptual work of determining what precisely is meant by “well-being” (e.g., “happiness,” “life satisfaction,” “flourishing,” “preference satisfaction,” “quality of life”; Alexandrova & Singh, Reference Alexandrova, Singh, Newfield, Alexandrova and John2022) it remains unclear why correlating more strongly with these particular variables is indicative of a more valid measure of well-being.

In sum, John et al.'s proxy failure framework offers insights into poor measurement practices in psychological science. However, we argue that the problem with proxies is not only that they become targets, but that they are pruned down to be legible targets, which decreases their fidelity, leaving them more susceptible to proxy failure.

Financial support

This work was supported by an Australian Government Research Training Program (RTP) Scholarship (W. C. H.), a Macquarie University Research Excellence Scholarship (W. C. H.), and the John Templeton Foundation (R. M. R., grant number 62631; A. J. G., grant number 61924).

Competing interest

None.

References

Alexandrova, A., & Haybron, D. M. (2016). Is construct validation valid? Philosophy of Science, 83(5), 1098–1109. https://doi.org/10.1086/687941CrossRef Google Scholar

Alexandrova, A., & Singh, R. (2022). When well-being becomes a number. In Newfield, C., Alexandrova, A., & John, S. (Eds.), Limits of the numerical: The abuses and uses of quantification (pp. 181–199). University of Chicago Press.Google Scholar

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.Google Scholar

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 3–25. https://doi.org/10.1037/amp0000191CrossRef Google Scholar PubMed

Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001). The “Reading the Mind in the Eyes” test revised version: A study with normal adults, and adults with Asperger syndrome or high-functioning autism. Journal of Child Psychology and Psychiatry, 42(2), 241–251. https://doi.org/10.1017/S0021963001006643CrossRef Google Scholar PubMed

Bringmann, L. F., Elmer, T., & Eronen, M. I. (2022). Back to basics: The importance of conceptual clarification in psychological science. Current Directions in Psychological Science: A Journal of the American Psychological Society, 31(4), 340–346. https://doi.org/10.1177/09637214221096485CrossRef Google Scholar

Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. https://doi.org/10.1037/pas0000626CrossRef Google Scholar PubMed

Dordevic, J., Zivanovic, M., Pavlovic, A., Mihajlovic, G., Karlicic, I. S., & Pavlovic, D. (2017). Psychometric evaluation and validation of the Serbian version of “Reading the Mind in the Eyes” test. Psihologija, 50(4), 483–502. https://doi.org/10.2298/PSI170504010DGoogle Scholar

Eronen, M. I., & Bringmann, L. F. (2021). The theory crisis in psychology: How to move forward. Perspectives on Psychological Science, 16(4), 779–788. https://doi.org/10.1177/1745691620970586CrossRef Google Scholar PubMed

Feest, U. (2020). Construct validity in psychological tests – The case of implicit social cognition. European Journal for Philosophy of Science, 10(1), 4. https://doi.org/10.1007/s13194-019-0270-8CrossRef Google Scholar

Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456–465. https://doi.org/10.1177/2515245920952393CrossRef Google Scholar

Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological & Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063CrossRef Google Scholar

Fried, E. I., Flake, J. K., & Robinaugh, D. J. (2022). Revisiting the theoretical and methodological foundations of depression measurement. Nature Reviews Psychology, 1(6), 358–368. https://doi.org/10.1038/s44159-022-00050-2CrossRef Google Scholar PubMed

Higgins, W. C., Ross, R. M., Langdon, R., & Polito, V. (2023a). The “Reading the Mind in the Eyes” test shows poor psychometric properties in a large, demographically representative U.S. sample. Assessment, 30(6), 1777–1789. https://doi.org/10.1177/10731911221124342CrossRef Google Scholar

Higgins, W. C., Ross, R. M., Polito, V., & Kaplan, D. M. (2023b). Three threats to the validity of the Reading the Mind in the Eyes Test: A commentary on Pavlova and Sokolov (2022). Neuroscience and Biobehavioral Reviews, 147, 105088. https://doi.org/10.1016/j.neubiorev.2023.105088CrossRef Google Scholar

Hussey, I., & Hughes, S. (2020). Hidden invalidity among 15 commonly used measures in social and personality psychology. Advances in Methods and Practices in Psychological Science, 3(2), 166–184. https://doi.org/10.1177/2515245919882903CrossRef Google Scholar

Kittel, A. F. D., Olderbak, S., & Wilhelm, O. (2022). Sty in the mind's eye: A meta-analytic investigation of the nomological network and internal consistency of the “Reading the Mind in the Eyes” test. Assessment, 29(5), 872–895. https://doi.org/10.1177/1073191121996469CrossRef Google Scholar

Muller, J. Z. (2018). The tyranny of metrics. Princeton University Press.Google Scholar

Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015). A psychometric analysis of the reading the mind in the eyes test: Toward a brief form for research and applied settings. Frontiers in Psychology, 6, 1503. https://doi.org/10.3389/fpsyg.2015.01503CrossRef Google Scholar

Pavlova, M. A., & Sokolov, A. A. (2022). Reading language of the eyes. Neuroscience and Biobehavioral Reviews, 140, 104755–104755. https://doi.org/10.1016/j.neubiorev.2022.104755CrossRef Google Scholar PubMed

Redondo, I., & Herrero-Fernández, D. (2018). Validation of the Reading the Mind in the Eyes Test in a healthy Spanish sample and women with anorexia nervosa. Cognitive Neuropsychiatry, 23(4), 201–217. https://doi.org/10.1080/13546805.2018.1461618CrossRef Google Scholar

Shaw, M., Cloos, L. J. R., Luong, R., Elbaz, S., & Flake, J. K. (2020). Measurement practices in large-scale replications: Insights from many labs 2. Canadian Psychology=Psychologie Canadienne, 61(4), 289–298. https://doi.org/10.1037/cap0000220CrossRef Google Scholar

Slaney, K. (2017). Validating psychological constructs historical, philosophical, and practical dimensions. Palgrave Macmillan.Google Scholar

Topić, M. K., & Kovačević, M. P. (2019). Croatian adaptation of the revised Reading the Mind in the Eyes Test (RMET). Psihologijske Teme, 28(2), 377–395. https://doi.org/10.31820/pt.28.2.8CrossRef Google Scholar

Van Staden, J. G., & Callaghan, C. W. (2021). An evaluation of the reading the mind in the eyes test's psychometric properties and scores in South Africa-cultural implications. Psychological Research, 86(7), 2289–2300. https://doi.org/10.1007/s00426-021-01539-wCrossRef Google Scholar PubMed