Hostname: page-component-78c5997874-8bhkd Total loading time: 0 Render date: 2024-11-10T04:02:21.811Z Has data issue: false hasContentIssue false

Proxy failure and poor measurement practices in psychological science

Published online by Cambridge University Press:  13 May 2024

Wendy C. Higgins*
Affiliation:
School of Psychological Sciences, Macquarie University, Macquarie Park, NSW, Australia david.kaplan@mq.edu.au https://researchers.mq.edu.au/en/persons/wendy-higgins https://researchers.mq.edu.au/en/persons/david-michael-kaplan
Alexander J. Gillett
Affiliation:
Department of Philosophy, Macquarie University, Macquarie Park, NSW, Australia. alexander.gillett@mq.edu.au robert.ross@mq.edu.au https://researchers.mq.edu.au/en/persons/alexander-gillett https://researchers.mq.edu.au/en/persons/robert-ross
David M. Kaplan
Affiliation:
School of Psychological Sciences, Macquarie University, Macquarie Park, NSW, Australia david.kaplan@mq.edu.au https://researchers.mq.edu.au/en/persons/wendy-higgins https://researchers.mq.edu.au/en/persons/david-michael-kaplan
Robert M. Ross
Affiliation:
Department of Philosophy, Macquarie University, Macquarie Park, NSW, Australia. alexander.gillett@mq.edu.au robert.ross@mq.edu.au https://researchers.mq.edu.au/en/persons/alexander-gillett https://researchers.mq.edu.au/en/persons/robert-ross
*
Corresponding author: Wendy C. Higgins; Email: wendy.higgins@mq.edu.au

Abstract

We argue that proxy failure contributes to poor measurement practices in psychological science and that a tradeoff exists between the legibility and fidelity of proxies whereby increasing legibility can result in decreased fidelity.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

John et al. offer insights about proxy failure across a range of disciplines (see their Table 1). We argue that this proxy failure lens can also be fruitfully applied in psychological science, where construct validity serves as a proxy for the goal of measuring unobservable psychological phenomena. Validated measurements (i.e., scores on self-report questionnaires and tests of ability) then serve as proxies for higher-order goals, such as improving clinical outcomes.

The American Psychological Association guidelines state: “A sound validity argument integrates various strands of evidence into a coherent account of the degree to which existing evidence and theory support the intended interpretation of test scores for specific uses” (American Educational Research Association et al., 2014, p. 21). Further, because the validity of test scores can vary depending on the properties of the sample being examined, “[b]est practice is to estimate both reliability and validity, when possible, within the researcher's sample or samples” (Appelbaum et al., Reference Appelbaum, Cooper, Kline, Mayo-Wilson, Nezu and Rao2018, p. 9). Unfortunately, there is a growing body of evidence demonstrating that studies across psychological science routinely accept measurements as valid without sufficient validity evidence (Alexandrova & Haybron, Reference Alexandrova and Haybron2016; Flake, Pek, & Hehman, Reference Flake, Pek and Hehman2017; Higgins, Ross, Polito, & Kaplan, Reference Higgins, Ross, Polito and Kaplan2023b; Hussey & Hughes, Reference Hussey and Hughes2020; Shaw, Cloos, Luong, Elbaz, & Flake, Reference Shaw, Cloos, Luong, Elbaz and Flake2020; Slaney, Reference Slaney2017), including measurements used for important clinical applications such as diagnosing and treating depression (Fried, Flake, & Robinaugh, Reference Fried, Flake and Robinaugh2022).

Building on the work of John et al., we propose that the proxy failure framework highlights a key cause of inadequate construct validity evidence in psychological science: When test scores become targets, focus is diverted away from the relationship between test scores and the underlying psychological constructs they are intended to measure. This, we suggest, can result in divergence between test scores and psychological phenomena. For instance, tests are sometimes used in different populations without considering whether the relationship between test scores and psychological constructs holds across populations.

Consider the Reading the Mind in the Eyes Test (Eyes Test; Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, Reference Baron-Cohen, Wheelwright, Hill, Raste and Plumb2001), which is widely used as a measure of social cognitive ability in samples drawn from many clinical and nonclinical populations and countries (Pavlova & Sokolov, Reference Pavlova and Sokolov2022). Despite the near universal practice of calculating a single sum score for the 36-item Eyes Test, there are two key pieces of evidence that the structural properties of Eyes Test scores vary across samples and that the interpretation of sum scores is not always supported. First, factor analysis studies spanning multiple language versions of the Eyes Test have reported poor unidimensional model fit (e.g., Dordevic et al., Reference Dordevic, Zivanovic, Pavlovic, Mihajlovic, Karlicic and Pavlovic2017; Higgins, Ross, Langdon, & Polito, Reference Higgins, Ross, Langdon and Polito2023a; Olderbak et al., Reference Olderbak, Wilhelm, Olaru, Geiger, Brenneman and Roberts2015; Redondo & Herrero-Fernández, Reference Redondo and Herrero-Fernández2018; Topić & Kovačević, Reference Topić and Kovačević2019), and it has even been found that the factor structure of Eyes Test scores for different ethnic and linguistic groups within the same country can vary (Van Staden & Callaghan, Reference Van Staden and Callaghan2021). Second, a recent meta-analysis identified substantial variation in the internal consistency estimates of Eyes Test scores across samples, with half falling below the level conventionally taken to be acceptable (Kittel, Olderbak, & Wilhelm, Reference Kittel, Olderbak and Wilhelm2022). Yet, Eyes Test sum scores are frequently compared between populations, with inferences drawn about relative levels of social cognitive ability.

An outstanding question when the proxy failure framework is applied to psychological science is why studies that fail to meet existing construct validity evidence reporting standards are routinely published (Flake & Fried, Reference Flake and Fried2020; Slaney, Reference Slaney2017). As John et al. note, a proxy must be simple enough for agents and regulators to identify and understand (i.e., must be legible), so that it can “feasibly be observed, rewarded, and pursued” (target article, sect. 3.2, para. 2). However, legibility can come at a cost to fidelity: “There is a natural human tendency to try to simplify problems by focusing on the most easily measurable elements. But what is most easily measured is rarely what is most important” (Muller, Reference Muller2018, p. 23). Although psychological research standards state that the construct validity proxy should be based on a variety of sources of validity evidence (American Educational Research Association et al., 2014; Clark & Watson, Reference Clark and Watson2019), some sources of evidence are more legible than others. We contend that the failure to enforce best practices in construct validation can be explained in part because of the prioritisation of legibility over fidelity. This results in less legible sources of validity evidence being overlooked in a phenomenon we refer to as “proxy pruning.” Unfortunately, proxy pruning can result in test scores being accepted as valid that might be deemed invalid if other sources of validity evidence were examined.

A key example of proxy pruning in psychological science is ignoring the importance of providing construct validity evidence that is derived from a psychological theory (Alexandrova & Haybron, Reference Alexandrova and Haybron2016; Bringmann, Elmer, & Eronen, Reference Bringmann, Elmer and Eronen2022; Eronen & Bringmann, Reference Eronen and Bringmann2021; Feest, Reference Feest2020). In particular, researchers often over-rely on the psychometric properties of test scores when establishing construct validity and avoid challenging theoretical questions about how to define psychological constructs (Alexandrova & Haybron, Reference Alexandrova and Haybron2016; Clark & Watson, Reference Clark and Watson2019). In addition to being important in their own right, these theoretical questions can be critical to interpreting psychometric properties. Consider the use of convergent validity evidence in the well-being literature where “it can seem as if nearly everything correlates substantially with nearly everything else” (Alexandrova & Haybron, Reference Alexandrova and Haybron2016, p. 1104). Some researchers have deemed that “better” measures of well-being are those that correlate more strongly with measures of life circumstances, including income, governance, and freedom. However, without having done the hard conceptual work of determining what precisely is meant by “well-being” (e.g., “happiness,” “life satisfaction,” “flourishing,” “preference satisfaction,” “quality of life”; Alexandrova & Singh, Reference Alexandrova, Singh, Newfield, Alexandrova and John2022) it remains unclear why correlating more strongly with these particular variables is indicative of a more valid measure of well-being.

In sum, John et al.'s proxy failure framework offers insights into poor measurement practices in psychological science. However, we argue that the problem with proxies is not only that they become targets, but that they are pruned down to be legible targets, which decreases their fidelity, leaving them more susceptible to proxy failure.

Financial support

This work was supported by an Australian Government Research Training Program (RTP) Scholarship (W. C. H.), a Macquarie University Research Excellence Scholarship (W. C. H.), and the John Templeton Foundation (R. M. R., grant number 62631; A. J. G., grant number 61924).

Competing interest

None.

References

Alexandrova, A., & Haybron, D. M. (2016). Is construct validation valid? Philosophy of Science, 83(5), 10981109. https://doi.org/10.1086/687941CrossRefGoogle Scholar
Alexandrova, A., & Singh, R. (2022). When well-being becomes a number. In Newfield, C., Alexandrova, A., & John, S. (Eds.), Limits of the numerical: The abuses and uses of quantification (pp. 181199). University of Chicago Press.Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (Eds.). (2014). Standards for educational and psychological testing. American Educational Research Association.Google Scholar
Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 325. https://doi.org/10.1037/amp0000191CrossRefGoogle ScholarPubMed
Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001). The “Reading the Mind in the Eyes” test revised version: A study with normal adults, and adults with Asperger syndrome or high-functioning autism. Journal of Child Psychology and Psychiatry, 42(2), 241251. https://doi.org/10.1017/S0021963001006643CrossRefGoogle ScholarPubMed
Bringmann, L. F., Elmer, T., & Eronen, M. I. (2022). Back to basics: The importance of conceptual clarification in psychological science. Current Directions in Psychological Science: A Journal of the American Psychological Society, 31(4), 340346. https://doi.org/10.1177/09637214221096485CrossRefGoogle Scholar
Clark, L. A., & Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 14121427. https://doi.org/10.1037/pas0000626CrossRefGoogle ScholarPubMed
Dordevic, J., Zivanovic, M., Pavlovic, A., Mihajlovic, G., Karlicic, I. S., & Pavlovic, D. (2017). Psychometric evaluation and validation of the Serbian version of “Reading the Mind in the Eyes” test. Psihologija, 50(4), 483502. https://doi.org/10.2298/PSI170504010DGoogle Scholar
Eronen, M. I., & Bringmann, L. F. (2021). The theory crisis in psychology: How to move forward. Perspectives on Psychological Science, 16(4), 779788. https://doi.org/10.1177/1745691620970586CrossRefGoogle ScholarPubMed
Feest, U. (2020). Construct validity in psychological tests – The case of implicit social cognition. European Journal for Philosophy of Science, 10(1), 4. https://doi.org/10.1007/s13194-019-0270-8CrossRefGoogle Scholar
Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456465. https://doi.org/10.1177/2515245920952393CrossRefGoogle Scholar
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological & Personality Science, 8(4), 370378. https://doi.org/10.1177/1948550617693063CrossRefGoogle Scholar
Fried, E. I., Flake, J. K., & Robinaugh, D. J. (2022). Revisiting the theoretical and methodological foundations of depression measurement. Nature Reviews Psychology, 1(6), 358368. https://doi.org/10.1038/s44159-022-00050-2CrossRefGoogle ScholarPubMed
Higgins, W. C., Ross, R. M., Langdon, R., & Polito, V. (2023a). The “Reading the Mind in the Eyes” test shows poor psychometric properties in a large, demographically representative U.S. sample. Assessment, 30(6), 17771789. https://doi.org/10.1177/10731911221124342CrossRefGoogle Scholar
Higgins, W. C., Ross, R. M., Polito, V., & Kaplan, D. M. (2023b). Three threats to the validity of the Reading the Mind in the Eyes Test: A commentary on Pavlova and Sokolov (2022). Neuroscience and Biobehavioral Reviews, 147, 105088. https://doi.org/10.1016/j.neubiorev.2023.105088CrossRefGoogle Scholar
Hussey, I., & Hughes, S. (2020). Hidden invalidity among 15 commonly used measures in social and personality psychology. Advances in Methods and Practices in Psychological Science, 3(2), 166184. https://doi.org/10.1177/2515245919882903CrossRefGoogle Scholar
Kittel, A. F. D., Olderbak, S., & Wilhelm, O. (2022). Sty in the mind's eye: A meta-analytic investigation of the nomological network and internal consistency of the “Reading the Mind in the Eyes” test. Assessment, 29(5), 872895. https://doi.org/10.1177/1073191121996469CrossRefGoogle Scholar
Muller, J. Z. (2018). The tyranny of metrics. Princeton University Press.Google Scholar
Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015). A psychometric analysis of the reading the mind in the eyes test: Toward a brief form for research and applied settings. Frontiers in Psychology, 6, 1503. https://doi.org/10.3389/fpsyg.2015.01503CrossRefGoogle Scholar
Pavlova, M. A., & Sokolov, A. A. (2022). Reading language of the eyes. Neuroscience and Biobehavioral Reviews, 140, 104755104755. https://doi.org/10.1016/j.neubiorev.2022.104755CrossRefGoogle ScholarPubMed
Redondo, I., & Herrero-Fernández, D. (2018). Validation of the Reading the Mind in the Eyes Test in a healthy Spanish sample and women with anorexia nervosa. Cognitive Neuropsychiatry, 23(4), 201217. https://doi.org/10.1080/13546805.2018.1461618CrossRefGoogle Scholar
Shaw, M., Cloos, L. J. R., Luong, R., Elbaz, S., & Flake, J. K. (2020). Measurement practices in large-scale replications: Insights from many labs 2. Canadian Psychology=Psychologie Canadienne, 61(4), 289298. https://doi.org/10.1037/cap0000220CrossRefGoogle Scholar
Slaney, K. (2017). Validating psychological constructs historical, philosophical, and practical dimensions. Palgrave Macmillan.Google Scholar
Topić, M. K., & Kovačević, M. P. (2019). Croatian adaptation of the revised Reading the Mind in the Eyes Test (RMET). Psihologijske Teme, 28(2), 377395. https://doi.org/10.31820/pt.28.2.8CrossRefGoogle Scholar
Van Staden, J. G., & Callaghan, C. W. (2021). An evaluation of the reading the mind in the eyes test's psychometric properties and scores in South Africa-cultural implications. Psychological Research, 86(7), 22892300. https://doi.org/10.1007/s00426-021-01539-wCrossRefGoogle ScholarPubMed