Stereotypes about a group are often measured by asking participants to rate the group on a scale in which one end indicates that most group members possess a positive trait such as “hardworking” and the other end indicates that most group members possess the opposite trait, such as “lazy.” These ratings can then be used as a measure of participant stereotypes about the group, in isolation (e.g., Newman et al. Reference Newman, Merolla, Shah, Lemi, Collingwood and Karthick Ramakrishnan2021: 1146, Filindra et al. Reference Filindra, Kaplan and Buyuker2022: 967, O’Connell Reference O’Connell2025: 225) or relative to other groups (e.g., DeSante and Smith Reference DeSante and Smith2020: 974, Jardina and Ollerenshaw Reference Jardina and Ollerenshaw2022: 581, Yadon and Piston Reference Yadon and Piston2019: 801). Stereotype ratings have been used to measure phenomena such as prejudice (e.g., Hopkins Reference Hopkins2021: 672) and ethnocentrism (e.g., Kinder and Kam Reference Kinder and Kam2010: 45, Thompson Reference Thompson2022: 36).
However, if a participant is asked to rate multiple groups, the participant’s stereotype ratings might be affected by the order in which the groups are asked about. For example, if a participant rates the first group asked about at the positive end of the stereotype scale, the participant cannot rate a later group more positively, even if the participant’s stereotype about the later group is more positive than their stereotype about the first group. The present study tested for such an ordering effect in data from an experiment that randomized the order in which participants were asked to rate four groups on a stereotype scale.
Research design
For this, I used data from the American National Election Studies (ANES) 2022 Pilot Study (American National Election Studies 2022), which YouGov fielded on the internet from 14 through 22 November 2022, with an opt-in sample of 1,585 U.S. citizens aged 18 or older. The survey had exactly two sets of stereotype items, which asked participants to rate, in random order, “Whites,” “Blacks,” “Hispanic-Americans,” and “Asian-Americans” on a scale in which 1 was hard-working and 7 was lazy. Participants were then asked to rate the same groups in the same order on a scale in which 1 was intelligent and 7 was unintelligent. The four groups were presented in a static matrix on large-screen devices and in a dynamic matrix on small-screen devices, but the data did not indicate device type, so I do not report results by matrix type. The data also did not permit comparison of results in which multiple stereotypes were measured on the same webpage (as in the ANES 2022 Pilot Study) to results in which multiple stereotypes were measured on different webpages. I conducted the analysis in Stata 15 (StataCorp 2017) and produced the figure in R (R Core Team 2024) using the tidyverse (Wickham et al. Reference Wickham, Averick, Bryan, Chang, McGowan, François, Grolemund, Hayes, Henry, Hester, Kuhn, Pedersen, Miller, Bache, Müller, Ooms, Robinson, Seidel, Spinu, Takahashi, Vaughan, Wilke, Woo and Yutani2019) package. Sampling weights were provided for 1,500 participants and were applied to main text estimates to better match the population of adult U.S. citizens. See the supplement for item text and more information.
Results
Figure 1 illustrates the ordering effect on the frequency of stereotyping for each pair of groups. For example, estimates in the top row in the left panel indicate that, when Asian-Americans were the first group asked about, 50% rated Asian-Americans as more hard-working than Blacks, but that, when Blacks were the first group asked about, only 41% rated Asian-Americans as more hard-working than Blacks, for an ordering effect of 9 percentage points. Across all pairs and stereotypes, the ordering effect ranged from 0 to 18 percentage points, with a median effect of 8 percentage points.

Figure 1. Stereotype ordering effects.
Note: Points indicate the percentage that rated the first group in the comparison (listed before the “>”) more positively than the second group in the comparison when the first group was asked about the first of the four groups (black dots) or when the second group was asked about the first of the four groups (white dots). Error bars indicate 83.4% confidence intervals (Payton et al. Reference Payton, Greenstone and Schenker2003).
Discussion
The ordering effect detected in this analysis is plausibly caused by a restriction of range in which a participant who rates the first group asked about at the end of a stereotype scale has no options remaining on the scale to rate a later group even more extremely. For stereotype batteries that do not randomize the order of all groups, this ordering effect can plausibly cause analyses to mismeasure stereotyping. This mismeasurement of stereotyping can then cause misestimation of the effect of stereotyping when stereotypes are used to predict outcomes such as policy preferences and vote choice.
Randomizing the order in which groups are presented in a stereotyping battery would have the benefit of evenly spreading the ordering effect across groups. But other designs can reduce the ordering effect by reducing the percentage of participants who select an end of the stereotype scale. This might be accomplished by increasing the number of scale response options (see Chyung et al. Reference Chyung, Hutchinson and Shamsy2020) from the traditional seven to eleven or more. This might also be accomplished by changing scale labels. For example, in the ANES 2022 Pilot Study, the positive ends of the stereotype scales (“hard-working” and “intelligent”) were more commonly selected than the negative ends of the stereotype scales (“lazy” and “unintelligent”), and some participants might reasonably interpret the midpoints as indicating a negative stereotype about a group, such as being less than intelligent; however, scale ends could be labeled “very unintelligent” and “very intelligent” so that the midpoint might be reasonably interpreted or explicitly labeled as indicating that most group members have average intelligence. Another method is to ask participants to directly compare groups, such as asking participants to indicate whether, compared to Asians, Whites are on average more, less, or as intelligent; for a given stereotype, this would require three items to compare each pair in three groups and would require six items to compare each pair in four groups but could permit clearer inferences about stereotypes due to the directness of the items.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/XPS.2025.10026.
Data availability
Data and code to replicate all analyses in this article are available at the Journal of Experimental Political Science Dataverse (Zigerell Reference Zigerell2025) within the Harvard Dataverse Network at: https://doi.org/10.7910/DVN/FYGUWO.
Acknowledgements
The author thanks the peer reviewers for their comments and for ideas such as methods to reduce the ordering effects in stereotypes and how to better visually present results.
Competing interests
The ANES 2022 Pilot Study was funded by a National Science Foundation grant SES-2209438 to the University of Michigan. The author did not receive a specific grant for funding this research and reports that there are no competing interests to declare.
Ethics statement
The author’s Institutional Review Board does not require review or approval of research that reports on deidentified data.