Performance on the Cognitive Reflection Test is stable across time

Michael N. Stagnaro; Gordon Pennycook; David G. Rand

doi:10.1017/S1930297500007695

Performance on the Cognitive Reflection Test is stable across time

Published online by Cambridge University Press: 01 January 2023

Michael N. Stagnaro ,

Gordon Pennycook and

David G. Rand

Show author details

Michael N. Stagnaro*: Affiliation:
Department of Psychology, Yale University
Gordon Pennycook: Affiliation:
Hill/Levene Schools of Business, University of Regina
David G. Rand: Affiliation:
Department of Psychology, Yale University
*: * Email: michael.stagnaro@yale.edu.

Article contents

Abstract
Introduction
Methods
Results
Discussion
Footnotes
References

Rights & Permissions

Abstract

A widely used measure of individual propensity to utilize analytic processing is the Cognitive Reflection Test (CRT), a set of math problems with intuitively compelling but incorrect answers. Here, we ask whether scores on this measure are temporally stable. We aggregate data from 11 studies run on Amazon Mechanical Turk in which the Cognitive Reflection Test (CRT) was administered and identify N = 3,302 unique individuals who completed the CRT two or more times. We find a strong correlation between an individual’s first and last CRT performance, r = .806. This remains true even when constraining to data points separated by over 2 years, r = .755. Furthermore, we find that CRT scores from one timepoint correlated negatively with belief in God and social conservatism from the other timepoint (and to a similar extent as scores gathered at the same timepoint). These results show that CRT scores are stable over time, and – given the stable relationship between CRT and religious belief and ideology – provide some evidence for the stability of analytic cognitive style more generally.

Keywords

Cognitive Reflection Test religion politics stability

Information

Type: Research Article
Information: Judgment and Decision Making , Volume 13 , Issue 3 , May 2018 , pp. 260 - 267

DOI: https://doi.org/10.1017/S1930297500007695 [Opens in a new window]
Creative Commons: The authors license this article under the terms of the Creative Commons Attribution 3.0 License.
Copyright: Copyright © The Authors [2018] This is an Open Access article, distributed under the terms of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

According to dual-process theory, decision making involves two different types of cognitive processes: one that relies on intuition (Type 1), and one that relies on deliberation (Type 2) (Reference EvansEvans, 2008; Frankish & Evans, 2009; Kahneman, 2003; Reference SlomanSloman, 1996; Evans & Franklin, 2009; Reference Gerrard, Gibbons, Houlihan, Stock and PomeryKahneman, 2013; Reference StanovichStanovich, 2013). Over the past several years a substantial body of evidence has indicated that people vary in the extent to which they utilize Type 2 processing in decision making (Reference Stanovich and WestStanovich & West, 2000; Reference StanovichStanovich, 2012), and this proclivity has been attributed to differences in “analytic cognitive style” (ACS) (Reference Iyer, Koleva, Graham, Ditto and HaidtPennycook, Cheyne, Seli, Koehler & Fugelsang, 2012) or, more broadly, “thinking disposition” (Reference Stanovich and WestStanovich & West, 2000). Dual-process models have been applied to a variety of topics of interest in psychology (Reference Pennycook, Fugelsang and KoehlerPennycook, Fugelsang & Koehler, 2015; Reference Stanovich, West and ToplakStanovich, West & Toplak, 2016).

A popular and widely used measures of ACS is the Cognitive Reflection Test (CRT) (Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingFrederick, 2005). This three-item measure involves asking subjects “trick questions”. Upon reading each question, many report an (intuitive) answer that comes immediately to mind. This answer, however, is incorrect and with some reflection an individual may be able to realize this error and generate the correct response. Consider the following example:

A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

For many people, the intuitive response that leaps to mind is 10 cents, yet this is incorrect. If the ball costs 10 cents, the bat would cost $1.10 and together they would cost $1.20. Though there could be any number of wrong answers to this question (these questions are often set up as free responses and thus any number can be given), most subjects that fail to give $0.05 (the correct answer), give the intuitive incorrect response of $0.10 (Reference Campitelli and GerransCampitelli & Gerrans, 2014; Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingFrederick, 2005; Reference Pennycook, Cheyne, Barr, Koehler and FugelsangPennycook, Cheyne, Koehler & Fugelsang, 2015). Further, many who give the correct answer report being aware of the intuitive response (Reference Mata, Ferreira and ShermanMata, Ferreira & Sherman, 2013). This is further evidence that individuals who give the correct versus intuitive answer are more likely to stop, reflect, and correct their initial intuitions. In contrast, the intuitive responders likely fail to consider their answer and instead simply go with their initial response (Reference Jost, Glaser, Kruglanski and SullowayPennycook, Ross, Koehler & Fugelsang, 2016).

CRT scores have been found to correlate with a number of psychological factors (Reference Pennycook, Fugelsang and KoehlerPennycook, et al., 2015; Reference NooriNoori, 2016), including religious and paranormal belief (Reference Meyer, Zhou and FrederickShenhav, Rand & Green, 2012; Reference Jost, Glaser, Kruglanski and SullowayPennycook, Ross, Koehler & Fugelsang, 2016; Reference Arechar, Kraft-Todd and RandBahçekapili & Yilmaz, 2017), moral judgments (Reference Greene, Morelli, Lowenberg, Nystrom and CohenGreene, Morelli, Lowenberg, Nystrom & Cohen, 2008; Reference Horton, Rand and ZeckhauserPennycook, Cheyne, Barr, Koehler & Fugelsang, 2014), risk taking (Reference Gerrard, Gibbons, Houlihan, Stock and PomeryGerrard, Gibbons, Houlihan, Stock & Pomery, 2008), social-cognitive development (Reference KlaczynskiKlaczynski, 2004; Reference Albert and SteinbergAlbert & Steinberg, 2011), altruism (Arecher, Kraft-Todd & Rand, 2017), prejudice (Reference Pennycook, Cheyne, Seli, Koehler and FugelsangYilmaz, Karadöller & Sofuoglu, 2016; Reference Franks and ScherrFranks & Scherr, 2017), political conservatism (especially social conservatisem; Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingDeppe et al., 2015; Reference Iyer, Koleva, Graham, Ditto and HaidtIyer, Koleva, Graham, Ditto & Haidt, 2012, but see Kahan, 2013), and the detection of pseudo-profound bullshit (Reference Pennycook, Cheyne, Barr, Koehler and FugelsangPennycook, Cheyne, Barr, Koehler & Fugelsang, 2015) and fake news (Bronstein, Pennycook, Bear, Rand & Cannon, 2018; Reference Pennycook and RandPennycook & Rand, 2018).

As the CRT is used so much, it is of interest to ask whether it is temporally stable. Some reason to expect that CRT scores may be stable comes from work using the Need for Cognition (NFC) scale (Reference Cacioppo and PettyCacioppo & Petty, 1982). This measure assesses people’s self-reported desire for complex and challenging thinking and includes items like “I would prefer complex to simple problems.” and “I really enjoy a task that involves coming up with new solutions to a problem.” (Reference Cacioppo, Petty and Feng KaoCacioppo, Petty & Feng Kao, 1984). Past work using NFC shows evidence suggesting this measure is in fact stable over time and context (Reference Sadowski and GulgozSadowski & Gulgoz, 1992). Thus, to the extent that individuals can accurately calibrate self-reported ACS, a similar pattern could be true for CRT performance. However, some subjects who are relatively intuitive (based on behavioral measures like the CRT) report that they are relatively analytic when given the NFC (Reference Pennycook, Ross, Koehler and FugelsangPennycook, Ross, Koehler & Fugelsang, 2017). Thus, given that there are many things outside of actual, real-time decision making that may play a role in a subject’s desire to declare that they are complex or analytic thinkers, the test-retest reliability of the NFC scale may be driven by factors that do not operate in the context of the actual behavioral measures of the CRT.

Other work, more specific to the CRT measure itself, suggests that multiple exposures to the CRT results in an increase in performance (Reference HaighHaigh, 2016; Reference Stieger and ReipsStieger & Reips, 2016). That is, the more subjects see the CRT items (from past testing, in the media, in classrooms, etc.), the more likely they are to give correct answers. This could call into question the reliability of this measure, and the evidence it has produced. Despite these concerns, however, recent work has shown that, while people who self-report previous exposure to the CRT do score higher, this exposure does not affect the relationship between CRT scores and other variables of interest (Reference Bialek and PennycookBialek & Pennycook, 2017).

In the present paper, we assemble a panel dataset to assess how an individual’s CRT performance changes over time. This dataset also allows us to investigate whether previously observed relationships between CRT and other measures are maintained over time. In particular, we examine the correlation between CRT and religious belief as well as conservative political ideology.

Turning first to belief in God, much work has shown a relationship between ACS and religious belief: subjects who report higher levels of religious belief give more intuitive rather than correct answers on the CRT and similar measures (Reference Meyer, Zhou and FrederickShenhav, Rand & Greene, 2012; Reference Iyer, Koleva, Graham, Ditto and HaidtPennycook, Cheyne, Seli, Koehler & Fugelsang, 2012; Reference EvansGervais & Norenzayan, 2012; Reference Jost, Glaser, Kruglanski and SullowayPennycook, Ross, Koehler & Fugelsang, 2016). This is also true when using subject’s religious identity (e.g. Personal God vs. Agnostic vs. Atheists) rather than overall belief in God (Pennycook et al, 2012). Further, evidence shows that this relationship extends beyond formal religious belief to other kinds of superstitious/paranormal beliefs (Pennycook et al, 2012). Thus, ample evidence supports the claim that religious belief is correlated with ACS (see Pennycook et al., 2016 for meta-analysis). There is also evidence, albeit weaker, for a causal relationship whereby ACS leads to weaker religious belief. First, it has been shown that ACS correlates with how much one’s belief in God has changed since childhood, but not with childhood religiosity (both measured retrospectively; Reference Meyer, Zhou and FrederickShenhav, et al., 2012). Second, experimentally manipulating one’s level of analytic thinking has been found to affect reported religiosity (Reference EvansGervais & Norenzayan, 2012; Reference Meyer, Zhou and FrederickShenhav, et al., 2012; Reference Pennycook, Cheyne, Seli, Koehler and FugelsangYilmaz, Karadöller & Sofuoglu, 2016), although some of these effects have not been successfully replicated (Reference Pennycook, Cheyne, Barr, Koehler and FugelsangYonker, Edman, Cresswell & Barrett, 201; Reference MackintoshSanchez, Sundermeier, Gray & Calin-Jageman, 2017).

Turning to political affiliation, evidence regarding the relationship with CRT score is mixed. While conservatives in the U.S. have been shown to be more reliant on intuition relative to deliberation using a number of different measures (e.g., Reference Jost, Glaser, Kruglanski and SullowayJost, Glaser, Kruglanski, & Sulloway, 2003; Reference SargentSargent 2004; Reference Van Hiel and MervieldeVan Hiel & Mervielde, 2004; Reference Thorisdottir, Jost, Liviatan and ShroutThorisdottir, Jost, Liviatan & Shrout, 2007), including the CRT (Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingDeppe et al., 2015; Reference Iyer, Koleva, Graham, Ditto and HaidtPennycook et al., 2012; Reference Pennycook and RandPennycook & Rand, 2018; Reference Pennycook, Cheyne, Barr, Koehler and FugelsangYilmaz & Saribay, 2016), other work using the CRT reports no significant relationship between political conservatism and intuitive thinking (Reference Piazza and SousaPiazza & Sousa, 2014; Reference Yilmaz and SaribayYilmaz & Saribay, 2017; Reference KahanKahan, 2013). Thus, there is need for further clarity on political differences in ACS as measured by CRT score.

2 Methods

To assess the stability of CRT scores over time, we aggregated the results of eleven experiments conducted on Amazon Mechanical Turk (AMT; Reference Horton, Rand and ZeckhauserHorton, Rand & Zeckhauser, 2011) by our lab between 2012 and 2017 (ten published or available online: Reference Stagnaro, Arechar and RandStagnaro, Arechar & Rand, 2017; Reference Dreber, Ellingsen, Johannesson and RandDreber, Ellingsen, Johannesson & Rand, 2013; Reference Epstein, Peysakhovich and RandEpstein, Peysakhovich & Rand, 2016; Reference Rand, Greene and NowakRand, Greene & Nowak, 2012; Peysakhovich & Rand, 2015; Arechar, Kraft-Todd & Rand, 2017; Pennycook, Cannon & Rand, in press; Reference Pennycook and RandPennycook & Rand, 2017; Reference Pennycook and RandPennycook & Rand, 2018; Pennycook & Rand, in press). Each of these studies included the same three item Cognitive Reflection Test.Footnote ¹ Using AMT workerIDs, unique identifiers provided by AMT that correspond to work accounts and thus allows the tracking specific individuals over time, we identified N = 3,302 individuals who participated in two or more of these experiments. We then compared the first and last observation from each subject in the dataset (we will refer to these as time 1 and time 2 observations). The average time difference between time 1 and time 2 was M = 408.18 days, SD = 426.33 days, Median 221 days.

The CRT was scored both as the number of correct (reflective) answers given (CRTr), as well as number of intuitively incorrect (intuitive) answers given (CRTi). Along with CRT scores, a number of studies collected the following demographics: age, gender, education, income, political ideology (social and economic), and belief in God. Belief in God was measured using the single item question “How strongly do you believe in the existence of God?”Footnote ², using a continuous response scale ranging from 1 to 10 Footnote ³. Subjects’ political ideology was measured using with two items: “Politically, how conservative are you in terms of social issues?” and “Politically, how conservative are you in terms of fiscal issues?”Footnote ⁴ The range of response options always ran from Very Liberal to Very Conservative, but the number of response options varied across studies. Thus, we normalized political scores to be between 0 and 1 (0: Liberal to 1: Conservative). We also included individuals’ self-reported experience participating on AMT. In a number of studies, subjects were asked a single item measure of number of experiments they completed: “About how many surveys/studies have you participated in on MTurk before?” Subjects were then prompted to enter any number into a free response window. Because the distribution of values was strongly right skewed, we log-transform values for analysis purposes.

3 Results

3.1 Evidence for stability of CRT

We find that CRTr scores from time 1 (CRTr1) and time 2 (CRTr2) are highly correlated, r = .806, p < .001, as were the two scores for CRTi, r = .753, p < .001 indicating a substantial degree of stability. Examining the full distribution of scores (Table 1), we see that 64.3% of people received the same CRTr score both times they took the CRT, and 93.3% of people’s CRTt scores did not differ by more than one point. Similarly, 59.4% of people received the same CRTi score both times, and 92.6% of people’s CRTi scores did not differ by more than one point.

Table 1: CRT scores at time 1 compared to time 2.

To further test stability, we ask how the correlation between CRT scores at time 1 and time 2 varied with the number of days separating the two observations. To do so, we use the absolute value of the difference between time 1 and time 2 scores as a measure of prediction error, and find a significant positive correlation between prediction error and time between observations, CRTr: r = .136, p < .001; CRTi: r = .144, p < .001.Footnote ⁵ Thus, the correlation between CRT scores decreases somewhat over time.

This decrease, however, is small. We continue to observe a strong correlation between CRTr1 and CRTr2 even when restricting our analysis to the longer half of separations (M = 725.7 days, N = 1,644; r = .763; the longest 25% of separations (M = 1,046.3 days, N = 828; r = .755; the longest 10% of separations (M = 1,188.7 days, N = 341; r = .739; and the longest 5% of durations (M = 1,385.9 days, N = 170; r = .69. A similar pattern of results was also observed for CRTi, all rs > .62 (all ps < .001).

To the extent that there was some decrease in correlation over time, however, what is the basis of this change? Examining the raw difference in CRT scores, we find that CRTr scores increase with days between measures, r = .035, p = .042, and CRTi scores decrease with days between measures, r = −.046, p = .008. Thus, subjects seem to improve somewhat over time. To gain further insight into this improvement, we conduct a second analysis including all observations for each individual (not just the first and last), in order to estimate the effect of multiple exposures over time on performance. Doing so gives us a total sample of 23,226 CRT scores collected from 18,852 unique MTurker workers. We examine learning using a linear regression predicting CRT score based on the subject’s number of previous exposures to the CRT score in our dataset (to represent learning). To account for the contribution of selection effects (e.g., individual differences in the extent to which people choose to participate in studies like this), we also include the total number of times the subject appears in the dataset (coded as a categorical variable using dummies). As implied by the improvement observed above, we find that CRTr scores increase with the number of prior exposures, and CRTi scores decrease (Table 2). The learning effect is small, however: each exposure only increases correct performance by an average of 0.14 points.

Table 2: Comparing average CRT scores accounting for overall number of times subjects appear in data, and the order which witch they appear.

Finally, we note that the correlation between CRT scores is also consistent across different demographic subsets. We find a strong correlation (ps < .001 for all) between CRTr1 and CRTr2 when separately considering men (r = .817) versus women (r = .789), those below the median on social conservatism (r = .805) vs. above the median on social conservatism (r = .811), those below the median on economic conservatism (r = .808) vs. above the median on economic conservatism (r = .82), and those below the median on belief in God (r = .804) vs. those above the median on belief in God (r = .799). All of the above results maintained when considering CRTi, all rs > .7, all ps < .001.

In sum, we find evidence of substantial test-retest reliability of the CRT.

3.2 Evidence for stability of cognitive style

Section 3.1 focused on the stability of the CRT measure, showing a substantial amount of stability across time and repeated exposure. We now look at the construct theorized to underlay CRT score – that is, Analytic Cognitive Style (ACS). To investigate the potential evidence of stability of ACS as measured by the CRT, we assess the stability of the relationship between CRT and constructs which have been previously associated with ACS – namely, religious belief and political ideology.

3.2.1 CRT and religion

There is a well-documented negative relationship between CRT performance and religious belief (Reference Jost, Glaser, Kruglanski and SullowayPennycook et al., 2016). However, no one has systematically investigated the temporal stability of this relationship. Here we can examine how that correlation is maintained across sessions. To do so, we constrain our analysis to a subset of the data that included belief in God in both experiments (N = 836). The average time difference between time 1 and time 2 in this data set was M = 287.83 days, SD = 397.06, Median of 86.12 daysFootnote ⁶.

Assessing the relationship between belief in God and CRT scores collected at the same time, we replicate past findings both for reflective scores, CRTr1 and God1, r = −.211, CRTr2 and God2, r = −.217, and for intuition, CRTi1 and God1, r = .176, CRTi2 and God2, r = .215 (p < .001 for all correlations). Importantly, we continue to observe correlations across time points: CRTr1 and God2, r = −.201; CRTi1 and God2, r = .17; CRTr2 and God1, r = −.201; CRTi2 and God1, r = .211. See Figure 1.

Figure 1: Y axis displays means for self-reported Belief in God at time two as a function of correct answer (reflective) CRT scores at time one (displayed on the X axis). Error bars represent 95% Confidence Intervals.

The above results provide evidence that the relationship between CRT and belief in God is stable over time. To further assess this relationship, we focus particularly on the ability of CRT at time 1 to predict belief in God at time 2, and restrict our analysis to the longer 50% durations (M = 554.9 days, N = 416), and continue to observe the correlation between CRTr1 and God2, r = −.232, as was to when restricting to the longest 25% of durations (M = 871.6 days, N = 209), r = −.262; and the longest 10% of durations (M = 1,079.9 days, N = 83), r = −.201 (which was not quite significant, given the smaller sample, but had a similar effect size). Looking at the same comparisons for intuitive responses, restricting our analysis to the longer 50% durations continues to show a correlation between CRTi1 and God2, r = .197, as well as the longest 25% of durations, r = .226, but diminishes somewhat for the longest 10% of durations, r = .139 (again not significant, but similar in magnitude). Furthermore, we find significant correlations (ps < .03 for all) between CRTr1 and God2 when separately considering men (r = −.218) versus women (r = −.186), as well as for CRTi1 and God2, (men, r = .149; women, r = .185) and when including demographic covariates.Footnote ⁷, for CRTr: β = −.121, p = .011, and CRTi: β = .087, p = .063. Thus, even when separated by over a year and including demographic covariates in the model, the CRT has predictive validity in the context of religious belief.

Figure 2: Y axis displays means for self-reported social (Left), and economic (Right) conservatism at time two as a function of correct answer (reflective) CRT scores at time one (displayed on the X axis). Error bars represent 95% Confidence Intervals.

3.2.2 CRT and political ideology

We now turn to the relationship between CRT and political ideology (Reference Iyer, Koleva, Graham, Ditto and HaidtPennycook, et al., 2012; Reference Piazza and SousaPiazza & Sousa, 2014; Reference KahanKahan, 2013), examining social conservatism reported at time 1 (SC1) and time 2 (SC2). For the following analysis there were sufficient observations of social conservatism to turn back to the original (full: N = 3,302) dataset.Footnote ⁸ We find significant correlations of similar magnitude for all temporal combinations: CRTr1 and SC1, r = −.135; CRTr2 and SC2, r = −.118; CRTr1 and SC2, r = −.125; CRTr2 and SC1, r = −.131, p < .001; CRTi1 and SC1, r = .124; CRTi2 and SC2, r = .114; CRTi1 and SC2, r = .124; CRTi2 and SC1, r = .128. Focusing particularly on the ability of CRT at time 1 to predict social conservatism at time 2, the relationship maintains when separately considering men (CRTr, r = −.08, CRTi, r = .08) versus women (CRTr, r −.183; CRTi, r = .171 (p < .01 for all). This relationship between CRTr1 and SC2 also maintained when including demographic covariates into the model, β = −.088, p = .008Footnote ⁹, and the same was true for CRTi, β = .087, p = .007.

Finally, we look at the relationship between CRTr and economic conservatism (EC), we found small relationships, none statistically significant: CRTr1 and EC1, r = −.026; CRTr2 and EC2, r = −.018; CRTr1 and EC2, r = −.029; CRTr2 and EC1, r = −.032. Looking at the relationship between CRTi and EC, we see that all were also small and either non-significant or barely significant (by planned tests): CRTi1 and EC1, r = .037; CRTi2 and EC2, r = .026; CRTi1 and EC2, r = .038 (p = .046); and CRTi2 and EC1 r = .046 (p = .029). Thus, although there was some relationship between social conservatism and CRT, there is little support for such a relationship for economic conservatism.

4 Discussion

The Cognitive Reflection Test has been associated with a large set of psychological phenomena (Reference Pennycook, Fugelsang and KoehlerPennycook, Fugelsang & Koehler, 2015). However, the stability of this measure has been in doubt. Here, we have presented such evidence using a panel dataset to show that an individual’s CRT scores are fairly stable over time. This was true even when contrasting time points separated by years and when including covariates in the model.

We also show this continuity over time extends to the relationship between CRT and constructs associated with analytic cognitive style. Specifically, we show correlations between CRT scores at one timepoint and belief in God and social conservatism at the other timepoint – even when separated by over a year. Thus, we provide some evidence in support of the stability of analytic cognitive style more broadly (rather than just stability of the CRT as a measure).

Our findings have practical implications for experimenters who worry about using subjects who have had prior experience with the CRT. If accuracy improves with repeated attempts, then mixing subjects with different numbers of such attempts will add extraneous variance and weaken correlations with other measures. Apparently, such effects, if they exist, are quite small. For one thing, the CRTr2 measures were all from subjects who differed in the number of previous attempts (although never less than 1). And the CRTr2 correlations with God (the mean of God1 and God2) and SC (social conservatism, the mean of SC1 and SC2) were no lower than the CRTr1 correlations: CRTr1/God −.213, CRTr2/God −.218, CRTr1/Soc −.145, CRTr2/God −.145. To look at mixing subjects who had no previous attempts (so far as we knew) with subjects who had 1 or more, we created 100 random mixtures of CRTr1 and CRTr2 (using all subjects available for both) and computed the correlation of each mixture with God and with SC. Importantly, the observed correlations fell roughly in the middle of the corresponding sets of 100 correlations; the percentiles in the ranking (low to high) ranged from 43 to 59. Thus, it appears mixing the extremes of experience has no noticeable effect on correlations with other variables.

One limitation of the current work is that we did not include a measure of numeracy. Such a measure would help to separate the influence of cognitive stlye from cognitive ability, including both general abilities and specific knowledge of arithmetic and elementary algebra. Though past work has shown these constructs do have unique predictive power (Reference Pennycook, Fugelsang and KoehlerPennycook, Fugelsang & Koehler, 2015; Reference Meyer, Zhou and FrederickShenhav et al., 2012), there is considerable overlap, r ~ .4 (depending on measure used) (Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingFrederick, 2005), and reliability of cognitive ability is comparable to what we observe here for CRT, r ~ .7 (Reference MackintoshMackintosh, 2011).

Another limitation is that we used only one measure of ACS (the CRT). Future work should compare additional measures of ACS (e.g., base rate questions, belief bias syllogisms, etc.) to investigate the stability of ACS over time. Lastly, in addition to the inclusion of other measures of cognitive ability and ACS, future work should take into account aspects of the administration of these measures, such as how economic incentives or timing variations can affect performance over time.

Overall, our findings provide strong evidence that performance on CRT is stable over time, and also support the argument that cognitive style is an enduring, pervasive trait. This work further clarifies and supports the role of the disposition to think analytically not just in judgment and decision-making, but also in broader aspects of psychological phenomena such as belief and identity.

Footnotes

Results reported here are supported by those reported in this issue by Meyer, Zhou & Frederick (2018).

¹ There was some minor variation in the wording for some subset of the questions, however the basic arithmetic underlying the questions did not change, i.e. “If it takes 10 second for 10 printers to print out 10 documents, how many seconds will it take 50 printers to print out 50 documents?” vs. “If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?”

² One study used the slightly different wording: “How strongly do you believe in the existence of God or gods?”

³ Some response scales were anchored with 1(not at all)/ 10(very confident), others used 1(very little) /10(very much)

⁴ One of the studies used the alternative wording: “Which US political party do you identify with more strongly?” (1. Strong Republican – 4. Neutral – 7. Strong Democrat)

⁵ Plotting average prediction error as a function of days between observations shows a reasonably liner relationship.

⁶ Note that all key effects reported between CRT1 and CRT2 in the previous section maintain within this subset of the data.

⁷ Covariates included: time duration, age, gender, education, income, political ideology (social and economic). “Ethnicity” was not included due to insufficient number of overlapping observations in this data set (n = 186).

⁸ Note that these data were obtained by aggregating past work conducted by our lab, including Pennycook & Rand (in press) which also included an analysis of CRT and political ideology. Specifically, 78% (N=2347) of subjects in our dataset had one of their two time points of data taken from Pennycook & Rand (in press). Our main contribution here is to assess the stability over time of the relationship between CRT and political ideology, rather than adding substantial new data regarding the existence or nature of such a relationship.

⁹ Covariates included: time duration, age, gender, ethnicity, income, education.

References

Albert, D., & Steinberg, L. (2011). Judgment and decision making in adolescence. Journal of Research on Adolescence, 21(1), 211–224.CrossRef Google Scholar

Arechar, A. A., Kraft-Todd, G. T., & Rand, D. G. (2017). Turking overtime: how participant characteristics and behavior vary over time and day on Amazon Mechanical Turk. Journal of the Economic Science Association, 3(1), 1–11.CrossRef Google Scholar PubMed

Bahçekapili, H. G., & Yilmaz, O. (2017). The relation between different types of religiosity and analytic cognitive style. Personality and Individual Differences, 117, 267–272.CrossRef Google Scholar

Bialek, M., & Pennycook, G. (2017). The Cognitive Reflection Test is robust to mulxtiple exposures. Behavior Research Methods, 1–7. https://doi.org/10.3758/s13428-017-0963-x.Google Scholar

Bronstein, M., Pennycook, G., Bear, A. R, and , D. G., & Cannon, T. (2018) Reduced analytic and actively open-minded thinking help to explain the link between belief in fake news and delusionality, dogmatism, and religious fundamentalism. Available at https://ssrn.com/abstract=3172140.CrossRef Google Scholar

Cacioppo, J. T., & Petty, R. E. (1982). The need for cognition. Journal of Personality and Social Psychology, 42(1), 116–131.CrossRef Google Scholar

Cacioppo, J. T., Petty, R. E., & Feng Kao, C. (1984). The efficient assessment of need for cognition. Journal of personality assessment, 48(3), 306–307.CrossRef Google Scholar PubMed

Campitelli, G., & Gerrans, P. (2014). Does the cognitive reflection test measure cognitive reflection? A mathematical modeling approach. Memory & Cognition, 42(3), 434–447.CrossRef Google Scholar PubMed

Deppe, K. D., Gonzalez, F. J., Neiman, J. L., Jacobs, C., Pahlke, J., Smith, K. B., & Hibbing, J. R. (2015). Reflective liberals and intuitive conservatives: A look at the Cognitive Reflection Test and ideology. Judgment and Decision Making, 10(4), 314–331.CrossRef Google Scholar

Dreber, A., Ellingsen, T., Johannesson, M., & Rand, D. G. (2013). Do people care about social context? Framing effects in dictator games. Experimental Economics, 16(3), 349–371.CrossRef Google Scholar

Epstein, Z., Peysakhovich, A., & Rand, D. G. (2016). The good, the bad, and the unflinchingly selfish: Cooperative decision-making can be predicted with high accuracy when using only three behavioral types. Proceedings of the 2016 ACM Conference on Economics and Computation (pp. 547–559). ACM.CrossRef Google Scholar

Evans, J. St. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annu. Rev. Psychol., 59, 255-278.CrossRef Google Scholar PubMed

Evans, J. St. B. T., & Frankish, K. E. (2009). In two minds: Dual processes and beyond. Oxford University Press.CrossRef Google Scholar

Evans, J. St. B. T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition: Advancing the debate. Perspectives on psychological science, 8(3), 223–241.CrossRef Google Scholar PubMed

Franks, A. S., & Scherr, K. C. (2017). Analytic thinking reduces anti-atheist bias in voting intentions. The International Journal for the Psychology of Religion, 1–12.Google Scholar

Frederick, S. (2005). Cognitive reflection and decision making. Journal of Economic Perspectives, 19(4), 25–42.CrossRef Google Scholar

Gerrard, M., Gibbons, F. X., Houlihan, A. E., Stock, M. L., & Pomery, E. A. (2008). A dual-process approach to health risk decision making: The prototype willingness model. Developmental Review, 28(1), 29–61.CrossRef Google Scholar

Gervais, W. M., & Norenzayan, A. (2012). Analytic thinking promotes religious disbelief. Science, 336(6080), 493–496.CrossRef Google Scholar PubMed

Greene, J. D., Morelli, S. A., Lowenberg, K., Nystrom, L. E., & Cohen, J. D. (2008). Cognitive load selectively interferes with utilitarian moral judgment. Cognition, 107(3), 1144–1154.CrossRef Google Scholar PubMed

Haigh, M. (2016). Has the standard cognitive reflection test become a victim of its own success?. Advances in Cognitive Psychology, 12(3), 145–149.CrossRef Google Scholar PubMed

Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14(3), 399–425.CrossRef Google Scholar

Iyer, R., Koleva, S., Graham, J., Ditto, P., & Haidt, J. (2012). Understanding libertarian morality: The psychological dispositions of self-identified libertarians. PloS One, 7(8), e42366.CrossRef Google Scholar PubMed

Jost, J. T., Glaser, J., Kruglanski, A. W., & Sulloway, F. J. (2003). Political conservatism as motivated social cognition. Psychological Bulletin, 129(3), 339–375.CrossRef Google Scholar PubMed

Kahan, D. M. (2013). Ideology, motivated reasoning, and cognitive reflection. Judgment and Decision Making, 8(4), 407–424.CrossRef Google Scholar

Klaczynski, P. A. (2004). A dual-process model of adolescent development: Implications for decision making, reasoning, and identity. Advances in Child Development and Behavior, 32, 73–123.CrossRef Google Scholar PubMed

Mackintosh, N. J. (2011). IQ and human intelligence. Oxford University Press.Google Scholar

Mata, A., Ferreira, M. B., & Sherman, S. J. (2013). The metacognitive advantage of deliberative thinkers: A dual-process perspective on overconfidence. Journal of Personality and Social Psychology, 105(3), 353–375.CrossRef Google Scholar PubMed

Meyer, A., Zhou, E., & Frederick, S. (2018). The non-effects of repeated exposure to the Cognitive Reflection Test. Judgment and Decision Making, 13(3), 246–259.CrossRef Google Scholar

Noori, M. (2016). Cognitive reflection as a predictor of susceptibility to behavioral anomalies. Judgment and Decision Making, 11(1), 114–120.CrossRef Google Scholar

Pennycook, G., Cannon, T. D., & Rand, D. G. (in press). Prior exposure increases perceived accuracy of fake news. Journal of Experimental Psychology: General.Google Scholar

Pennycook, G., Cheyne, J. A., Seli, P., Koehler, D. J., & Fugelsang, J. A. (2012). Analytic cognitive style predicts religious and paranormal belief. Cognition, 123(3), 335–346.CrossRef Google Scholar PubMed

Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2014). The role of analytic thinking in moral judgements and values. Thinking & Reasoning, 20(2), 188–214.CrossRef Google Scholar

Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2015). On the reception and detection of pseudo-profound bullshit. Judgment and Decision Making, 10(6), 549–563.CrossRef Google Scholar

Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). Everyday consequences of analytic thinking. Current Directions in Psychological Science, 24(6), 425–432.CrossRef Google Scholar

Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2016). Is the cognitive reflection test a measure of both reflection and intuition?. Behavior Research Methods, 48(1), 341–348.CrossRef Google Scholar

Pennycook, G., Ross, R. M., Koehler, D. J., & Fugelsang, J. A. (2016). Atheists and agnostics are more reflective than religious believers: Four empirical studies and a meta-analysis. PloS One 11(4), e0153039.CrossRef Google Scholar PubMed

Pennycook, G., Ross, R. M., Koehler, D. J., & Fugelsang, J. A. (2017). Dunning–Kruger effects in reasoning: Theoretical implications of the failure to recognize incompetence. Psychonomic Bulletin & Review, 24(6), 1774–1784.CrossRef Google Scholar PubMed

Pennycook, G., & Rand, D. G. (2017). The implied truth effect: Attaching warnings to a subset of fake news stories increases perceived accuracy of stories without warnings. Available at SSRN: https://ssrn.com/abstract=3035384 Google Scholar

Pennycook, G., & Rand, D. G. (2018). Susceptibility to partisan fake news is explained more by a lack of deliberation than by willful ignorance. Available at SSRN: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3165567 CrossRef Google Scholar

Pennycook, G., & Rand, D. G. (in press). Cognitive reflection and the 2018 U.S. Presidential Election. Personality and Social Psychology Bulletin.CrossRef Google Scholar

Peysakhovich, A., & Rand, D. G. (2016). Habits of virtue: Creating norms of cooperation and defection in the laboratory. Management Science, 62(3), 631–647.CrossRef Google Scholar

Piazza, J., & Sousa, P. (2014). Religiosity, political orientation, and consequentialist moral thinking. Social Psychological and Personality Science, 5(3), 334–342.CrossRef Google Scholar

Rand, D. G., Greene, J. D., & Nowak, M. A. (2012). Spontaneous giving and calculated greed. Nature, 489(7416), 427–430.CrossRef Google Scholar PubMed

Sadowski, C. J., & Gulgoz, S. (1992). Internal consistency and test-retest reliability of the Need for Cognition Scale. Perceptual and Motor Skills, 74(2), 610–610.CrossRef Google Scholar

Sanchez, C., Sundermeier, B., Gray, K., & Calin-Jageman, R. J. (2017). Direct replication of Gervais & Norenzayan (2012): No evidence that analytic thinking decreases religious belief. PloS One, 12(2), e0172636.CrossRef Google Scholar PubMed

Sargent, M. J. (2004). Less thought, more punishment: Need for cognition predicts support for punitive responses to crime. Personality and Social Psychology Bulletin, 30(11), 1485–1493.CrossRef Google Scholar PubMed

Shenhav, A., Rand, D. G., & Greene, J. D. (2012). Divine intuition: Cognitive style influences belief in God. Journal of Experimental Psychology: General, 141(3), 423.CrossRef Google Scholar PubMed

Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22.CrossRef Google Scholar

Stagnaro, M. N., Arechar, A. A., & Rand, D. G. (2017). From good institutions to generous citizens: Top-down incentives to cooperate promote subsequent prosociality but not norm enforcement. Cognition, 167, 212–254.CrossRef Google Scholar

Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate?. Behavioral and Brain Sciences, 23(5), 645–665.CrossRef Google Scholar PubMed

Stanovich, K. E. (2012). On the Distinction Between Rationality and Intelligence: Implications for Understanding Individual Differences in Reasoning. In K. J. Holyoak & R. Morrison (Eds.), The Oxford Handbook of Thinking and Reasoning (pp. 433–455). New York, NY: Oxford University Press.CrossRef Google Scholar

Stanovich, K. E., West, R. F., & Toplak, M. E. (2016). The rationality quotient: Toward a test of rational thinking. MIT Press.CrossRef Google Scholar

Stieger, S., & Reips, U. D. (2016). A limitation of the Cognitive Reflection Test: familiarity. PeerJ, 4, e2395.CrossRef Google Scholar PubMed

Thorisdottir, H., Jost, J. T., Liviatan, I., & Shrout, P. E. (2007). Psychological needs and values underlying left-right political orientation: Cross-national evidence from Eastern and Western Europe. Public Opinion Quarterly, 71(2), 175–203.CrossRef Google Scholar

Van Hiel, A.,, & Mervielde, I., (2004). Openness to experience and boundaries in the mind: Relationships with cultural and economic conservative beliefs. Journal of personality, 72(4), 659-686.CrossRef Google Scholar PubMed

Yilmaz, O., & Saribay, S. A. (2016). An attempt to clarify the link between cognitive style and political ideology: A non-western replication and extension. Judgment and Decision Making, 11(3), 287–300.CrossRef Google Scholar

Yilmaz, O., Karadöller, D. Z., & Sofuoglu, G. (2016). Analytic thinking, religion, and prejudice: An experimental test of the dual-process model of mind. The International Journal for the Psychology of Religion, 26(4), 360–369.CrossRef Google Scholar

Yilmaz, O., & Saribay, S. A. (2017). The relationship between cognitive style and political orientation depends on the measures used. Judgment and Decision Making, 12(2), 140–147CrossRef Google Scholar

Yonker, J. E., Edman, L. R., Cresswell, J., & Barrett, J. L. (2016). Primed analytic thought and religiosity: The importance of individual characteristics. Psychology of Religion and Spirituality, 8(4), 298–308.CrossRef Google Scholar

Table 1: CRT scores at time 1 compared to time 2.

Table 2: Comparing average CRT scores accounting for overall number of times subjects appear in data, and the order which witch they appear.

Stagnaro et al. supplementary material

Data and variable names

File 10.4 KB

Article contents

Performance on the Cognitive Reflection Test is stable across time

Abstract

Keywords

Information

1 Introduction

2 Methods

3 Results

3.1 Evidence for stability of CRT

Table 1: CRT scores at time 1 compared to time 2.

Table 2: Comparing average CRT scores accounting for overall number of times subjects appear in data, and the order which witch they appear.

3.2 Evidence for stability of cognitive style

3.2.1 CRT and religion

3.2.2 CRT and political ideology

4 Discussion

Footnotes

References

Stagnaro et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests