1 Introduction
The distinction between reflective and intuitive thinking guides a wide range of research questions in modern behavioral sciences. The dual-process model of the mind provides the leading theoretical framework for these questions by positing that cognition is based on two fundamentally distinct types of processes (Reference ChristensenEvans & Stanovich, 2013; Reference Morewedge and KahnemanMorewedge & Kahneman, 2010). Type 1 processes include the automatic, effortless, and intuitive thinking that we share with our evolutionary ancestors, whereas Type 2 processes include the controlled, effortful, and reflective thinking specific to humans (Reference FrederickKahneman, 2011). Although the assumption of the dual-process model that the two cognitive processes are independent has recently come under scrutiny (Reference Baron, Scott, Fincher and MetzBaron, Scott, Fincher & Metz, 2015; Reference Białek and De NeysBiałek & De Neys, 2016; Reference KleinKlein, 2011; Reference Pennycook, Fugelsang and KoehlerPennycook, Fugelsang & Koehler, 2015; Reference Thompson, Evans and FrankishThompson, Evans & Frankish, 2009; Reference Trémolière and BonnefonTrémolière & Bonnefon, 2014), it is well-established that the relative extent of reflection vs. intuition constituting a decision-making process can nevertheless strongly influence beliefs and behaviors (e.g., ideological, religious, and conspirational beliefs, and economic, moral, and health behaviors; Reference Gervais, van Elk, Xygalatas, McKay, Aveyard, Buchtel and RiekkiGervais et al., 2018; Reference Pennycook, Cheyne, Barr, Koehler and FugelsangPennycook, Cheyne, Barr, Koehler & Fugelsang, 2013; Reference Pennycook, Cheyne, Seli, Koehler and FugelsangPennycook, Cheyne, Seli, Koehler & Fugelsang, 2012; Reference RandRand, 2016; Reference Swami, Voracek, Stieger, Tran and FurnhamSwami, Voracek, Stieger, Tran & Furnham, 2014; Reference Yilmaz and IslerYilmaz & Isler, 2019; Reference Yilmaz and SaribayYilmaz & Saribay, 2017a, 2017b).
Surprisingly, the relative effectiveness of reflection and intuition manipulations used in behavioral research remains largely unknown (Reference Horstmann, Hausmann and RyfHorstmann, Hausmann & Ryf, 2009; Reference Myrseth and WollbrantMyrseth & Wollbrant, 2017). We are aware of only one (unpublished) experimental comparison of intuition manipulations in cognitive performance (Reference Deck, Jahedi and SheremetaDeck, Jahedi & Sheremeta, 2017), and no previous experimental study that has systematically compared alternative reflection manipulations. The presumed effectiveness of reflection manipulations used in the literature can be questioned since baseline cognitive functions tend to be intuitive and motivating people to pursue an effortful activity such as reflection can be difficult (e.g., Reference KahnemanKahneman, 2011). Here, we provide possibly the first systematic methodological comparison of regularly used and promising reflection manipulations.
Another reason for the missing methodological evidence is the frequent lack of control conditions, which stems from a reliance on experimental comparisons of intuition and reflection manipulations as the basis for hypothesis testing. Without these controls, the question of whether experimental results are due to activation of intuitive or reflective processes cannot be answered (e.g., Reference Isler, Maule and StarmerIsler, Maule & Starmer, 2018; Reference RandRand, 2016). Similarly, studies that rely on the two-response paradigm, where an initial (relatively more intuitive) response is elicited before a second (relatively less intuitive and more reflected) response, often lack a control condition (e.g., Reference Bago and De NeysBago & De Neys, 2017). As a recent exception, Reference Lawson, Larrick and SollLawson, Larrick, and Soll (2020) employ slow and fast thinking prompts (without time-limits) and find that slow thinking has limited positive effect on cognitive performance compared to a control condition. Given its importance, we also employ control conditions in the current study.
Studies using intuition and reflection manipulations often do not directly test whether cognitive processes were activated in the intended directions. While some have checked the direct effects of their manipulations on cognitive performance (e.g., Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingDeppe et al., 2015; Reference Lawson, Larrick and SollLawson et al., 2020; Reference Yilmaz and SaribayYilmaz & Saribay, 2016), subjective self-report questions and behavioral measures such as response times are frequently relied on as alternative manipulation checks (Reference Rand, Greene and NowakRand, Greene & Nowak, 2012; Reference Yilmaz and IslerYilmaz & Isler, 2019). The lack of performance measures would be misleading if, rather than thinking reflectively about the problem at hand, participants were to rely on their own lay theories about reflection (Reference Saribay, Yilmaz and KörpeSaribay, Yilmaz & Körpe, 2020) or if they were to respond in socially desirable ways (Reference Grimm, Sheth and MalhotraGrimm, 2010). Consistent with the existence of such methodological problems, Reference Saribay, Yilmaz and KörpeSaribay et al. (2020) found intuition and reflection primes to affect self-reported thinking style but not actual performance in the commonly used Cognitive Reflection Test (CRT, Reference FrederickFrederick, 2005). Even the regularly used objective performance measures — such as when differences in response times are used to check whether time-limit manipulations have impacted behavior (e.g., Reference Isler, Maule and StarmerIsler et al., 2018; Reference Rand, Greene and NowakRand et al., 2012) — may not always provide direct and convincing evidence about whether and how cognitive processes have been manipulated (Reference Krajbich, Bartling, Hare and FehrKrajbich, Bartling, Hare & Fehr, 2015).
Therefore, the effect of reflection manipulations should be observed on well-established measures of cognitive performance — such as the CRT (Reference FrederickFrederick, 2005) and the CRT-2 (Reference Thomson and OppenheimerThomson & Oppenheimer, 2016). Providing evidence of their ability to predict the domain-general features of reflection, test scores on these two tasks have been shown to correlate with a wide-range of cognitive performance measures in the lab (e.g., syllogistic reasoning and heuristics-and-biases problems) and in the field (e.g., standardized academic test scores and university course grades) (Reference Lawson, Larrick and SollLawson et al., 2020; Reference Meyer, Zhou and ShaneMeyer, Zhou & Shane, 2018; Reference Thomson and OppenheimerThomson & Oppenheimer, 2016; Reference Toplak, West and StanovichToplak, West, & Stanovich, 2011). Numerous other widely-used reasoning problems, such as the conjunction fallacy (Reference Tversky and KahnemanTversky & Kahneman, 1983), probability matching (Reference Stanovich and WestStanovich & West, 2008) and base rate neglect (Reference Kahneman and TverskyKahneman & Tversky, 1973), can also be used to measure the effects of manipulations on cognitive performance (e.g., Reference Lawson, Larrick and SollLawson et al., 2020). Among these alternatives, we chose CRT-2 as our performance measure because participants are less likely to be familiar with it, thereby minimizing problems such as ceiling effects, and because its reliance on numeracy skills is less than that of CRT, which can confound the interpretation of scores (see discussion in Reference Thomson and OppenheimerThomson & Oppenheimer, 2016). Despite these advantages, the CRT-2 arguably captures only some of the specific features of cognitive reflection directly, such as attention to detail and careful reading. Hence, the immediate effects of the reflection manipulations found in our study can be limited to these features of reflection, as we further detail in the Discussion.
The increased reliance on online experiments provides another reason to study the effectiveness of reflection manipulations, namely, to test their robustness in this novel research environment. Online labor markets such as Amazon Mechanical Turk as well as professionally maintained research participant pools such as Prolific have been shown to provide internally valid experimental tests in settings less artificial and more anonymous than the laboratory (Reference Horton, Rand and ZeckhauserHorton, Rand & Zeckhauser, 2011; Reference Palan and SchitterPalan & Schitter, 2018; Reference Peer, Brandimarte, Samat and AcquistiPeer, Brandimarte, Samat & Acquisti, 2017), but online experiments can also suffer from idiosyncratic drawbacks such as noncompliance with treatments and asymmetry in dropout rates (Reference Arechar, Gächter and MollemanArechar, Gächter & Molleman, 2018; Reference Isler, Maule and StarmerIsler et al., 2018). These problems may be more acute for cognitively demanding tasks such as the reflection manipulations that we study here, especially in online decision environments that can be distracting to participants (Reference Dandurand, Shultz and OnishiDandurand, Shultz & Onishi, 2008). For example, providing participants with monetary incentives has been shown to result in high rates of compliance with time-limits (Reference Isler, Maule and StarmerIsler et al., 2018) and reflective thinking (Reference Lawson, Larrick and SollLawson et al., 2020) in online experiments. With these considerations in mind, we compare five tasks that are simple and fast enough to be used in online experiments, and we use monetary incentives to motivate compliance for the task instructions.
Numerous experimental tasks for promoting reflective thinking are currently in use. Some of these tasks, introduced in once-acceptable small-sample studies, are now known to be unreliable. For example, the perceptual disfluency method (e.g., the use of hard-to-read-fonts to promote reflection), the scrambled sentence task that primes participants with words such as “reason” and “rational”, and the task that aims to prime reflection by showing participants a picture of Rodin’s The Thinker (Reference Gervais and NorenzayanGervais & Norenzayan, 2012; Reference Song and SchwarzSong & Schwarz, 2008) all failed to manipulate reflective thinking in recent large-sample replication attempts (Reference BakhtiBakhti, 2018; Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingDeppe et al., 2015; Reference Meyer, Frederick, Burnham, Guevara Pinto, Boyer, Ball and SchuldtMeyer et al., 2015; Reference Sanchez, Sundermeier, Gray and Calin-JagemanSanchez, Sundermeier, Gray & Calin-Jageman, 2017; Reference Sirota, Theodoropoulou and JuanchichSirota, Theodoropoulou & Juanchich, 2020). In addition, researchers sometimes attempt to activate reflective thinking by having participants complete tasks (e.g., the CRT) that are originally designed to measure thinking style, but the effects of such unestablished approaches tend to be unreliable too (Reference Yonker, Edman, Cresswell and BarrettYonker, Edman, Cresswell & Barrett, 2016). Instead, to make the most use of our experimental resources, we here focus on methods that are specifically designed to manipulate reflection and that are not known to be unreliable.
One of the most frequently used reflection manipulations is to put time-limits on decision-making processes (Reference Horstmann, Ahlgrimm and GlöcknerHorstmann, Ahlgrimm & Glöckner, 2009; Reference Maule, Hockey and BdzolaMaule, Hockey & Bdzola, 2000; Reference Spiliopoulos and OrtmannSpiliopoulos & Ortmann, 2018). In this method, participants in a time pressure condition, prompted to decide within a time-limit (e.g., 10 seconds), are compared to those in a time delay condition, who are either asked to think or forced to wait for a certain duration (e.g., 20 seconds) before submitting decisions (Reference Capraro, Schulz and RandCapraro, Schulz & Rand, 2019; Reference RandRand, 2016; Reference Suter and HertwigSuter & Hertwig, 2011). Although the time delay condition is assumed to induce reflective answers relative to the time pressure condition, the usual lack of a control condition without time-limits prohibits the identification of whether it is time pressure or time delay that affects decision-making. Only a few studies have used control conditions to isolate the influence of time delay (e.g., Reference Everett, Ingbretsen, Cushman and CikaraEverett, Ingbretsen, Cushman & Cikara, 2017). Nevertheless, the exact effect of time delay arguably remains unclear even with a control condition, as it may be difficult to distinguish between increased reliance on reflective processes and dilution of emotional responses (Reference Neo, Yu, Weber and GonzalezNeo, Yu, Weber & Gonzalez, 2013; Reference RandWang et al., 2011). Given its prominence as the most frequently used cognitive process manipulation, we here use time delay as one of our experimental conditions, and we also explore the role of emotional responses.
Another frequently used technique for activating reflection is memory recall (Reference Cappelen, Sørensen and TungoddenCappelen, Sørensen & Tungodden, 2013; Reference Forstmann and BurgmerForstmann & Burgmer, 2015; Reference Ma, Liu, Rand, Heatherton and HanMa, Liu, Rand, Heatherton & Han, 2015; Reference Rand, Greene and NowakRand et al., 2012; Reference Shenhav, Rand and GreeneShenhav, Rand & Greene, 2012). In this method, participants are usually asked to write a paragraph describing a personal experience where reliance on careful reasoning led to a good outcome, with the expectation that the explicit priming of these memories would motivate reflection. Although a recent high-powered study failed to find an effect of this priming method on a cognitive performance measure (Reference Saribay, Yilmaz and KörpeSaribay et al., 2020), this null result may have been a result of the low rates of compliance with the task instructions (see Reference Shenhav, Rand and GreeneShenhav et al., 2012). Similar difficulties in achieving high rates of compliance have been observed when using time-limits to activate reflection (Reference Tinghog, Andersson, Bonn, Bottiger, Josephson, Lundgren and JohannessonTinghog et al., 2013), and monetary incentives have successfully been implemented to resolve this problem (Reference Isler, Maule and StarmerIsler et al., 2018; Reference Kocher and SutterKocher & Sutter, 2006). Building on these findings, we adapt this task to the online context and, as with other tasks tested in the study, use monetary incentives to motivate compliance.
In the third reflection manipulation that we test here, we simply ask participants to justify their answers by writing an explanation of their reasoning. Across multiple studies employing the classic Asian disease problem (Reference Miller and FagleyMiller & Fagley, 1991; Reference Sieck and YatesSieck & Yates, 1997; Reference TakemuraTakemura, 1994), the decision justification task has been found to reduce framing effects effectively. Asking for justification or elaboration was found to be even more effective than monetary incentives (Reference VieiderVieider, 2011), and its effectiveness has been validated across multiple decision-making contexts, including health (Reference Almashat, Ayotte, Edelstein and MargrettAlmashat, Ayotte, Edelstein & Margrett, 2008) and consumer choice (Reference Cheng, Wu and LinCheng, Wu & Lin, 2014). Justification prompts can motivate reflection by generating feelings of higher levels of responsibility for one’s decisions as well as expectations of their scrutiny by others. However, the effectiveness of the justification task has been questioned (Reference Belardinelli, Bellé, Sicilia and SteccoliniBelardinelli, Bellé, Sicilia & Steccolini, 2018; Reference Leboeuf and ShafirLeboeuf & Shafir, 2003). Additional findings have suggested that the effectiveness of decision justification is task-dependent (Reference Leisti, Radun, Virtanen, Nyman and HäkkinenLeisti, Radun, Virtanen, Nyman, & Häkkinen, 2014) and that it may even harm decisions (Reference Igou and BlessIgou & Bless, 2007), especially in specific contexts prone to motivated reasoning (Reference ChristensenChristensen, 2018; Reference Sieck, Quinn and SchoolerSieck, Quinn & Schooler, 1999). Given the promising but mixed findings on the effectiveness of the justification task, we used this simple technique as an alternative reflection manipulation.
For the fourth reflection task tested here, we develop a novel training procedure for the online context consistent with well-established debiasing principles (Reference Lewandowsky, Ecker, Seifert, Schwarz and CookLewandowsky, Ecker, Seifert, Schwarz & Cook, 2012). We modify a debiasing training task that was previously tested in the laboratory with promising results (Reference Yilmaz and SaribayYilmaz & Saribay, 2017a, 2017b). The lab version of the task provides participants with a 10-minute training on noticing and correcting cognitive biases: it first elicits the Cognitive Reflection Test (Reference FrederickFrederick, 2005) and various base-rate problems (Reference De Neys and GlumicicDe Neys & Glumicic, 2008) and then provides feedback on the correct answers and their explanations (also see Reference Morewedge, Yoon, Scopelliti, Symborski, Korris and KassamMorewedge et al., 2015; Reference Stephens, Dunn, Hayes and KalishStephens, Dunn, Hayes & Kalish, 2020). While previous studies using debiasing training have been successful (Reference Sellier, Scopelliti and MorewedgeSellier, Scopelliti & Morewedge, 2019), its lengthy and complicated exercises have so far precluded its systematic use in online experiments.
In short, alternative reflection manipulations have not yet been experimentally compared using an actual performance measure and behavioral research methods lack reliable reflection manipulations that can be used in online experiments. Here, we use CRT-2 scores as the cognitive performance measure and compare the effects of five promising manipulations on reflective thinking in a high-powered between-subjects experiment. The five reflection manipulations include the time delay condition (R1), the memory recall task (R2), the decision justification task (R3), and the debiasing training (R4) described above as well as a combined task that includes both the debiasing training and the decision justification tasks (R5). We compare these five reflection conditions with two control groups: the passive control condition (C1) where participants received no treatment prior to taking part in CRT-2, and the active control condition (C2) where participants were assigned neutral reading and writing tasks to provide comparability with the reflection conditions.
Using this experimental setup, we test three preregistered hypotheses on the effect of manipulations on reflective thinking as measured by the CRT-2 scores. First, we predicted that the CRT-2 scores in the five reflection conditions (R1 to R5) will be higher than the two control conditions (C1 to C2). Second, we predicted that the CRT-2 scores in conditions with debiasing training (R4 and R5) will be higher than the reflection conditions without debiasing training (R1, R2 and R3) because they are based on proven debiasing techniques, including repeated explanations of cognitive biases and warnings against potential future mistakes (Reference Lewandowsky, Ecker, Seifert, Schwarz and CookLewandowsky et al., 2012). Third, we expected that the combination of debiasing training and decision justification manipulations can motivate even higher reflection by prompting participants to apply debiasing techniques when providing justifications for their decisions on the CRT-2 items. Accordingly, we predicted that the CRT-2 scores in the debiasing training condition with justification (R5) will be higher than the debiasing training condition without justification (R4).
In addition to testing these hypotheses, we report various exploratory analyses. We investigate response times and study the role of task compliance in driving the treatment effects. We then contrast CRT-2 scores with self-report measures of reflection. We conjectured that a discrepancy between these two measures, where self-reported reflection is not supported by actual performance, could indicate socially desirable responding. There is limited but suggestive evidence that reflection manipulations such as time limits can influence affect (Reference Isler, Maule and StarmerIsler et al., 2018; Reference Maule, Hockey and BdzolaMaule et al., 2000). Therefore, we also explore whether the effects of treatments on cognitive performance align with differences in effects on emotional responses.
2 Method
Using a between-subjects design, we experimentally compared five reflection manipulations and two control conditions. Participants were blind to the experimental conditions, and each participant was randomly assigned to one of seven conditions (see Table 1). The experiment was preregistered at the Open Science Framework (OSF) (https://osf.io/6axuz). The experimental materials, the dataset, and the analysis code are available at the OSF study site (https://osf.io/k495r/).
2.1 Participants
Participants were recruited online via Prolific (http://www.prolific.co/, Reference Palan and SchitterPalan & Schitter, 2018) and recruitment was restricted to fluent English-speaking UK residents who were 18 or older. As preregistered, participants with incomplete data were excluded from the dataset prior to analysis (n = 107). None of the excluded participants had completed the CRT-2. Hence, their inclusion in the analysis does not change the results. We analyze data from 1,748 unique participants with complete submissions (M age = 33.58, SD age = 11.50; 71.1% female). In addition to a participation fee of £0.40, participants were paid £0.20 for compliance with task instructions.
2.2 Planned sample size
We planned for a powerful test (1-β = 0.90) to identify small effects of manipulations (f = 0.10) in a one-way ANOVA model with seven conditions and standard Type I error rate (α = 0.05). Using G*Power 3.1.9.2 (Reference Faul, Erdfelder, Buchner and LangFaul, Erdfelder, Buchner & Lang, 2009), we estimated our target sample size to include at least 1750 complete submissions.
2.3 Procedure
To increase compliance with the experimental tasks, participants were informed that they would earn an additional £0.20 if they closely followed the task instructions. Five of the seven conditions were designed to activate cognitive reflection (R1 to R5), whereas the other two conditions were designed as controls (C1 and C2). In all conditions, participants completed the Cognitive Reflection Test (CRT-2; Reference Thomson and OppenheimerThomson & Oppenheimer, 2016), which provides a less familiar and less numerical alternative to the original CRT (Reference FrederickFrederick, 2005). CRT-2 includes four questions that are designed to trigger a spontaneous but incorrect response and reliance on cognitive reflection is operationalized as resistance to this initial response (e.g., “If you’re running a race and you pass the person in second place, what place are you in?”). Hence, individual CRT-2 scores range from 0 to 4. Cronbach’s α for the four CRT-2 items was .54, in line with the original CRT (Reference Baron, Scott, Fincher and MetzBaron et al., 2015). As we next describe in detail, the reflection manipulations were implemented during the CRT-2 for R1 and R3 and before the CRT-2 for R2 and R4, whereas participants in R5 were exposed to reflection manipulations both before and during the CRT-2.
In the first reflection manipulation (R1), the time delay condition, participants were asked to think for at least 20 seconds before answering each CRT-2 question. Each question screen displayed a reflection prompt (“Carefully consider your answer”) and a timer counting up from zero seconds. Consistent with its regular use (Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Begue, Branas-Garza and WollbrantBouwmeester et al., 2017; Reference Isler, Maule and StarmerIsler et al., 2018; Reference RandRand, 2016; Reference Rand, Greene and NowakRand et al., 2012), it was technically possible to submit answers within 20 seconds, which allows checking that time delay instructions motivate behavior change (Reference Horstmann, Hausmann and RyfHorstmann, Hausmann, et al., 2009). The average rate of compliance with time-limits across the four questions was 67%.
The second reflection condition (R2), the memory recall task, was based on Reference Shenhav, Rand and GreeneShenhav et al. (2012). Participants were told to write a paragraph describing an episode when carefully reasoning through a situation led them in the right direction and resulted in a good outcome. Adapting this task to the online setting, we asked participants to write four sentences rather than eight-to-ten sentences as in the original task. Despite this modification, whereas at least 95% of the initially recruited participants completed the study in other conditions (i.e., answered all questions, including the survey), this figure was only 79% for R2. Among those who completed R2, the compliance rate (i.e., the prevalence of participants who wrote four or more sentences) was 88.6%. Because exclusion of non-compliant participants can jeopardize internal validity by annulling randomization (Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Begue, Branas-Garza and WollbrantBouwmeester et al., 2017; Reference Tinghog, Andersson, Bonn, Bottiger, Josephson, Lundgren and JohannessonTinghog et al., 2013), we include them in our analyses consistent with our preregistered intention-to-treat analysis plan.
The third reflection condition (R3) included the justification task, which elicited justifications from participants similar to Reference Miller and FagleyMiller and Fagley (1991). Specifically, on each of the four screens where answers to the CRT-2 questions were elicited, participants were asked to justify their answers in a separate cell by providing an explanation of their reasoning in one sentence or more. For each question, the answer to the CRT-2 question and its justification were submitted simultaneously.
As the fourth reflection condition (R4), we developed a novel training task for the online context. The task was designed to improve vigilance against three commonly observed cognitive biases. Participants were asked to answer three questions. The first question was intended to illustrate a semantic illusion: “How many of each animal did Moses take on the ark?” The second question involved a test of the base rate fallacy: “In a study, 1000 people were tested. Among the participants, there were 5 engineers and 995 lawyers. Jack is a randomly chosen participant in this study. Jack is 36 years old. He is not married and is somewhat introverted. He likes to spend his free time reading science fiction and writing computer programs. What is most likely?” (Jack is a lawyer or engineer). The third question was designed to exhibit availability bias: “Which cause more human deaths?” (sharks or horses). After each question, the screen displayed the correct answer, along with an explanation of the bias (see materials at the OSF study site). Finally, participants were asked to write four sentences summarizing what they have learned in training, and they were instructed to rely on reflection during the next task (i.e., the CRT-2).
We devised a fifth reflection condition (R5) that combined decision justification (R3) with debiasing training (R4). Participants first participated in the debiasing training and then they were asked to justify their responses to the CRT-2 questions, as described above. Hence, R5 promoted learning-by-doing (Reference Bruce, Bloch and SeelBruce & Bloch, 2012), the application of the lessons received during debiasing training on CRT-2 questions.
Two control conditions were designed to allow insightful comparisons to the five reflection conditions. The passive control condition (C1), where participants completed CRT-2 without any additional tasks, measures baseline CRT-2 scores in the participant pool. In the active control condition (C2), participants were first asked to describe an object of their choosing in four sentences before answering the CRT-2 questions. This neutral writing task in C2 controls for any direct effect that the act of writing itself in R2, R4 and R5 may have on reflection. Similarly, to achieve comparability between reflection manipulations, participants in R1 and R3 were asked to complete the same neutral writing task as in C2 prior to beginning CRT-2.
After the CRT-2, participants answered two questions on a 7-point Likert scale (1 = “not at all”, 7 = “a great deal”): 1) “To what extent did you rely on your feelings or intuitions when making your decisions?”, and 2) “To what extent did you rely on reason when making your decisions?” The score on the first question was reversed and the average of the scores on the two questions constituted the self-reported composite index of reflection.
Finally, participants completed a survey, including the 20-item Positive and Negative Affect Schedule (PANAS; Reference Watson, Clark and TellegenWatson, Clark & Tellegen, 1988) and a brief demographic questionnaire. The PANAS consisted of two 10-item scales measuring positive and negative affect. Participants were asked to indicate the extent to which they experienced each emotion item during the previous task (i.e., CRT-2) on a Likert scale ranging from 1 (“very slightly or not at all”) to 5 (“extremely”). Both positive and negative affect scales revealed sufficient internal consistency (both Cronbach’s αs = .89).
3 Results
3.1 Confirmatory tests
Overall, the debiasing training, the justification task, and their combination significantly improved performance on the CRT-2, whereas time delay and memory recall were not helpful. The CRT-2 scores across the control and experimental conditions are presented in Figure 1. A one-way ANOVA model revealed significant differences in CRT-2 scores across the conditions (F(6, 1741) = 15.75, p < .001, η2p = .051). As post-hoc analysis, we conducted pairwise comparisons using two-tailed t-tests, which indicated partial support for our initial hypothesis that reflection manipulations increase performance on the CRT-2. As predicted, CRT-2 scores in the justification and debiasing training conditions (i.e., R3, R4 and R5) were significantly higher than both of the control conditions, C1 (Cohen’s d = 0.47, 0.52 and 0.54 respectively, ps < .001) and C2 (d = 0.40, 0.45 and 0.47, ps < .001). In contrast, neither time delay (R1) nor memory recall (R2) showed significant difference from C1 (vs. R1: p = .537, d = 0.05; vs. R2: p = .610, d = 0.05;) or C2 (vs. R1: p = .721, d = 0.03; vs. R2: p = .682, d = 0.04). We also found partial support for our second hypothesis that debiasing training is more effective than the other reflection manipulations: CRT-2 scores in the conditions with debiasing training (R4 and R5) were significantly higher than time delay (R1 vs. R4: d = 0.47; R1 vs. R5: d = 0.49; ps < .001) and memory recall conditions (R2 vs. R4: d = 0.48; R2 vs. R5: d = 0.50, ps < .001) but not the justification condition (R3 vs. R4: p = .704, d = 0.03; R3 vs. R5: p = .448, d = 0.07). Failing to find confirmatory evidence for our final hypothesis, CRT-2 scores in the two conditions with debiasing training did not significantly differ (R4 vs. R5: p = .681, d = 0.04). In other words, the combination of debiasing training with justification provided no clear added benefits.
3.2 Exploratory analyses
Here, we first report the remaining (i.e., non-confirmatory) pairwise comparisons of experimental conditions, and then explore differences in response times (RTs), task noncompliance, self-reported reflection, and self-reported emotions across the conditions. No difference in CRT-2 scores were identified when comparing the two control conditions (p = .324) and when comparing time delay with memory recall (p = .944). The CRT-2 scores were higher in the decision justification condition than in the memory recall (p < .001). Finally, CRT-2 scores in the decision justification condition were significantly higher than the time delay condition (p < .001).
To help explore response times (RTs), Table 2 indicates the position of the reflection manipulations and the active controls in the study procedure as well as the mean RTs across the seven conditions. We use log-transformed RTs (base 10) to account for data skewness in all exploratory analyses that involve study duration measures. RTs in both the CRT-2 and the overall study significantly differed across conditions (CRT-2: F(6, 1741) = 274.84, p < .001, η2p = .486; overall: F(6, 1741) = 161.26, p < .001, η2p = .357). As expected, pairwise comparisons with two-tailed t-tests indicated that eliciting justifications during CRT-2 (i.e., R3 and R5) increased CRT-2 RTs compared to all other conditions (ps < .001) and that lack of reflection manipulations or active controls (i.e., C1) decreased the remaining study duration (i.e., excluding CRT-2 RTs) compared to all other conditions (ps ≤ .001). While there was no difference between the total study durations of R3 and R4 (p = .889), R1 was the fastest, R2 was the second fastest, and R5 was the slowest reflection condition (ps ≤ .001). Since careful reflection requires time, the variation in CRT-2 scores across the conditions could in part be driven by these RT asymmetries. Consistent with this conjecture, a linear regression of the CRT-2 scores on two variables that together constitute the total study duration were both positive and statistically significant (log of total RT on CRT-2: β = 0.189, p < .031, η2p = .003; log of remaining time spent on the study: β = 0.260, p < .034, η2p = .003).
One reason why the time delay condition failed to significantly activate reflection may be non-compliance with the time-limits. In R1, 44.7% of participants failed to comply with the 20-second time-limit in one or more of the four CRT-2 questions. Similarly, 21% of participants in the memory recall condition (R2) failed to complete the study and 11.4% of participants in R2 who completed the study failed to write at least four sentences in the memory recall task. In principle, task noncompliance could have weakened these reflection manipulations, since CRT-2 scores were higher among compliant than among non-compliant participants in both R1 (2.70 vs. 2.02, t(260) = 5.08, p < .001, d = 0.63) as well as R2 (2.50 vs. 1.64, t(208) = 3.85, p < .001, d = 0.78). However, these differences may also be due to participants’ thinking styles, as those who tend to be reflective (i.e., those with higher baseline CRT-2 scores) are likely to read the task instructions more carefully. Hence, exclusion of non-compliant participants from the analysis can bias results by annulling random assignment (Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Begue, Branas-Garza and WollbrantBouwmeester et al., 2017; Reference Tinghog, Andersson, Bonn, Bottiger, Josephson, Lundgren and JohannessonTinghog et al., 2013), and the appropriate solution would be to increase compliance in future studies, for example by using forced delay in R1 and stronger monetary incentives in R2.
Next, we explore the influence of experimental manipulations on self-reported reflection (Figure 2) and affect (Figure 3). A one-way ANOVA showed that the self-reported composite index of reflection significantly differed between the conditions (F(6, 1741) = 3.08, p = .005, η = .011). Pairwise comparisons using two-tailed t-tests revealed that participants in conditions with debiasing training (R4 and R5), consistent with differences in CRT-2 performance, reported relying more on reason as compared to those in the passive control (R4 vs. C1: p = .029, d = 0.19; R5 vs. C1: p = .027, d = 0.20) and the memory recall conditions (R4 vs. R2: d = 0.32; R5 vs. R2: d = 0.32; all ps < .001). As a further indication of the failure of the memory recall condition (R2) in activating reflection, self-reported reflection was significantly lower in R2 as compared to the active control and the time delay conditions (R2 vs. C2: p = .022, d = 0.21; R2 vs. R1: p < .001, d = 0.26). No other significant difference in self-reported reflection was identified between the experimental conditions.
One-way ANOVA models of PANAS showed significant effect on positive affect (F(6, 1741) = 5.25, p < .001, η = .018) but failed to show effect of conditions on negative affect (F(6, 1741) = 2.05, p = .057, η2p = .007). In particular, pairwise comparisons using two-tailed t-tests indicated that debiasing training with decision justification (R5) significantly increased positive affect as compared to the two controls (R5 vs. C1: p = .001, d = 0.29; R5 vs. C2: p < .001, d = 0.44) as well as the time delay (R5 vs. R1: p = .047, d = 0.18), the memory recall (R5 vs. R2: p = .002, d = 0.29), and the decision justification conditions (R5 vs. R3: p < .001, d = 0.36). Time delay (R1) and debiasing training (R4) conditions also increased positive affect compared to the active control (R1 vs. C2: p = .004, d = 0.26; R4 vs. C2: p = .002, d = 0.27) and the decision justification conditions (R1 vs. R3: p = .040, d = 0.18; R4 vs. R3: p = .027, d = 0.20). All other pairwise comparisons failed to reach statistical significance.
4 Discussion
In this study, we aimed to identify experimental manipulations that can effectively activate reflective thinking. Comparing five reflection manipulations and two control conditions, we found that justifying answers to the CRT-2 (R3), receiving a brief debiasing training prior to it (R4), and the combination of the two methods (R5) significantly increased reflective thinking. Against our expectations, no difference in cognitive performance was found across these three reflection manipulations. The online versions of the two manipulations commonly used in the literature — time delay (R1) and memory recall (R2) — were not found to be effective in increasing reliance on reflection, which may have been due to high noncompliance in R1 and high dropout rates in R2. On a positive note, reflection manipulations were not found to increase negative affect, and no socially desirable responding was found in these ineffective manipulations, since the self-reported reflection scores in these conditions were not higher than the controls. Overall, our study isolated two underutilized treatments (R3 and R4) as effective reflection manipulations appropriate for the online context and indicated that the two regularly used reflection methods (R1 and R2) may not be effective with the configurations used in this study.
Are any of the successful reflection manipulations preferable to the others? Our study revealed that R3, R4 and R5 increased reliance on reflection to a similar extent — resulting in moderate effect sizes that did not significantly differ from each other. As compared to conditions with debiasing training (R4 and R5), the condition with only the decision justification task (R3) has the advantage of involving a simple prompt that is easy to administer without the need to teach explicit rules for reflection. On the other hand, compared to the conditions that use decision justification (R3 and R5), the condition with only the debiasing training (R4) achieved not only high scores but also fast responses in the CRT-2 that was subsequently elicited. Therefore, the debiasing training shows promise in inducing continued activation of reflection, but the longevity of this manipulation, as well as alternative ways to strengthen it, should be further explored. Likewise, R5 (and to a lesser extent R4) resulted in higher levels of self-reported positive affect as compared with the controls, suggesting that debiasing training and the application of its lessons during decision making can increase positive effect. Whether positive affect in turn aids reflection is an open question that needs further examination. Overall, we advise that the best reflection manipulation is the one that is most appropriate for the experimental task at hand. For example, asking justifications for decisions in tasks that measure prosocial intentions can motivate socially desirable responding. For such tasks, debiasing training can be preferable. In other research settings, decision justification can provide a fast and effective reflection manipulation.
The present study suffers from various limitations. Most importantly, our results are limited by its reliance on CRT-2 as the sole cognitive performance measure. While it is well-established that the CRT-2 scores show significant positive correlations with other cognitive reflection measures such as the CRT (Reference Thomson and OppenheimerThomson & Oppenheimer, 2016; Reference Yilmaz and SaribayYilmaz & Saribay, 2017c) or standard heuristics-and-biases questions (e.g., Reference Lawson, Larrick and SollLawson et al., 2020), it is currently unclear exactly what aspects of cognitive reflection are directly captured by the CRT-2. The CRT-2 items differ from the standard CRT items by design, relying more on careful reading than on numeracy (Reference Thomson and OppenheimerThomson & Oppenheimer, 2016). In this sense, the CRT-2 items can be likened to the so-called “stumpers” (Reference Bar-Hillel, Noah and FrederickBar-Hillel, Noah & Frederick, 2018; Reference Bar-Hillel, Noah and ShaneBar-Hillel, Noah & Shane, 2019). On the other hand, while stumpers are difficult riddles that “do not evoke a compelling, but wrong, intuitive answer” (Reference Bar-Hillel, Noah and FrederickBar-Hillel et al., 2018), the intuitive answers on the CRT-2 are systematically wrong and can be used to distinguish between intuitive and reflective thinking. For example, more than a third of the answers to the first CRT-2 question (“If you’re running a race and you pass the person in second place, what place are you in?”) in the original study by Reference Thomson and OppenheimerThomson and Oppenheimer (2016) was “first” and not “second”. These systematic mistakes are probably in part due to careless reading but also because correct response on this item requires the logical inference that passing the second person in a race implies the existence of another runner who is ahead of them both. Nevertheless, more research is needed to distinguish between various cognitive performance tasks in their ability to measure different aspects of reflection (e.g., Reference Erceg, Galić and RužojčićErceg, Galić & Ružojčić, 2020).
Secondly, our results are not conclusive about the potential of time delay and memory recall tasks in increasing reflection. Our setup, where the memory recall task was shortened for the online context and where the time delay condition was not forced, may have weakened the manipulations. Low task compliance in time delay and high dropout rates in memory recall could have contributed to this failure. Hence, improved methods are needed to test the superiority of the decision justification and the debiasing training tasks over time delay and memory recall. For such tests, the standard version of the memory recall that requires writing of eight sentences can be coupled with higher monetary incentives to motivate task compliance, and the alternative version of the time delay condition that forces participants to wait for a set period can be used.
Thirdly, we cannot rule out the possibility that the direct effects of our successful reflection manipulations on cognitive performance may have been limited. For example, rather than activating reflection directly, the debiasing training condition may have indirectly improved reflection performance by increasing test-taking ability through exposure to questions that are similar to the CRT-2 or by increasing understanding of the CRT-2 items through more careful reading. Likewise, the decision justification task may be open to experimenter demand effects in some contexts. One reason why we did not find evidence for socially desirable responding may be the fact that all participants were exposed to the CRT-2 prior to reporting how much they reflected. Exposure to CRT-2 may have created a sense of reliance on reflection in the control conditions. Future studies specifically designed to study the role of socially desirable responding in reflection manipulations are needed.
Overall, this study fills an important gap in the literature by highlighting two effective manipulations (and their combination) for activating reflective thinking. These methods can be easily implemented in future research on dual-process models, including experiments conducted online. Some of the commonly used reflection manipulations are recently shown to be ineffective (e.g., Reference Deppe, Gonzalez, Neiman, Jacobs, Pahlke, Smith and HibbingDeppe et al., 2015; Reference Meyer, Frederick, Burnham, Guevara Pinto, Boyer, Ball and SchuldtMeyer et al., 2015), and earlier findings based on these manipulations often fail to replicate (e.g., Reference Sanchez, Sundermeier, Gray and Calin-JagemanSanchez et al., 2017). Hence, previous results based on unreliable reflection manipulations should be tested using improved methods. Our findings indicate that, rather than just reminding people of the benefits of reflection (as in memory recall) or giving them time to think (as in time delay), providing guidance about how to reflect specifically (as in debiasing training and decision justification) can improve cognitive performance. The methods advanced in this study — decision justification, debiasing training and their combined use — can serve this purpose well.