1 Introduction
The Social Heuristics Hypothesis (SHH) stipulates that intuitive decisions drive cooperative behavior and that reflective control overrides a cooperative ‘default’ behavior to produce selfish decisions (Bear and Rand Reference Bear and Rand2016; Rand et al. Reference Rand, Peysakhovich, Kraft-Todd, Newman, Wurzbacher, Nowak and Greene2014). According to the SHH, intuitive decisions tend to rely on experience from games encountered in everyday life, where interactions typically are repeated and involve opportunities for sanctions; deliberation adjusts behavior to the optimal self-interested response in the situation at hand.
The SHH, however, conflicts with suggestions elsewhere in the literature that deliberative processing supports pro-social decision making (e.g., Achtziger et al. Reference Achtziger, Alós-Ferrer and Wagner2015; Martinsson et al. Reference Martinsson, Myrseth and Wollbrant2012; Stevens and Hauser Reference Stevens and Hauser2004). Moreover, several studies have failed to find a relationship between pro-social behavior and canonical manipulations of cognitive processes (e.g., Hauge et al. Reference Hauge, Brekke, Johansson, Johansson-Stenman and Svedsäter2016; Tinghög et al. Reference Tinghög, Andersson, Bonn, Böttiger, Josephson, Lundgren and Johannesson2013, Reference Tinghög, Andersson, Bonn, Johannesson, Kirchler, Koppel and Västfjäll2016; Verkoeijen and Bouwmeester Reference Verkoeijen and Bouwmeester2014). This includes a recent registered replication report by Bouwmeester et al. (Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Bègue, Brañas-Garza and Evans2017), which sought to replicate the keystone time-pressure study in Rand et al. (Reference Rand, Greene and Nowak2012) but did not find an effect of time pressure on cooperation. Yet, recent meta-analyses present results consistent with an overall positive effect of intuitive decision processes on cooperation (Rand Reference Rand2016, Reference Rand2017a, Reference Randb). In sum, the literature on intuitive cooperation has grown sharply since the publication of the original time-pressure study by Rand et al. (Reference Rand, Greene and Nowak2012)—but without reaching a resolution.
This paper presents an updated meta-analysis to add clarity to the literature. While we obtain an overall meta-analytic effect of the intuition manipulations on cooperation, we can attribute this effect to a specific class of induction manipulations. These manipulations ask participants to rely on emotion over reason in determining their resource allocation (Gärtner et al. Reference Gärtner, Tinghög and Västfjäll2018; Levine et al. Reference Levine, Barasch, Rand, Berman and Small2018). Thus, we identify a single source of variation in the effect size that may account for inconsistent conclusions in the literature; when we exclude the six experiments that feature this specific manipulation—comprising just 7% of our total data set—we obtain no effect of intuition on cooperation, and the exclusion also yields a substantial reduction in systematic between-study variation. These results are problematic for the SHH as emotion-induction manipulations are vulnerable to alternative interpretations—and the SHH gives no reason for favoring this class of manipulations over others. Moreover, the dramatic dissipation of systematic heterogeneity, following removal of emotion-induction manipulations, runs counter to the idea that the intuitive cooperation effect, if present, is highly heterogeneous (Rand Reference Rand2016). We also note that our results cannot be explained by between-study variation in participant compliance rates; we find no evidence that studies with higher compliance rates yield systematically higher effect sizes, speaking against the claim in Rand (Reference Rand2017a, Reference Rand2019) that non-compliance explains why many studies find no effect of intuition manipulations on cooperation.
Our paper proceeds as follows. First, we present our data set and methods, then the analysis, after which we offer concluding remarks on the cognitive foundations of cooperation and the state of the literature.
2 Data and methods
Our inclusion criteria largely follow those in Rand (Reference Rand2016), who presented a meta-analysis to examine the effect of intuitive decision making on cooperation. The inclusion criteria define relevant experimental games and intuition manipulations. To be included in our meta-analysis, a study has to feature a controlled experiment—with monetary incentives and no deception—that used time pressure, cognitive load, ego depletion, or induction to manipulate cooperation.Footnote 1 The required intuition manipulations follow Rand (Reference Rand2016), exactly.Footnote 2
As for relevant experimental games, we depart slightly from Rand (Reference Rand2016) by focusing on games that capture cooperation in strategic interactions not contaminated by past or future choices, to ensure clear interpretation of the dependent variable. Therefore, we include only one-shot, simultaneous-move public goods games and prisoner’s dilemmas. This differs from Rand (Reference Rand2016), who in addition to simultaneous-move public goods games and prisoner’s dilemmas, also included second-player moves in sequential trust games and decisions from the last round of finitely repeated games. Nevertheless, to gauge how inclusion criteria affect our results, we perform robustness checks that also include sequential game decisions. Our final data set comprises 44 of 51 experiments included in the prior meta-analysis by Rand (Reference Rand2016), as most of his studies fit our inclusion criteria. In addition, we include 36 new experiments featuring 13,189 participants, an increase of 56.9% in the number of studies and an increase of 83.5% in the number of participants.Footnote 3 Table A.2 in Supplemental Online Material (SOM) A provides a full overview of the experiments comprising our data set, including the number of participants and details about game type and manipulation used.Footnote 4
Our inclusion decisions depart from Rand (Reference Rand2016) in two additional respects. First, our main analysis includes studies that provided in the experimental instructions information about time pressure. Rand (Reference Rand2016) argues that this introduces a potential comprehension confound—however, such challenges are inherent to these kinds of experiments regardless of when one introduces information about time pressure. Moreover, most of the data using this variation of the time-pressure manipulation originate from Tinghög et al. (Reference Tinghög, Andersson, Bonn, Böttiger, Josephson, Lundgren and Johannesson2013), who successfully solved the issue of compliance plaguing other studies (e.g., Bouwmeester et al. Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Bègue, Brañas-Garza and Evans2017; Rand et al. Reference Rand, Greene and Nowak2012). For these reasons, we do not see adequate justification to exclude studies that, in the experimental instructions, inform participants about time pressure.
Second, all of our analyses include participants who did not comply with the experimental treatment, as excluding them would lead to selection bias. The meaning of ‘compliance’ depends on the specific manipulation type, and the compliance rate varies by type. Compliance is mostly an issue for the time-pressure manipulation (where non-compliance means not responding according to the time constraint) and induction manipulations (where non-compliance means that one has failed to follow instructions to write down something in an open field). Table A.5, in SOM A, displays compliance rates by manipulation type.
In his discussion of time-pressure experiments, Rand (Reference Rand2017a) argues that excluding non-compliers provides an improved picture of the effect and that such exclusion is justifiable due to the absence of correlation between observable factors and compliance with the time constraint. However, a re-analysis of Rand et al. (Reference Rand, Peysakhovich, Kraft-Todd, Newman, Wurzbacher, Nowak and Greene2014), Table A.1 in SOM A, shows that compliant participants are a selected subgroup—consistent with the argument that compliant-only analyses suffer from selection bias (Bouwmeester et al. Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Bègue, Brañas-Garza and Evans2017; Tinghög et al. Reference Tinghög, Andersson, Bonn, Böttiger, Josephson, Lundgren and Johannesson2013). Moreover, regardless of the outcome of balance tests, participants could self-select based on factors unobservable to the researcher. For this reason, we include non-compliers, and so all results must be interpreted as an ‘intention-to-treat’ effect. Still, the number of studies and participants featured in our meta-analysis allows for high statistical power to detect very small hypothesized population effect sizes (see SOM B for a detailed power analysis).Footnote 5
We subject our data set to a random-effects meta-analysis, which allows for systematic variation between studies by assuming that each true effect is drawn from a normal population distribution, with a common mean and between-study variance (Higgins et al. Reference Higgins, Thompson and Spiegelhalter2009).Footnote 6 This modeling assumption seems reasonable a priori, as several papers argue that the effect is heterogeneous (Mischkowski and Glöckner Reference Mischkowski and Glöckner2016; Rand Reference Rand2018; Rand et al. Reference Rand, Peysakhovich, Kraft-Todd, Newman, Wurzbacher, Nowak and Greene2014; Strømland et al. Reference Strømland, Tjotta and Torsvik2016). In line with Rand (Reference Rand2016), we use as our dependent variable percentage contributed of the total endowment, ensuring that our results are directly comparable to those in the previous meta-analysis. For decision problems with binary choice, such as the conventional prisoner’s dilemma, the dependent variable takes the value 100 if the participant cooperates, and 0 otherwise.
Analytically, our study differs from Rand (Reference Rand2016) in that we pay particular attention to sources of heterogeneity—systematic inconsistency across experiments. When there is large systematic inconsistency across experiments, it is hard to interpret the weighted summary effect produced by a meta-analysis.
In our meta-analysis, each effect size is computed as the percentage point difference between the treatment (intuition condition) and control group (deliberation condition). This means that the effect-size measure is bounded between 0 and 100. For studies retrieved from Rand (Reference Rand2016), we use reported effect sizes and standard errors, directly. For Bouwmeester et al. (Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Bègue, Brañas-Garza and Evans2017), we follow the same procedure and retrieve the standard errors from the data reported. For other studies not included in either of the aforementioned data sets, we retrieved the data from regression tables where the percentage point difference between the treatment group and the control group was reported, and we normalized the effect size to a scale ranging between 0 and 100. For studies where this was not possible (e.g., if the main analysis conditioned on participants’ compliance status and the intention-to-treat effect was not reported), we downloaded the data and ran linear regressions of the normalized contribution rate on a dummy indicator for the intuition condition, using the estimated coefficient as a measure of the treatment effect (this estimator is equivalent to a simple mean difference between the intuition condition and the deliberation condition). We use robust standard errors in the regression and construct 95% confidence intervals (effect size ± 1.96SE, where SE is the standard error for the regression coefficient).
3 Results
We start by considering all experiments that meet our inclusion criteria. Figure 1 displays a forest plot of all experiments, including the overall effect with a corresponding 95% confidence interval. To the right of each estimate, we provide design details for the associated experiment.
As Fig. 1 shows, the magnitude of the overall effect of intuition manipulations on cooperation is 2.19 percentage points, and this effect is statistically significant (p = 0.005, Z test). However, the magnitude of the overall effect is only 35.7% of the main effect reported in a prior meta-analysis that excludes non-compliers (Rand Reference Rand2016) and only 52.1% the size of the intention-to-treat effect reported in that meta-analysis. This reduction in effect size may reflect the addition of individual lab estimates featured in the large registered replication study by Bouwmeester et al. (Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Bègue, Brañas-Garza and Evans2017), which finds no effect of time constraints on cooperation. This pattern, in turn, is consistent with the ‘decline effect’ (e.g., Fanelli et al. Reference Fanelli, Costas and Ioannidis2017), whereby the influence in a meta-analysis of publication bias in an initial study dissipates with the number of replication studies added with no effect.
The overall effect may nevertheless not capture a psychologically relevant parameter; we can attribute 62% of the variation in the above forest plot to systematic differences between experiments (I 2 = 61.9%, χ 2(81) = 212.75, p < 0.001).Footnote 7 Moreover, the estimated between-study variance is large ( = 27.08). As an illustration, note that the effect size varies from − 9 percentage points to 32. In summary, the analysis suggests an overall positive effect, but the experiments included exhibit very large variation in effect sizes, and that variation may, to a large degree, be attributed to factors other than chance.Footnote 8 As an overall effect size provided by a random-effects analysis is insufficient to summarize a heterogeneous set of studies (Raudenbush and Bryk Reference Raudenbush and Bryk1985), our summary estimate should be interpreted with caution.
When a meta-analysis suggests large between-study variation, it is common practice to search for the sources of that variation (Higgins et al. Reference Higgins, Thompson, Deeks and Altman2003a, Reference Higgins, Thompson, Deeks and Altmanb). In our case, the observed heterogeneity may have several explanations. One possibility is that the intuitive cooperation effect is contingent on various background factors, as suggested in several papers (Capraro and Cococcioni Reference Capraro and Cococcioni2015; Mischkowski and Glöckner Reference Mischkowski and Glöckner2016; Rand et al. Reference Rand, Peysakhovich, Kraft-Todd, Newman, Wurzbacher, Nowak and Greene2014; Strømland et al. Reference Strømland, Tjotta and Torsvik2016), including Rand’s (Reference Rand2016) meta-analysis. Another possibility is that various manipulations, here grouped together as ‘intuition manipulations’, may work in different ways or even capture distinct psychological processes. That is, one may ask whether the observed inconsistency across studies is attributable to genuine and perhaps unpredictable variation in the underlying effect across study populations, or whether it is a by-product of the inclusion criteria. To distinguish between these possibilities, we turn to an analysis that separates manipulation types.
3.1 Comparing manipulations: meta-regressions
We use meta-regressions (see e.g., Thompson and Higgins Reference Thompson and Higgins2002) to compare the intuitive cooperation effect across manipulation types. We take as a baseline experiments with time pressure, since time pressure is the manipulation type most frequently applied to induce cooperation. In SOM A (see Figures A.1–A.7), we provide meta-analyses specific to each manipulation type. In all individual meta-analyses but one, there is substantially less systematic between-study variation than there is in the overall analysis. The exception is that for induction manipulations (see Figure A.4), where the estimated heterogeneity is 83.1%—which is very high (Higgins et al. Reference Higgins, Thompson, Deeks and Altman2003a, Reference Higgins, Thompson, Deeks and Altmanb); this indicates that nearly all observed variation is attributable to genuine differences in the underlying effect across studies of this type. For this reason, we split induction manipulations into the following subcategories: (i) ‘emotion-induction’ manipulations instructing participants to rely on emotion over reason when making their choices, (ii) ‘recall induction’, and (iii) ‘other induction’ manipulations. The meta-regression results are displayed in Table 1. It is important to note that these regressions capture correlations, as we only have within-study randomization and no exogenous between-study variation.
(1) |
(2) |
(3) |
(4) |
|
---|---|---|---|---|
Effect size |
Effect size |
Effect size |
Effect size |
|
Depletion |
− 0.177 |
− 15.06*** |
||
(2.567) |
(3.145) |
|||
Cognitive load |
0.695 |
− 14.19*** |
||
(3.851) |
(4.258) |
|||
Recall induction |
1.142 |
− 13.74*** |
||
(2.043) |
(2.734) |
|||
Emotion induction |
14.88*** |
14.81*** |
||
(2.123) |
(2.080) |
|||
Other induction |
2.565 |
− 12.32*** |
||
(3.148) |
(3.635) |
|||
Time pressure |
− 14.88*** |
− 14.81*** |
||
(2.123) |
(2.080) |
|||
Pooled |
0.978 |
− 13.83*** |
||
(1.461) |
(2.300) |
|||
Constant |
0.619 |
15.50*** |
0.630 |
15.44*** |
(0.777) |
(1.976) |
(0.765) |
(1.934) |
|
Observations |
82 |
82 |
82 |
82 |
Standard errors in parentheses. (1) Meta-regressions on manipulation type (baseline: time pressure); (2) meta-regressions on manipulation type (baseline: emotion induction); (3) meta-regressions on manipulation type, all manipulations that are not emotion induction or time pressure pooled together (baseline: time pressure); and (4) meta-regressions on manipulation type, all manipulations that are not emotion induction or time pressure pooled together (baseline: emotion induction). ‘Pooled’ is a dummy for all manipulations that are not time pressure or emotion induction
*p < 0.10, ** p < 0.05, ***p < 0.01
The meta-regressions yield several noteworthy results. First, Column (1) shows that only experiments using emotion-induction manipulations are significantly more effective in promoting cooperation than are time-pressure studies (coefficient = 14.88 percentage points, t(76) = 7.01, p < 0.001); the other manipulations are not significantly different from the small and non-significant effect estimated for the time-pressure studies (coefficient = 0.619, t(76) = 0.80). It is also noteworthy that ‘other induction’ manipulations yield an estimated effect very close to that of time-pressure studies, a mere 2.57-percentage point difference (t(76) = 0.81). Column (2) takes emotion-induction manipulations as the baseline and shows that all other manipulations are significantly less effective in promoting cooperation. Consistent with this, in Column (4), both time-pressure (t(79) = − 7.12, p < 0.001) and ‘pooled’ manipulations (t(79) = − 6.01, p < 0.001) are estimated to reduce the effect size by about 14 percentage points relative to emotion-induction manipulations. Together, these results justify our subdivision of the wider class of induction manipulations.
A funnel plot of all studies in the main analysis (see SOM A, Fig. A.11) illustrates the relative effectiveness of manipulation types; five out of six experiments using the emotion-induction manipulations appear as outliers, to the right of the 95%-confidence bar.
While Rand (Reference Rand2016) suggests that time-pressure manipulations are less effective than are other manipulations, our results indicate that only emotion-induction manipulations differ in their effect from other manipulations. We therefore proceed to test whether our overall meta-analytic effect depends on the emotion-induction manipulations, specifically; we conduct an alternative meta-analysis that includes all studies other than the six experiments using emotion-induction manipulations. This meta-analysis (see Fig. A.10) reveals no discernable overall effect on cooperation; the estimated meta-analytic effect is 1 percentage point ( p = 0.076, Z test), and, judged by conventional classifications (Higgins et al. Reference Higgins, Thompson, Deeks and Altman2003a, Reference Higgins, Thompson, Deeks and Altmanb), heterogeneity is also quite low (I 2(74) = 19.8%, χ 2(75) = 93.50, p = 0.073, = 4.43). Because time-pressure studies have been called into question, both for the size of their effect (Rand Reference Rand2016) and their validity (Myrseth and Wollbrant Reference Myrseth, Wollbrant and Cognitive2017), we run a meta-analysis that excludes all emotion-induction and time-pressure manipulations, evaluating all other manipulations in the same test (Fig. A.9). In this meta-analysis, the estimated effect of the intuition manipulations is 1.62 percentage points—only 26.4% of the main effect reported in Rand (Reference Rand2016) and only 38.6% of that study’s intention-to-treat estimate—and not significantly different from zero (p = 0.177, Z test).
To ensure that our conclusions are not sensitive to inclusion criteria, we undertake additional robustness checks, using various combinations of Rand’s (Reference Rand2016) inclusion criteria while excluding the emotion-induction studies. In all tests, we follow Rand and include data on second movers and last-round moves in finitely repeated games. We also undertake robustness tests where we include data on trust game decisions, and tests where we include second-mover decisions only where the first mover contributed the maximum amount possible (as in Rand Reference Rand2016). We carry out these robustness checks both for the specification excluding emotion-induction and time-pressure studies (Fig. A.9) and for the specification excluding only the emotion-induction studies (Fig. A.10). None of these robustness checks reveal a statistically significant overall effect; the estimated effect is consistently very small and insensitive to the inclusion criteria (see Table A.3 for details). Finally, it is worth noting that a separate meta-analysis of pre-registered studies (Bouwmeester et al. Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Bègue, Brañas-Garza and Evans2017; Camerer et al. Reference Camerer, Dreber, Holzmeister, Ho, Huber and Johannesson2018; Everett et al. Reference Everett, Ingbretsen, Cushman and Cikara2017), only, leads to a similar conclusion; the effect size in this meta-analysis is just 0.79 percentage points and not statistically significant, and the estimated heterogeneity is low (see Fig. A.12).
A possible interpretation of our null result is that the ‘true’ effect size is very small, and that our result, when excluding emotion-induction manipulations, is a false negative. However, this interpretation would prove equally challenging to existing studies that report evidence for intuitive cooperation. Suppose that our upper bound on the effect size—1.8 percentage points in these eight specifications—represents the true effect size. Then, for a single study to have 80% power to detect the underlying effect, one would need a sample size of at least 15,486 participants (assuming a common standard deviation of 40 between treatment groups). Should the effect size instead be 1 percentage point, as in Fig A.10—which also corresponds closely to the effect size obtained using only pre-registered studies (see Fig. A.12)—one would need a sample size of at least 50,176 participants for a single study to achieve 80% power. Thus, even if our main finding were a false negative, the mean effect size in this literature is so small that to meaningfully study it one would need sample sizes an order of magnitude larger than those typically used in experimental studies. Any statistically ‘positive’ finding in this literature, obtained in typical sample sizes, would therefore likely represent a major overestimate (Gelman and Carlin Reference Gelman and Carlin2014).
3.2 Alternative explanations
Rand (Reference Rand2019) responds to a pre-print version of our analysis by undertaking his own updated meta-analysis. He uses a combination of the data from Rand (Reference Rand2016) and those from our paper. His main argument is that our choice to exclude sequential games from the main analysis is responsible for the null effect obtained when we exclude emotion-induction manipulations. However, this cannot be the reason for the discrepancies between his new findings and ours—Table A.3 in our supplementary materials shows that our results are insensitive to the differences in inclusion criteria between Rand (Reference Rand2016) and our study.
Rand (Reference Rand2019) argues further that poor experimental designs may account for why there are many null findings in the literature. He suggests that future studies should move towards experimental designs that increase the compliance rate and comprehension of the game, and he expects these design features to be associated with substantially larger treatment effects. Related to the latter point, we note that the registered replication report by Bouwmeester et al. (Reference Bouwmeester, Verkoeijen, Aczel, Barbosa, Bègue, Brañas-Garza and Evans2017) undertook a high-powered test of the hypothesis that comprehension moderates the time-pressure effect; they found no time-pressure effect in the comprehending subgroup. As for the hypothesis that greater compliance is associated with greater effect size, we are not aware of prior tests in the literature, so we undertake it here. Because compliance varies between manipulation types, we also undertake a separate test for studies using time-pressure manipulations. Figure 2 presents a scatter plot of compliance rate and effect size, for all studies included in our meta-analysis.
As Fig. 2 shows, there is no obvious relationship between the compliance rate of the study and the observed effect size, neither in general for all manipulations nor specifically for the time-pressure manipulations. The correlation is estimated in a meta-regression to be small and positive, but not statistically significant, neither for the full sample nor for the sample of time-pressure studies (regression results in Table A.4). Based on evidence available, therefore, it seems unlikely that a movement towards studies with higher compliance rates will have a major impact on effect sizes in this literature.
An alternative way of addressing the role of study compliance is to run the meta-analysis for compliant participants only (so that the effect size is computed for each study only for participants who complied with the time allotted), as the main analysis in Rand (Reference Rand2016) was conducted. We report such an analysis in Fig. A.13, where we run the meta-analysis for all studies, including compliant participants only. This analysis yields a positive and statistically significant association between the intuition manipulations and cooperation, even when excluding emotion-induction manipulations. However, conditioning on compliance status amounts to a ‘bad control problem’, as the treatment effect conditional on potentially endogenous variables warrants causal interpretation only under quite restrictive assumptions (Montgomery et al. Reference Montgomery, Nyhan and Torres2018). Specifically, the analysis assumes that compliance, which happens after randomization, does not affect systematically the relative distribution of participants in the treatment versus control groups. This assumption is unmerited, however, as conditioning on compliance may plausibly change the composition in the treatment versus control groups differentially, such that these groups no longer are directly comparable. And, as seen in Table A.1, there is empirical evidence for the selection-bias argument—data sets in this literature indicate that there is self-selection into who complies or fails to comply with the treatment assigned. Finally, we would also note that absence of imbalance would not in itself amount to evidence against the selection-bias argument, as balance tests do not have 100% statistical power—and not all factors imbalanced between treatments are measured. In choosing to include non-compliant participants in our main analysis, we also follow recent meta-analyses in this literature (Fromell et al. Reference Fromell, Nosenzo and Owens2018; Köbis et al. Reference Köbis, Verschuere, Bereby-Meyer, Rand and Shalvi2019; Rand Reference Rand2019).
4 Conclusion
We present an updated meta-analysis of experiments that attempt to manipulate intuitive decision-making processes in games of cooperation. Our analysis tests the Social Heuristics Hypothesis (SHH), which stipulates that intuitive decision-making processes facilitate cooperative behavior. In examining both the overall meta-analytic effect and the origin of the between-study heterogeneity, we fail to obtain robust evidence for the SHH. Although we find evidence in favor of an overall positive effect of intuitive decision processes on cooperation, we can attribute this effect to a particular class of emotion-induction manipulations—those asking participants to rely on emotion over reason when determining allocation. Other manipulation types fail to yield a statistically discernable effect on cooperation. When we exclude the six studies with this manipulation type and conduct a meta-analysis on the remaining 76 studies, which comprise 93% of the observations in our full data set, we find that intuition manipulations have no effect on cooperation.
The consistency in findings across all manipulation types, save the emotion-induction manipulations, suggests that the latter produces a distinct effect. One possibility is that the transparency of the researchers’ intention in this setting—asking people to rely on emotion over reason—is understood as a request that participants cooperate, akin to an experimenter demand effect. A request to use your ‘heart’ could be seen as encouragement to be ‘nice’, whereas a request to use your ‘brain’ may indicate that you should try to calculate personal consequences (and not be gullible). The demand effect is less likely to apply to the other intuition manipulations (e.g., time pressure) as the link in those cases, between the researcher’s hypothesis of interest and the treatment, is less transparent. While a laboratory participant asked to decide within 10 s might suspect that the study is about the relationship between cooperation and making decisions fast or slow, the direction of the research hypothesis is not evident. Notably, direct requests that signal strongly potential underlying research objectives have been shown to strengthen experimenter demand effects (de Quidt et al. Reference de Quidt, Haushofer and Roth2018).
An alternative, but perhaps less plausible possibility is that emotion induction is the only class of manipulations that successfully influences intuitive decision making. However, even if this alternative interpretation were true, it is worth noting that the SHH (Rand et al. Reference Rand, Peysakhovich, Kraft-Todd, Newman, Wurzbacher, Nowak and Greene2014; Bear and Rand Reference Bear and Rand2016) did not give reason a priori that this manipulation should work, whereas others should not. Related to this, one might wonder whether failure to comply with experimental instructions could account for our results, as compliance varies with study type. However, we do not find evidence for the hypothesis, put forward by Rand (Reference Rand2019), that studies with higher compliance exhibit higher effect sizes.
We also fail to find support for the idea that the underlying effect is highly heterogeneous (Rand Reference Rand2016), as the removal of emotion-induction experiments from the meta-analysis reduces estimated between-study heterogeneity dramatically. This finding is consistent with the low between-study variation observed in the meta-analysis by Fromell et al. (Reference Fromell, Nosenzo and Owens2018), who study the effect of intuition manipulations on dictator game giving. We cannot rule out the possibility that we are underpowered to detect study-level heterogeneity, but it does appear that the meta-study by Rand (Reference Rand2016) overstates the importance of study-level heterogeneity for the effect of intuition manipulations. Nevertheless, tests for heterogeneity between studies will not necessarily pick up genuine individual-level heterogeneity, if such individual characteristics tend to be similar across study populations, and some studies argue that such individual-level heterogeneity is important for the link between intuition and cooperation (e.g., Alós-Ferrer and Garagnani Reference Alós-Ferrer and Garagnani2018). One recent study on time-pressure effects in the dictator game tests more directly for such individual-level heterogeneity (across a large set of potentially relevant variables) and finds little evidence for it (Strømland and Torsvik Reference Strømland and Torsvik2019).
As our study focuses on cooperation, we cannot rule out that intuition influences other forms of pro-social behavior. According to Rand et al. (Reference Rand, Brescoll, Everett, Capraro and Barcelo2016), the SHH also predicts intuitive altruism in women, but not men. While their meta-analysis finds support for this prediction, a more recent meta-analysis by Fromell et al. (Reference Fromell, Nosenzo and Owens2018) finds for men a negative effect of intuitive decision processes on altruism and for women no effect.
At a more general level, our findings also speak to the current discussion on heterogeneity in effect sizes in psychology and economics (DellaVigna and Pope Reference DellaVigna and Pope2018; Klein et al. Reference Klein, Ratliff, Vianello, Adams, Bahník, Bernstein and Nosek2014; McShane and Böckenholt Reference McShane and Böckenholt2014; van Aert et al. Reference van Aert, Wicherts and van Assen2016). Meta-analyses in psychology typically suggest substantial systematic heterogeneity in effect size (Stanley et al. Reference Stanley, Carter and Doucouliagos2018), but the recent ‘Many Labs’ projects find relatively low systematic variation in effect size across various contexts and cultures (Klein et al. Reference Klein, Ratliff, Vianello, Adams, Bahník, Bernstein and Nosek2014, Reference Klein, Vianello, Hasselman, Adams, Adams, Alper and Nosek2018). Consistent with this, studies by DellaVigna and Pope (Reference DellaVigna and Pope2018) indicate that effect sizes tend to be more stable across settings than predicted by expert forecasts. Our meta-analysis is consistent with these findings, and it shows that estimated treatment effect heterogeneity in meta-analyses can be surprisingly sensitive to inclusion criteria; when we include the emotion-induction manipulations, heterogeneity is high—but when we exclude them, heterogeneity is low. Our evidence thus highlights the possibility that some of the heterogeneity reported in meta-analyses arises from researchers’ inclusion decisions—as opposed to genuine variation in the effects under scrutiny.