1 Introduction
The literature on unconscious processing is vast and there is evidence that such processes can influence judgments, memory, and behavior (e.g., Bargh, Reference Bargh, Higgins and Sorrentino1990; Jacoby, Reference Jacoby1991; Nisbett & Wilson, Reference Nisbett and Wilson1977; Shiffrin & Schneider, Reference Shiffrin and Schneider1977; Zajonc, Reference Zajonc1980). An intriguing new theory, Unconscious Thought Theory (Dijksterhuis & Nordgren, Reference Dijksterhuis, Bos, Nordgren and vanBaaren2006) holds that the unconscious is a highly sophisticated, rational system that can make better decisions in complex situations than conscious thought (Dijksterhuis, Reference Dijksterhuis2004; Dijksterhuis, Bos, Nordgren, & van Baaren, Reference Dijksterhuis, Bos, Nordgren and vanBaaren2006; Dijksterhuis & van Olden, Reference Dijksterhuis and van Olden2006;). Furthermore, according to a recent publication, experts who think unconsciously can make better use of diagnostic information and arrive at better predictions than non-experts, or experts who think consciously (Dijksterhuis, Bos, van der Leij, & van Baaren, Reference Dijksterhuis, Bos, van der Leij and van Baaren2009). In the present article, we evaluate this claim and conclude that the hypothesis of superior performance by unconscious thinkers in a predictive judgment task is not conclusively substantiated statistically or theoretically. (For a more general and detailed critique of Unconscious Thought Theory see, González-Vallejo, Lassiter, Bellezza, & Lindberg, Reference González-Vallejo, Lassiter, Bellezza and Lindberg2008).
2 Summary of Dijksterhuis et al.’s (Reference Dijksterhuis, Bos, van der Leij and van Baaren2009) methodology
In two studies (Dijksterhuis et al., Reference Dijksterhuis, Bos, van der Leij and van Baaren2009), participants predicted the results of upcoming soccer matches (n = 352 and n = 116, in Experiments 1 and 2, respectively). The experimental methodology used by the researchers was very similar for the two studies. First they assessed participants’ expertise using a 1 to 9 self-rating scale. Next, they presented participants with four upcoming soccer matches from the highest Dutch league (“Eredivisie”) and were asked to predict the results of each one (home-team win, away-team win, or draw). In the Immediate condition, participants were presented with the team names and were asked to make a prediction in 20 seconds. In the Conscious and Unconscious conditions, participants were shown the teams for 20 seconds, and then were told that they would be making predictions later on. Conscious thought participants were then given an additional 2 minutes to think about the matches, while Unconscious thought participants were told they would do something else and performed a 2-minute “two-back” task designed to occupy conscious processing. The procedure for Experiment 2 was basically the same as that of Experiment 1 with two differences. First participants predicted five soccer matches from the World Cup, and second, participants were asked to estimate the rank of each country in the World Ranking List (WRL) after they completed the other procedures. Dijkstehruis et al. (Reference Dijksterhuis, Bos, van der Leij and van Baaren2009) claimed that participants who were distracted prior to providing their predictions (the unconscious group) and who scored higher in a self-assessed measure of soccer expertise, outperformed participants who either provided their predictions immediately, or after being asked to think carefully about each prediction. In this critique we perform alternative statistical analyses and derive different conclusions.
3 Statistical issuesFootnote 1
The primary test carried out by Dijksterhuis et al. (Reference Dijksterhuis, Bos, van der Leij and van Baaren2009) was an ANOVA with condition (Immediate, Conscious, and Unconscious) and Expertise (Low versus High) as between-subjects factors on accuracy as measured by proportion of correct predictions. The Expertise factor was constructed from a median-split of the self-assessments of expertise. The main result from the two studies is a Condition by Expertise interaction showing that higher accuracy results with higher expertise for unconscious participants.
Our statistical reanalysis begins at the descriptive level, because it provides a clear view of the distributional characteristics of accuracy, as a function of the independent variables in question. In addition, we challenge the use of the ANOVA analysis conducted by the authors using a median-split of self-rated expertise. The perils of median-split have been greatly documented by several prominent researchers of the methodological field (Maxwell & Delaney, Reference Maxwell and Delaney1993; Vargha, Rudas, Delaney & Maxwell, Reference Vargha, Rudas, Delaney and Maxwell1996; MacCallum, Zhang, Preacher & Rucker, Reference MacCallum, Zhang, Preacher and Rucker2002). Irwin & McClelland (Reference Irwin and McClelland2001) and Fitzsimons (Reference Fitzsimons2008) have made a direct call to researchers to stop dichotomizing variables because of the potential of making unwarranted conclusions. Thus, we present alternative analyses that do not dichotomize self-rated expertise.
* This cell has only one observation.
** No observations at self-expertise level of 8.
Table 1 contains the means and quartiles of proportion correct as a function of Self-rated expertise and Condition (Conscious and Unconscious groups) in Experiments 1 and 2. For ease of presentation the Immediate group is omitted, but its distribution is very similar to that of the other two groups. Clearly the middle fifty percent of the distributions for the groups overlap at all levels of expertise and no greater increase in mean accuracy is observed for the unconscious group as a function of expertise. In addition, the number of times that the unconscious participants produce higher means than the conscious group is not greater than what would be predicted by chance alone (binomial test p > .05).Footnote 2
Using the dichotomization of Dijksterhuis et al. (Reference Dijksterhuis, Bos, van der Leij and van Baaren2009), we replicated the ANOVA significant interaction between thought Condition and Self-rated expertise. The means and standard errors for each experiment and condition are found in Table 2.
As seen in Table 2, conditional on expertise level, the 95 percent confidence intervals around the means overlap across the Immediate, Conscious, and Unconscious conditions in both experiments. We note that the medians of the Self-rated expertise are rather low (3 and 4, for each ). The values used by Dijksterhuis et al. to split the groups differ from these values. As found in Dijksterhuis et al.’s footnotes, they departed from using the medians in order to have more even groups of participants (for example, 52 and 64 individuals in the low and high self-rated expertise groups in Experiment 2, respectively). We remark, however, that splitting the groups at the median of 4, yields exactly 58 participants in each group in Experiment 2, and the number of participants at each level of Condition is more even than with the split the authors used. In addition, the Self-rated expertise by condition interaction in Experiment 2 occurs when Self-rated expertise is dichotomized at the value 3, and this result disappears when the dichotomization occurs at the actual median of 4. Another important aspect of these data is that the effect sizes found are quite small (partial eta square < .03).
The descriptive statistics found in Tables 1 and 2 tell two different stories. Without dichotomization, accuracy does not increase more sharply as a function of expertise for the Unconscious group; but with dichotomization, the mean differences (ignoring the confidence intervals) are greater between low and high Self-rated expertise for the Unconscious group. Thus, in order to test the generality of the interaction found with ANOVA that used the median-splits, we performed several splits of the expertise ratings,Footnote 2 five in each for a total of ten tests, and found that no other split criteria besides the one used by Dijksterhuis et al. (Reference Dijksterhuis, Bos, van der Leij and van Baaren2009) replicated their ANOVA interaction results.
Dichotomization has the problem that some splits result in more uneven sample sizes for the different groups, so it is desirable to test the interaction hypothesis in another manner. As earlier stated, the prediction of UTT is that higher mean accuracy should be evident for experts in the unconscious condition relative to the experts in the other groups and the non-experts. This implies two things: 1) that accuracy increases with expertise and 2) that the increase is more pronounced for the unconscious group. Using the general linear model approach advocated by many researchers (e.g., Fitzsimons, Reference Fitzsimons2008), we can test this interaction in a regression framework. Results demonstrated no significant Condition by Self-rated expertise interactions in the two studies: F(2, 346) = 1.98, p = .14, Experiment 1; and F(2, 110) = 1.56, p = .215, Experiment 2. Because Experiment 2 also had measures of objective expertise (that is, knowledge of the world soccer rankings of the teams, WRL), we performed the same test using WRL as independent variable. The Condition by WRL (objective-expertise) interaction was not significant either, F(2, 110) = 2.17, p = .12. Thus, we do not find support for the hypothesis that accuracy is differentially affected by thought condition and levels of expertise (either objective or self-rated) when using the general linear model approach. We thus conclude that the results observed with the median split analysis are spurious, because the works of Maxwell and Delaney (Reference Maxwell and Delaney1993), Vargha et al. (Reference Vargha, Rudas, Delaney and Maxwell1996), and MacCallum et al. (Reference MacCallum, Zhang, Preacher and Rucker2002) demonstrated that spurious significant interactions can appear in analyses that dichotomize the independent variables, in part due to non-linearity between the independent and dependent variables. As seen in Table 1, accuracy does not follow a clear monotonic trajectory from low to high expertise, and the trend of the means show a small peak in the middle of the scale for the unconscious participants.
Next, we performed a more direct test of the mean differences between the two key groups, Unconscious and Conscious participants, on a contrast that captured the expected accuracy increases when going from low to high expertise. Again, Unconscious and Conscious groups were not significantly different on this linear contrast: (t(349) = –.2, p = .42, Experiment 1, and t(113) = –.19, p = .42 in Experiment 2, one tail tests). That is, the changes in accuracy as a function of expertise were not different for the Unconscious and Conscious participants.
Finally, we just explored additive models and checked more generally whether the variability in accuracy is better explained by adding Condition as a variable once we control for Self-rated expertise. R2s remained unchanged up to two decimal places when the Condition independent variable was added to the model. In Experiment 1, the full and reduced models yield R2 = .14. In Experiment 2, the R2 = .01. In each experiment, the linear model containing only Self-rated expertise is significant at the .05 level, but the relation is small (R2 < .14). Using WRL (objective-expertise) as a predictor (with or without Condition in Experiment 2) yields R2 = .09. Objective-expertise is significant (at the .05 level), and not surprisingly a stronger predictor of accuracy.
4 Conclusions
The notion that experts can make better predictions when thinking unconsciously is in part traced to the assumption that unconscious thought weights the importance of attributes appropriately, whereas conscious thought disturbs the natural process and produces suboptimal weighting of cues (Dijksterhuis, et al., Reference Dijksterhuis, Bos, van der Leij and van Baaren2009). This is the weighting principle of UTT (Dijksterhuis & Nordgren, Reference Dijksterhuis, Bos, Nordgren and vanBaaren2006;). An earlier study (Dijksterhuis, Reference Dijksterhuis2004, Experiment 3) attempted to find evidence for this principle by correlating people’s importance judgments of the dimensions that defined the stimuli and the participants’ overall preferences for the stimuli. As stated by the authors, no significant differences were found among the groups that thought consciously or unconsciously on this correlation measure (see page 2, Dijksterhuis, et al., Reference Dijksterhuis, Bos, van der Leij and van Baaren2009; page 100, Dijksterhuis & Nordgren, Reference Dijksterhuis, Bos, Nordgren and vanBaaren2006).
From another perspective, Dijksterhuis et al., (Reference Dijksterhuis, Bos, van der Leij and van Baaren2009) refer to the work by Halberstadt and Levine (Reference Halberstadt and Levine1999) to emphasize the shortcomings of conscious thinking in a predictive judgment task. In that study, participants predicted basketball games either after thinking and listing the reasons for their choices (at least three reasons), or without doing so (control group). Participants also provided self-rated expertise judgments. The results of this study found that those who were asked to list reasons had worse accuracy scores (measured with three dependent variables) than those who did not, replicating and expanding the work of Wilson and Schooler (Reference Wilson and Schooler1991) on the effects of listing reasons. With regards to self-rated expertise, the results showed only a marginal (thus non-significant) negative correlation between self-rated expertise and one of the three accuracy measures used in the study. Hence, we believe that the Halberstand and Levine study cannot be linked directly to the hypothesis that unconscious thinking should aid experts (or more precisely, self-rated experts) when making predictions. We also believe that this research does not directly relate to the conscious condition employed by Dijksterhuis et al. and therefore has little to say about the possible lower performance of individuals who are asked to think consciously about their predictions. The differences in procedures could be significant (i.e., between listing reasons versus just thinking about a problem). For example, a good technique for reducing the overconfidence bias (i.e., confidence judgments are higher than those warranted by accuracy) is to list con reasons for a chosen response in contrast to listing pro reasons (Koriat, Lichtenstein, & Fischhoff, Reference Koriat, Lichtenstein and Fischhoff1980). What this means is that even within different types of conscious directives, performance can vary.
From yet another angle, the superiority of unconscious experts is linked to Fuzzy-trace theory (Reyna & Brainerd, Reference Reyna and Brainerd1991, Reference Reyna and Brainerd1995a, Reference Reyna and Brainerd1995b). Dijksterhuis et al. (Reference Dijksterhuis, Bos, van der Leij and van Baaren2009) state that experts will benefit more from unconscious thinking when compared to non-experts because experts rely on “gist” instead of “verbatim” memory to form judgments, and that gist memory is unconscious. But Fuzzy-trace theory acknowledges that consciousness is multidimensional, and there is nothing in Fuzzy-trace theory that would prevent gist from being used when prompted to think carefully. Some of Fuzzy-trace theory key principles are: 1) Cognitive flexibility results from encoding both gist and verbatim representations, 2) reasoning operates at the least precise level of gist as expertise increases, and 3) qualitative processing becomes the default mode of reasoning and is not a result of computational complexity. The first principle assumes parallel processing for both gist and verbatim information, and the second and third principles assume a greater reliance on gist as expertise increases with reasoning being qualitative more than quantitative. Taking these principles together, the only expectation with regard to making predictive judgments is that experts will be more likely to use their gist memory than non-experts. It is unclear how unconscious experts will derive further benefits from distraction.
A final point concerning the weighting principle is that unconscious thought is simultaneously expected to weight information optimally, but is unable to use numerical information (page 2, Dijksterhuis et al., Reference Dijksterhuis, Bos, van der Leij and van Baaren2009). Payne, Samper, Bettman and Luce (Reference Payne, Samper, Bettman and Luce2008) showed that in a gambling task conscious thinkers were better at weighting than the unconscious thinkers (i.e., a contradiction of UTT), but these results were dismissed by Dijksterhuis et al. under the premise that the unconscious does not use numbers. Thus we are left with a conundrum: the unconscious can make better judgments and decisions in complex environments, but it cannot process numerical information. A thought experiment quickly reveals that much complexity in the world is found in numerical form (e.g., comparing insurances, making retirement decisions, making travel plans with differing costs and schedules) and therefore the non-numerical aspect of unconscious thinking seems at odds with its ability to excel in complex problems.
From a broad theoretical perspective, we (researchers in judgment and decision making) are surprised that a vast literature on predictive and diagnostic judgments was largely ignored by a paper that attempts to advise experts on how to best make predictions. For example, there is an extensive literature on clinical and probability judgment that has focused on describing the shortcomings of expertise and the robustness of linear models in many domains (Dawes, Reference Dawes1979, Reference Dawes2005; Dawes, Faust, & Meehl, Reference Dawes, Faust and Meehl1989; Meehl, Reference Meehl1954, etc.). Studies have also looked at the factors that influence the beliefs in expertise (the illusion of validity — Einhorn & Hogarth, Reference Einhorn and Hogarth1978) and the conditions under which experts differ among each other in the way they weight and combine information (Einhorn, Reference Einhorn1974). In a different realm, the calibration literature has demonstrated, among other things, that accuracy is a complex concept and that different measures address different psychological processes (e.g., discrimination versus calibration, see Yates,Reference Yates1990, for a comprehensive review of calibration; see Yates also for performance differences by experts and lay people in many domains). In addition, experts vary in their levels of accuracy as a function of tasks (Yates); for example, weather forecasters made accurate probabilistic forecasts of rain (Murphy & Winkler, Reference Murphy and Winkler1977), but physicians diagnosing pneumonia did not perform well (Christensen-Szalanski & Bushyhead, Reference Christensen-Szalanski and Bushyhead1981). Furthermore, researchers in the cue probability learning and lens model traditions have studied predictive judgments extensively and proposed mechanisms of how individuals combine and weight cues and how feedback and task properties can affect these processes as well as performance (Hammond, Summers, & Deane, Reference Hammond, Summers and Deane1973; Hogarth, Gibbs, McKenzie, & Marquis, Reference Hogarth, Gibbs, McKenzie and Marquis1991; Klayman, Reference Klayman, Brehmer and Joyce1988; Stewart & Lusk, Reference Stewart and Lusk1994). The list of references we present is by no means exhaustive, but sheds light on the richness of studies and methods that researchers have employed to understand judgments of novices and experts. We believe that a theory like UTT would benefit from making the relevant theoretical connections to this research when attempting to explain and predict how judges make forecasts. In particular, Hammond’s Cognitive Continuum Theory (Reference Hammond1996) is a clear candidate for analyzing the conditions in which different modes of thoughts may lead to different judgment strategies and outcomes across the deliberation-intuition continuum.
In sum, because the mechanisms underlying UTT have yet to be clearly defined, and because several researchers have not been able to replicate the basic finding of superior performance by unconscious thinkers (Acker, Reference Acker2008; Calvillo & Penaloza, Reference Calvillo and Penaloza2009; Newell, Yao Wong, Cheung, & Rakow, Reference Newell, Yao Wong, Cheung and Rakow2009; Waroquier, Marchiori, & Cleeremans, Reference Waroquier, Marchiori, Klein and Cleeremans2009) we conclude that it is premature to recommend that individuals “let their unconscious do the work” for important decisions. We also warn against the recommendation that experts should think unconsciously when making forecasts.