1 Introduction
Methodology is one of the vital pillars of all science. Indeed, the question of how we go about our scientific quests—rather than what exactly we are investigating—has stimulated numerous debates and controversies over the past centuries. Mostly, this debate has served the common purpose of establishing certain standards which serve as a road map for scientists. Though disciplines and subfields vary greatly in their specific methodological standards, all share some degree of concern for such matters.
The field of psychology certainly is no exception. On the contrary, “[o]ne of the hallmarks of modern academic psychology is its methodological sophistication” (Reference RozinRozin, 2009, p. 436). Methodological issues play a prominent role in the ongoing exchange and a growing number of contributions have recently addressed potential methodological problems inherent in the behavioral sciences (e.g., see the recent special issues in Perspectives on Psychological Science by De Houwer, Fiedler, & Moors, 2011; and Reference KruschkeKruschke, 2011). Doubts have been raised concerning the subjects on which findings are typically based (Reference Henrich, Heine and NorenzayanHenrich, Heine, & Norenzayan, 2010), the approaches taken in theory development and testing (Reference Marewski, Schooler and GigerenzerGigerenzer, 1998; Reference HendersonHenderson, 1991; Trafimow, 2003, 2009; Reference Wallach and WallachWallach & Wallach, 1994), the nature of the behavior assessed (Reference Baumeister, Vohs and FunderBaumeister, Vohs, & Funder, 2007), or specific practices of data collection and questionable standards in data analysis (Reference DienesDienes, 2011; Reference Simmons, Nelson and SimonsohnSimmons, Nelson, & Simonsohn, in press; Reference WagenmakersWagenmakers, 2007; Reference Wagenmakers, Wetzels, Borsboom and van der MaasWagenmakers, Wetzels, Borsboom, & van der Maas, 2011; Reference Wetzels, Matzke, Lee, Rouder, Iverson and WagenmakersWetzels, et al., 2011), to name but a few examples.Footnote 1
Also, there are several projects in development that attempt to coordinate collective action for solving fundamental methodological problems. The Filedrawer Project (www.psychfiledrawer.org) provides an online archive of replication attempts to address the problem that “[m]ost journals […] are rarely willing to publish even carefully conducted non-replications that question the validity of findings that they have published” (www.psychfiledrawer.org/about.php); this problem, in turn can lead to publication biases (see Renkewitz, Fuchs, & Fiedler, 2011). In a similar vein, the Reproducibility Project (http://openscienceframework.org) aims to estimate the reproducibility of findings published in top psychological journals by conducting a collective, distributed attempt to replicate findings from a large sample of recently published papers. Needless to say, still other examples of papers and projects highlight methodological challenges and provide potential solutions. In a nutshell, all essentially hint at the continuous struggle for increasingly conclusive, robust, and general knowledge.
In our view, this struggle also goes on in the field of Judgment and Decision Making (JDM) research. Often enough, important advances in this area are motivated by methodological criticism. For example, of the many reactions to the recently proposed priority heuristic for risky choice (Reference BirnbaumBrandstätter, Gigerenzer, & Hertwig, 2006) a substantial number raise methodological concerns pertaining to research strategy in general, the diagnosticity of tasks used, or the data analyses applied (e.g. Andersen, Harrison, Lau, & Rutström, 2010; Birnbaum, 2008a, 2008b; Reference Birnbaum and LaCroixBirnbaum & LaCroix, 2008; Reference Katsikopoulos and LanFiedler, 2010; Reference Baumeister, Vohs and FunderGlöckner & Betsch, 2008; Reference HendersonHilbig, 2008; Reference Regenwetter, Dana and Davis-StoberRegenwetter, Dana, & Davis-Stober, 2011; Reference Regenwetter, Ho and TsetlinRegenwetter, Ho, & Tsetlin, 2007). Other theoretical controversies have similarly stimulated debate that largely centers around methodological issues (e.g., Reference Brighton and GigerenzerBrighton & Gigerenzer, 2011; Reference Camilleri and NewellCamilleri & Newell, 2011; Reference TrafimowHilbig, 2010; Reference Regenwetter, Dana and Davis-StoberHilbig & Richter, 2011; Reference Marewski, Schooler and GigerenzerMarewski, Schooler, & Gigerenzer, 2010; Reference PachurPachur, 2011).
These examples and others demonstrate a need for an explicit and focused exchange of methodological arguments in JDM and potentially some room for improving common practices in this field. This assertion provided the main motivation for setting up a call for papers on methodology in JDM research. Aiming to keep our own agendas out of the early stages of development, we kept the initial call for papers deliberately broad. The gratifying upshot was an unexpectedly large number of interesting and important submissions.
Despite the breadth of the initial call, however, an early observation was that relatively few (if any) contributions dealt with issues in the philosophy of science or concerned methodological issues of theory formation and revision. Instead, the vast majority of manuscripts addressed issues of design and data analysis. This unequal distribution will become obvious in what follows: In this introduction to the special issue, we briefly discuss all contributions ordered by the stages of scientific discovery to which they (mostly) refer (see Figure 1).
2 Overview of papers
In this overview, we commence with issues of theory construction, before we then turn to experimental design and measurement. Next, we discuss the papers pertaining to those steps that follow data collection, namely data analysis, and cumulative development of knowledge. Note, however, that several of the papers speak to more than one of these matters. As such, ordering and grouping the contributions in the current way should not be taken to imply that each paper relates to only one of the phases of scientific progress.
2.1 Theory construction
Two papers in this special issue discuss theory construction and theory development in the field of JDM (Glöckner & Betsch, 2011; Katsikopoulos & Lan, 2011). Following Poppers approach of critical rationalism, Glöckner and Betsch advocate that scientific progress crucially necessitates that theories be formulated so as to comprise high empirical content, while being falsifiable. The authors point out some common drawbacks in corresponding theory formulation in JDM—especially a tendency towards formulation of weak theories. Also, for certain classes of JDM models, some remedies are suggested. More generally, observable shortcomings are partially attributed to a social dilemma structure (i.e., strictly maximizing personal interests would harm the collective interest to achieve scientific progress). It is suggested that the scientific community should agree upon a change in publication policies to overcome this dilemma structure.
Katsikopolus and Lan take a historical perspective and discuss general developments in the field of JDM by investigating Herbert Simon’s influence on current work. In a review of recent articles in the field, the authors demonstrate the strong influence that Simon’s ideas had on today’s thinking in JDM. Katsikopolus and Lan also critically assess the way in which these ideas are treated in current work. In particular, the authors argue that integrative approaches for research on descriptive and prescriptive models are sought too seldom.
2.2 Design
Many of the contributions in this special issue focus on the steps between theory construction and data collection. That is, they concern the design stage, including the use of measurement methods, as well as the selection of appropriate tasks and stimuli.
2.2.1 Measurement methods
Schulte-Mecklenbeck, Kühberger, and Ranyard (2011) discuss classic and more recently developed process tracing methods and present examples of how these techniques can strongly aid development and testing of JDM process models. In a similar vein, Franco-Watkins and Johnson (2011) suggest applying an eye-moving window technique (i.e., information board in which information is revealed once it is looked at). They argue that this information board variant allows for combining the advantages of classic Mouselab techniques and eye-tracking; specifically, this method should allow for fast and effortless information acquisition, while ensuring that the researcher gains full insight on which information was looked up, for how long, and when.
A third paper proposing a new method to gain insight on cognitive processes was contributed by Reference HilbigKoop and Johnson (2011). They suggest applying a measure of response dynamics which is based on analyzing different aspects of mouse-trajectories between a starting position and the option chosen. The underlying idea is that the attraction exerted by the non-chosen option will manifest itself in these trajectories (e.g., Reference Moshagen and HilbigSpivey & Dale, 2006) and thus provides insight concerning the on-line formation of preferences. Overall, these different contributions commonly signify that the application and combination of classic and new methods will provide important insights concerning processes underlying judgment and decision making.
2.2.2 Diagnostic task selection
Another issue of research design discussed in several papers is the selection of tasks that allow for actually discriminating between theories or hypotheses. Doyle, Chen and Savani (2011) provide a method (using Excel-Solver) for selecting tasks that differentiate optimally between theoretical models of temporal discounting. They show how to construct tasks that make the rate parameters of prominent theories orthogonal or even inversely related.
In a rather different domain, Murphy, Ackermann and Handgraaf (2011) provide a method to measure social value orientation (Reference Van and A. M.Van Lange, 1999) by using a few highly diagnostic tasks in which participants distribute money between themselves and others. The innovative method is based on a slider format which—combined with diagnostic tasks—makes data collection very efficient. Indeed, both approaches by Doyle et al. and Murphy et al. also seem promising in that they can probably be extended to other concepts relevant in JDM such as loss aversion, risk aversion etc.
Another contribution addresses the issue of diagnostic task selection from a somewhat different angle. Jekel, Fiedler and Glöckner (2011) provide a standard method for diagnostic task selection in probabilistic inference tasks. The suggested Euclidian Diagnostic Task Selection method increases the efficiency in research design and reduces the degree of subjectivity in task selection. Jekel et al. also provide a ready-made tool programmed in R that makes it easy to use the method in future research (see also Jekel et al., 2010). Overall, there is agreement that diagnostic task selection is crucial for model comparison and model testing.
2.3 Data analysis
The majority of papers in the special issue are concerned with core issues of data analysis, including contributions suggesting improved methods for model comparisons, demonstrating the advantages of Bayesian methods, or pointing to the advantages of mixed-model approaches.
2.3.1 Model comparisons
Several papers focus on methods for model comparisons. Davis-Stober and Brown (2011) describe how to apply a normalized maximum likelihood (NML) approach to strategy classification in probabilistic inference and risky choice. One of the crucial advantages is that NML takes into account models’ overall flexibility instead of correcting for the number of free parameters only. The paper also illustrates how to test models assuming that decision makers do not stick to single strategies, but rather use a mixture of these.
Moshagen and Hilbig (2011) connect to the ideas discussed in Glöckner and Betsch, though focusing more on the importance of falsification. They show that comparing the fit of competing models can easily lead to entirely false conclusions once the true data-generating model is not actually among those considered (Reference De Houwer, Fiedler and MoorsBröder & Schiffer, 2003). As a remedy, they suggest including a test of absolute model fit which provides a chance for refuting false models.
Broomell, Budescu and Por (2011) show that the problem of overlapping model predictions (see also the contribution by Jekel et al.) can lead to biased conclusions in model comparisons and model competitions (see Reference Erev, Ert, Roth, Haruvy, Herzog and HauErev, et al., 2010). The reason for this is that global measures of fit can hide the level of agreement between the predictions of various models. Broomell et al. propose the use of more informative pair-wise model comparisons and demonstrate the advantages of such an approach. Also, the contribution by Jekel et al. discussed in the previous section adds insight on this matter by suggesting certain improvements in model comparisons. The same holds true for the hierarchical Bayesian approach put forward by Reference TrafimowLee and Newell (2011) that is discussed in the next section.
2.3.2 Bayesian approaches
Another prominent issue concerns the application of Bayesian approaches and replacing classic methods of hypothesis testing by corresponding methods. Reference TrafimowLee and Newell (2011) demonstrate the advantages of using hierarchical Bayesian methods for modeling search and stopping rules of decision strategies at the level of individuals. One of the core advantages over the strategy classification methods discussed above is that the hierarchical structure uses what has been learned about one subject for assisting inference for another one (“shrinkage”). Lee and Newell further show that their method will provide new insight on the nature of individual differences (e.g., in information search) which might also help to solve the debate between multi-strategy and uni-models for decision making (e.g., Newell, 2005).
In another paper on Bayesian methods, Matthews (2011) discusses potential advantages of replacing classic Fisherian and Neyman-Pearson hypothesis testing. He exemplifies that a reanalysis of previous studies when replacing classic t-tests by Bayesian t-tests (Reference Rouder, Speckman, Sun, Morey and IversonRouder, Speckman, Sun, Morey, & Iverson, 2009) can lead to strikingly different conclusions. The Bayesian approach allows for comparing mutually exclusive hypotheses on the same footage, thus avoiding the problems of p-values and allowing for evidence for the null hypothesis. In a long-term perspective, the Bayesian approach would also aid knowledge accumulation by considering the sum of previous research findings when setting the priors for later analyses. We hope that the paper inspires further constructive discussion concerning the clear advantages but also the remaining drawbacks of Bayesian statistics.
2.3.3 Mixed-model approaches
Budescu and Johnson (2011) suggest a model-based approach to improve the analysis of the calibration of probability judgments. In calibration research, judgments must be compared against event probabilities. However, event probabilities are often unknown. The authors show that aggregating over observations can lead to wrong conclusions and suggest using a model-based approach instead. Specifically, they put forward a mixed-model regression approach (simultaneously taking into account effects between and within subjects) to estimate event probabilities which are then compared against probability judgments to determine calibration. Similar to the hierarchical approach by Lee and Newell, one crucial advantage of this mixed-model based approach is that estimates for within- and between subjects effects are more stable because they profit from the larger underlying data basis.
2.4 Cumulative development of knowledge
There are two contributions in the special issue that—besides touching on questions of data analysis—also speak to the matter of cumulative development of knowledge. One is the above mentioned paper by Matthews (2011) on using Bayesian approaches. As mentioned above, replacing (or complementing) classic hypothesis testing by the Bayesian approach aids knowledge accumulation. In a second contribution, Renkewitz, Fuchs, and Fiedler (2011) address the important issue of publication biases. By exemplarily re-analyzing two JDM-specific meta-analyses, they demonstrate that publication biases are also present in JDM research. Such biases, in turn, will hinder appropriate cumulative development of knowledge. Indeed, severely distorted overall estimations of effect size—or even premature acceptance of the existence and stability of effects—can be the consequences. The authors discuss both specific methods to identify publication biases (in meta-analyses) and further provide recommendations on how changes in the overall standards and publication practices might counteract the problem identified.
3 Summary and conclusions
We are pleased to say that the 15 papers contained in this special issue avail many important insights in JDM methodology and provide helpful tools and suggestions which—in our view—will further improve the confidence we may have in our findings. Despite the fact that these 15 contributions are motivated by some methodological weaknesses in the field of JDM, it is also important to highlight that many of the problems tackled speak for the methodological sophistication of JDM research that is already in place. Of course, of those points raised in this special issue some are more and others less controversial. Indeed, our experience in handling these papers throughout the review process showed that some papers have more potential for debate than others. Nonetheless, the constructive way in which all contributions describe ways to overcome methodological weaknesses makes us optimistic that this issue might inspire further positive developments.
It seems as if the techniques and policies for improving our methodological standards are available. One of the foremost aims of the special issue was to inspire a more intense debate concerning these issues in order to improve the degree to which standards are shared within the community which is the basic requirement for their comprehensive enforcement. This is necessary for achieving scientific progress and overcoming social dilemma structures inherent in joint scientific discovery.