1 Introduction
Many everyday judgments decisions must be made from memory. Correspondingly, a growing number of models in judgment and decision making explicitly consider the role of memory (Reference Weber and JohnsonWeber & Johnson, 2009). One such theory that has attracted substantial attention in recent research is the recognition heuristic (RH; Reference Goldstein and GigerenzerGoldstein & Gigerenzer, 2002). The RH is a simple—and yet often surprisingly accurate—strategy proposed for comparative judgments from memory. It presumes that recognized options are chosen over unrecognized ones in a one-cue non-compensatory fashion, implying that no further information can overrule the binary all-or-none recognition cue (for the RH’s preconditions, see Reference Gigerenzer and GoldsteinGigerenzer & Goldstein, 2011; Reference Pachur, Bröder and MarewskiPachur, Bröder, & Marewski, 2008). The RH has been studied widely in the realm of probabilistic inferences (for reviews, see Reference Gigerenzer and GoldsteinGigerenzer & Goldstein, 2011; Reference PohlPohl, 2011) and although some degree of controversy remains concerning the claim of one-cue non-compensatory decision-making (Reference Brighton and GigerenzerBrighton & Gigerenzer, 2011; Reference Hilbig and RichterHilbig & Richter, 2011), there is consensus that it does account for the behavior of some individuals under certain circumstances (Reference HilbigHilbig, 2010; Reference Pachur, Bröder and MarewskiPachur et al., 2008)—especially whenever the task induces the motivation to reduce cognitive effort (Reference Hilbig, Erdfelder and PohlHilbig, Erdfelder, & Pohl, 2012; Reference Pohl, Erdfelder, Hilbig, Liebke and StahlbergPohl, Erdfelder, Hilbig, Liebke, & Stahlberg, 2013).
Extending the RH theory beyond its original domain of probabilistic inferences, recent studies have investigated the role of recognition in preferential choice, that is, consumer decision-making (Reference Oeusoonthornwattana and ShanksOeusoonthornwattana & Shanks, 2010; Reference Thoma and WilliamsThoma & Williams, 2013). In both studies, participants were asked to choose between pairs of products. In critical pairs, a recognized brand was paired with an unknown one such that the RH would presume choice of the former. In addition, the authors manipulated whether additional information (i) was in line with the recognition cue (suggesting the same choice), (ii) was neutral, or (iii) contradicted the recognition cue (suggesting choice of the option with the unknown brand name). In line with previous research in probabilistic inferences (e.g. Bröder & Eichler, 2006; Reference Newell and FernandezNewell & Fernandez, 2006; Reference Newell and ShanksNewell & Shanks, 2004; Reference Richter and SpäthRichter & Späth, 2006), the authors found that the probability of choosing recognized options—although substantial and well above chance level—was not independent of the additional information, thus contradicting the RH’s claim of one-cue non-compensatory decision-making.
However, the result that aggregate choice probabilities are not perfectly aligned with the RH does not imply that another model must be preferred, let alone specify which one. Likewise, the individual analyses reported by Reference Thoma and WilliamsThoma and Williams (2013) hint only that the RH may have been used by some individuals, but probably not by others—while leaving open how these others may have been making choices. Correspondingly, Thoma and Williams (2013) conclude that “recognition is not used as the sole cue” (p. 42), but conclude that their results do not support any one specific alternative (compensatory) model (“the compensatory effect observed in this study is arguably not fully consistent with a simple cue integration model either”, p. 42). Based on similar findings, some researchers have suggested that the RH be retained so long as no fully specified alternative model is shown to account for the data more successfully (Reference Brighton and GigerenzerBrighton & Gigerenzer, 2011; Reference Marewski, Gaissmaier, Schooler, Goldstein and GigerenzerMarewski, Gaissmaier, Schooler, Goldstein, & Gigerenzer, 2010; Reference PachurPachur, 2011). However, even if a falsified model is retained simply because no better alternative is available, model refutation should undoubtedly trigger attempts to seek out and comparatively test potentially superior alternative models.
2 Methods
Despite the fact that comparative model tests are the exception in research on the RH (for a quintessential example, see Glöckner & Bröder, 2011) and have not yet been reported for preferential choice tasks, Thoma and Williams’ data actually allow for such a test of competing models, since the degree to which additional information confirms or contradicts the RH was varied. In such a setup, the choice-based maximum-likelihood strategy classification method suggested by Bröder and colleagues (Reference BröderBröder, 2010; Reference Bröder and SchifferBröder & Schiffer, 2003) can be used. In essence, if models predict distinct choice vectors across two or more sets of trials (i.e. conditions), the models can be compared with respect to how well they fit the data. In Thoma and Williams’ (2013) study, there are three such sets of trials (conditions) or items types as summarized in Table 1. Note that the strategy predictions and thus model comparison hinge on the vital assumption that the strategy will be constant across all trials for each individual decision maker. Given that the nature of the information remains constant across trials (only its content varies), this is arguably a plausible assumption and one that is commonplace (Bröder & Schiffer, 2006a, 2006b; Reference GlöcknerGlöckner, 2009; Reference Glöckner and BröderGlöckner & Bröder, 2011; Reference Platzer and BröderPlatzer & Bröder, 2012).Footnote 1
As outlined above, in critical trials with exactly one option recognized, the additional information—in Thoma and Williams’ (2013) study these were quality ratings—was either aligned with the recognition cue (item type 1), neutral (item type 2), or contradicted the recognition cue (item type 3). Participants made 10 choices in each of these item types and thus 30 in total. Across the three item types, several models make distinct choice predictions: The RH predicts choice of the recognized option in each of the item types—since any additional information should be ignored. An equal weights model (EQW; choose the option with the higher sum of positive cue values) predicts choice of the recognized option in item types 1 and 2, but it has to guess in item type 3 since the two cues contradict each other. A weighted additive model (WADD; choose the option with the higher sum of weighted cue values) will make different predictions depending on whether the recognition cue or the additional information receives more weight. If the recognition cue is given more weight (WADD1), the model is equivalent to the RH, thus making the same choice predictions. In turn, if the additional information is given more weight (WADD2), the model predicts choice of the recognized option in item types 1 and 2 and choice of the unrecognized option in item type 3.
To determine the “best” model for each participant, the vial assumption is made that the probability of choosing in line with a model’s predictions is constant across item types (except when guessing is predicted in which case the choice probability is set to .50). Thus, models are allowed only unsystematic strategy execution errors, whereas systematic errors lead to misfit (Reference Bröder and SchifferBröder & Schiffer, 2003; Reference Moshagen and HilbigMoshagen & Hilbig, 2011). To determine model fit, individual choice frequencies of recognized options conditional upon item types were used. As Reference Thoma and WilliamsThoma and Williams (2013) thankfully published the raw data of their study, these could be calculated in a straightforward manner. Based on the choice frequencies, and resorting to the multinomial processing tree framework (Reference Batchelder and RieferBatchelder & Riefer, 1999; Reference Erdfelder, Auer, Hilbig, Aßfalg, Moshagen and NadarevicErdfelder et al., 2009) the Bayesian Information Criterion (BIC, Wasserman, 2000) for each of the models per individual data set was determined using the freeware multiTree (Reference MoshagenMoshagen, 2010). Note that reliance on BIC was indicated since models differ in complexity (RH/WADD1, EQW, and WADD2 have on free parameter each whereas GUESS has none). Model equations can be found in Appendix A. The model which produced the smallest BIC was retained as the best description of an individual participants’ data (for a similar approach in multinomial modeling see Reference Hilbig, Erdfelder and PohlHilbig, Erdfelder, & Pohl, 2011).Footnote 2 To rule out falsely classifying datasets which were most likely generated by some other, unknown process outside the set of models considered, only models that fit the data (p > .05) were retained for classification (Reference Moshagen and HilbigMoshagen & Hilbig, 2011).
3 Results
The classification results are summarized in Table 2 for each of two different maximum levels of execution error (Reference Bröder and SchifferBröder & Schiffer, 2003; Reference GlöcknerGlöckner, 2009). The former implements a rather strict requirement, namely that models should predict choices very well and substantially better than chance level. The latter, in turn, is more lenient and requires only that models predict choices slightly better than chance. As can be seen, EQW provided the best account for the modal number of datasets. This result held regardless of the level of execution error that was allowed. Only a few datasets were best accounted for by RH/WADD1, especially with a stricter maximum level of execution errors. Guessing also accounted for some datasets, whereas WADD2 accounted for essentially none.
* Datasets remain unclassified if all models are excluded, either due to absolute misfit (p < .05) or due to an observed error above the error criterion specified. The rationale is that these datasets were most likely generated by a decision strategy outside the set of those considered (Reference Moshagen and HilbigMoshagen & Hilbig, 2011).
As these findings reveal, EQW accounted for most participants’ choices best. Note, however, that when EQW predicted guessing (i.e. in the negative condition) there were systematic item-level difference for participants classified as EQW-users. More specifically, whereas the aggregate probability of choosing the recognized brand was .52 and thus very close to chance, predicting a choice proportion of .50 (across participants) for each single brand led to misfit (χ ²(26) = 52.5, p = .002). That is, the strict interpretation of EQW that guessing should occur for each item in the negative condition was rejected. Closer inspection revealed that this was primarily driven by two familiar brands which were extremely often chosen despite the additional negative cue, namely Apple (.82) and Toshiba (1.00). Once these two were excluded from the analysis, the strict EQW-hypothesis (choice proportion of .50 for each single brand across participants) fit the data well (χ ²(24) = 31.8, p = .13).
To further assess the appropriateness of the classified models, participants’ response times across the three item types were analyzed (for a similar approach, see Bröder & Gaissmaier, 2007). The RH predicts that response times should be equivalent across the three item types: As shown in Table 1, the recognition cue always discriminates and since choices should thus be based on this cue alone, response times should be constant across item types (Reference Glöckner and BröderGlöckner & Bröder, 2011). EQW predicts that all pieces of information are acquired and added in each of the item types, implying the same processing timeFootnote 3. However, in item type 3 an additional step is required, namely guessing (because the sum of cue values does not discriminate between options). Thus, according to EQW response times should be equivalent in item types 1 and 2, but longer in item type 3. Finally, GUESS predicts constant response latencies across all item types. WADD2 was not considered in the analysis since too few datasets conformed to this model in the choice-based classification.
For the analyses, the mean log-transformed response time was computed for each participant and item type. These means along with the classified model per individual data set (separately for the lenient and strict classification criterion) can be found in the online supplementary material. To test the above predictions, the mean response times were compared for each combination of item types (1 vs. 2, 2 vs. 3, and 1 vs. 3) using paired t-test, across all individuals classified into the same model category. Given that several null-hypotheses were tested (i.e., equivalent response times), the obtained t-values were transformed into JZS Bayes factors using the approach of Reference Roulder, Speckman, Sun, Morey and IversonRoulder, Speckman, Sun, Morey, and Iverson (2009). Thereby, the odds in favor of the null hypothesis (assuming uniform priors) were approximated. The resulting mean differences and JZS Bayes factors are summarized in Table 3. The full analyses can be found in Appendix B.
Note: * p < .05
** p < .01 (paired t-tests). Positive differences indicate that the former item type was responded to faster. JZS Bayes factors are odds in favor of the null-hypothesis (no difference); thus, values below 1 indicate evidence for the alternative hypothesis.
As can be seen, the predictions of the RH were only partially corroborated. For those individuals classified as RH-consistent, response times did not differ between item types 1 and 2 (as predicted). However, there was a clear difference between item types 2 and 3 as well as 1 and 3—contrary to the RH-predictions. Note that, when basing analyses on the strict classification criterion (Appendix B), the evidence no longer implied robust differences between item types (although the mean difference for the comparison of item types 1 vs. 3 was comparable in size); however, in this analysis, only 10 data sets were classified as RH-consistent in the first place (see Table 2).
For datasets classified as EQW, the response time differences were in line with the predictions (which held independent of the classification criterion, see Appendix B): There was no difference between item types 1 and 2, whereas response times were longer in item type 3 as compared to both 1 and 2. Finally, as predicted, most tests favored the null-hypothesis of no response time difference for data sets classified as GUESS (the only exception is the comparison of types 1 vs. 3 when using the strict classification criterion, see Appendix B). Overall, response time patterns were very well aligned with the strategy classification in the case of EQW and satisfactorily so in the case of GUESS.
4 Discussion
Based on these findings, the conclusions of Reference Thoma and WilliamsThoma and Williams (2013) on the role of recognition in consumer choice (see also Reference Oeusoonthornwattana and ShanksOeusoonthornwattana & Shanks, 2010) can be further specified. Indeed, the RH does not account for the data particularly well: Although it once again turned out the superior model at least for a minority of participants in terms of accounting for choice data (for similar conclusions in probabilistic inferences, see Pachur et al., 2008), response time patterns contradicted RH predictions even for these individuals (Reference Glöckner and BröderGlöckner & Bröder, 2011; Reference Hilbig and PohlHilbig & Pohl, 2009). Only when using a strict strategy classification criterion did response time patterns conform to the RH—while very few datasets (16%) were actually classified as RH-consistent in this case. Either way, the evidence for the RH is limited, at best. On the other hand, the very low rate of classifications for a model assuming that alternative information is weighted more strongly than recognition (WADD2) suggests that recognition will also rarely be overruled.
Instead, the current data suggest that recognition and further information are treated equivalently by most decision makers. Specifically, the results show that a fully specified alternative model, namely an equal weights strategy, provides a superior account of choice patterns for a modal number of data sets. Additionally, response time patterns were aligned with model predictions for these data sets, thus lending further support to the strategy classification. So, extending the conclusions of Reference Thoma and WilliamsThoma and Williams (2013), their data do favor a specific compensatory model over the RH, namely one in which recognition is weighted no more or less than the additional information (for similar conclusions, see Richter & Späth, 2006).
Nonetheless, item-level analyses also show that the strategy classification method used herein will be influenced by the specific item material. The more brands like Apple and Toshiba—which appear to be chosen irrespective of contradictory information—are included, the fewer participants will be classified as consistent with EQW (and the more with the RH). More importantly from a substantive point of view, the item-level results replicate that the RH’s notion of all-or-none recognition is inappropriate (Reference Erdfelder, Küpper-Tetzel and MatternErdfelder, Küpper-Tetzel, & Mattern, 2011; Reference Newell and FernandezNewell & Fernandez, 2006): Familiar items come with certain knowledge which is integrated in the choice situation. By contrast, a binary all-or-none understanding of recognition could not explain why—in the face of contradictory information—brands like Apple and Toshiba were consistently chosen, whereas brands like Olympus and Shure were hardly ever chosen.
Furthermore, it should be noted that the realm of consumer choice differs in noteworthy ways from probabilistic inferences for which the RH was originally proposed. In particular, when considering the preconditions for the RH proposed by Pachur et al. (2008), it is important to note that the current domain includes induced (rather than natural) cue knowledge, menu-based (rather than memory-based) information acquisition, unknown recognition validity, and additional information about unrecognized options—all of which are considered non-optimal preconditions for the RH by Pachur et al. (2008). Nevertheless, the current findings do provide insight on the role of recognition in consumer decision making—especially in providing an alternative account of how recognition information may be integrated in these decisions.
Overall, the novel findings from the current model comparison answer the call for specifying an alternative model that provides a better account of the data as compared to the RH (Reference Brighton and GigerenzerBrighton & Gigerenzer, 2011; Reference Marewski, Gaissmaier, Schooler, Goldstein and GigerenzerMarewski et al., 2010; Reference PachurPachur, 2011)—in this case, in consumer choice. Importantly, the current findings should not be over-interpreted as evidence that decision makers used EQW. The current methodology is not primarily designed to test any one specific process model critically, but rather to provide relative insight on which of several models provides a better account for the data. All models are necessarily abstractions and even a model that accounts for the data well need not represent the true underlying process (Reference Roberts and PashlerRoberts & Pashler, 2000). However, Thoma and Williams’ (2013) data and the current analyses demonstrate that an alternative to the RH is superior in accounting for the data. In simple terms, this finding is not necessarily a strong argument for EQW—and indeed, a strict interpretation of EQW did not hold for each and every item—but it is stronger evidence against the RH as compared to the original conclusion.
Appendix A
Model equations for use in the multiTree freeware tool (Reference MoshagenMoshagen, 2010) for each of the models considered. The first column refers to the item type (see first row of Table 1), the second column specifies the category number (i.e. which option is chosen, with 1, 3 and 5 referring to option A and 2, 4, and 6 to option B) see second row of Table 1), and the third specifies the choice probabilities. Note that the probability of choosing in line with a model’s predictions is 1—e, as e denotes a strategy execution error.
Appendix B
Results of response time comparisons for all pairs of item types, conditional upon strategy classification (separately for the lenient and strict classification criterion).
Lenient refers to a maximum strategy execution error of .40, strict refers to .20 (see Table 2). Class is the classified model. S.d. and s.e. are those of the mean difference. The Bayes factor is approximated using http://pcl.missouri.edu/bf-one-sample.