1 Introduction
Early studies of learning in repeated choice tasks highlight the value of simple models that quantify Reference ThorndikeThorndike’s (1898) law of effect. The law of effect states that positive reinforcements increase the propensity of selecting the reinforced actions. The simplest quantifications of this law assume a sequential adjustment process to experienced reinforcements (e.g., see the “noisy-adjuster” model described below in Section 2). Such sequential adjustment models have five important and attractive features: First, they can capture a wide set of behavioral phenomena. For example, Reference Erev and RothErev and Roth (1998) demonstrate how a 3-parameter sequential adjustment model provides useful predictions of behavior in simple games. Second, they entail a highly efficient process: The decision maker needs to remember only one value per option — the updated subjective value. A third attractive feature is that in static settings, these simple models can approximate optimal choice (Reference Sutton and BartoSutton & Barto, 1998). A fourth attractive feature is that the computations these models denote are correlated with well-documented brain activity (Reference Schultz, Dayan and MontagueSchultz et al., 1997). Finally, several studies have shown that the estimated parameters for models of this type can capture interesting individual differences (e.g., Reference Yechiam, Busemeyer, Stout and BecharaYechiam et al., 2005).
Given the evidence in support of simple “sequential adjustment” models, the results of a series of choice prediction competitions (Erev et al., 2010a, 2010b; Reference Erev, Ert, Plonsky, Cohen and CohenErev et al., 2017; Reference Plonsky, Apel, Ert, Tennenholtz, Bourgin, Peterson, Reichman, Griffiths, Russell and CarterPlonsky et al., 2019) come as a surprise: While these choice prediction competitions were originally designed to compare alternative sequential adjustment models, these models did not perform well. Instead, the best performing models in these competitions relied on the assumption that people remember many past experiences (see related idea in Reference Gonzalez, Lerch and LebiereGonzalez et al., 2003), but base each choice on a small sample of these memories.
The apparent inconsistency between the evidence in favor of sequential adjustment models and the superiority of sampling models in the competitions has been previously explained in two different ways. The first explanation rests on the fact that because of the competitions’ focus on predicting aggregate choice rates, the underlying processes that produce the choice rates can be misrepresented (Reference BirnbaumBirnbaum, 2011; Reference Regenwetter and RobinsonRegenwetter & Robinson, 2017; Reference Spektor and WulffSpektor & Wulff, 2021; Reference Wulff and van den BosWulff & van den Bos, 2018; Reference Chen, Regenwetter and Davis-StoberChen et al., 2021). Thus, it is possible that while individuals actually rely on an efficient sequential adjustment process with an individual-specific adjustment speed, on the aggregate this process is obscured. That is, while aggregate measures may best be captured by models that assume costly memory storage and sampling-based valuation, this in fact misrepresents the underlying processes. The feasibility of this explanation was recently demonstrated by Reference Spektor and WulffSpektor and Wulff’s (2021, hereafter, SW) reanalysis of the data collected by Reference Yakobi, Cohen, Naveh and ErevYakobi, Cohen, Naveh and Erev (2020, hereafter YCNE).
The second explanation assumes that the apparent inconsistency reflects the reliance on different working assumptions that led to different comparisons (Reference ErevErev, 2020). In accordance with this explanation, the clearest evidence in favor of sequential adjustment models come from studies that do not include a systematic comparison of the assumptions that distinguish these models from sampling models. For example, Reference Erev and BarronErev and Barron (2005) considered a simple sampling model, and then show how the data can be captured with a more complex sequential adjustment model. The current paper examines this explanation by building on SW’s analysis of YCNE’s data.
YCNE’s original analysis focused on aggregate choice rates. Their results highlight the predictive value of models that assume sampling-based decisions and imply that the main driver for deviation from maximization (of expected payoff) is a tendency to rely on small samples.Footnote 1 Conversely, SW’s analysis suggests that a simple model that assumes sequential adjustment to the payoffs’ weighted average can capture the data better than the models considered by YCNE. In support, SW demonstrate that their model predicts the aggregate choice rates as well as the random sampling from experience models (as used by YCNE), and highlights an interesting pattern of individual differences that imply a new interpretation of YCNE’s results. This interpretation suggests that extreme myopia (by 32% of participants), rather than reliance on small samples, is the main driver of the deviations from maximization documented by YCNE.
The current paper extends SW’s analysis by considering two differences between models that assume sampling-based decisions (as in YCNE), and sequential adjustment models as considered by SW. The first difference is in the assumptions dictating how the option’s subjective valuation process is carried out (i.e., by relying on random sampling or on weighted average). The second difference is in the assumptions dictating how choice is derived from those valuations (i.e., the choice rule). While YCNE limited their analysis to models that assume a deterministic choice rule (i.e., choice of the option with a higher sampled mean), SW’s model uses a stochastic (noisy) choice rule. Our analysis clarifies the importance of the different assumptions regarding each process of valuation. We do so by comparing the descriptive value of the random sampling and the weighted average assumptions, while using the same stochastic choice rule as SW use (i.e., keeping the choice rule fixed).
Our results validate the existence of large individual differences, as suggested by SW, but favor a different interpretation of these differences. Our analysis shows that the random sampling assumption provides better predictions (both qualitatively and quantitatively) than the weighted average assumption, even when predicting individual decisions.
2 The Data
YCNE’s analysis starts with the observation that the reliance on small samples hypothesis can be used to shed light on the conditions under which high taxation, designed to reduce reckless behavior, is likely to backfire. In certain settings, it predicts a backfiring effect even when the tax is carefully designed to ensure that the desired behavior (i.e., safer decisions) maximizes expected return.
To test this prediction, each of 246 participants (Mturk workers) in YCNE studies was assigned to one of the three groups described in Figure 1 (one group in Study 1, and two in Study 2), and faced either three or two tasks (in a within-subject design). Each task included 100 trials, and in each trial the participant was asked to choose between three keys marked as A, B or C. The participants did not receive a description of the incentive structure and had to base their decisions on feedback that was provided after each choice. As demonstrated in Figure 1, the feedback described the obtained and the forgone payoffs. The participants’ final compensation was determined by their accumulated payoffs gained during the experiment.
The middle panel in Figure 1 shows that all the tasks involved a choice between a safe option, a moderate risky option, and a counter-productive (low expected return) risky option. The groups differed with respect to the payoff from the safe option (as reflected by the group’s names). The tasks faced by each group differed with respect to the magnitude of the variable “Tax” that reduces the payoff from the moderate risky option. This Tax variable simulates the adoption of a policy that tries to reduce accidents (abstracted by the loss of 20 points) by imposing a cost on the most attractive reckless behavior. The results (bottom panel of Figure 1) show that high taxation moved many participants to choose the counterproductive risky option. As a result, accident rates significantly increased.
3 Comparison of the Weighted Average and the Random Sampling Assumptions
As noted above, SW show that YCNE’s main results can be captured with a simple sequential adjustment model that does not include an explicit “reliance on small samples” hypothesis. Their model assumes that the subjective value of Option j for agent i in trial t + 1, after observing the payoff R t,j,i (from Option j in trial t) is:
The initial subjective value is assumed to equal Q t,j,i = 0, and αi is a parameter that captures Agent i’s learning rate. Thus, the subjective value is the weighted average of the observed payoffs, and recent observations receive more weight than older observations. Besides, the model also assumes a noisy ε -greedy response rule. The model, referred to here as the “noisy-adjuster,” chooses an option randomly with probability εi (Agent i’s error rate parameter), and the option with the highest Q t,j value otherwise.
In addition, SW note that YCNE’s analysis ignores the existence of large between-individual differences. SW’s highlighted the significance of the individual differences by estimating the parameters of their model for each individual. Their analysis, relying on maximum likelihood estimation (MLE) and shown on Figure 2a, suggests a bi-modal distribution: About 32% of the decision-makers appear to be “myopic” (their estimated αi is in the range [.85, 1], suggesting extremely strong positive recency bias), and the rest appear to be “emmetropic” (their estimated αi is positive and close to 0, suggesting weak positive recency bias).
SW’s analysis demonstrates that a simple sequential adjustment model can provide an elegant and insightful explanation of YCNE’s results. Yet, their analysis does not imply that their noisy-adjuster model outperforms sampling-based models. To facilitate a clear comparison of the weighted average and the random sampling assumptions, we chose to compare them while keeping the ε -greedy response rule assumed by SW. Specifically, we compare the predictive value of the noisy-adjuster model with a variant of the same, changing only the computation of the subjective values (Equation 1). The new “noisy-sampler” model assumes that the subjective values in trial t > 1 are determined by the average payoffs of each option in a sample of κi randomly selected (with replacement) previous trials (where κi is a parameter that captures the sample size taken by agent i).Footnote 2 Figure 2b presents the estimated parameters (with the MLE procedure used above)Footnote 3 for the noisy-sampler model. It shows that the change to the computation of the subjective value did not eliminate the variability in the estimated parameters. Yet, the distribution under the noisy-sampler model is more uniform.Footnote 4
To evaluate the predictive value of the two models we build on the fact that each participant in YCNE’s study faced at least two conditions (i.e., Tax levels). Our analysis focuses on predicting each of the 100 choices, made by each participant in each condition, based on the parameters estimated on the same participant’s decisions in the other condition (or conditions) they faced. For example, the predictions of Condition Tax = 0.8 in Group 3or0 were derived with the parameters estimated based on the participant’s (200) decisions in Conditions Tax = 0 and Tax = 0.4.
The accuracy of the predictions was evaluated using a log likelihood criterion. The results, summarized in Table 1, reveal a clear advantage of the random sampling assumption. The random sampling assumption fits the data better (higher log-likelihood score), and more importantly, provides better prediction. The significance of this advantage is reflected by the fact that the noisy-sampler model provided a better prediction of the impact of higher taxation for 157 (64%) of the 246 participants (p < .001 in a sign test, compared to the noisy-adjuster model).Footnote 5
Note. The accident rate is estimated as 0.03(Moderate risk rate) + 0.06(High risk rate). MSD is mean squared deviation.
Figure 3 presents the log likelihood prediction scores of each participant (average over the two or three conditions faced by the participants) under the two models. Each dot in this figure presents one of the 246 participants. The results show large individual differences, and also show that most dots fall around the 45 degrees line.
The lower rows in Table 1 shows that both models capture the most interesting pattern documented by YCNE’s study: The observation that taxation designed to increase the relative attractiveness of a promoted safe behavior can backfire (i.e., increasing the tax from 0.4 to 0.8 increases the accident rate). These results show that the noisy-sampler model also provides a better prediction on this measure of aggregate choice rates.
4 From predictions to understanding
The advantage of the random sampling assumption over the weighted average assumption of course does not imply that the noisy-sampler model provides an accurate description of the underlying processes. It suggests only that the random sampling assumption provides a better approximation of the data than the weighted average assumption. To clarify the advantage of the random sampling assumption, we compared the two models on how well they predict the observed sequential dependencies found in YCNE’s data. We focus on Study 2 from YCNE (Groups 1.35 and 0.6, 161 participants)Footnote 6, which showed the clearest individual differences in SW’s study. Table 2 and Figure 4 summarize the results of a sequential dependency analysis on trials 2 to 100, for each of the four conditions. Table 2 reveals that both assumptions (and the implied models) under-predict the participant’s tendency to repeat their previous choice (i.e., the rate of inertia). The main difference between the two models involves the predicted recency effect (estimated by the difference between the choice rates after trials in which the option did or did not lead to the best payoff, see middle rows of Table 2). The median over the 9 recency scores in the data is .16. This value is similar to the median recency score under the noisy-sampler model, and much lower than the median recency score under the noisy-adjuster model (.41).
Note. Inertia rate is calculated as the rate in which choice of an option in trial t-1 is repeated in trial t. Recency score is calculated as the difference between the choice rate of each option after trials in which that option led to the best payoff, and after all other trials. Missing values appear for conditions in which one of the two risky options could not lead to the best payoff. Median recency scores are highlighted in Bold.
The top panel in Figure 4 presents the observed recency score as a function of the observed inertia rate for each of the 161 participants. The lower panels in Figure 4 present the predicted rates for each participant, based on the best fitting parameters over the two tax levels. In agreement with SW’s analysis, the human data plot (top panel) reveals large individual differences. However, the results do not show the bimodal recency pattern predicted by the noisy-adjuster model (middle plot).
The results summarized in Table 2 and Figure 4 highlight two contributors to the advantage of the random sampling assumption. First, this assumption can capture the detrimental effects of high taxation (that implies underweighting of rare events) without over-predicting the magnitude of the recency effect. Second, the random sampling assumption’s predictions are less sensitive to the fact that it ignores the tendency to repeat the last choice (inertia), compared to the weighted average assumption. To illustrate this point, consider the simulated example presented in Table 3. It focuses on the behavior of virtual agents that face 100 repeated choices between “0 for sure” and an attractive risky prospect that provides “+5, .3; -1” (i.e., get +5 with probability .3, –1 otherwise).
Note. The generated Risk-rates (observed) row was estimated for each model separately. Median estimated parameters’ rows are estimated on those generated Risk rates of each model. Reproduced risk rates’ rows are the Risk-rates reproduced with the estimated parameters of each separate model.
The top row in Table 3 focuses on virtual agents that choose in accordance with the noisy-adjuster model (with the parameters αi = .99 and εi = .01) in some of the trials, and repeat their last choice in the other trials. The probability of repeating the last choice was 0 in the first two trials and Prep thereafter (with Prep= 0, .5 or .9). The bottom row presents virtual agents that behave in accordance with the noisy-sampler model (with k i = 1 and εi = .01) under the same repetition conditions.Footnote 7 The results reveal that ignoring the rate of inertia had a limited effect on the estimated parameters of the random sampling model (bottom row): The estimation of the true parameters is robust to the level of inertia. Conversely, changes in inertia rates have a much larger effect on the estimated parameters of the noisy-adjuster model: The difference between the generating and estimated parameters increases with inertia, leading to a bias in the predicted R-rates. Increase in the level of inertia increases the estimated error parameter (εi) of the noisy-adjuster model from 0.01 to 0.70. This overestimation suggests the noisy-adjuster model cannot reproduce the generated choice rates in the presence of inertia, let alone provide useful predictions based on its estimated parameters.
5 Summary
Previous studies of decisions from experience highlight an apparent inconsistency between the results of choice prediction competitions focused on aggregate choice rates, and results of studies that focus on individual decisions. While the competitions favor models assuming reliance on small samples of randomly selected past experiences, many analyses of individual decisions favor models that assume sequential adjustment of choice propensities. The difference between these two classes of models is important as they imply very different cognitive processes. While the reliance on small samples models assume the storage and the use of many past experiences, the sequential adjustment models assume efficient processes that require the storage of only one value (weighted payoff average) per option.
The present research clarifies this debate by highlighting the importance of the distinction between elegant explanations, and prediction-based model comparisons. Specifically, we propose that sequential adjustment models provide more elegant explanations of specific experimental results, but random sampling models tend to perform better in prediction tasks. For example, the weighted average assumption used by SW implies a simple and cognitively efficient process that fits the data we analyzed, but prediction-based model comparison highlights a clear (both qualitative and quantitative) advantage of the random sampling assumption. Our analysis shows that this advantage of the random sampling assumption is not limited to predictions of aggregate choice rates. We find that the random sampling assumption outperforms the sequential adjustment (weighted average) assumption even when the analysis focuses on individual decisions and sequential dependencies.
To understand clearly the implications of the current results, remember that there are important boundaries to the descriptive value of the noisy-sampler model supported here. The clearest boundary, in the context of pure decisions from experience, involves environments with easy-to-detect dynamic structures as illustrated by the thought experiment described in Table 4 (following Reference Plonsky, Teodorescu and ErevPlonsky et al., 2015). While the noisy-sampler model (with the parameters estimated above based on YCNE’s data) predicts a Top-rate (at trial 100) of only 29%, it is natural to assume most human subjects will quickly learn to select Top after a sequence of four losses (see related observation in Reference Cohen and TeodorescuCohen & Teodorescu, 2021). We believe that the observation that the noisy-sampler model provides useful description of YCNE results, but fails to describe the likely behavior in Table 4’s thought experiment, can shed light on the underlying processes. Under one explanation of this pattern, people always try to rely on a small sample of their most similar past experiences. When it is easy to discover the most similar past experiences (in terms of the expected payoff), as in Table 4’s thought experiment, choice behavior is likely to deviate from the predictions of the noisy-sampler model. Yet, when it is difficult, or impossible, to detect the most similar past experience (as in the current static setting) the effort to rely on the most similar past experiences leads to behavior that can be approximated with the current noisy-sampler model.
The current similarity-based explanation suggests that the reliance on small random samples assumption can be used to shed light on natural environments in which the payoff distributions are relatively stable. While this set of situations has clear boundaries, it contains many important members. Examples include settings in which safety devices increase accidents (Reference Cohen and ErevCohen & Erev, 2018), taxation backfires (YCNE), people over and under-commit to a course of action (Reference Cohen and ErevCohen & Erev, 2021), experience reduces the tendency to trust well-calibrated experts (Reference Erev, Roth and SonsinoErev et al., 2022) and it is necessary to enforce rules (Reference Plonsky, Roth and ErevPlonsky et al., 2021). Yet, more insight into how people respond to different dimensions of similarity, and how these similarities interact, is necessary to predict behavior when dynamic regularities are easily detectable.