1 Introduction
Throughout life, people continually make decisions about what to do or have immediately, and what to put off until later. The behavior of a person choosing an immediate benefit at the cost of foregoing a larger delayed benefit (e.g., by purchasing a new car rather than saving towards one’s pension) is an example of temporal discounting (Reference SamuelsonSamuelson, 1937). Similarly, choosing to avoid an immediate loss in favor of a larger, later loss (e.g., by postponing a credit-card payment) is another example of discounting. Laboratory measures of discounting predict many important real-world behaviors that involve tradeoffs between immediate and delayed consequences, including credit-card debt, smoking, exercise, and marital infidelity (Reference Chabris, Laibson, Morris, Schuldt and TaubinskyChabris, Laibson, Morris, Schuldt, & Taubinsky, 2008; Reference Daly, Harmon and DelaneyDaly, Harmon, & Delaney, 2009; Reference Meier and SprengerMeier & Sprenger, 2010; Reference Reimers, Maylor, Stewart and ChaterReimers, Maylor, Stewart, & Chater, 2009). At the same time, numerous studies have established that time preferences are also determined by a number of contextual factors (for an overview, see Reference FrederickFrederick, 2003).
Despite the growing popularity of research on temporal discounting (Figure 1), there is relatively little consensus or empirical research on which methods are best for measuring discounting. Most of the theoretical and empirical efforts have been directed at testing rival exponential versus hyperbolic discounting models. The continuously compounded exponential discount rate (Reference SamuelsonSamuelson, 1937) is calculated as V = Ae−kD, where V is the present value, A is the future amount, e is the base of the natural logarithm, D is the delay in years, and k is the discount rate. This is a normative model of discounting, which specifies how rational decision makers ought to evaluate future events, but it has often been employed as a descriptive model as well. The hyperbolic model (Reference Mazur, Commons, Mazure, Nevin and RachlinMazur, 1987) is a descriptive model, calculated as V = A / (1+kD), where V is the present value, A is the future amount, D is the delay,Footnote 1 and k is the discount rate. Although the hyperbolic and exponential models are often highly correlated,Footnote 2 the hyperbolic model often fits the data somewhat better (e.g., Reference KirbyKirby, 1997; Reference Kirby and MarakovicKirby & Marakovic, 1995; Reference Myerson and GreenMyerson & Green, 1995; Reference Rachlin, Raineri and CrossRachlin, Raineri, & Cross, 1991). Because most recent psychological studies of discounting have employed the hyperbolic model, our further analyses in this paper will focus on this model.
In addition to these two popular discounting metrics, others have been proposed and tested (for a recent review, see Reference DoyleDoyle, 2013). In contrast, investigations of different experimental procedures for eliciting discount rates are rare. A comprehensive review paper on discounting (Reference Frederick, Loewenstein and O’DonoghueFrederick, Loewenstein, & O’Donoghue, 2002) noted the huge variability in discount rates among studies, and hypothesized that heterogeneity in elicitation methods might be a major cause. Fifty-two percent of studies reviewed used choice-based measures, 31% used matching, and 17% used another method.
1.1 Measuring discount rates: Choice versus matching
Choice-based methods often present participants with a series of binary comparisons and use these to infer an indifference point, which is then converted into a discount rate. For example, suppose a participant, presented with a choice between receiving $10 immediately or $11 in one year, chooses the immediate option, and subsequently, presented with a choice between $10 or $12 in one year, chooses the future option. This pattern of choices implies that the participant would be indifferent between $10 today and some amount between $11 and $12 in one year. For analytic convenience, we assign their indifference point as the average of the upper and lower bound, which would be $11.50 in this case. This indifference point can then be converted into a discount rate using one of the discounting models discussed above. For example, using the continuously compounded exponential model, this would yield a discount rate of 14%. The matching method, in contrast, asks for the exact indifference point directly. For example, it might ask the participant what amount “X” would make her indifferent between $10 immediately and $X in one year.
How do discount rates from these two elicitation methods compare? Several studies have concluded that matching yields lower discount rates than choice (Reference Ahlbrecht and WeberAhlbrecht & Weber, 1997; Reference Manzini, Mariotti and MittoneManzini, Mariotti, & Mittone, 2008; Reference Read and RoelofsmaRead & Roelofsma, 2003). What are the reasons or mechanisms for this difference? One hypothesis is that, in choice, people are motivated to take the earlier reward, and pay relatively more attention to the delay (rather than the greater magnitude) of the later reward, whereas the matching methods, which typically ask for a match on the dollar dimension, focus them on the magnitude of the two rewards and thus results in a better attentional balance between the magnitude and delay attributes (Reference Tversky, Sattath and SlovicTversky, Sattath, & Slovic, 1988). This attentional hypothesis predicts order effects (specifically, that the first method will bias attention throughout the task), but unfortunately neither study investigated task order,Footnote 3 so it is difficult to know whether participants’ experience with one method influenced their answers on the other method. Reference FrederickFrederick (2003) compared seven different elicitation methods (choice, matching, rating, “total”, sequence, “equity”, and “context”) for saving lives now or in the future. He also found that matching produced lower discount rates than choice, but again, order effects were not explored. He speculated that the choice task creates demand characteristics: offering the choice between different amounts of immediate and future lives implies that one ought to discount them to some extent—“otherwise, why would the experimenter be asking the question” (Reference FrederickFrederick, 2003, p. 42). In contrast, the matching method makes no suggestions as to which amounts are appropriate.
Further evidence that characteristics of offered choice options can bias discount rates comes from a pair of studies that compared two variations on a choice-based measure. One version presented repeated choices that kept the larger-later reward constant with the amounts of the smaller-sooner reward presented in ascending order, while the other version employed the same choice pairs, but with the smaller-sooner amounts presented in descending order. The order of presentation influenced discount rates, such that participants were more patient (i.e., exhibited lower discount rates) when answering the questions in descending order of sooner reward (Reference Robles and VargasRobles & Vargas, 2008; Reference Robles, Vargas and BejaranoRobles, Vargas, & Bejarano, 2009). This suggests that observed discount rates are at least partly a function of constructed preference (Reference Stewart, Chater and BrownStewart, Chater, & Brown, 2006) and “coherent arbitrariness” (Reference Ariely, Loewenstein and PrelecAriely, Loewenstein, & Prelec, 2003), rather than a stable individual preference. One explanation offered by Robles and colleagues is the magnitude effect (i.e., the finding that people are relatively more patient for large magnitude gains than small magnitude gains, Reference ThalerThaler, 1981): in the descending condition, participants are first exposed to largest immediate outcomes, which may predispose them to choose the future option more readily. In the ascending condition, participants see the smallest magnitude outcomes first, which may predispose them to impatience. In other words, the theory is that participants construct their time preference during the first question or two, and then carry this preference forward into the rest of the task, consistent with theories of order effects in constructed choice such as Query Theory (Reference Weber and JohnsonWeber & Johnson, 2009; Reference Weber, Johnson, Milch, Chang, Brodscholl and GoldsteinWeber et al., 2007).
The question of how choice versus matching influences people’s answers has also been explored in the context of utility measurement and public policy (Reference BaronBaron, 1997), with conflicting results. Some studies suggest that choice methods are more sensitive to quantities (Reference Fischhoff, Quadrel, Kamlet, Loewenstein, Dawes, Fischbeck, Klepper, Leland and StrohFischhoff, Quadrel, Kamlet, Loewenstein, Dawes, Fischbeck, Klepper, Leland, & Stroh, 1993) and valued attributes (Reference Tversky, Sattath and SlovicTversky et al., 1988; Reference ZakayZakay, 1990), whereas other studies suggest matching is equally or more sensitive to quantities (Reference Baron and GreeneBaron & Greene, 1996; Reference McFaddenMcFadden, 1994). Throughout, a key component seems to be joint versus separate evaluation (Reference Baron and GreeneBaron & Greene, 1996; Reference Hsee, Loewenstein, Blount and BazermanHsee, Loewenstein, Blount, & Bazerman, 1999): people give more weight to difficult-to-evaluate attributes in joint evaluation. For example, suppose people find it easier to evaluate $10 than to evaluate the lives of 2,000 birds. In single evaluation (“Would you pay $10 to save 2,000 birds?”) people will put more weight on the financial cost, whereas in joint evaluation (“Would you choose Program A, that costs $10 and saves 2,000 birds, or Program B, that costs $20 and saves 4,000 birds?”) people will put more weight on the lives of the birds. Single versus joint evaluation is often confounded with matching versus choice, which may explain the conflicting conclusions in the literature.
While these studies show that elicitation method affects responses, they rarely specify which measure researchers ought to use to measure discount rates (or which methods to use when). One perspective on this normative question would argue that because preferences are constructed, the results from different measures are equally valid expressions of people’s preferences, and it is therefore impossible to recommend a best measure. However, when researchers are interested in predicting and explaining behaviors in real-world contexts, such behavior provides a metric by which to make a judgment. While several studies have shown such real-world behavior links for choice-based techniques, we are not aware of any published studies examining how well matching-based discount rates predict consequential decisions and whether they do so better or worse than discount rated inferred from choice-based methods.Footnote 4 Another criterion for selecting the “best” measure is to compare the psychometric properties of each. The ideal measure would be reliable, with low variance, low demand characteristics, a good ability to detect inattentive (or dishonest) participants, a quick completion time, and a straightforward analysis.
Another important question is how well these different elicitation techniques perform across a broad range of time delays and outcome dimensions. Most discounting studies have focused only on financial gains with delays in the range of a few weeks to a few years, but many consequential real-world intertemporal choices, such as retirement savings, smoking, or environmental decisions, involve future losses and much longer time delays. Studying the discounting of complex outcome sets on long timescales can be logistically difficult in the lab, if the goal is to make choices consequential: tracking down past participants in order to send them their “future” payouts is hard enough one year after a study, but doing so in 50 years may well be impossible. Truly consequential designs are even trickier when studying losses, since they require researchers to demand long-since-endowed money from participants who may not even remember having participated in the study. Fortunately, hypothetical delay-discounting questions presented in a laboratory setting do appear to correlate with real-world measures of impulsivity such as smoking, overeating, and debt repayment (Reference Chabris, Laibson, Morris, Schuldt and TaubinskyChabris et al., 2008; Reference Meier and SprengerMeier & Sprenger, 2012; Reference Reimers, Maylor, Stewart and ChaterReimers et al., 2009), suggesting that even hypothetical outcomes are worth studying.
1.2 Study 1
In Study 1, we compared matching with choice-based methods of eliciting discount rates for hypothetical financialFootnote 5 outcomes, in a mixed design. Half the participants completed matching first followed by choice, while the other half did the two tasks in the opposite order. This allowed us to analyze the data both within and between subjects. Within each measurement technique, delays of the larger-later option varied from 1 year to 50 years. Outcome sign was manipulated between subjects, such that half the participants considered current versus future gains, while the other half considered current versus future losses.
Within the choice-based condition, we compared two different techniques: fixed-sequence titration and a dynamic multiple-staircase method. The fixed-sequence titration method presented participants with a pre-set list of choices between a smaller, sooner amount and a larger, later amount, with all choices appearing on one page. The multiple-staircase was developed in psychophysics (Reference CornsweetCornsweet, 1962) and attempts to improve on fixed sequence titration in several ways. First, choice pairs are selected dynamically, which should reduce the number of questions participants need to answer (relative to a fixed sequence), yield more precise estimates, or both. Second, the staircase method approaches indifference points from above and below, thus reducing anchoring, and the interleaving of multiple staircases (eliciting several different indifference points at once, e.g., for different time delays) should attenuate false consistency, which should reduce demand characteristics and coherent arbitrariness. Third, we built consistency-check questions into the multiple-staircase method, which should enable it to better detect inattention or confusion.
At the end of the survey, we presented participants with a consequential choice between $100 today or $200 next year, and randomly paid out two participants for real money. We also asked participants whether they smoked or not, to get data on a consequential life choice.
We compared the three elicitation methods on four different criteria: ability to detect inattentive participants, differences in central tendency and variability across respondents, model fit, and ability to predict consequential intertemporal choices. We predicted that the multiple-staircase method would be best at detecting inattentive participants, because we designed it partly with this purpose in mind. We predicted that the choice-based methods would show higher discount rates than the matching method, due to demand characteristics (as discussed above, previous research suggest that the choice options presented to participants implicitly suggest discounting). We also predicted that the choice-based methods would be easier for participants to understand and use, based on anecdotal evidence from our own previous research indicating that participants often have a hard time understanding the concept of indifference, and have a hard time picking a number “out of the air”, without any reference or anchor. Finally, we predicted that the choice-based methods would be better at predicting the consequential choices, because there is a natural congruence in using choice to predict choice, and because previous studies have shown the efficacy of choice-based methods as a consequential-choice predictor, but none have done so for matching.
1.3 Methods
Five hundred sixteen participants (68% female, mean age=38, SD=13) were recruited from the virtual lab participant pol of the Center for Decision Sciences (Columbia University) for a study on decision making and randomly assigned to an experimental condition. Participants in the gain condition were given the following hypothetical scenario:
Imagine the city you live in has a budget surplus that it is planning to pay out as rebates of $300 for each citizen. The city is also considering investing the surplus in endowment funds that will mature at different possible times in the future. The funds would allow the city to offer rebates of a different amount, to be paid at different possible times in the future. For the purposes of answering these questions, please assume that you will not move away from your current city, even if that is unlikely to be true in reality.
The full text of all the scenarios can be found in the Supplement [A]. After reading the scenario, participants indicated their intertemporal preferences in one of three different ways. In the matching condition, participants filled in a blank with an amount that would make them indifferent between $300 immediately and another amount in the future (see the Supplement [B] for examples of the questions using each measurement method). Participants answered questions about three different delays: one year, ten years, and 50 years. Although some participants might expect to be dead in 50 years, the scenario described future gains that would benefit everyone in their city, so it was hoped that those future gains would still have meaning to participants. In the titration condition, participants made a series of choices between immediate and future amounts, at each delay. Because the same set of choice options was presented for each delay (see Supplement [B] for the list of options), the choice set offered a wide range of values, to simultaneously ensure that time delay and choice options were not confounded and allow for high discount rates at long delays. The order of the future amounts was balanced between participants, such that half answered lists with the amounts of the larger-later option going from low to high (as in the Supplement [B]), and others were presented with amounts going from high to low.
In the multiple-staircase condition, participants also made a series of choices between immediate and future amounts. Unlike the simple titration method, these amounts were selected dynamically, funneling in on the participant’s indifference point. Choices were presented one at a time (unlike titration, which presented all choices on one screen). Also unlike titration, the questions from the three delays were interleaved in a random sequence. The complete multiple-staircase method is described in detail in the Supplement [C].
In all conditions progress in the different tasks of the questionnaire was indicated with a progress bar, and participants could refer back to the scenario as they answered the questions. After completing the intertemporal choice task, participants were asked “What things did you think about as you answered the previous questions? Please give a brief summary of your thoughts.” This allowed us to collect some qualitative data on the processes participants recalled using while responding to the questions.
Next, participants answered the same intertemporal choice scenario using a different measurement method. Those who initially were given a choice-based measure (titration or multiple-staircase) subsequently completed a matching measure, while those who began with matching then completed a choice-based measure. In other words, all participants completed a matching measure, either before or after completing one of the two choice-based measures. Subsequently, participants were given an attention check, very similar to the Instructional Manipulation Check (Reference Oppenheimer, Meyvis and DavidenkoOppenheimer, Meyvis, & Davidenko, 2009) that ascertained whether participants were reading instructions.
After that, participants read an environmental discounting scenario (order of financial vs. environmental scenario was not counterbalanced), the full text of which can be found in the Supplement [A]. We do not discuss the results from this scenario because of possible conceptual confusion in the questions themselves, and other problems.
Next, participants provided demographic information, including a question about whether they smoked. Finally, participants completed a consequential measure of intertemporal choice, in which they chose between receiving $100 immediately or $200 in one year (note that participants in the loss condition still chose between two gains in this case, due to the fact that it would have be difficult to execute losses for real money). Participants were informed that two people would be randomly selected and have their choices paid out for real money, and this indeed happened.
1.4 Results and discussion
1.4.1 Detecting inattentive participants
In most psychology research, and especially in online research, a portion of respondents does not pay much attention or does not respond carefully. It is helpful, therefore, if measurement methods can detect these participants. The multiple-staircase method had two built-in check questions (described in the Supplement [C]) to detect such participants. The titration method can also detect inattention in some cases, by looking for instances of switching back and forth, or switching perversely. For example, if a participant preferred $475 in one year over $300 today, but preferred $300 today over $900 in one year, this inconsistency would be a sign of inattention. Titration cannot, however, differentiate between “good” participants and those who learn what a “good” pattern of choices looks like and reproduce such a pattern for later questions, without carefully considering each subsequent question individually. It is nearly impossible for a single matching measure to detect inattention, but with multiple measures presented at different time points, matching may identify those participants who show a non-monotonic effect of time. For example, if the one-year indifference point (with respect to $300 immediately) is $5,000, the ten-year indifference point is $600, and the 50-year indifference point is $50,000, this inconsistency might be evidence of inattention.
As described above, each participant also completed another attention check, very similar to the Instructional Manipulation Check (IMC, Reference Oppenheimer, Meyvis and DavidenkoOppenheimer et al., 2009). As this measure has been empirically shown to be effective for detecting inattentive participants, we compared the ability of each measurement method to predict IMC status.
Correlations between the IMC and each measure’s test of attention revealed that while neither matching, r (253) = .06, p > .1, nor titration, r (124) = .04, p > .1 were able to detect inattentive participants, the multiple-staircase method had modest success, r (132) = .21, p < .05. Overall, then, no method was particularly effective at detecting inattentive participants, but the multiple-staircase method was apparently superior to the other two. It was not surprising that multiple-staircase performed better in this regard, given that it was designed partly with this purpose in mind, but it was surprising that titration did not outperform matching in this regard, given that titration gave participants ten times as many questions, and as such provided more opportunities to detect inattention.
For all of the following analyses, we compared only those participants who were paying attention and reading instructions, because the indifference points of participants who did not read the scenario are of questionable validity. Also, it is fairly common in research on discounting (and online research in particular) to screen out inattentive participants (see Reference Ahlbrecht and WeberAhlbrecht & Weber, 1997; Reference Benzion, Rapoport and YagilBenzion, Rapoport, & Yagil, 1989; Reference ShelleyShelley, 1993). Therefore, we excluded those participants who failed the IMC, leaving 316 participants (61% of the original sample) for further analysis. The rate of inattentive participants did not vary as a function of measurement condition, χ2 (2,N=516)=0.8, p=.67. (For analyses with all participants included, see the Supplement [D]. Overall, the results are quite similar, but variance and outliers are increased.)
1.4.2 Differences in central tendency and spread
Choice indifference points were determined for each scenario, participant, and time delay as follows: in matching, the number given by participants was used directly. For titration, the average of the values around the switch point was used. For example, if a participant preferred $300 immediately over $475 in ten years, but preferred $900 in ten years over $300 immediately, the participant was judged to be indifferent between $300 immediately and $687.50 in ten years. For multiple-staircase, the average of the established upper bound and lower bound was used, in a similar manner to titration. These choice indifference points were then converted to discount rates, using the hyperbolic model described in the introduction.
Because order effects were observed (which we describe below), the majority of the analyses to follow will focus on the first measurement method that participants completed. This leaves n=154 in the matching condition, n=82 in the titration condition, and n=80 in the multiple-staircase condition.Footnote 6 Discount rates for financial outcomes in each condition are summarized in Table 1. Because skew and outliers were sometimes pronounced, this table lists median and interquartile range in addition to mean and standard deviation.
As shown in Table 1 and consistent with prior results (Reference Ahlbrecht and WeberAhlbrecht & Weber, 1997; Reference Read and RoelofsmaRead & Roelofsma, 2003), financial discount rates measured with the choice-based methods (titration and multiple-staircase) were generally higher than discount rates measured with matching, and this was particularly true for gains. A Kruskal-Wallis non-parametric ANOVA of the financial gain data, χ2 (2, n = 148) = 26.8, p < .001, and of the loss data, χ2 (2, n = 168) = 6.8, p = .03, confirmed a significant effect of elicitation method on discount rates.
We hypothesized that these differences in discount rates were partly a function of anchoring or demand characteristics. In other words, the extreme options sometimes presented to participants (such as a choice between $300 today and $85,000 in one year) may have suggested that these were reasonable choices, and so encouraged higher discount rates. Consistent with this explanation, an earlier study from our lab (Reference Hardisty and WeberHardisty & Weber, 2009, Study 1)—with a different sample recruited from the same participant population and using titration for financial outcomes—presented participants with a much smaller range of options ($250 today vs. $230 to $410 in one year) and yielded much lower median discount rates: 0.28 for gains, and 0.04 for losses, compared with medians of 1.59 for gains and 0.14 for losses in the present study. It seems, then, the range of options presented to participants affected their discount rates by suggesting reasonable options as well as by restricting what participants could or could not actually express (for example, an extremely impatient person might prefer $250 today over $1,000 next year, but if the maximum choice pair is $250 today vs. $410 in one year, the experimenter would never know). We also tested for the influence of the options presented to participants by comparing the two orderings, descending and ascending. As summarized in Figure 2, this ordering manipulation did indeed affect responses. Consistent with previous studies on order effects in titration (Reference Robles and VargasRobles & Vargas, 2008; Reference Robles, Vargas and BejaranoRobles et al., 2009), participants exhibited lower discount rates in the descending order condition, possibly as a result of the magnitude effect. Losses show the opposite order effect (by “ascending order” for losses, we mean ascending in absolute value), which is consistent with the reverse magnitude effect recently found with losses (Reference Hardisty, Appelt and WeberHardisty, Appelt, & Weber, 2012): people considering larger losses are more likely to want to postpone losses.
A Mann-Whitney U test comparing the descending and ascending orderings for losses was significant, W = 363, n = 43, p < .01, showing greater discounting in the descending order. A similar test comparing the two orderings for gains was not significant, W = 141, n = 38, p = .28, (the sample size was somewhat small here, with only 17 participants in the ascending condition, and 21 in the descending) but was in the predicted direction, with higher discount rates in the ascending order condition.
While discount rates were generally much higher when using the choice-based methods, we believe that this difference was due to the large range of options that we presented to participants, and it would be possible to obtain the opposite pattern of results if a smaller range were used. For example, if the maximally different choice pair were $300 today vs. $400 in the future (rather than $85,000, as we used), this presentation would result in lower discount rates, both through demand characteristics (suggesting that lower discount rates are reasonable) and by restricting the range of possible answers. Further evidence for this explanation comes from a within-subject analysis comparing the different methods: although all participants completed a matching measure, some did so before a choice method, and some did so after a choice method. Comparing these participants reveals a significant effect of order, as seen in Figure 3, such that discount rates assessed by matching were larger when they followed a choice-based elicitation method.
A Mann-Whitney U test confirmed that participants gave different answers to the matching questions depending on whether they completed a choice based measure first or second, both for gains, W = 4373, n = 147, p < .001, and losses, W = 4211, n = 166, p = .02. Furthermore, participants’ answers to the matching and choice-based questions were strongly correlated, Spearman’s r(269) = .52, p < .001.
Just as the mean and median discount rates yielded by the choice-based methods were higher than those from the matching method, so too was the spread of the distributions from the choice-based methods larger. The interquartile range (IQR) from the matching method for gains was 1.1, compared with 2.0 from multiple-staircase and 3.0 from titration. Similarly, the IQR for matching losses was only 0.41, compared with 0.54 from multiple staircase and 0.52 from titration. It is likely that the same factors that led to the higher medians in the choice-based methods also produced the greater IQR.
Overall, then, we have three pieces of evidence suggesting that the options presented to participants in the choice-based methods affected discount rates: (1) discount rates were higher when using the choice-based methods, (2) ordering of choice pairs (ascending or descending) in the titration condition affected discount rates, and (3) discount rates first elicited with choice methods went on to influence later responses elicited with matching. While matching has the advantage of not providing any anchors or suggestions to participants, it is nonetheless still quite susceptible to influence from other sources. This is not a particularly novel finding, as theories and findings of constructed preference (Reference Johnson, Haubl and KeinanJohnson, Haubl, & Keinan, 2007; Reference Stewart, Chater and BrownStewart et al., 2006; Reference Weber, Johnson, Milch, Chang, Brodscholl and GoldsteinWeber et al., 2007) and coherent arbitrariness (Reference Ariely, Loewenstein and PrelecAriely et al., 2003) are plentiful. However, it has not received as much attention in intertemporal choice as in other areas (particularly risk preference). Many differences in discount rates between studies may be explained by differences in the amount and order of options that experimenters presented to participants.
1.4.3 Differences in model fit
Considering that the hyperbolic model is currently the dominant descriptive model of discounting, a measurement method may be more desirable if it conforms to this model more closely. Therefore, we fit k values (using least squares) separately for each participant, and calculated the average r 2 of each method, as summarized in Table 2. Matching produced the best fit for both gains and losses, suggesting that the results from matching are the most consistent with the hyperbolic model. One possible reason for the superior fit of matching is that it provides exact indifference points, whereas the choice methods merely provide boundaries on indifference points.
1.4.4 Predicting consequential choices
Researchers are often interested in discount rates because they would like to better understand the real, consequential choices that people make. We therefore compared the ability of discount rates elicited in different ways for the hypothetical scenarios to predict two consequential choices. First, we used the 1-year hyperbolic discount rateFootnote 7 to predict whether participants chose to receive $100 today or $200 in one year in the final consequential choice they made as part of our study. (Overall, 25% of participants chose the immediate $100.) As seen in Table 3, the correlations were always positive, meaning that participants with higher assessed discount rates were more likely to choose the immediate $100. The correlations from the choice based measures were generally higher, but the only significant difference between the correlations was that titration outperformed both matching (p<.001) and multiple-staircase (p=.04). The generally stronger performance of the choice-based methods is consistent with the method compatibility principle (Reference Weber, Johnson, Glimcher, Camerer, Fehr and PoldrackWeber & Johnson, 2008): predicting a choice will be more accurate with a choice-based measure than a fill-in-the-blank measure.
Second, we looked at the ability of these 1-year discount rates to predict a real-life choice: whether each participant was a smoker or not. The prediction is that people who discount future financial outcomes may also discount their future health, and so be more likely to smoke (Reference Bickel, Odum and MaddenBickel, Odum, & Madden, 1999). As seen in Table 3, the choice-based methods were sometimes able to predict this (with higher discount rates correlating with smoking), while the matching method was not. The multiple-staircase method was significantly (p=.02 and p=.08) better at predicting smoking rates with discount rates for gains, while titration was directionally better at predicting using discount rates for losses (p=.14 and p=.11). We don’t have a good explanation for this difference, other than random fluctuations. However, the overall trend was that choice methods yielded greater predictive power than matching. This may stem from the fact that participants found it easier to understand and respond to the choice-based measures of discounting and thus had less error in responding.
2 Study 2
These results suggest that researchers interested in predicting consequential intertemporal choices should employ choice-based methods. However, the results are a bit thin and inconsistent. Therefore, in Study 2, we included measures of eighteen real-world behaviors (in addition to the $100 vs. $200 consequential choice) that have previously been found to correlate with discount rates (Reference Chabris, Laibson, Morris, Schuldt and TaubinskyChabris et al., 2008; Reference Reimers, Maylor, Stewart and ChaterReimers et al., 2009), allowing us to compare the measures more rigorously in this regard. A second shortcoming of Study 1 is that a large proportion of participants gave inattentive or irrational answers and had to be excluded from the data set. We addressed this in Study 2 by recruiting a more conscientious group of participants and building checks into each measure that prevent participants from giving non-monotonic or perverse answers. A third weakness of Study 1 is that our complex multiple-staircase method did not perform well, possibly because participants found it difficult to use. Therefore, in Study 2, we tested a simpler dynamic choice method.
2.1 Methods
316 U.S. residents with at least a 97% prior approval rate were recruited from Amazon Mechanical Turk for a study on decision making and paid a flat rate of $1. Participants (59% female, mean age=34, SD=11.9) were randomly assigned to one of three conditions: matching, titration, or single-staircase (described below). Participants answered questions about immediate versus future gains and losses (in counterbalanced order) at delays of 6 months, 1 year, and 10 years.Footnote 8 In the matching condition, participants saw this instruction:
Imagine you could choose between receiving [paying] $300 immediately, or another amount 6 months [1 year | 10 years] from now. How much would the future amount need to be to make it as attractive [unattractive] as receiving [paying] $300 immediately?
Please fill in the dollar amount that would make the following options equally attractive [unattractive]:
A. Receive [Lose] $300 immediately.
B. Receive [Lose] $____
After participants entered a value for each delay, an automated script checked whether the amounts increased or decreased in a monotonic fashion. If not, the participant was given the instruction, “Your answers were inconsistent over time. Please try this scale again, and be more careful as you answer”, and was forced to go back and enter new values. Note that participants were allowed to show negative discounting (for example, indicating that losing $300 today would be equivalent to losing $290 in six months, $280 in one year, and $120 in ten years). Such a pattern of responding has previously been observed in studies of discounting (Reference Hardisty, Appelt and WeberHardisty et al., 2012; Reference Hardisty and WeberHardisty & Weber, 2009) and can be considered a rational way to avoid dread (Reference HarrisHarris, 2010).
In the titration condition, participants saw this instruction: “Imagine you could choose between receiving [paying] $300 immediately, or another amount 6 months [1 year | 10 years] from now. Please indicate which option you would choose in each case:” Participants then made a series of 10 choices at each delay (the complete list can be found in the Supplement [E]), such as “Receive $300 immediately OR Receive $350 in 6 months”. The future amounts ranged from $250 to $10,000. As in Study 1, all questions for a given sign were presented on one page. In other words, one page had 30 questions about immediate versus future gains, and another page (in counterbalanced order) had 30 questions about immediate versus future losses. Participants’ answers were automatically checked for nonmonotonicity or perverse switching (such as choosing to receive $350 in one year over $300 immediately, and then choosing $300 immediately over $400 in one year), and participants were forced to go back and redo their answers if they violated either of these principles.
In the single-staircase condition, participants answered one question per page, and each choice option was dynamically generated, using bisection. The upper and lower ends of each staircase were set to $200 and $15,000 (the maximum and minimum possible implied indifference points using the titration scale). The first question cut the possible range in half, and thus was always “Receive $300 immediately OR Receive $7,600 in 6 months.” The next question then cut the range in half again, in the direction indicated by the prior choice. For example, if the participant chose the future option, the second question would be “Receive $300 immediately OR Receive $3,900 in 6 months.” Ten questions were asked in this way at each time delay.
Subsequently, all participants answered a number of demographic questions (mostly drawn from Reference Chabris, Laibson, Morris, Schuldt and TaubinskyChabris et al., 2008; Reference Reimers, Maylor, Stewart and ChaterReimers et al., 2009): gender, age, education, relevant college courses, ethnicity, political affiliation, height, weight, exercise, dieting, healthy eating, dental checkups, flossing, prescription following, tobacco use, alcohol consumption, cannabis use, other illegal drug use, age of first sexual intercourse, recent relationship infidelity, annual household income, number of credit cards, credit-card late fees, carrying a credit card balance, savings, gambling, wealth relative to friends, wealth relative to family, and available financial resources. Finally, participants made a consequential choice between a $100 Amazon gift certificate today or a $200 gift certificate in one year, and one participant was randomly selected and paid out for real money. The full text of all Study 2 materials can be found in the Supplement [E].Footnote 9
2.2 Results and Discussion
Data were excluded from 5 participants with duplicate IP addresses, 13 participants who did not finish the study, and 22 participants who failed an attention check (similar to the attention check used by Reference Oppenheimer, Meyvis and DavidenkoOppenheimer et al., 2009), leaving 276 for further analysis. Although the rate of attention check failure did not vary by condition, χ2 (2, N=311)=2.5, p=.29, the number of participants that failed to complete the study did: 10% of participants in the staircase condition dropped out, compared with 3% in the titration condition and 0% in the matching condition, χ2 (2, N=311)=13.01, p<.01. This difference in completion rates may have been caused by the fact that participants in the staircase condition were more likely to fail the monotonicity checks (and be forced to go back and complete the measure again). When considering immediate versus future gains, 0% of participants in the matching condition initially gave answers that were non-monotonic over time, compared with 2% in the titration condition (with the titration condition including both nonmonotonic answers and perverse switching), and 13% in the staircase condition, a significant difference with a Chi-square test, χ2 (2, N=276) = 18.3, p < .01. Follow-up pairwise comparisons showed that while the staircase method showed more nonmonotonicities than each of the other two methods, matching and titration did not differ from each other. Similarly, when comparing immediate versus future losses, 2% of participants in the matching condition initially gave answers that were non-monotonic over time, compared with 4% in the titration condition, and 10% in the staircase condition, a significant difference, χ2 (2, N=276) = 6.3, p < .05. In follow-up pairwise comparisons, a greater proportion of participants showed nonmonotonicities when using staircase than when using matching, p = .05, but no other comparisons were significant. Across conditions, only 76% of participants who failed a monotonicity check finished the study, compared with 99% of those who passed all monotonicity checks, a significant difference in proportions, z=3.2 p<.01.
Another possible reason dropout rates were higher in the staircase condition is that the study was significantly longer: participants spent a median of 10.6 minutes completing the survey in the staircase condition, compared with 7.8 in the titration condition and 6.3 in the matching condition, a significant difference with a Kruskal-Wallis test, χ2 (2, N=276) = 70.2, p < .01. Pairwise tests were all significant, ps < .01.
2.2.1 Differences in central tendency and spread
Hyperbolic discount rates were calculated using the same method as for Study 1. The distributions of discount rates were generally quite skewed, so we will focus our analysis on non-parametric measures and tests. As can be seen in Table 4, the matching method produced larger, more variable discount rates than the two choice based methods. Kruskal-Wallis tests for gains, χ2 (2, n = 276) = 31.0, p < .001, and losses, χ2 (2, n = 276) = 9.1, p = .01, confirmed that median discount rates varied as a function of measurement method. Follow-up pairwise comparisons confirmed that while matching was higher than each of the choice based methods, the choice-based methods did not differ significantly from each other.
This pattern of discount rates is opposite of that observed in Study 1 (where discount rates from matching were lowest, and the spread from matching was smallest). The difference between the Study 1 and 2 results can probably be attributed to the fact the range of choice options in the titration and staircase conditions was much lower in Study 2 (maximum future payout = $10,000) than in Study 1 (maximum future payout = $85,000). This reinforces the importance of not only the method that researchers use (ie, choice vs matching), but the specific details of that method (ie, the range and order of the choice options).
2.2.2 Differences in model fit
As in Study 1, we computed how well the results of each measure fit the hyperbolic model. As seen in Table 5, matching showed the best fit, replicating the results of Study 1.
2.2.3 Predicting consequential choices
Finally, and perhaps most importantly, we compared the correlations of each measure with real-world outcomes. We reverse scored several items (indicated in Table 6) so that positive values indicate a correlation in the predicted direction. As can be seen in Table 6, the choice-based measures generally outperformed matching, with correlations around .18 (compared with .10 on average for matching). Replicating Study 1, the choice-based measures significantly predicted tobacco use, while matching did not. Unlike Study 1, both matching and the choice-based measures significantly predicted the consequential choice between $100 now and $200 in one yearFootnote 10. This improvement might be attributed to the fact that in Study 2 we used a subject population (MTurk workers) that may be more experienced with survey questions and more careful than our previous subject population. It is also notable that credit-card behavior and savings behavior were generally well predicted by discount rates, which makes sense because these are clean examples of real-life choices between gaining or losing money now or in the future.
§ this item was reverse scored;
** p<.01;
* p<.05;
† p<.01.
3 Conclusions
Choice-based measures of discounting are a double-edged sword, to be used carefully. On the one hand, they generally outperform matching at predicting consequential intertemporal choices. On the other hand, the options (and order of options) that researchers use will influence participants’ answers, so experimental design and interpretation must be done with care. Matching introduces less experimenter bias, is faster to implement, and produces a better fit with the hyperbolic model of discounting. Differences in discount rates observed between studies may be partly attributed to differences in elicitation technique, consistent with long-established research on risky choice that has come to the same conclusion (Reference Lichtenstein and SlovicLichtenstein & Slovic, 1971; Reference Tversky, Sattath and SlovicTversky et al., 1988). Across all methods, we found strong evidence for the sign effect (gains being discounted more than losses), replicating previous research (Reference Frederick, Loewenstein and O’DonoghueFrederick et al., 2002; Reference Hardisty and WeberHardisty & Weber, 2009; Reference ThalerThaler, 1981). In other words, participants’ desire to have gains immediately was stronger than their desire to postpone losses.
In Study 1, the choice-based measures presented participants with a large range of outcomes, and therefore yielded higher discount rates than the matching method, whereas in Study 2 the range of outcomes was more restricted and thus the choice-based measures yielded lower discount rates. We agree with Reference FrederickFrederick (2003) that the range of options participants see implicitly suggests appropriate discount rates. Consistent with this, when doing within-subject analyses, we found strong order effects; participants gave very different responses to the matching questions depending on whether they completed them before or after a choice-based method. Therefore, future research on methods should be careful to counterbalance and investigate order effects.
Another disadvantage of choice-based methods is that they take longer for participants to complete (and longer for the experimenter to analyze). In our studies, the matching method was about 1.5 minutes faster than titration (and 4.3 minutes faster than the staircase method). Thus may seem trivial, but if participants are financially compensated for their time, matching is cheaper to run, and if the research budget is limited, matching therefore allows for larger sample sizes or more studies.
In comparison with the standard, fixed-sequence titration method, we did not find compelling advantages for the complex multiple-staircase method we developed in Study 1, nor for the simple dynamic staircase method we tested in Study 2. This is consistent with another recent study on dynamic versus fixed-sequence choice, which also found no mean differences (Reference Rodzon, Berry and OdumRodzon, Berry, & Odum, 2011). In some ways, it is disappointing that our attempts to improve measurement were unsuccessful. However, the good news is that the simple titration measure, which is much more convenient to implement, remains a useful method.
While we focused on choice and matching elicitation methods because these have been most commonly used in the literature, it should be noted that many other techniques have recently been tested and compared, including intertemporal allocation, evaluations of sequences, intertemporal auctions, and evaluation of amounts versus interest rates (Reference Frederick and LoewensteinFrederick & Loewenstein, 2008; Reference Guyse and SimonGuyse & Simon, 2011; Reference Manzini, Mariotti and MittoneManzini et al., 2008; Reference Olivola and WangOlivola & Wang, 2011; Reference Read, Frederick and ScholtenRead, Frederick, & Scholten, 2012). All of these investigations have found differences in discount rates based on the elicitation methods. Taken together with our results, this strongly suggests that intertemporal preferences are partly constructed, based on the manner in which they are elicited. At the same time, the correlations between lab measured discount rates and real-world intertemporal choices such as smoking establish that intertemporal preferences are also partly a stable individual difference that is manifested across diverse contexts.
In terms of best practices for studying temporal discounting, our recommendation depends on the goal of the research project. If the goal is to predict real-world behavior and outcomes, choice-based methods should be used, whereas if the goal is to minimize experimental demand effects, secure a good model fit, or quickly obtain an exact indifference point, matching should be used.