1 Introduction
Many choice situations involve numeric values. Numbers indicate quantities, prices, rankings, and they serve as arbitrary labels or identification codes. A recent literature related to the Chinese culture shows that tastes and distastes for particular numbers can influence decisions and affect market prices. Vehicle license plates with the lucky number eight are auctioned at relatively high prices, and vehicle plates with the unlucky number four are auctioned at relatively low prices (Reference Woo and KwokWoo & Kwok, 1994; Reference Woo, Horowitz, Luk and LaiWoo, Horowitz, Luk & Lai, 2008; Reference Chong and DuChong & Du, 2008; Reference Ng, Chong and DuNg, Chong & Du, 2010). In housing markets, houses with a number ending in eight are traded at a premium, whereas houses with a number ending in four are traded at a discount (Reference Bourassa and PengBourassa & Peng, 1999; Reference Chau, Ma and HoChau, Ma & Ho, 2001; Reference Agarwal, He, Liu, Png, Sing and WongAgarwal, He, Liu, Png, Sing & Wong, 2014; Reference Fortin, Hill and HuangFortin, Hill & Huang, 2014; Reference Shum, Sun and YeShum, Sun & Ye, 2014). In financial markets, culture-inspired number preferences cause particular limit-order and transaction prices to be more frequent than other ones (Reference Brown, Chua and MitchellBrown, Chua & Mitchell, 2002; Reference He and WuHe & Wu, 2006; Reference Cai, Cai and KeaseyCai, Cai & Keasey, 2007; Reference Brown and MitchellBrown & Mitchell, 2008; Reference Bhattacharya, Kuo, Lin and ZhaoBhattacharya, Kuo, Lin & Zhao, 2016). Moreover, the shares of newly listed firms with lucky listing codes seem to be overvalued and underperform those with unlucky listing codes (Reference Hirshleifer, Jian and ZhangHirshleifer, Jian & Zhang, 2014).
Tradition or cultural background is just one possible determinant of tastes and distastes for particular numbers. In the present paper, we map a variety of other determinants in the context of two different lottery games. The first is the Dutch Lotto, a nationwide six-number lottery. For 175 consecutive draws that span a two-and-a-half year period, we have five million choices of combinations of six different numbers between 1 and 45. The second is a lottery that was organized as a promotional event by a large casino company in the Netherlands in 2013 and 2014. We have the complete collection of entries for each of the two years, for an aggregate of more than five hundred thousand choices of combinations of four numbers between 0 and 36.
The question whether people in these lottery games exhibit a systematic preference for particular numbers is interesting from multiple perspectives. First, our data provide a real-life test-bed for various behavioral regularities. The orientation of the games towards chance and prediction, the use of particular choice forms, the fact that people choose numbers in combinations, and the availability of specific numbers in the decision context, allow for the testing of a variety of psychological phenomena. Second, the preferences that we document here may also play a role in areas outside that of lotteries. Numerical labels and indicators abound in the environments of, for example, consumers, investors, entrepreneurs, and experimental subjects. If people have number preferences, these labels and indicators could influence their choices. The studies cited in the first paragraph above illustrate that the economic impact of such preferences is potentially significant. Last, understanding how people behave in lotteries is interesting in its own right. Many countries have one or more large lotteries in which people can choose the numbers they play with. Worldwide, households spend a significant portion of their income on lotteries, with total expenditures amounting to hundreds of billions of dollars (Reference Kearney, Tufano, Guryan and HurstKearney, Tufano, Guryan & Hurst, 2011; Reference Beckert and LutterBeckert & Lutter, 2013).
Our results are surprisingly similar across the two games. Players have a tendency to play with the personally meaningful numbers in their birthdate, age, and postal code. They also more frequently choose numbers that are situationally available: there is a preference for numbers (i) in the current date, (ii) in the date of the draw, (iii) forming the jackpot size, (iv) representing the remaining time until the draw shown on the screen, and (v) on a voucher that players need in order to participate.
We also find evidence that the spatial position of numbers matters. The two lottery games employ a different range of numbers and tabulate these numbers in a different way. In both lotteries, players are attracted towards numbers in the center of the choice form and avoid numbers at the edges. Our final result for individual numbers is that frequent players avoid the winning numbers from recent draws, whereas infrequent players chase these.
For combinations of numbers we find that players care about aesthetics. With only a few exceptions, the most popular combinations all represent numeric sequences or spatial patterns. These combinations are selected extremely often in comparison with what would be expected if people choose randomly. Furthermore, players spread their numbers relatively evenly across the range of possible numbers.
Our study is not the first to investigate number preferences in lottery games, but it is distinct in terms of data and scope. Many earlier studies rely on indirect or aggregated data, analyzing the number of winners given particular draw results (Reference ChernoffChernoff, 1981; Reference Cook and ClotfelterCook & Clotfelter, 1993; Reference TerrellTerrell, 1994; Reference FinkelsteinFinkelstein, 1995; Reference ScogginsScoggins, 1995; Reference HaighHaigh, 1997; Reference Cox, Daniell and NicoleCox, Daniell & Nicole, 1998; Reference Papachristou and KaramanisPapachristou & Karamanis, 1998; Reference Farrell, Hartley, Lanot and WalkerFarrell, Hartley, Lanot & Walker, 2000; Reference Roger and BroihanneRoger & Broihanne, 2007) or the overall popularity of individual numbers or combinations (Reference JoeJoe, 1987; Reference Halpern and DevereauxHalpern & Devereaux, 1989; Reference Stern and CoverStern & Cover, 1989; Reference Clotfelter and CookClotfelter & Cook, 1993; Reference HenzeHenze, 1997; Reference SimonSimon, 1999; Reference DingDing, 2011; Reference Lien, Yuan and ZhengLien, Yuan & Zheng, 2015; Reference Lien, Yuan and ZhengLien & Yuan, 2015). To the best of our knowledge, only Reference Suetens and TyranSuetens and Tyran (2012) and Suetens, Galbo-Jørgensen & Tyran (2015) use detailed individual-level data on lottery players and number choices. All these studies focus on a subset of the behavioral regularities that we consider in the present paper.
2 Games and data
2.1 Lotto game
Generating €144 million in revenues in 2014, the Dutch Lotto is one of the largest nationwide lotteries in the Netherlands (annual report De Lotto, Reference De2014). Draws take place every Saturday at 6pm CET. On the last Saturday of every month (“Super Saturday”) there are two draws. Players choose six numbers from the range of 1 to 45, and additionally one color from six. Bets cost €2 each, and prizes are awarded for matching at least two of the numbers drawn. The more numbers a player matches, the bigger the prize. During our sample period, the progressive jackpot had a minimum value of €7.5 million and increased by half a million each time it was not awarded. A player wins the jackpot if she matches all six numbers and the jackpot color. If there is more than one winner, the jackpot is shared. The chance of winning the jackpot or a share of it is roughly one-in-49-million. Table S1 in the Supplement displays the probabilities for the smaller prizes.
Our data consists solely of online transactions. When making an online transaction, a player is first asked how many combinations she wishes to bet on. Next, she chooses the numbers and color of each combination, and decides how many draws she wants to participate in (maximum of twelve). Our analyses ignore the number of chosen repetitions, because there is only one decision process underlying a string of automatically repeated bets. Figure S1 in the Supplement shows the online Lotto form.
By default, the computer system generates a random combination for each bet. A player can choose whether to play with this combination, to generate another random combination, to adjust one or more numbers manually, or to choose a combination from scratch. Unfortunately, we do not know when default combinations were used.
In our standard approach we weight each chosen combination equally, regardless of how many other combinations the same player bets on. As a robustness check, we also conduct analyses in which we weight observations by the reciprocal of the total number of combinations chosen by the player in our sample period.
Our anonymized data set consists of 2,590,919 online transactions for the Dutch Lotto between April 19, 2010 and December 31, 2012. A total of 175 draws took place in this time period. For the 5,108,343 chosen combinations in our data set we know the date of the transaction and the date of the draw. For the 131,407 (anonymous) players we know their gender, birthdate, and the four digits of their postal code.Footnote 1 A majority of 73% of the players are male and 84% of the combinations are entered by males.
2.2 Casino game
Our data for the casino game derive from two identical promotional events organized by Holland Casino in 2013 and 2014. Anyone who visited a casino of this Dutch state-owned company between May 2 and June 9, 2013 or May 6 and June 9, 2014 received a voucher with a login code. Via a terminal inside the casino and via the Internet this code granted access to a lottery where players had to predict the outcomes of four consecutive spins of a roulette wheel with pockets numbered from 0 to 36. Participants were competing for a guaranteed prize of €100,000, to be shared by those who predicted the correct numbers in the correct order. If nobody would win according to this criterion, then the prize would be shared by all players who predicted the correct numbers irrespective of order. If nobody would win on the basis of all four numbers, the prize would be awarded on the basis of the first three numbers alone. Unlike Lotto, players were not offered the possibility to use randomly generated numbers.
Our anonymized data consist of all 323,896 combinations of four numbers entered in 2013 and all 245,091 entered in 2014. For each combination we know the voucher code, the date of play, the player’s gender, and the player’s birthdate. The data set from 2014 also contains a unique number for each of the 112,473 players. For 2013 such a unique number is not available. The percentages of combinations entered by male players in 2013 and 2014 are 54.9 and 58.6, respectively.
If we analyze the two years separately, the results are strikingly similar. For example, as illustrated in Figure S2 in the Supplement, the correlation between the individual number frequencies is equal to 0.98 and the differences are small. In the subsequent sections we therefore present the results for the pooled data.
3 Number frequencies
If players in the Lotto game pick their numbers randomly, each number is expected to be chosen 13.3% of the time (6/45). Figure 1 depicts the actual frequencies. The most popular number in the Lotto data is 11, picked in 16.5% of the combinations. The number 7 follows closely (16.3%). The least popular numbers are 37 and 38 (10.3% and 10.5%, respectively). Overall, we observe that players have a tendency to pick small numbers. Figure 2 presents the frequencies in a heat map, where the numbers are displayed in a matrix as they appear on the Lotto website.
Similar results emerge in the casino game. Figure 3 shows the selection frequencies for the 37 numbers. Under random number selection each number would be chosen 2.70% of the time (1/37). Again, we observe a preference for small numbers. The most popular number is 7, chosen 4.19% of the time, closely followed by 8 (4.05%). The most frequently picked number in the Lotto data, 11, is the fourth most popular number in the casino data (3.46%). The least popular numbers are 34 (1.43%) and 35 (1.64%). Figure 4 presents the frequencies in a heat map, with the numbers displayed as they appear on the roulette table. This presentation was also used on the vouchers and on the screen when players entered their predictions.
These results are in line with past research. Other lottery studies have similarly found that players have a preference for small numbers (Reference Stern and CoverStern & Cover, 1989; Reference FinkelsteinFinkelstein, 1995; Reference Cox, Daniell and NicoleCox et al., 1998; Reference Papachristou and KaramanisPapachristou & Karamanis, 1998; Reference Farrell, Hartley, Lanot and WalkerFarrell et al., 2000; Roger & Briohanne, Reference Roger and Broihanne2007; Reference Oyeleke and OtekunrinOyeleke & Otekunrin, 2014; Reference Suetens, Galbo-Jørgensen and TyranSuetens et al., 2015). A possible explanation is that smaller numbers are more present in everyday life and easier to recall, and thus more likely to be personally relevant and prominently available in memory (Reference MilikowskiMilikowski, 1995). The popularity of 7 seems to be a general phenomenon. Without exception, lottery studies find that 7 is among the most popular numbers. Experimental studies similarly document a preference for this number (Reference SimonSimon, 1971; Reference Simon and PrimaveraSimon & Primavera, 1972; Reference HeywoodHeywood, 1972; Reference Kubovy and PsotkaKubovy & Psotka, 1976; Reference TeigenTeigen, 1983; Reference Silver, McCulley, Chambliss, Charles, Smith, Waddell and WinfieldSilver et al., 1988). Footnote 2
Studies that looked at color preferences find that blue is the most frequently chosen color (Reference SimonSimon, 1971; Reference Simon and PrimaveraSimon & Primavera, 1972; Reference TruemanTrueman, 1979; Reference Silver, McCulley, Chambliss, Charles, Smith, Waddell and WinfieldSilver et al., 1988).Footnote 3, Footnote 4 Among our Lotto players, the most popular jackpot color is blue as well (22.2%), followed by red (18.9%), green (17.6%), yellow (14.6%), purple (13.4%), and orange (13.3%). In the game of roulette, half the numbers 1–36 are black and the other half are red (0 is green), and when the casino game players entered their predictions the numbers were displayed in these colors. The average selection frequency of red numbers is 2.75%, which is significantly higher than the average for black numbers of 2.68% (z-test; p < 0.001).
In both games, odd numbers are more popular than even numbers (Lotto: 13.5% vs. 13.1%; Casino: 2.77% vs. 2.63%); among the odd numbers, prime numbers are more popular than non-prime numbers (Lotto: 14.0% vs. 13.0%; Casino: 3.14% vs. 2.32%) and among the even numbers, non-round numbers are more popular than the “round” multiples of ten (Lotto: 13.2% vs. 12.7%; Casino: 2.68% vs. 2.48%). All these pairs of averages are significantly different (z-tests; all p < 0.001).
In other contexts, people tend to use round numbers more often than non-round numbers (Reference PlugPlug, 1977; Reference Klesges, Debon and RayKlesges, Debon & Ray, 1995; Reference Bopp and FaehBopp & Faeh, 2008; Reference Pope and SimonsohnPope & Simonsohn, 2011). One possible explanation for the difference is that lottery players may look for combinations that “look random”, and that non-round numbers appear more random than round numbers. Similarly, odd and prime numbers may appear more random than even and non-prime numbers, respectively.
4 Personally meaningful and situationally available numbers
People generally hold a favorable view towards the self (Reference Greenwald and BanajiGreenwald & Banaji, 1995). This favorable view tends to spill over to things associated with the self (Reference BegganBeggan, 1992; Reference Morewedge, Shu, Gilbert and WilsonMorewedge, Shu, Gilbert & Wilson, 2009; Nuttin, Reference Nuttin1985, Reference Nuttin1987). The resulting tendency of people to gravitate towards people, places, and things that resemble the self has been termed implicit egotism (Reference Pelham, Carvallo and JonesPelham, Carvallo & Jones, 2005). One example is the preference for the numbers in one’s own birthday (Kitayama & Karasawa, Reference Kitayama and Rarasawa1997; Reference Jones, Pelham, Mirenberg and HettsJones, Pelham, Mirenberg & Hetts, 2002). In line with this, virtually all past Lotto studies show that the numbers in the range of 1–31 (days), and in particular 1–12 (days and months) are more popular than other numbers.
With our individual-level Lotto data we can directly investigate whether players have a preference for playing with the numbers of their day, month, and year of birth. We can also test whether they favor two other kinds of personally meaningful numbers, namely the number corresponding to their age and the numbers in their postal code.
For year of birth we consider the last two digits. Players need to be born between 1901 and 1945 to be able to use their birth year, which was true for 7.9% of the 5.1 million combinations. Selecting age as a number is only possible for people under the age of 46, which was true for 42.3% of the combinations. Dutch postal codes are alphanumeric, consisting of a number between 1000 and 9999 and two letters. We consider the first two digits and the last two digits. Players could select these numbers in 60.7 and 59.2% of the cases, respectively.
Table 1, Panel A shows how frequently the personally meaningful numbers are chosen, conditional on the player being able to do so. Under the null hypothesis of random choice, numbers will be picked 13.3% of the time (6/45). This proportion is exceeded for all personally meaningful numbers (z-tests; all p < 0.001). Day of birth is the most popular one, followed by the year and month of birth, age, and the postal code numbers.
Notes: The number of combinations reflects how often players were able to choose the particular number. For 29,442 (18,758) combinations we have no birthdate (postal code) information. All frequencies are significantly higher than 13.33% at the 0.1% level.
Personally meaningful numbers may also be popular due to the mere fact that people are frequently exposed to them. Even a short exposure to a number can make that number more available in short-term memory and affect subsequent responses (Reference KubovyKubovy, 1977). In the context of the Lotto game, numbers that are especially available to players are the current date, the numbers in the date of the upcoming draw, and the numbers prominently displayed on the website. Also, when making an online transaction, Lotto displays both the current jackpot size and the remaining time before the next scheduled draw.
For the jackpot size (expressed in millions of Euros), we consider the popularity of both the integer and the decimal number, where the latter could only take a value of zero or five during our sample period. The time until the draw is shown in days and hours (before the final 24 hours) or in hours and minutes (during the final 24 hours), and we examine whether a number is chosen more frequently when it appears on the screen as one of these elements. Selecting the numbers in the current date or draw date was always possible, as was selecting the integer of the jackpot size (range: 7–36). The decimal number, and the first and second element of the remaining time could be chosen in 46.8, 96.3, and 86.3% of the cases, respectively.
Table 1, Panel B shows the raw frequencies for these available numbers. All percentages significantly exceed 13.3 by approximately one or two percentage points (z-tests; all p < 0.001).
The raw percentages are, however, biased by a general preference for small numbers that may result from a preference for other (unobserved) meaningful or available numbers, or from other mechanisms. To control for differences in base rates and to also disentangle the effects of the different meaningful and available numbers we perform a logit regression. The dependent variable is the player’s decision to choose (1) or not choose (0) a given number. Hence, each chosen combination generates 45 observations, one for each number (1–45) that could be selected. As explanatory variables we use dummy variables that take the value of 1 for the number that corresponds to the personally meaningful or situationally available number (and 0 otherwise).Footnote 5 To allow for differences in base rates we include number fixed effects. We follow the common approach of reporting average marginal effects, and correct the standard errors for clustering at the player-number level and the combination level (Reference Cameron, Gelbach and MillerCameron, Gelbach & Miller, 2011; Reference ThompsonThompson, 2011).
Table 2, Model 1 displays the average marginal effects (in percentage points). All personally meaningful numbers are significantly more likely to be selected. The marginal effect sizes of the day and year of birth are roughly equal: players are approximately 7 percentage points more likely to pick these numbers. The effects for month of birth and age are about half as strong. Postal code numbers are considerably less important, with marginal effect sizes of 0.30 and 0.24 percentage points for the first and last two digits, respectively. The effects of the current date, draw date, and jackpot size are also significant and comparable in size to those of the postal code. The second element of the remaining time has a small but significant effect, whereas the first element is insignificant.
** p <.001;
* p <.01;
† p <.05.
Table 2, Model 2 shows the logit regression results when observations are weighted by the reciprocal of a player’s total number of combinations. When all chosen combinations are weighted equally, as we have done so far, the results may be more representative for frequent players than for the cross-section of players. After weighting, the effects of birthdate, age, current date, draw date, and jackpot numbers are stronger. This implies that infrequent players make more use of these personally meaningful and situationally available numbers, possibly because they use the random number generator less frequently.
Similar patterns emerge in the casino data. The last two digits of the year of birth can be selected by people born between 1900 and 1936. Players in this category entered 4.2% of the combinations. Age can be only chosen by people under 37. This condition is met for 33.1% of all entries. Table 3, Panel A shows how often players pick these personal numbers. All frequencies significantly exceed 2.70% (z-tests; all p < 0.001).Footnote 6 The results are especially pronounced for the day of birth; players select this number approximately three times as often.
Notes: The number of observations reflects how often players were able to choose the particular number. All frequencies are significantly higher than 2.70% at the 0.1% level.
The situationally available numbers that we consider here are the day and month of play, and the numeric values that appear in a player’s voucher code. In 2013, the voucher code was composed of three sets of three symbols that could be either letters or numbers. We extract all numbers between 0 and 36 from each set. For example, from XVH-M51-36Z we extract 5, 1, 3, 6, and 36. On average there are 2.05 such numbers in a voucher code. In 2014, the voucher code was composed of letters alone. Table 3, Panel B shows that whenever players are able to pick a number from the date of play or from the voucher code, they do this significantly more often than 2.70% of the time (z-tests; all p < 0.001).
We perform similar regression analyses as we did for the Lotto data. We correct standard errors for clustering at the player-number level and the level of individual predictions.Footnote 7
Table 4, Model 1 displays the average marginal effects (in percentage points). The effects of the personally meaningful and situationally available numbers are all significant. Players are 4.7 and 3.3 percentage points more likely to pick their day and year of birth, respectively. Month of birth and age are somewhat less important, with effect sizes of 1.5 and 1.2 percentage points. The average marginal effects for the numbers from the current date are 0.15 percentage points, corresponding to roughly 5.6% of the probability under random selection. The numbers in the voucher codes also play a statistically significant role, but the effect size there is only 0.07 percentage point.
** p < .001.
Table 4, Model 2 shows the logit regression results when observations are weighted by the reciprocal of a player’s total number of entries. The effect sizes for birthdate numbers and age are stronger after weighting, suggesting that personally meaningful numbers are more popular among infrequent players. The effect sizes for current date and voucher code are hardly affected.
5 Spatial position
Players in the Lotto game select their numbers from a given 5 by 9 matrix (Figure 2). In the casino game the set of numbers are presented as on a roulette table, with the numbers 1 through 36 depicted in a 12 by 3 matrix and 0 on top (Figure 4). Multiple studies have shown that people have a tendency to select choice options presented in the middle of a display and avoid the edges. This behavior has been observed with laboratory and field data, for both individual choice and strategic interaction (Reference ChristenfeldChristenfeld, 1995; Reference Rubinstein, Tversky and HellerRubinstein, Tversky & Heller, 1997; Reference Shaw, Bergen, Brown and GallagherShaw, Bergen, Brown & Gallagher, 2000; Reference Attali and Bar-HillelAttali & Bar-Hillel, 2003; Reference Raghubir and ValenzuelaRaghubir & Valenzuela, 2006; Reference Chandon, Hutchinson, Bradlow and YoungChandon, Hutchinson, Bradlow & Young, 2009; Reference Atalay, Bodur and RasolofoarisonAtalay, Bodur & Rasolofoarison, 2012; Reference Valenzuela, Raghubir and MitakakisValenzuela, Raghubir & Mitakakis, 2013; Reference Bar-HillelBar-Hillel, 2015). Closely related to our analyses for lottery games, Bar-Hillel and Zultan (Reference Bar-Hillel and Zultan2012) examine the distribution of gamblers’ bets on a roulette table and observe that numbers in the center are more popular.
There are several ways to define the central part of the Lotto form. Figure 5 compares the raw frequencies for numbers in and out of the center for eight definitions. Under each definition, the difference is positive and statistically significant. In relative terms, numbers in the center are 5–13% more likely to be selected than numbers out of the center. The difference is largest if the center region is confined to the number 23 alone. This number in the exact center does not determine the effect in full, as positive and significant differences remain when we exclude it (z-tests; all p < 0.001).
Notes: Center definitions are indicated with bold rectangles. Differences are expressed in percentage points. Results after excluding number 23 (highlighted in grey) are within parentheses.
Figure 6 shows that the difference is also positive for all six possible definitions of the center region of the casino game (z-tests; all p < 0.001). In relative terms, numbers in the center are 22–40% more likely to be selected than numbers out of the center. As with Lotto, the center effect is strongest when the center is confined to the most centrally located number (17), but it is not solely driven by this single number.
Notes: Center definitions are indicated with bold rectangles. Differences are expressed in percentage points. Results after excluding number 17 (highlighted in grey) are within parentheses.
Weighting observations by the reciprocal of a player’s total number of entries amplifies the center effects in the Lotto game (Figure S3 in the Supplement). In the casino game, however, the results hardly change (Figure S4 in the Supplement). A possible explanation for this difference is that frequent Lotto players are more likely to use the random number generator than infrequent Lotto players. In the casino game there is no such number generator available.Footnote 8
6 Recent draws
Various lottery studies find that players tend to avoid numbers that were recently drawn (Reference Clotfelter and CookClotfelter & Cook, 1993; Reference TerrellTerrell, 1994; Reference DingDing, 2011; Reference Suetens and TyranSuetens & Tyran, 2012).Footnote 9 Suetens et al. (2015) document a similar response to the previous draw, but they also find that a number is popular if it appears in multiple recent draws.
The Lotto data comprises 175 draws. Figure 7A compares the average selection frequency of numbers that appeared in the previous draw with that of numbers that did not appear in the previous draw. This simple comparison shows that recent winning numbers are chosen less often than other numbers. Figure 7B displays the average selection frequency of a number conditional on whether it was drawn 0, 1, 2, 3, or 4 times in the preceding six draws. This figure suggests that numbers drawn only once over the past six draws are being avoided, while numbers drawn three or four times are relatively popular. The regression results in Table 2, Model 1 confirm these patterns. Note that the effect sizes are relatively small. This is not surprising because the numbers from previous draws are not readily available to players; players have to make a conscious effort to keep track of those numbers.
Notes: (A) displays the average selection frequency of a number that appeared in the previous draw and that of a number that did not appear in the previous draw. (B) displays the average selection frequency of a number conditional on whether it was drawn 0, 1, 2, 3, or 4 times in the preceding six draws. (C) and (D) display the results of similar analyses for the six jackpot colors.
Weighting observations by the reciprocal of a player’s total number of combinations changes the effect of the past draw from negative to positive, and amplifies the effects of frequently drawn numbers (Table 2, Model 2). These changes suggest that frequent and infrequent players respond differently to prior draw results. To investigate this in more detail, we perform separate regressions for players who participated only ten or fewer times throughout our sample period (Table 2, Model 3) and for players who participated a thousand times or more (Table 2, Model 4). The results show that infrequent players have a preference for “hot” numbers, whereas frequent players tend to avoid these.
These results can be related to a large literature showing that people have difficulties understanding randomness. In their early work, Reference Tversky and KahnemanTversky and Kahneman (1971) speak of a “belief in the law of small numbers” to describe the misconception that a short sequence of events generated by a random process will have characteristics that closely resemble those of the data generating process (DGP). This false belief leads to the gambler’s fallacy when people know the DGP and to the hot-hand fallacy when people do not know it (Reference Kahneman and TverskyKahneman & Tversky, 1972; Reference Tversky and KahnemanTversky & Kahneman, 1974; Reference RabinRabin, 2002). When people are asked to produce random sequences for a given DGP, they typically predict too many reversals (Reference O’NeillO’Neill, 1987; Reference Rapoport and BudescuRapoport & Budescu, 1992, 1997; Reference Bar-Hillel and WagenaarBar-Hillel & Wagenaar, 1991). When a random sequence is given for an unknown DGP, people tend to exaggerate the degree to which the DGP will resemble the given sequence of signals, leading to a belief in non-existent variation over time (Reference Gilovich, Vallone and TverskyGilovich, Vallone & Tversky, 1985; Reference CamererCamerer, 1989; Reference Tversky and GilovichTversky & Gilovich, 1989). The different behavior of frequent and infrequent Lotto players is in line with the different theoretical underpinnings of the two biases, assuming that frequent players are more familiar with the game and the underlying DGP than infrequent players.
Surprisingly, the results for the jackpot colors are different. Color choices are consistent with the gambler’s fallacy only. Figure 7C shows that the winning color in the previous draw is chosen less often than other colors. Figure 7D shows that the more frequently a color has been drawn in the last six draws, the less frequently players bet on that color.
7 Combinations
In the Lotto game there are 8,145,060 possible combinations of numbers that players can choose. Table 5 lists the thirty most frequently selected combinations, ranked by the number of players who selected them. If players were picking their 5,108,343 combinations at random, the likelihood of one or more combinations appearing more than ten times in our data would be 0.1%. The fact that many combinations appear hundreds of times can thus be seen as an extreme deviation from random choice.
Many of the thirty most popular combinations form a numeric sequence or spatial pattern. The majority are composed of a vertical or diagonal line of five numbers, plus a sixth number that connects with one of the endpoints or is located at one of the corners of the form (Figure S5 in the Supplement). Overall, 0.9% of the combinations in our sample can be classified as a diagonal or vertical pattern, which is a significantly greater portion than the 0.009% expected under randomness.
In the casino game, players can choose a number more than once, and the order of the chosen numbers matters. The total number of unique combinations thus equals 374=1,874,161. Table 6 shows the thirty most popular ones, ranked by the total number of times they appear in the data. If our total of 568,987 combinations would be picked completely at random, the likelihood of one or more combinations occurring more than ten times would be virtually zero. In sharp contrast, we observe that many combinations appear hundreds of times. Again, most of the popular combinations form a numeric sequence or spatial pattern. The exceptions in the top thirty represent neighboring numbers on the roulette wheel. Note that the numbers in all thirty combinations are in ascending order. This turns out to reflect a general phenomenon: 33.6% (33.1%) of all combinations are entered in ascending (strictly ascending) order, while only 4.88% (3.52%) would be expected to have that property under randomness.
Henze (1997) similarly reports that many of the most popular Lotto combinations represent a numeric sequence. In line with the many occurrences of spatial patterns that we observe, Reference Falk, Falk and AytonFalk, Falk & Ayton (2009) find that aesthetics play an important role in the choices of laboratory subjects.
8 Spacing
Reference Boland and PawitanBoland and Pawitan (1999) find that the students in their classroom experiment tended to spread out their selections when asked to randomly generate a Lotto draw. Reference Lien, Yuan and ZhengLien and Yuan (2015) find similar results in data from a Chinese six-number lottery. These results may reflect a form of representativeness bias (Reference Tversky and KahnemanTversky & Kahneman, 1971): if people believe that six draws from a uniform distribution should closely resemble the uniform distribution, they will expect the six numbers to be evenly spread across the possible range and deem clusters unlikely.
To investigate the degree to which Lotto players spread their numbers across the possible range, we compute the five spaces between the six (ordered) numbers for each combination. Next, we compare the empirical distribution of these spaces with the distribution that can be expected under random number choice.Footnote 10 If people indeed have a tendency to evenly spread their numbers, small and large spaces will be underrepresented.
The bars in Figure 8A reflect the absolute differences between the empirical and theoretical frequencies. In line with a tendency to spread numbers evenly, we observe more medium-sized spaces and fewer small and large spaces than expected by chance. Figure 8B displays the differences as a percentage of the theoretical frequencies (with the vertical axis truncated at 70%). These relative differences follow a similar pattern but are more pronounced for larger spaces due to their smaller theoretical likelihood. Extremely large spaces are highly unlikely in theory, but relatively popular among the players in our sample.
Henze’s (1997) analyses of the most popular combinations in a German number lottery also point out that spacing patterns are not in accordance with randomness, but he cites this as evidence for the popularity of numeric sequences. Indeed, the abnormal spacing patterns that we find in our data could result from a preference for specific numeric sequences or spatial patterns. To rule out that the patterns are caused by specific, popular combinations, we redo the analysis after excluding combinations that occur more than once in our data. The lines in Figure 8 reflect the absolute and relative differences between the empirical and theoretical distribution for the unique combinations only. Albeit somewhat weaker, the resulting patterns have a similar shape.
In the casino game, the three distances between the four numbers can be positive, negative, and zero. Because of the tendency to pick numbers in ascending order, positive spaces are strongly overrepresented (Figure S8 in the Supplement). To analyze spacing effects in isolation from ordering effects, we therefore measure the three distances in each combination after sorting the numbers in ascending order.Footnote 11
Figure 9 shows the absolute and relative differences between the empirical and theoretical frequencies after sorting. In line with a tendency to spread numbers evenly, and similar to what we found for Lotto, medium-sized spaces are overrepresented. Similar patterns emerge when we reduce the samples to unique combinations only, indicating that the abnormal spacing patterns do not result from specific, popular combinations alone.
Weighting observations by the reciprocal of a player’s total number of entries amplifies the spacing effects in the Lotto game (Figure S10 in the Supplement), but leaves the casino results virtually unaffected (Figure S11 in the Supplement). This again suggests that frequent Lotto players are more likely to use the random number generator than occasional players.
9 Summary and concluding remarks
We have documented a variety of empirical patterns in number choices in lottery games, using data sets that together comprise a total of approximately 33 million selected numbers. The patterns in the two different lottery games are qualitatively very similar. In a quantitative sense the effects are somewhat more pronounced in the casino game than in the Lotto game. This difference can probably be ascribed to the availability of default, computer-generated sets of numbers in the Lotto game, as there is strong evidence that people tend to stick with defaults (Reference Camerer, Issacharoff, Loewenstein, O’Donoghue and RabinCamerer et al., 2003).
In line with earlier findings in the literature, the number 7 is highly popular in both games. Other numbers that consistently rank among the favorites include 3, 5, 8, and 11. More generally, numbers from the lower end of the possible ranges are more popular than numbers from the higher end. Also, in both games players prefer odd numbers over even numbers, prime numbers over non-prime numbers, and non-round numbers over round numbers.
Reinforcing earlier findings in different contexts, players are attracted towards numbers in the center of the choice form. Within each game, the relative location of the numbers on the entry screen is fixed, but between the two games the ordering is different. Regardless of the exact definitions of the center, numbers in the middle are more popular than numbers on the edges.
Using the data we have about individual players’ birthdates and postal codes, we find that players like to pick numbers that have a special meaning to them. Similarly, our analyses with data on dates of play, dates of draw, numbers on entry screens, and numbers in entry codes confirm that players more frequently choose numbers that are situationally available.
Our analyses of the combinations of numbers yield evidence that players care about aesthetics. Combinations that form a numeric sequence or spatial pattern are extremely popular, despite the fact that the parimutuel aspect of both lottery games creates an incentive to strategically attempt to select unique combinations. This suggests that many players do not see or understand the strategic aspect, or that the joy of playing with aesthetically pleasing combinations more than offsets the negative effect on expected payoff (Reference Goodman and IrwinGoodman & Irwin, 2006).
Last, we find that frequent players avoid numbers that appeared in the latest draws, that infrequent players chase these numbers, and that both spread their numbers relatively evenly across the possible range. These results may reflect that players misjudge the likelihood of winning with these numbers or combinations, and fit into a large body of literature that shows that people have difficulties understanding randomness. Moreover, the different responses of frequent and infrequent players to prior draw results accord with a literature arguing that knowing the data generating process leads to a gambler’s-fallacy type of behavior and not knowing it leads to a hot-hand type of behavior.