1 Introduction
Game-theoretic models are typically motivated by the idea that players reason about the behavior of others and choose their strategies accordingly. This reasoning can be informed directly by observing the payoff structure of the game or indirectly by observing and learning from the actions of other players. If information about another player's payoffs plays a pivotal role in affecting an individual's action choices, then varying the availability of this information may result in insightful differences in play. This paper addresses the following question: how does the common knowledge of all players’ payoffs, relative to only knowing one's private payoffs (and common knowledge thereof), impact play in strategic interactions? For brevity, we henceforth refer to this as the effect of mutual payoff information.
On the one hand, mutual payoff information may greatly impact individuals’ action choices. The type of introspective reasoning supported by directly observing others’ payoffs is often embedded in models of strategic decision making, such as higher-level reasoning in level-k models (Stahl and Wilson, Reference Stahl and Wilson1994; Nagel, Reference Nagel1995), which assume that players know the whole payoff matrix. Additionally, experiments using eye tracking have found that subjects devote a sizable amount of attention to the payoffs of other players (Knoepfle et al., Reference Knoepfle, Wang and Camerer2009; Polonio and Coricelli, Reference Polonio and Coricelli2019), and it has been documented that subjects engage in higher-level reasoning when other players’ payoffs can be observed (e.g., Kneeland, Reference Kneeland2015). Thus, varying the availability of mutual payoff information may result in vastly different action choices among players.
On the other hand, the absence of mutual payoff information may have no impact on individuals’ choices. In providing an interpretation for his seminal equilibrium concept, Nash (Reference Nash1950) makes it explicit that, “it is unnecessary to assume that the participants have full knowledge of the total structure of the game, or the ability and inclination to go through any complex reasoning processes.” Similarly, theoretical models of learning explore how equilibria can be reached and selected through processes of learning, adaptation, and/or imitation rather than introspection (Fudenberg and Levine, Reference Fudenberg and Levine2009), and uncoupled learning models (e.g., Hart and Mas-Colell, Reference Hart and Mas-Colell2006; Foster and Young, Reference Foster and Young2006; Young, Reference Young2009; Babichenko, Reference Babichenko2010) describe how equilibria can be reached in the absence of information about other players’ incentives or even their existence. Thus, the degree of payoff information available to subjects may cause no change in play.
We present the first experiment designed to study how mutual payoff information affects play in canonical two-by-two games. Subjects play one-shot stage games repeatedly with randomly re-matched opponents each round.Footnote 1 In our partial-information treatment, subjects observe their own payoffs and the action of their opponent after each round, but never observe the other's payoffs. Comparing this partial-information version to the full-information baseline treatment in which subjects observe the whole payoff matrix (in addition to actions) allows us to detect differences in play that arise due to the presence of mutual payoff information.
We explore play using the Prisoner's Dilemma (PD) and the Stag Hunt (SH). Mutual payoff information can reveal opportunities to coordinate on socially optimal outcomes; however, being aware of an opportunity to cooperate can increase the tension of a game if the cooperative outcomes are associated with actions that are dominated for at least one player. The appeal of contrasting the PD with the SH in our experiment is grounded in our conjecture that mutual payoff information affects behavior differently in these two games. The SH exhibits a tension between a mutually desirable outcome and avoiding personal risk. Knowledge of the other's payoffs arguably reduces the tension of the game by revealing a mutually beneficial outcome. The PD, on the other hand, exhibits a tension between a socially optimal outcome and personal gain. There is little reason not to choose the payoff-dominant action in the absence of mutual payoff information; however, introducing this information arguably increases the tension by making players aware that a socially optimal outcome can be reached at personal expense.
To our knowledge, this is the first experiment that employs these information treatments and matching protocol to the PD and the SH game. In Feltovich and Oda (Reference Feltovich and Oda2014), subjects play partial-information versions of the SH and PD as well as four other games, but no full-information treatments are run for comparison. The latter is essential for studying the effect of mutual payoff information. A detailed review of related experimental studies can be found in Appendix A.
We present three novel insights. First, the fraction of subjects who initially cooperate in the PD or who coordinate on the payoff-dominant equilibrium in the SH is substantially higher under full-information than under partial-information.Footnote 2 Second, to our knowledge we present the first evidence that mutual payoff information can affect equilibrium selection in the SH throughout all rounds: The vast majority of subjects choose the action consistent with the payoff-dominant equilibrium of the SH in the full-information treatment, while choosing the risk-dominant action under partial-information.Footnote 3 Third, we find that play in the PD converges toward the unique NE of the game under both information treatments. Even in the absence of mutual payoff information, most subjects eventually choose actions that correspond to Nash equilibria in both games. Taken together, the effect of mutual payoff information on play is strong in both games.
To investigate whether the information treatment effect operates through initial play, learning, or both, we estimate a special case of an experience-weighted attraction (EWA) model (Camerer and Ho, Reference Camerer and Ho1999). We find significant differences not only in the estimates of the initial attractions for each action, but also in the parameters pertaining to the ongoing learning process. Simulations based on these estimates suggest that the treatment effects in both games are driven not only by how subjects perceive the game initially but also by ongoing learning.
2 Experimental design
Overview. Subjects played one-shot stage games of the SH and PD repeatedly in randomly re-matched pairs with two information treatments per game. In the “Full” information treatment, subjects were shown the complete payoff matrix, while in the “Partial” information treatment they were shown only their own payoffs. Players made simultaneous choices and were notified of their opponent's action and their own resulting payoff at the end of each round.
Games, information treatments, and matching protocol. Fig. 1 shows payoff matrices and the available payoff information for each game and treatment. The SH has two pure-strategy equilibria; one is payoff dominant (X, X) and one is risk dominant (Y, Y). The PD has one strictly dominant action and equilibrium, (Y, Y). Treatments have the same payoffs (though they are partially hidden in partial), thus keeping equilibria and best-response correspondences constant.Footnote 4 Appendix figures D1 and D2 show screenshots of the interface.
Subjects were randomly and anonymously re-matched with other subjects each round.Footnote 5 The information treatment was common knowledge and the same for all subjects within a session. That is, in the Full treatment, it was common knowledge that subjects were being re-matched with other subjects who could also observe the whole payoff matrix. Similarly, in the Partial treatment, it was common knowledge that subjects were being re-matched with other subjects who could only observe their own payoffs.
We employed a two-population matching mechanism to ensure that subjects in the Partial treatment could not infer the full symmetric payoff structure. At the beginning of a treatment, subjects were randomly assigned to one of two groups (labeled A and B) and all subjects within a group had the the same payoff structure. Throughout the treatment, subjects were exclusively matched with subjects of the opposite group. This procedure was announced at the start of each session, so while subjects could not infer their opponents’ payoffs by observing their own, they were aware that their opponents would always have the same payoff structure. This two-population matching mechanism was used in both information treatments for consistency.
Each experimental session consisted of two blocks of 40 rounds each, one with a Full treatment and the other with a Partial treatment, for a total of 80 rounds of play. Having multiple rounds allowed subjects to learn about the game. Table 1 provides an overview of how treatments were allocated across sessions. Subjects never played a Full before a Partial treatment of the same game to avoid inference that the payoffs in the second game were the same as in the first game. Table 2 describes the between- and within-subjects analyses, which we use to test for order effects.
Implementation. Instructions and comprehension questions are provided in the Appendices. Instructions were handed out and read aloud before each 40-round block. Subjects had to correctly answer a comprehension quiz before participating.Footnote 6 We programmed the interface using Z-Tree (Fischbacher, Reference Fischbacher2007), conducted sessions in April and September 2018 at the Experimental and Behavioral Economics Laboratory (EBEL) at UCSB, and recruited 194 subjects through ORSEE (Greiner, Reference Greiner2015). Subjects had a median age of 20, and 16% of them indicated Economics as their major or intended major. Sessions lasted 45–55 min. Subjects received payoffs from a randomly selected round, plus a $7.00 show-up fee. The average total payment was $13.22 (min. $8.00, max. $20.00).
3 Results
We pool both the within- and between-subjects data to investigate the main results. Results in Appendix Tables D1 and D2 are qualitatively similar using alternative samples. Additionally, we rule out large differences in play due to order effects in Appendix Figure D3 and Tables D3 and D4.
Our main interest is in examining the impact of the information treatment on choosing action X, which is associated with the socially optimal outcome in both games. Panels (a) and (b) of Fig. 2 illustrate the average rate of choosing X for each game and information treatment. In the SH, there is a significant difference in play between treatments throughout all rounds. In the PD, there is initially a substantial difference in the rate of choosing action X, which diminishes towards the final rounds. To investigate the treatment effect more formally, we estimate regressions of the following form, separately for the SH and the PD:
where is a binary indicator for subject i choosing action X in round t of session s. The vector is a set of dummy variables that flexibly controls for session size.Footnote 7 The variable Partial equals one if the action choice is made under the Partial treatment and zero otherwise. Thus, the estimated coefficient can be interpreted as the percentage point difference in the probability of choosing X under the Partial treatment compared to the Full treatment. Table 3 presents the results of estimating equation 1 using ordinary least squares and standard errors clustered at the subject-session level.Footnote 8
Initial play Mutual payoff information has a large effect on initial play in both games. In SH-Full, 86.5% choose X in the first round, compared to 32.0% in SH-Partial. For the PD, the corresponding rates are 64.3% and 17.0%. The same qualitative pattern emerges in the early rounds of the games when estimating regression results. Column (2) of Table 3 indicates that in SH-Full, subjects choose action X about 66.9 percentage points (pp) (88.1%) more often than in SH-Partial, and 30.4pp (70.2%) more often in PD-Full than in PD-Partial.
Result 1
In both games, a substantially higher proportion of subjects initially choose X (the action supporting socially optimal outcomes) in the Full than in the Partial treatment.
Equilibrium selection and convergence. Next, we analyze how play evolves across the 40 rounds of a game. In the SH, the initial effect is remarkably persistent: Across all rounds, subjects are 67.6pp ( ) more likely to choose action X in SH-Full than in SH-Partial, as column (1) in panel (a) of Table 3 indicates.
These results directly impact equilibrium selection and efficiency in the SH. Panels (c) and (d) of Fig. 2 show that subjects tend to reach the risk-dominant equilibrium in SH-Partial and the payoff-dominant equilibrium in SH-Full.Footnote 9 We estimate Eq. 1 using a binary indicator for reaching a pure strategy Nash equilibrium as the outcome, which for the SH is (X, X) or (Y, Y), and report the results in Table 4.Footnote 10 Column (1) of panel (a) indicates that a pure Nash equilibrium is reached about 80% of the time in SH-Full, and 10pp less often in SH-Partial. Consequently, outcomes are more efficient with mutual payoff information in the SH, as the payoff-dominant equilibrium is more efficient than the risk-dominant one. Appendix Table D7 presents estimates of Eq. 1 when using the efficiency ratio as the outcome.Footnote 11 In SH-Partial, the efficiency ratio is on average 0.40pp ( ) lower than in SH-Full (see column (1) of panel (a)). In sum, while equilibria tend to be achieved under both treatments, mutual payoff information crucially affects which equilibrium arises. This novel insight contributes to the literature on equilibrium selection in the SH, see Appendix A.
Result 2
Throughout all rounds of play, most subjects select action X (corresponding to the payoff-dominant equilibrium) in SH-Full and action Y (corresponding to the risk-dominant equilibrium) in SH-Partial.
In the PD, on the other hand, play converges toward the unique Nash equilibrium of the game under both information treatments, as panel (b) of Fig. 2 shows. We define convergence as the round of play where, on average across all sessions, at least 80% of subjects consistently choose the deviating action Y for the remaining rounds of the game. This occurs at round 3 in PD-Partial and at round 24 in PD-Full. Panel (b) of Table 3 shows that the treatment effect diminished greatly across rounds of play. Since more subjects tend to reach the defecting equilibrium in both treatments (see panel (b) of Table 4), the gap in efficiency ratios between treatments also becomes smaller over time (see panel (b) of Appendix Table D7).
Result 3
In the PD, play in both treatments converges toward the unique Nash equilibrium of the game.
4 Initial play versus learning: model and simulations
With the aim of understanding if the effect of mutual payoff information operates through initial play, learning, or both, we estimate a weighted fictitious play model of belief learning, a special case of the EWA learning model (Camerer and Ho, Reference Camerer and Ho1999). We discuss our model choice in Appendix B.1.
During each round, players choose X or Y based on their attractions (expected payoffs conditional on beliefs), which depend on past observations and a prior attraction. We assume all subjects use the same learning and decision-making mechanism, but may choose different actions due to different observed histories of play. Player i's probability of choosing action in the next round is
where the sensitivity parameter ranges from 0 (a uniform random choice) to (always choosing the action with the highest attraction). The attraction of action j at the end of round t is defined as
where and are player i and their opponent's chosen actions in t. Player i's realized payoff is and their hypothetical payoff (had they chosen j in t) is . Beliefs are weighted averages of the observed history of play and initial attractions – subjects’ expected payoffs from either action before the first round, given their beliefs about their opponent's action, conditional on their own actions.Footnote 12 Beliefs are defined over two states: opponent playing X given oneself playing X, and opponent playing X given oneself playing Y.Footnote 13 The weighting decay parameter captures how much weight is put on observations of previous rounds, relative to the most recent round, and ranges from 0 (only previous action is weighted) to (only initial attractions are weighted).Footnote 14
Using maximum-likelihood techniques, we estimate four parameters for each game and information treatment: , , , and . Table 5 and Appendix Tables D8 and D9 present the results, and demonstrate significant treatment effects on the value of initial attractions and , and on parameter .Footnote 15 As affects both initial and ongoing play, these treatment effects are consistent with a hypothesis that both initial play and ongoing learning are affected by the presence of mutual payoff information.
To examine if the model fits the data well, we conduct simulations based on parameter estimates. Appendix Figure D4 and Appendix Table D10 indicate very similar simulated and observed mean action rates. See Appendix B.2 for details on estimation and simulation techniques.
Initial play is captured by the initial attractions and and by , while learning is captured by , , as well as history-dependent attractions and . To investigate if the treatment effect operates through initial play or learning, we swap parameter values between treatments of the same game. For example, to study if the treatment affects initial play in SH-Full, we simulate behavior using the SH-Full parameters, except we use the initial attraction parameters from SH-Partial. This helps understand the economic meaningfulness beyond hypothesizing the direction of changes from parameter estimates.
Appendix Figure D5 compares model simulations with estimated parameters (solid lines) and swapped parameters (dotted lines). As shown in panels (a) and (b), swapping initial attraction estimates dramatically affects simulated data in the SH, while differences in the PD disappear by round 15. The fraction playing X changes in opposite directions across information treatments, consistent with the estimates of in Appendix Table D8. Swapping results in a lower fraction of playing X in both games, as shown in panels (c) and (d) of Appendix Figure D5. In PD-Full, the attraction is initially higher for X but then shifts to Y, explaining why the simulation with swapped estimates initially lies above the original but then falls below it, as shown in panel (f). Swapping has little effect except in SH-Full, as is seen in panel (e); here small changes greatly affect coordination on the payoff-dominant Nash equilibrium. Lower values cause rapid depreciation of weights on earlier round observations, increasing sensitivity to behavior volatility. Taken together, swapping parameter estimates highlights their importance in explaining the treatment effect, except for in the PD.
Result 4
Learning model parameter estimates and simulations suggest that the information treatment effect can be attributed to differences in both initial play and learning in both games.
5 Discussion
We experimentally vary subjects’ access to opponents’ payoffs to investigate the effect of mutual payoff information on strategic play and find multiple statistically and economically significant results. We see a variety of opportunities for further research. Our results highlight the effects for a very limited set of environments, and it is not clear whether similar effects would be seen in different games or with different payoff structures. Note that equilibrium selection in games similar to SH-Full has been found to depend on the payoffs chosen (e.g.,Battalio et al.,Reference Battalio, Samuelson and Van Huyck2001). Additionally, there is room for investigating the underlying causes of our treatment effect.Footnote 16 For example, while common knowledge of the payoff structure allows for players to consider opponents’ payoffs when engaging in introspective reasoning, it also enables players to include consideration of opponents’ payoffs as part of their own preferences. Our experiment was not designed to isolate a specific cause for the effects of mutual payoff information but rather document the overall trends that exist in our games. Future studies could tackle both these limitation by examining whether such effects persist across other environments and detailing additional nuances and drivers for the effects of mutual payoff information on players’ decision making.
Subplots (a) and (b) show the share of subjects playing action X by game and treatment. Faded lines represent the mean rate by round for each session separately. Action X is associated with the socially optimal outcome in both games (payoff-dominant equilibrium in SH and cooperation in PD). Subplots (c) and (d) show the proportion of subject pairs that played an equilibrium for each round in the SH. Not all subject pairs played a Nash equilibrium and the sum of the blue (dotted red) lines add up to less than one. Averaged across all rounds, subject pairs failed to reach equilibria in 19% (25%) of plays in the full (partial).
Sessions |
Part 1 |
Part 2 |
# Subjects per session |
---|---|---|---|
1–3 |
SH - Partial |
SH - Full |
16, 16, 20 |
4–6 |
PD - Partial |
PD - Full |
16, 16, 18 |
7–9 |
SH - Full |
PD - Partial |
16, 14, 14 |
10–12 |
PD - Full |
SH - Partial |
16, 18, 14 |
Note: in each of the two parts, 40 rounds of a game were played
Analysis |
Game |
Data |
---|---|---|
Between subjects |
SH |
First part sessions 1–3, first part sessions 7–9 |
Between subjects |
PD |
First part sessions 4–6, first part sessions 10–12 |
Within subjects |
SH |
Sessions 1–3 (first and second part) |
Within subjects |
PD |
Sessions 4–6 (first and second part) |
(1) |
(2) |
(3) |
(4) |
(5) |
|
---|---|---|---|---|---|
Overall |
1–10 |
11–20 |
21–30 |
31–40 |
|
a) Stag Hunt |
|||||
Partial-information |
0.676 |
0.669 |
0.683 |
0.677 |
0.673 |
(0.034) |
(0.030) |
(0.041) |
(0.047) |
(0.040) |
|
Cluster p-value |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
Full-information mean |
0.844 |
0.881 |
0.836 |
0.822 |
0.838 |
Number of clusters |
144 |
144 |
144 |
144 |
144 |
N |
7840 |
1960 |
1960 |
1960 |
1960 |
b) Prisoner's Dilemma |
|||||
Partial-information |
0.173 |
0.304 |
0.147 |
0.156 |
0.086 |
(0.022) |
(0.037) |
(0.028) |
(0.031) |
(0.026) |
|
Cluster p-value |
0.000 |
0.000 |
0.000 |
0.000 |
0.001 |
Full-information mean |
0.232 |
0.433 |
0.205 |
0.176 |
0.113 |
Number of clusters |
142 |
142 |
142 |
142 |
142 |
N |
7680 |
1920 |
1920 |
1920 |
1920 |
Note: The sample uses the pooled data. Action X is associated with the socially optimal outcome in both games. The regressions include controls for session size. Standard errors presented in parentheses are calculated using the cluster-robust method allowing for correlation between observations within a cluster. Clustering is at the session-subject level. Cluster p value indicates the p value from a two-sided t test of the null hypothesis that the treatment effect is zero using the cluster-robust standard error
(1) |
(2) |
(3) |
(4) |
(5) |
|
---|---|---|---|---|---|
Overall |
1–10 |
11–20 |
21–30 |
31–40 |
|
(a) Stag Hunt |
|||||
Partial-information |
0.100 |
0.174 |
0.050 |
0.076 |
0.099 |
(0.028) |
(0.036) |
(0.036) |
(0.033) |
(0.033) |
|
Cluster p-value |
0.001 |
0.000 |
0.167 |
0.023 |
0.004 |
Full-information mean |
0.811 |
0.808 |
0.798 |
0.831 |
0.808 |
Number of clusters |
144 |
144 |
144 |
144 |
144 |
N |
7840 |
1960 |
1960 |
1960 |
1960 |
(b) Prisoner's Dilemma |
|||||
Partial-information |
0.266 |
0.405 |
0.241 |
0.258 |
0.158 |
(0.019) |
(0.032) |
(0.030) |
(0.033) |
(0.027) |
|
Cluster p-value |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
Full-information mean |
0.618 |
0.347 |
0.643 |
0.696 |
0.788 |
Number of clusters |
142 |
142 |
142 |
142 |
142 |
N |
7680 |
1920 |
1920 |
1920 |
1920 |
Note: The sample uses the pooled data. The regressions include controls for session size. Standard errors presented in parentheses are calculated using the cluster-robust method allowing for correlation between observations within a cluster. Clustering is at the session-subject level. Cluster p value indicates the p value from a two-sided t test of the null hypothesis that the treatment effect is zero using the cluster-robust standard error
Parameter |
SH−Full |
SH−Partial |
PD−Full |
PD−Partial |
---|---|---|---|---|
|
1.5831 |
0.6516 |
0.4297 |
0.7426 |
|
|
|
|
|
|
0.9649 |
0.8290 |
0.8011 |
0.8769 |
|
|
|
|
|
|
8.5765 |
5.9218 |
9.4480 |
7.1258 |
|
|
|
|
|
|
6.4400 |
7.0199 |
6.4995 |
8.5930 |
|
|
|
|
|
|
|
|
|
|
n |
3840 |
4000 |
3920 |
3760 |
|
0.9672 |
0.3284 |
0.7802 |
0.2517 |
(learning model) |
|
|
|
|
|
0.8646 |
0.3200 |
0.6429 |
0.1702 |
(binomial model) |
|
|
|
|
Note: Results of tests of significance of the information treatment effect on parameters estimates for and in SH and PD are reported in Appendix Tables D8 and D9, respectively. Differences in values for are significant for both SH and PD ( ), while difference in values of are not significant for PD, and are only weakly significant for SH. For estimates of we employ Agresti–Coull binomial confidence intervals (Agresti and Coull, Reference Agresti and Coull1998; Brown et al., Reference Brown, Cai and DasGupta2001)
Funding
University of California, Santa Barbara.