Large-Scale Evidence for the Effectiveness of Partisan GOTV Robo Calls

Daniel T. Kling; Thomas Stratmann

doi:10.1017/XPS.2022.16

Large-Scale Evidence for the Effectiveness of Partisan GOTV Robo Calls

Published online by Cambridge University Press: 18 August 2022

Daniel T. Kling

and

Thomas Stratmann

Show author details

Daniel T. Kling*: Affiliation:
Department of Business, Belmont Abbey College, Belmont, NC, USA
Thomas Stratmann: Affiliation:
Department of Economics, George Mason University, Fairfax, VA, USA
*: *Correspondence: Email: dan.kling@gmail.com

Article contents

Abstract
Introduction
Procedures
Results
Conclusion
Supplementary material
Data availability
Conflicts of interest
Ethics statement
Footnotes
References

Rights & Permissions

Abstract

We document the effectiveness of automated (robo) calls for increasing voter participation in contrast to most published research which finds little or no effect from automated calls. We establish this finding in a large field experiment which mimics campaign behavior with a targeted, partisan get-out-the-vote campaign. Our findings show that across all treatments, automated calls led to three additional votes for every thousand subjects called during the 2014 midterm general election. Additionally, our experimental design allows for testing how the number of calls in a treatment, that is dosage, affects voter turnout. Here, results show that three extra calls increase the treatment effect to seven additional votes per thousand subjects called, but that too many additional calls decrease that effect to statistical insignificance in a six-call treatment.

Keywords

elections voter turnout robo calls field experiment GOTV

Type: Research Article
Information: Journal of Experimental Political Science , Volume 10 , Issue 2 , Summer 2023 , pp. 188 - 200

DOI: https://doi.org/10.1017/XPS.2022.16 [Opens in a new window]
Open Practices: Open data Open materials
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of The Experimental Research Section of the American Political Science Association

Introduction

We conducted a large-scale experiment to assess the effectiveness of partisan robo calls.Footnote ¹ We find a positive, significant treatment effect of increasing voter turnout. Our experiment has several key features that stand out in the get-out-the-vote (GOTV) literature. First, our experiment varies the intensity of the treatment dosage by using a different number of calls (one, three, or six) for different treatment groups. We found the largest treatment effects for subjects scheduled to receive three calls. Second, a Republican-leaning media firm administered the experiment and targeted likely Republican voters. In contrast, many similar published experiments are nonpartisan, Democratic, or otherwise left-leaning. Third, the subject target of our experiment was potentially marginal voters. Specifically, people who had voted in two of the previous four elections. These three key features may explain why this experiment shows that robo calls have a positive, significant treatment effect on voter participation, while many other robo call experiments do not. In this section, we will discuss important results from the robo call GOTV literature and highlight several important features of the experiment and how they relate to previous GOTV studies.

Previous research shows that the impact of robo calls on voter turnout is small and not statistically significant. This is summarized in a meta-study by Green, McGrath, and Aronow (Reference Green, McGrath and Aronow2013). They report a statistically insignificant increase in voter turnout of 0.156 percentage points, attributable to prerecorded phone messages. Another meta-analysis reports an “average intent-to-treat effect of 0.113 percentage point, with a 95% confidence interval ranging from −0.336 to 0.563” (Green and Gerber Reference Green and Gerber2015, p. 196), also implying a statistically insignificant effect of robo calls on voter participation.Footnote ²

One study that did find a significant positive treatment effect of robo calls is Gerber et al. (Reference Gerber, Green, Kaplan and Kern2010), which used a placebo design to increase the precision of the estimate of its treatment effect.Footnote ³ Studying the 2008 primary election, Gerber et al. (Reference Gerber, Green, Kaplan and Kern2010) find an approximately two percentage point increase in voter participation for subjects who were successfully contacted, relative to subjects in a placebo group who were successfully contacted. The authors attribute the large size of their reported effect to call content, or “social pressure” messaging, which reminded subjects that they had voted in the past two general elections but not in the past primary, and informed subjects that their voting records are publicly available. However, when attempting to replicate their results for the November 2008 general election in a follow-up study, the authors find “a weakly positive effect in the November 2008 general election, in keeping with the usual pattern of weaker turnout effects in high-salience elections” (Green and Gerber Reference Green and Gerber2019, 89). Ali and Lin (Reference Ali and Lin2013) develop a model of voter turnout based on similar social dynamics. DellaVigna et al. (2014) present evidence from a field experiment that “individuals vote because they expect to be asked” about it and find lying costly. That research suggests that understanding the dynamics of social pressure messaging amidst an evolving campaign landscape is a fruitful research agenda.

Our experiment does not use social pressure messaging but does find a statistically significant positive effect for robo calls placed before the November 2014 general election. Subjects in the combined treatment group were 0.318 percentage points more likely to vote than subjects in the control group – or a rate of 1 mobilized voter for every 314 subjects called. The treatment with three calls was the most effective. Subjects in the three-call treatment were 0.695 percentage points more likely to vote than subjects in the control group – or a rate of 1 mobilized voter for every 144 subjects called.

We posit that robo calls are most effective when they are received by voters who are on the margin as to whether to participate or not to participate in voting. That is, a GOTV treatment will be less effective if it is primarily targeted to individuals who are habitual voters or individuals who never vote. Most of the individuals with a high propensity to vote will vote regardless of the treatment. Similarly, individuals with a very low propensity to vote are unlikely to be mobilized by a robo call. However, an individual’s propensity to vote is context-dependent and can change based on the salience of a given election to that potential voter (Arceneaux and Nickerson Reference Arceneaux and Nickerson2009). Moreover, that an individual’s propensity to vote is context dependent is consistent with an array of rational choice models of voting (Ali and Lin Reference Ali and Lin2013; Coate and Conlin Reference Coate and Conlin2004; Degan and Merlo Reference Degan and Merlo2011; Dhillon and Peralta Reference Dhillon and Peralta2002; Feddersen and Sandroni Reference Feddersen and Sandroni2006). Thus, along several dimensions, such as the choice of states without multiple high-profile statewide races and the subject selection criteria, the experiment was designed to include a higher proportion of subjects who were likely to be marginal voters.

We built a sample of potentially marginal voters by focusing on registered voters who had voted exactly twice, during the 2010 and 2012 primary and general elections. Thus, these voters missed voting in two of the four elections. Other GOTV studies use alternative criteria for subject selection. For example, Ramirez (Reference Ramirez2005, p. 70) selects subjects from a list of “registered Latino voters in low-propensity precincts” Other studies select subjects that are expected to be receptive to the treatment message, which is in a similar spirit to our experimental design. For example, Shaw et al. (Reference Shaw, Green, Gimpel and Gerber2012, p. 236), whose calls include an endorsement by Texas Governor Rick Perry, direct their calls to subjects that are “both likely primary voters and strong Perry supporters.” Gerber et al. (Reference Gerber, Green, Kaplan and Kern2010) select only subjects who voted in the past two general elections but did not vote in the most recent primary.

As noted above, our experiment is targeted at Republican voters, while many other robo call and other GOTV experiments target Democratic or nonpartisan voters. Examples of such experiments include, e.g., Nickerson and Rogers (Reference Nickerson and Rogers2010), Barton, Castillo, and Petrie (Reference Barton, Castillo and Petrie2016), and Rogers, Green, Ternovski, and Young (Reference Rogers, Green, Ternovski and Young2017) which are all targeted to Democrats (with messages of varying degrees of partisanship) while Ramirez (Reference Ramirez2005), Nickerson (Reference Nickerson2008), Gerber, Green, Kaplan, and Kern (Reference Gerber, Green, Kaplan and Kern2010), and Panagopoulos (Reference Panagopoulos2011) had nonpartisan targeting. Among studies of robo calls, Shaw, Green, Gimpel, and Gerber (Reference Shaw, Green, Gimpel and Gerber2012) is an exception; it targeted voters in a Republican primary.

It is possible that differences between partisan groups or differences in messages could cause particular GOTV methods, such as robo calls, to be more effective for one voter group than another. This effect could possibly be mediated by voters’ age or other voter characteristics. We do not have prior expectations that robo calls are more (or less) effective when targeting Republican voters as opposed to Democratic or nonpartisan voters, especially after controlling for individual characteristics. However, we believe that it is important to close the empirical gap in studies of the effectiveness of robo calls with experiments that have Republicans in general elections as their main subject pool. This approach will give insights into whether findings from experiments targeting Democratic and nonpartisan voters generalize to Republican voters.

One of the important features of this experiment is the use of treatment groups that vary in the number of GOTV robo calls received. With a few exceptions, published GOTV experiments use a single robo call, while it is common for political candidates and advocacy groups to deploy multiple iterations of multiple types of GOTV communication. Ramirez (Reference Ramirez2005) included a treatment with two robo calls among a wide assortment of GOTV methods. Recently, Zelizer (Reference Zelizer2020) studied the effect of repeating robo calls, dividing approximately 40,000 registered voters into eight treatments that received zero to seven calls in the week leading up to a primary election. Zelizer found that a treatment with five calls was the most effective, but the differences between treatments were not statistically significant. We are not aware of any other studies that directly measure the effect of robo call dosage.

Similarly, Green and Zelizer (Reference Green and Zelizer2017) conducted an experiment using multiple treatments with GOTV mail and found that efficacy declined after five mailers. Other experimental studies have raised the issue of GOTV over-saturation (Gerber and Green Reference Gerber and Green2000) or analyzed GOTV synergies via multiple treatments (Green and Gerber Reference Green and Gerber2019, p. 184), but did not find an effect of such multiple treatments. In a manual for political operatives, Lofy (Reference Lofy2005, p. 153) recommends two to nine contacts before an election to increase the likelihood that habitual voters cast a ballot on the day of the election.

Procedures

The field experiment occurred in the days leading up to the November 2014 general election. The experiment was funded and conducted by a political consulting firm seeking to evaluate the effectiveness of their GOTV services.Footnote ⁴ The firm identified registered voters in six states as subjects. The potential subject pool consisted of likely Republican voters who had registered to vote in Georgia, Nebraska, New Mexico, Ohio, Pennsylvania, or Virginia prior to January 1, 2010.Footnote ⁵ The pool of potential subjects was narrowed to people who had voted in exactly two of the 2010 and 2012 primary and general elections. People who voted early or who cast an absentee ballot in the 2010 or 2012 general election were also excluded.

From this group, 42,000 subjects with unique landline phone numbers were randomly selected from each state. Many of these landline numbers are associated with households that have more than one voter. Because we cannot identify who answers a robo call, we included every registered voter in each selected household in the analysis even though these household members are not expected to all match the original selection criteria in terms of voting history or even party affiliation. The 42,000 households from each state yielded subject pools ranging from 86,714 to 95,557 registered voters. In total, the study includes 539,567 subjects.

Within each state, households were randomly assigned to one of four groups: a control group and three treatment groups. Subjects in the three treatment groups, T1, T3, and T6 received one, three, or six treatment calls over 1, 3, or 6 days, respectively. The robo calls to subjects in T6 started 6 days before the election, the calls to subjects in T3 started 3 days before the election, and all subjects in the three treatment groups received a call on the morning of Election Day. On average, each household has 2.16 registered voters. Detailed information, including summary statistics of subject characteristics in each treatment group, is in Appendix A. The table of summary statistics shows that there are some imbalances across treatments on some control variables, particularly 2010 and 2012 voting rates. To address this issue, we include these covariates in our regression models. Further, we cluster the standard errors by household in all household-level specifications.

Treatment call messages were slightly different on each call date. Each message included a reminder about the date of the election and a short partisan message encouraging the subject to vote. Most of the messages invoke negative partisanship rather than making specific promises or describing specific plans.Footnote ⁶ The duration of the messages was between 35 and 45 seconds. Appendix B lists the script for each message. Table 1 shows the schedule of calls and schedule of scripts, which were identical for every state.

Table 1. Treatment Call Schedule

Note: Households in the treatment groups were called in 2014, on the date listed in the first row of Table 3. The general election was on November 4, 2014. Appendix B includes the text for each of the six scripts.

We define “live answerers” as subjects residing in households where at least one call resulted in a live answer, meaning someone in that household answered the phone for any length of time. We classify all other subjects as “treatment non-answerers,” regardless of whether the call went to an answering machine, operator, no answer, busy, fax machine, or was otherwise uncompleted.Footnote ⁷ However, calls that went to answering machines or had a live answer were considered successful treatment for the purposes of the local average treatment effect (LATE) analysis described in Appendix E, because subjects may have listened to and been influenced by the message left on their answering machine. Our outcome variable – whether a subject voted in the election – is based on verified voting records for the November 2014 election.

Results

Table 2 shows that the percent of subjects with at least one live answer for T1, T3, and T6 were 39, 60, and 67%, respectively.Footnote ⁸ The remaining subjects in each group did not answer any of the treatment calls. Table 3 shows the mean number of calls with live answers for subjects in each treatment group.Footnote ⁹ This pattern of more live answers in treatments with more calls is consistent across states.

Table 2. Treatment Call Outcomes by Treatment Group

Note: The table shows the percent of subjects in each treatment group that either answered at least one call live, had no live answer, had at least one answering machine answer, etc. We refer to these “responses” as “call outcomes.” Treatment call outcome categories, which are not all mutually exclusive, are listed in the rows of this table. Percents are relative to the total number of subjects in each treatment group. “AM” denotes answering machine. The raw number of subjects in each cell is given in parentheses.

Table 3. Mean Number of Live Answers by Treatment Group

Note: The top panel of this table shows the mean number of live answers for all subjects in each treatment group. The bottom panel shows the mean number of live answers only among subjects with at least one live answer. Standard errors for each mean are listed in parentheses.

Table 4 reports the voting rates for each of the treatment groups. These results show that treatment was associated with modestly higher voting rates, particularly the three call treatments, with voting rates among all treated subjects and subjects in T3 approximately 0.3 and 0.6 percentage points higher, respectively.

Table 4. Voting Rates by Treatment Group

Table 5 uses multivariate regressions to report the intent-to-treat estimates from comparing voting rates of the three treatment groups to those of the control group while controlling for subject-specific voting history, demographics, and socioeconomic characteristics.Footnote ¹⁰ Specifically, our control variables include whether or not a subject voted in the 2010 and/or 2012 general elections and subject age, gender, educational attainment, estimated income, state of residence, and the number of registered voters in their household.Footnote ¹¹ The coefficients on each of the treatment variables show the difference in voting rates, measured in percentage points, between the corresponding treatment group and the control group.Footnote ¹² Standard errors are clustered by household and listed below each regression coefficient.

Table 5. Intent-to-Treat Effects for All Subjects and Single-Voter Households (SVH)

Note: The table shows OLS estimates for the difference in voting rates between subjects in the treatment group and subjects in the control group. The subset of subjects used to estimate each specification is shown with the “Population” label. Subject controls include whether or not a subject voted in the 2010 or 2012 general elections or both and subject age, gender, education, income, number of subjects in the household, and state of residence. Standard errors are clustered by household and displayed in parentheses.

*** p < 0.01,

** p < 0.05,

* p < 0.1.

In Table 5, Column 1, the coefficient on the All Treatments variable is 0.00318, which indicates that voter participation for subjects in the pooled three treatment group is 0.318 percentage points higher than for subjects in the control group. This point estimate is statistically significant at the 5% level. The magnitude of this estimate indicates a yield of three additional voters for every 1,000 subjects who were called, or 314 subjects called per additional voter mobilized.Footnote ¹³ This effect is larger than the average robo call effect reported in Green and Gerber (Reference Green and Gerber2015, p. 196). This result implies a cost of under $9 to induce a subject from not participating in the election to casting a ballot.Footnote ¹⁴

Table 5, Column 2 reports estimation results by each treatment group. The findings indicate that the most effective treatment is T3, the treatment in which subjects receive an automated call on each of 3 days. T3’s treatment effect is 0.695 percentage points which is statistically significant at the one percent level.Footnote ¹⁵ This estimate implies the turnout of one additional voter for every 144 subjects assigned to T3.Footnote ¹⁶ The treatment effect for groups T1 and T6 are positive, but smaller and statistically insignificant. The implied cost to induce an additional vote for only subjects in T3 is under $4.

Our findings show that calling on 3 days was more effective than calling on 1 day or over 6 days.Footnote ¹⁷ The Wald statistic for testing for the equality of the coefficients for T1 and T3 is 9.04 and is significant at the one percent level. The Wald statistic testing the equality of the coefficients for T3 and T6 is 8.21 and is statistically significant at the one percent level. Thus, for both tests, we reject the hypothesis of equal dosage effects. There is no statistically significant difference between the coefficients for T1 and T6.

In households with multiple registered voters, we do not know which subject or subjects answered the treatment call or calls. Because the treatment effect may be diluted in multi-voter household, we also analyze subjects in single-voter households, which accounted for approximately one tenth of all subjects.Footnote ¹⁸ In Table 5, Columns 3 and 4 provide those estimation results. Our data show a larger treatment effect for subjects in single-voter households. In Column 3, the average treatment effect for single-voter households is 0.388 percentage points, but it is not statistically significant at the ten percent level.

Table 5, Column 4 separates the results by treatment group and shows the estimated effect among single-voter households. The treatment effects for T1 and T3 are both positive, although only T3 is statistically significant at the ten percent level. The estimate for T6 is negative, but not statistically significant.

As noted in the Procedures section above, the treatment messages were different from day to day. Within T3 and T6, subjects differed in terms of which days had live answers, and thus which messages were heard. However, we were not able to identify any meaningful differences in turnout that can be attributed to differences between hearing specific message scripts.Footnote ¹⁹ Similarly, although previous researchers find that GOTV calls are less effective the earlier these calls are placed prior to an election (Green and Gerber Reference Green and Gerber2019, p. 186; Murray and Matland Reference Murray and Matland2014; Nickerson Reference Nickerson2007; Panagopoulos Reference Panagopoulos2011), we did not observe a timing effect within the groups of voters who received three or six calls. This suggests that within the time frame of our study, the effectiveness of the treatment did not measurably decay.

We found that the pattern of treatment effectiveness varied across the states in our sample. We have included a chart in Figure 1 to show each group’s treatment effect for the six states in the experiment. While there are differences in the balance of subject characteristics across states, we do not observe a correlation between differences in some of the balance tests and the estimated treatment effects across the states. The state-by-state subject characteristic summary statistics are included in Appendix A as are the regression tables for the results graphed in Figure 1.

Figure 1. Intent-to-treat effects by state and treatment group.

The chart shows the state-by-state OLS estimates for the difference in voting rates between subjects in each treatment group and subjects in the control group in that state. Each circle represents the estimated treatment effect, and the vertical lines indicate the 95% confidence interval for each estimate. The regression used to calculate these treatment effects used subject controls, including whether or not a subject voted in the 2010 or 2012 general elections or both and subject age, gender, education, income, and number of subjects in the household. Standard errors are clustered by household.

In Appendix D, we discuss an alternative method of estimating the causal treatment effect using data from a call placed after the election. In Appendix E, we report the conventional method of estimating the treatment on treated effect by calculating local average treatment effects. In Appendix F, we test whether our results are significant after correcting for multiple hypothesis testing.

Conclusion

This experiment employed multiple-call dosage and mimicked the behavior of real political GOTV campaigns. We show that targeted automated calls have a positive effect on voter turnout even without using a social pressure message. We find that, across all treatments, the intent-to-treat effect is larger than most previous measures of robo call effectiveness. Our results suggest that it is not irrational for campaigns to deploy robo calls as a cheap additional tool to increase voter turnout. We find that the treatment increases voter participation by between three to six additional voters for every 1,000 subjects. This corresponds to a cost of $4 to $9 to induce a subject to vote in elections.

We also find that dosage matters. The intent-to-treat effect for dosage T3 is two to three times larger than for dosage T1, while dosage T6 typically had the smallest effect. These results suggest that additional calls can increase effectiveness, but that too many calls may be counterproductive. Our data are not rich enough to determine why the T6 treatment had the smallest treatment effect. One explanation for this finding might be that receiving so many calls annoyed the subjects and that they started ignoring the calls that were closer to the election, even if they were picking up the phone. While T6 reached a higher percentage of subjects than T3 – 67 and 60% – respectively, the increase from three calls to six calls nearly doubled the total number of live answers for each live answerer – from 2.1 in T3 to 3.8 in T6. Annoyance with this stark increase in live answers received may explain the decreased effectiveness of T6. This topic may warrant further experiments, using placebos, to determine how receiving marginal additional calls affects voting behavior.

Overall, the pattern of dosage results observed in our experiment is consistent with the broader GOTV literature. The 0.12 percentage point effect of T1 is consistent with meta-analyses (Green and Gerber Reference Green and Gerber2015, p. 196; Green, McGrath, and Aronow Reference Green, McGrath and Aronow2013),Footnote ²⁰ while the larger 0.65 percentage point treatment effect of T3 suggests, along with more recent work (such as Zelizer Reference Zelizer2020), provides evidence that robo call treatments are more effective with some, but not too much, repetition.

Supplementary material

To view supplementary material for this article, please visit https://doi.org/10.1017/XPS.2022.16

Data availability

The data, code, and any additional materials required to replicate all analyses in this article are available at the Journal of Experimental Political Science Dataverse within the Harvard Dataverse Network, at: https://doi.org/10.7910/DVN/DMJ7EA.

Conflicts of interest

The authors are not aware of any conflicts of interest related to this paper.

Ethics statement

The research presented here was approved by the George Mason University IRB under Project Title: [854034-1] Robo call GOTV data analysis.

To the best of our knowledge, this research adheres to APSA’s Principles and Guidance for Human Subjects Research.

Details of the experimental procedure are described in the manuscript above. Further description pertinent to the APSA standards is in online Appendix G.

Footnotes

This article has earned badges for transparent research practices: Open Data and Open Materials. For details see the Data Availability Statement.

¹ The experiment itself was conducted by a political consulting firm while the authors of this paper performed the analysis of the data. In the description of the experiment and its execution, we intend for “we” and “our” to include the members of that firm.

² A more recent meta-analysis (Green and Gerber Reference Green and Gerber2019, p. 222) finds an average treatment effect of 0.234 percentage points (confidence interval from 0.039 to 0.430 percentage points). However, that newer meta-analysis does not provide the best of comparison because it includes the results from this experiment as described in a related article based on the experiment described in this paper.

³ See also Nickerson (Reference Nickerson2008) for an early example a GOTV placebo design.

⁴ This experiment was not pre-registered. The authors received approval to perform this research from their institution’s Institutional Review Board. Initially, our involvement with the experiment was limited to providing information about design features for clean causal inference, identifying the voters likely to be marginal, and the sample size necessary to detect an expected treatment effect. We were subsequently asked to assist with analyzing and interpreting the data. We received permission to use the data for academic research and publication a year after the experiment.

⁵ In Nebraska, New Mexico, and Pennsylvania, voters register with a party affiliation when they register to vote. Georgia, Ohio, and Virginia do not have party registration. For these three states, the likely party affiliation and eligibility for the subject pool were determined by a third-party data company. We expect that the results from this experiment would generalize to voters from either of the two parties.

⁶ Negative messages are often more effective for motivating voters (Barton, Castillo, and Petrie Reference Barton, Castillo and Petrie2016).

⁷ If a call outcome was operator, no answer, busy, fax machine, or otherwise uncompleted on the first attempt, then the phone number was called again 30 minutes later. If the second attempt also was not successful, a final attempt was made after another 30 minutes had passed. Thus, each single treatment “call” could include up to three attempts. Once a call resulted in a live answer or answering machine on the day for which the call was scheduled according to the treatment protocol, on that day, the message was played, and no further calls were placed. In all cases, the outcome of the first successful or the final unsuccessful call was recorded as the outcome of that call.

⁸ We report detailed results of the treatment call outcomes in Appendix C. That appendix includes call outcomes grouped by state, call number, and treatment group.

⁹ These tables show that the percent of live answerers for T3 was 21 percentage points higher than for T1, and live answerers in T3 answered one more call on average. However, the percent of live answerers for T6 was only 7 percentage points higher than for T3, while live answerers in T6 answered almost two more calls than live answerers in T3.

¹⁰ The corresponding table without subject controls is available in Appendix E. These estimates are similar to the estimates reported in Table 5.

¹¹ We truncate the values for age and the number of subjects in a household to reduce the probability that outliers will lead to spurious results. The low and high age categories are 20 or younger and 90 or older, respectively, while the maximum variable for the number of subjects in a household is coded as 6 or higher. Data for control variables were not available for a small number of subjects. For these subjects, we imputed values based on the means of the rest of the sample. When a value was missing for an indicator variable, we added an indicator for “data unavailable” and coded the missing indicator as 99.

¹² Although we use OLS to estimate treatment effects, our results are robust to nonlinear estimation techniques such as probit and logit regression models.

¹³ Given that each landline phone number was associated, on average, with approximately 2.5 household members, this estimate implies that robo call treatments for 400 households, with an average of 3.5 calls per treatment, lead to three additional voters.

¹⁴ We estimate the total cost of generating one additional voter through automated calls by multiplying the sum of the number of calls that resulted in a live answer or answering machine by $250/10,000 calls. We estimate the number of additional voters by multiplying the treatment coefficient, 0.00318, by the number of subjects in the treatment group, 404,236. We estimate the cost per additional voter by dividing the total cost by the number of additional voters.

¹⁵ Following the methodology described in List et al. (Reference List, Shaikh and Xu2019), we evaluate the effect of correcting for multiple hypothesis testing. With this correction, the point estimate for T3 remains statistically significant at the one percent level. See Appendix F for details.

¹⁶ Based on the average of 2.5 household members per landline, this corresponds to approximately 62 treated households, or 186 robo calls placed.

¹⁷ This finding corroborates a conventional GOTV strategy of making “two or three contacts in the final weekend” (Lofy Reference Lofy2005, 153).

¹⁸ In an experiment to measure how making specific plans with subjects can increase the effectiveness of GOTV contacts, Nickerson and Rogers (Reference Nickerson and Rogers2010) find much larger treatment effects for single-voter households.

¹⁹ For example, by restricting the sample to subjects with exactly one live answer we might be able to detect differences across messages or days, but we did not find anything conclusive to suggest such differences.

²⁰ As noted above, the newer meta-analysis with a 0.234 percentage point treatment effect in Green and Gerber (Reference Green and Gerber2019, p. 222) includes experiments with multiple calls: Zelizer (Reference Zelizer2020) and the experiment described here.

References

Ali, S. Nageeb and Lin, Charles. 2013. Why People Vote: Ethical Motives and Social Incentives. American Economic Journal: Microeconomics 5(2): 73–98.Google Scholar

Angrist, Joshua D. and Pischke, Jörn-Steffen. 2014. Mastering ‘Metrics: The Path from Cause to Effect. Princeton: Princeton University Press.Google Scholar

Arceneaux, Kevin and Nickerson, David W.. 2009. Who Is Mobilized to Vote? A Re-Analysis of 11 Field Experiments. American Journal of Political Science 53(1): 1–16.10.1111/j.1540-5907.2008.00354.xCrossRef Google Scholar

Barton, Jared, Castillo, Macro, and Petrie, Ragan. 2016. Negative Campaigning, Fundraising, and Voter Turnout: A Field Experiment. Journal of Economic Behavior & Organization, 121, 99–113.CrossRef Google Scholar

Coate, Stephen and Conlin, Michael. 2004. A Group Rule–Utilitarian Approach to Voter Turnout: Theory and Evidence. The American Economic Review 94(5): 1476–504.10.1257/0002828043052231CrossRef Google Scholar

Degan, Arianna and Merlo, Antonio. 2011. A Structural Model of Turnout and Voting in Multiple Elections. Journal of the European Economic Association 9(2): 209–45.CrossRef Google Scholar

DellaVigna, Stefano, List, John A., Malmendier, Ulrike, and Rao, Gautam. 2017. Voting to Tell Others. The Review of Economic Studies 84(1): 143–81.CrossRef Google Scholar

Dhillon, Amrita and Peralta, Susana. 2002. Economic Theories of Voter Turnout. The Economic Journal 112(480): F332–52.10.1111/1468-0297.00049CrossRef Google Scholar

Feddersen, Timothy J. and Sandroni, Alvaro. 2006. A Theory of Participation in Elections. The American Economic Review 96(4): 1271–82.CrossRef Google Scholar

Gerber, Alan S., and Green, Donald P.. 2000. The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment. American Political Science Review 94(03): 653–63.CrossRef Google Scholar

Gerber, Alan S., Green, Donald P., Kaplan, Edward H., and Kern, Holger L.. 2010. Baseline, Placebo, and Treatment: Efficient Estimation for Three-Group Experiments. Political Analysis 18(3): 297–315.10.1093/pan/mpq008CrossRef Google Scholar

Gerber, Alan S., Huber, Gregory A., Doherty, David, Dowling, Conor M., Raso, Connor, and Ha, Shang E.. 2011. Personality Traits and Participation in Political Processes. The Journal of Politics 73(3): 692–706.10.1017/S0022381611000399CrossRef Google Scholar

Green, Donald P., and Gerber, Alan S.. 2015. Get Out the Vote: How to Increase Voter Turnout (3rd ed.). Washington, DC: Brookings Institution Press.Google Scholar

Green, Donald P. and Gerber, Alan S.. 2019. Get Out the Vote: How to Increase Voter Turnout (4th ed.). Washington, DC: Brookings Institution Press.Google Scholar

Green, Donald P., McGrath, Mary C., and Aronow, Peter M.. 2013. Field Experiments and the Study of Voter Turnout. Journal of Elections, Public Opinion & Parties 23(1): 27–48.10.1080/17457289.2012.728223CrossRef Google Scholar

Green, Donald P. and Zelizer, Adam. 2017. How Much GOTV Mail is Too Much? Results from a Large-Scale Field Experiment. Journal of Experimental Political Science 4(2): 107–18.10.1017/XPS.2017.5CrossRef Google Scholar

Kling, Daniel T., and Stratmann, Thomas. 2022. Replication Data for Large-Scale Evidence for the Effectiveness of Partisan GOTV Robo Calls. https://doi.org/10.7910/DVN/DMJ7EA. Harvard Dataverse, V1.CrossRef Google Scholar

List, John A., Shaikh, Azeem M., and Xu, Yang. 2019. Multiple hypothesis testing in experimental economics. Experimental Economics, 22(4), 773–793.10.1007/s10683-018-09597-5CrossRef Google Scholar

Lofy, Bill. 2005. Politics the Wellstone Way: How to Elect Progressive Candidates and Win on Issues. Minneapolis, MN: University of Minnesota Press.Google Scholar

Murray, G. R. and Matland, R. E.. 2014. Mobilization Effects Using Mail: Social Pressure, Descriptive Norms, and Timing. Political Research Quarterly 67(2): 304–19.CrossRef Google Scholar

Nickerson, David W. 2007. Quality is Job One: Professional and Volunteer Voter Mobilization Calls. American Journal of Political Science 51(2): 269–82.10.1111/j.1540-5907.2007.00250.xCrossRef Google Scholar

Nickerson, David W. 2008. Is Voting Contagious? Evidence from Two Field Experiments. American Political Science Review 102(01): 49–57.CrossRef Google Scholar

Nickerson, David W. and Rogers, Todd. 2010. Do You Have a Voting Plan? Implementation Intentions, Voter Turnout, and Organic Plan Making. Psychological Science 21(2): 194–9.CrossRef Google Scholar

Panagopoulos, Costas. 2011. Timing Is Everything? Primacy and Recency Effects in Voter Mobilization Campaigns. Political Behavior 33(1): 79–93.CrossRef Google Scholar

Ramirez, Ricardo. 2005. Giving Voice to Latino Voters: A Field Experiment on the Effectiveness of a National Nonpartisan Mobilization Effort. The Annals of the American Academy of Political and Social Science 601(1): 66–84.10.1177/0002716205278422CrossRef Google Scholar

Rogers, Todd, Green, Donald P., Ternovski, John, and Young, Carolina Ferrerosa. 2017. Social Pressure and Voting: A Field Experiment Conducted in a High-Salience Election. Electoral Studies 46: 87–100.CrossRef Google Scholar

Romano, Joseph P. and Wolf, Michael. 2010. Balanced Control of Generalized Error Rates. The Annals of Statistics 38(1): 598–633.CrossRef Google Scholar

Sagarin, Brad J., West, Stephen G., Ratnikov, Alexander, Homan, William K., Ritchie, Timothy D., and Hansen, Edward J.. 2014. Treatment Noncompliance in Randomized Experiments: Statistical Approaches and Design Issues. Psychological Methods 19(3): 317–33.10.1037/met0000013CrossRef Google Scholar PubMed

Shaw, Daron R., Green, Donald P., Gimpel, James G., and Gerber, Alan S.. 2012. Do Robotic Calls from Credible Sources Influence Voter Turnout or Vote Choice? Evidence from a Randomized Field Experiment. Journal of Political Marketing 11(4): 231–45.CrossRef Google Scholar

Zelizer, Adam. 2020. How Many Robocalls are too Many? Results from a Large-Scale Field Experiment. Journal of Political Marketing 19(4): 405–13.CrossRef Google Scholar

Table 1. Treatment Call Schedule

Table 2. Treatment Call Outcomes by Treatment Group

Table 3. Mean Number of Live Answers by Treatment Group

Table 4. Voting Rates by Treatment Group

Table 5. Intent-to-Treat Effects for All Subjects and Single-Voter Households (SVH)

Figure 1. Intent-to-treat effects by state and treatment group.The chart shows the state-by-state OLS estimates for the difference in voting rates between subjects in each treatment group and subjects in the control group in that state. Each circle represents the estimated treatment effect, and the vertical lines indicate the 95% confidence interval for each estimate. The regression used to calculate these treatment effects used subject controls, including whether or not a subject voted in the 2010 or 2012 general elections or both and subject age, gender, education, income, and number of subjects in the household. Standard errors are clustered by household.

Kling and Stratmann Dataset

Dataset

https://doi.org/10.7910/DVN/DMJ7EA

Link

Kling and Stratmann supplementary material

Appendix

File 144.1 KB

Article contents

Large-Scale Evidence for the Effectiveness of Partisan GOTV Robo Calls

Abstract

Keywords

Introduction

Procedures

Results

Conclusion

Supplementary material

Data availability

Conflicts of interest

Ethics statement

Footnotes

References

Kling and Stratmann Dataset

Kling and Stratmann supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests