The 2024 presidential election has many unique characteristics. Although at first it appeared to be the first rematch for the presidency since Dwight Eisenhower defeated Adlai Stevenson for a second time in 1956, it will instead be a race between a former president and the first woman of color to lead a major party’s presidential ticket. Not since 1940 and Herbert Hoover’s ill-fated attempt to claim the Republican Party’s nomination for president has a former occupant of the White House attempted to reclaim the nation’s highest office. The current matchup between Donald Trump and Kamala Harris results from the sitting president, and winner of his party’s primary process, stepping aside in response to immense pressure from within and outside his own political party. We also saw an attempt on the former president’s life that, to many, already seems like a distant memory. This creates a complex landscape that challenges election forecasters in unique ways. Despite these challenges, elections tend to follow predictable patterns, and we rely on this knowledge to predict the 2024 election. To that end, we constructed a model that is responsive to the everyday fluctuations that exist in a campaign environment while also taking into context states’ electoral history.
HOW OUR MODEL WORKS
Our model has two variables: the previous election results of a given state, as measured by the margin of the winning candidate, and the average of current polls in that state. Consequently, it provides a balance of contemporary, present-day horse-race polling paired with an accounting of the partisanship and vote history of a state in the most recent presidential election. Notably, our model’s coefficients change over time. As the election approaches, we recalculate our coefficients, and the model tends to put more weight on polls and less on the previous election result to account for polling becoming a more accurate predictor as the election cycle goes on. In this section, we discuss how we calibrate our model, provide a brief description of these two variables, and explore how we collect and measure them to construct our forecasting model.
Nuts and Bolts: Calibration
To collect data for the model, we use data sources that documented both general election polling by state and previous election results. To gather previous polling averages, we leverage the data collected by two data sources: Real Clear Politics (RCP) and FiveThirtyEight (538). Simple polling averages from 2004 through 2012 are constructed using polls documented in the RCP databases, and the 2016 and 2020 polling averages are constructed from polling in 538’s (more comprehensive) database. To gather election results, we collect data from the Federal Election Commission (FEC) reports from the years 2000 through 2020.
We create simple averages based on the polls in that databases that were available at the date to which the model is calibrated; in this case, September 1. We selected 19 states in March that we viewed as having the potential to be the most competitive.Footnote 1 We analyze how the polling—and the errors associated with that polling—changed over time across five election cycles and reestimate the model at six dates between April 15 and Election Day (see table 1). This approach emphasizes correctly estimating the outcomes and win probabilities of the states most likely to decide the election and doing so only using data that were available at similar points in the election cycle.
* p < .01.
Our reasoning supporting this approach is that polling averages become more accurate as the election nears. For example, on April 15, the earliest calibration of our current model, the model puts more weight on past results than on polling. But, as we near Election Day, we find that the model will typically continue to put more weight on the polls, providing us with something akin to a weighted average between the polls and the prior results that changes as the election approaches. Thus, the weight of each factor changes as we near Election Day. Our model is estimated using an ordinary least squares (OLS) regression and without an intercept, for reasons we discuss later.
Polling Average
To create a polling average for each state, we only include polls from pollsters that are ranked in the Top 100 as determined by the reputable outlet, FiveThirtyEight, whenever such polls are available. The Top 100 includes about one-third of tabulated pollsters and such polling giants as the New York Times/Siena College, Monmouth’s University Polling Institute, YouGov, Selzer, Marist College, and Quinnipiac. Furthermore, whenever possible, we only use those pollsters with a documented history of polling presidential general or primary elections or other statewide general election races in the past and only if the poll in question does not have a partisan sponsor.Footnote 2 These selection criteria were chosen to avoid the inclusion of low-quality pollsters that, despite much media attention, may skew the averages with questionable data. When no such polls are available, we give preference to higher-ranked (Top 150) polls. We base our averages on the margin between the Democratic and Republican candidates. When possible, we use likely voter samples but do include registered voter samples when the former are unavailable. Additionally, given that there are no third-party candidates currently in the race consistently polling above 1%, we include head-to-head polls whenever possible.
Only recent polls (within a month) are included in the model.Footnote 3 We follow this process to maintain the averages: when new polls are added to each state’s average, old polls are deleted if (1) they are more than one month old and (2) deleting the poll would not cause there to be less than three polls in the state’s average. To compute our averages, we weight by sample size and the age of the polls relative to others in the average.Footnote 4
Previous Election Result
We include in our model the result from the previous presidential election. We view this as an important component of the model for two reasons. First, it represents real, actual data that reflect how voters cast their ballots in the past: in some ways, it is the ultimate horse-race poll. Second, this variable allows us to account for both state-level partisanship and our model’s elastic nature, by which we mean how much states tend to fluctuate between presidential elections. Through this process, the model may capture factors that polls may not, which helps account partially for some potential biases within polling that arise from methodological challenges, such as partisan nonresponse bias. We measure this variable by taking the margin between Joe Biden and Donald Trump from the previous presidential election. For example, Donald Trump garnered 57.02% of the vote in Indiana in the 2020 presidential election compared to Joe Biden’s 41.96%. Thus, the margin is rounded and entered into the model as 15.1 percentage points.
Other Modeling Considerations and Strategies
Our model is notably “simple,” which we argue is a strength: parsimony, after all, is desirable in political science. And given that we are relying on past electoral performance and the best-performing polls in recent cycles, we are confident in the integrity of the model. However, many other variables have been used, with varying degrees of success, to model presidential election outcomes, such as incumbency (Abramowitz Reference Abramowitz2016), varying measures of economic growth or sentiment (Erikson and Wlezien Reference Erikson and Wlezien2021; Lewis-Beck and Tien Reference Lewis-Beck and Tien2021; Lockerbie Reference Lockerbie2021), the number of war deaths (Hibbs Reference Hibbs2000), presidential approval (Abramowitz Reference Abramowitz2016), primary support (Norpoth Reference Norpoth2021), perceived competency and leadership capability (Graefe Reference Graefe2021), the candidate’s home state (DeSart Reference DeSart2021) and the number of consecutive terms served by the incumbent party (Abramowitz Reference Abramowitz1988; DeSart Reference DeSart2021).
We take seriously the possibility that, in our pursuit for an accessible and parsimonious model, we have left out variables that may matter. We therefore put forward a three-pronged justification of our model. First, we are skeptical that including additional variables will further refine the model and not create a problem of overfitting. For instance, overfitting can lead to overconfidence in the output of our models, accentuating some “idiosyncratic features” that may be powerful in explaining some presidential elections but not others (Dowding and Miller Reference Dowding and Miller2019).
Second, many models incorporate a state-level approach with a minimum number of variables to great success. For example, Campbell and Wink (Reference Campbell and Wink1990) built a model that comprised the Gallup Poll trial-heat question and real GNP growth, borrowing that variable from Lewis-Beck and Rice (Reference Lewis-Beck and Rice1984). Some models also leverage economic variables within the state context, using unemployment change, presidential job approval, and local partisan domination (Jerôme and Jerôme-Speziari Reference Jerôme and Jerôme-Speziari2012). Still others incorporate limited variables in the statewide context, such as asking respondents whom they believe will win in the upcoming election (Murr and Lewis-Beck Reference Murr and Lewis-Beck2020). Consequently, we view ourselves on sure ground for positing a model with a small number of variables at the state level.
Finally, we believe that horse-race polling may account for much of these potentially confounding variables. For example, someone who is dissatisfied with the current economic condition of the United States is likely incorporating that into an evaluation of who to vote for, which is then reflected in the preference provided to a pollster. Furthermore, the unit of analysis for many of these variables is this election cycle at the state level. Thus, our observations are drawn from just five presidential elections, and we want to ensure that cycle-level factors do not lead us to incorrectly estimate the coefficients by placing undue certainty on our results because of temporal or idiosyncratic factors.
Notably, we are not the first to incorporate polling and past election results in a model forecasting presidential elections at the state level, although our approach differs from DeSart’s 2020 model. DeSart (Reference DeSart2021) attempted to forecast the election using a variable aggregating several past election results and polling from a year prior to the election, in addition to the candidate’s home state and the number of terms the incumbent party had been in power. We, instead, use just the previous election results and calibrate our model to changes in polling over time. Rather than predicting the election from a single date, our model changes over time, becoming less error prone.
DATA COLLECTION AND BUILDING THE MODEL
The coefficients of the independent variables change over time as the model is recalibrated because the election date becomes closer and closer. Later in the election cycle, we find that the model will generally put more weight on polling and less on previous election results, given that polling becomes more predictive of election outcomes as we near the actual date of the election.Footnote 5 How much weight the model places on each variable at various points in the cycle can be seen in table 1. We expect the final update to come the night before the election. The model can be viewed throughout the election cycle at the link in footnote 6, which we provide to readers to see how the model continued to evolve after the time of this writing and to increase the transparency of how we reach our forecast.Footnote 6
Later in the election cycle, we find that the model will generally put more weight on polling and less on previous election results given that polling becomes more predictive of election outcomes as we near the actual date of the election.
We estimate outcomes using an OLS regression model. We choose to do so without an intercept because it is likely to be the result of aggregate cycle-level errors, a concern that we noted earlier. It is unclear whether the intercept provides any new information beyond which party has benefited from cycle-level polling error in the past five elections. Notably, polling “bias tends to shift unpredictably from election to election” (Silver Reference Silver2018). Therefore, we believe that estimating the model without the intercept is the best choice. We are assuming, instead, that if polls and previous election results both indicate a tie, we should assume that the election will come close to being a tie. We then create a point estimate for each observation based on the model and create an estimated error term using the model’s residuals based on the square root of the mean of squared errors.
To create uncertainty estimates, we gather a z-score using the expected error and take the p-value, assuming a normal distribution. For the overall outcome, we take the percentage chance a given candidate has of winning in the tipping-point state as their percentage change of winning the race. In doing so, we are assuming uniform errors in the model’s estimates. Although this is not the case in reality, we do know that polling error is correlated between states and that the assumption of uniform error is better than the assumption of independent error (Silver Reference Silver2016). The model additionally provides an estimated popular vote outcome by taking an average of states’ predicted outcomes, weighted by the total number of votes in the previous election. It also gives us a mean electoral vote outcome based on the sum of the candidate’s expected electoral votes from each state.
Our forecast relies on three simple values: (1) we predict an election result based only on information that was available at the same time point in previous election cycles, (2) we predict state election outcomes because they are determinative of the outcome of the election, and (3) we do not make assumptions about the direction of error and avoid modeling choices that may lead us to do so unduly.
Our forecast relies on three simple values: (1) we predict election based only on information that was available at the same time point in previous election cycles, (2) we predict state election outcomes because they are determinative of the outcome of the election, and (3) we do not make assumptions about the direction of error and avoid modeling choices that may lead us to do so unduly.
THE MODEL
Table 1 shows the model calibrated at six different points from April 15 through Election Day. It indicates that the amount of weight that the model puts on each variable shifts over time. We use the following equation to predict the outcome of the election on September 1:
However, on Election Day, the model puts more weight on polls and less on the previous election margin, although both remain significant at p < .001. The November equation is
Notably, as Election Day nears, each model has a higher R2, indicating that as it approaches, each model explains a higher portion of the variance (as expected). In addition, the expected error of the model’s prediction changes over time. The expected error of the April 15 model is 6.9 points, whereas the Election Day model has an expected error of just 3.4 points. As the election grows closer, the model’s predictions become more precise.
To test how the model would have performed in previous election cycles, we test the Election Day model error against the error of polling alone. We test this using both coefficients from the full sample and the coefficients in which the election being tested is taken out of the sample, thereby providing an out-of-sample test. The error of the model excluding the election being predicted from the sample was within 0.2 points on average, between 2016 and 2020, of the error of the model when using the full sample (table 2). We find that regardless of model specification, our model beat the error of Election Day polling in both 2020 and 2016. Additionally, we find that the errors of the out-of-sample test are fairly similar to the full-sample errors.
Table 3 shows our model’s outputs from key states as of September 1, 2024, and figure 1 shows the model’s predicted winner in each state and in Washington, DC. We find that Kamala Harris is currently a narrow favorite and, assuming uniform errors, that she has a 57% chance of winning the presidency. The average outcome is Harris winning the electoral college 289–249. The median outcome is Harris winning 292 electoral votes to Trump’s 246. Our model also currently estimates that Harris will win the popular vote by 3.8 percentage points. The model currently gives Harris a 95% chance of receiving between 162 and 447 electoral votes.
We find that Kamala Harris is currently a narrow favorite, and assuming uniform errors, that she has a 57% chance of winning the presidency. The average outcome is Harris winning the electoral college 289–249.
Note: The full predictions for all 50 states on September 1 are available in the Supplemental Appendix.
Figures 2 and 3 show the Democrats’ probability of winning the election over time. Currently, Kamala Harris is favored and has a 57% chance of winning the presidency. On July 15, shortly before Biden dropped out of the race, his odds of winning were just 36%. This indicates that Democrats’ chances of winning the presidency are 21 percentage points higher now than they were on July 15. In a separate analysis available in the appendix, we find that the change in candidates led to a 3.3-point increase in the Democrats’ standing in national polling. Given this data, it appears that the choice of Biden to withdraw from the race greatly improved the Democrats’ chances of holding the White House.
It is important to note that there is significant uncertainty associated with predicting an election at this stage, and even though uncertainty does decrease as the election approaches, forecasters are never fully free from it. Our goal is not to downplay that uncertainty but rather to embrace it and do the best we can with the data available to us. As the election gets closer, we will have a better idea of what will happen.
In this spirit of transparency, it is also important to acknowledge two important limitations of this model. First, it does not have a way to recognize discrete campaign events that may have a fleeting effect on the polls. If, for example, the Democratic National Convention had an effect on Harris’s polling that was only temporary, rather than durable, as the effects of conventions tend to be (Erikson and Wlezien Reference Erikson and Wlezien2012), our model may currently be underestimating Trump’s chances of winning. Second, in places where there is less polling available or where the polling is less current, the model’s estimates may be less accurate. Thankfully, states that are more likely to decide the election are polled more often, and the volume of polling tends to increase as the election approaches: therefore, the estimates are likely to become more accurate as Election Day approaches and are likely more accurate in decisive states.
CONCLUSION
Our model is parsimonious, balances two important yet simple criteria, and is accessible to the general public. We have also attempted to incorporate Victor’s (Reference Victor2021) recommendations for election forecasters in our model development. We place a strong emphasis on the precise parameters—in this case, horse-race polling and previous election results—that are predicting the outcome. We have also endeavored to be transparent, providing access to our data and recognizing the uncertainty associated with our predictions by discussing the amount of confidence we have in the results.
Currently, Harris is a narrow favorite to win the election, but Trump remains in strong position for an upset. Given the stark policy difference between these two candidates, the stakes are tremendously high. A Harris presidency is likely to bring new pressure for abortion rights protections at the federal level, attempts to enshrine protections on voting rights into federal law, continued pressure on Russia to withdraw from Ukraine, and an expansion on economic regulations and social spending. A Trump presidency, in contrast, would likely try to remake the federal bureaucracy in Trump’s image, have a harsher immigration policy, and embolden authoritarian leaders across the world. Our model shows that a Harris presidency is slightly more likely, but the latter remains a strong possibility. Only on November 5, when more than 125 million voters decide the fate of the nation, will we begin to understand what is in store for the future of the United States.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit http://doi.org/10.1017/S1049096524000878.
ACKNOWLEDGMENTS
We would like to thank Jeffery Harden (University of Notre Dame) for his informal advice during the review process.
DATA AVAILABILITY STATEMENT
Research documentation and data that support the findings of this study are openly available at the PS: Political Science & Politics Harvard Dataverse at https://doi.org/10.7910/DVN/RLEUOF.
CONFLICTS OF INTEREST
The authors declare that there are no ethical issues or conflicts of interest in this research.