In the past 20 years, economists have estimated empirical exercises that rely in part on a published work that reports the population of every country in the world starting in the year 1 ce or even earlier. The existence of such data surprises those familiar with research on population history; we have only a rough idea of the population of most parts of the globe before 1500. For many countries, the statistical lacuna extends closer to the present. Until the advent of modern censuses, which in most countries started during the nineteenth century, reckonings of the total population for even the best-studied cases remain subject to considerable error.
These exercises typically rely on McEvedy and Jones’s Atlas of World Population History (hereafter MJ). Published in 1978, this work reports a population total for the countries of the world at intervals of a century or half-century. MJ did not disguise the rough nature of their data, as the epigraph notes, and we should distinguish what they report from the way others used their work. Several economists point to a U.S. Census Bureau summary that appears to endorse MJ’s estimates. The Bureau simply notes that MJ’s estimates for world population are not too different from the other, earlier results.Footnote 1 As MJ state (pp. 353–4), however, that agreement is largely by construction.
The drawbacks of using such data are numerous. MJ’s estimates, as they suggested themselves at the time of writing, lacked, in many cases, any firm foundation. Often, the estimates appear to reflect a judgment about the nature of the economy in question, rendering their use as economic proxies partially tautological. The MJ estimates are out-of-date for some countries; researchers have provided better figures in the past 40 years. Economists tend to dismiss measurement error issues by appealing to the implications of “classical” measurement error. MJ’s clearly stated rounding rules mean the measurement error is not classical. Non-classical measurement error create several opportunities for bias in regression models. Economists have compounded these weaknesses with unwise disaggregation practices.
Many economics articles, including several highly cited contributions in the leading journals, rely on MJ for econometric exercises. This research has appeared in the leading general-interest economics journals, in development and growth-oriented journals, and in the main field journals for economic history. Several of these papers have been cited many times.Footnote 2 The present paper raises serious questions about the results of any econometric exercise that relies on MJ.
If the correct population data were available, we could re-estimate specific models that appear in published papers and assess the consequences of the measurement-error problems discussed here. This is obviously not possible because we lack the correct data. What I do instead is study the way MJ assemble and round their estimates. This permits us to draw on econometric literature to understand the difference between a model estimated using MJ and a model estimated using corrected population data. I then discuss more specifically the way some economists have used this population data. A brief replication exercise using Nunn and Qian (2011) shows that some published results are not robust to careful consideration of the problems in the MJ data.
THE SOURCE
MJ report a series of graphs of total population in a country (or region), with labels at centuries or half-centuries. Figure 1 reports the data for Germany in a format similar to the figures MJ use to present most of their estimates.Footnote 3 For the twentieth century and, in some cases, the nineteenth, MJ reproduce official census counts as discussed by earlier scholars, sometimes adjusted for changes in national boundaries.Footnote 4 Modern censuses did not start anywhere until the late eighteenth century and were not widespread until the nineteenth century.
One would think from reading the economics literature that MJ report precise numbers based on their analysis of earlier works. Graphs such as Figure 1, along with MJ’s descriptions, suggest a different picture. “There are almost no data on which to base a population estimate for Germany until we reach the late Middle Ages” (McEvedy and Jones Reference McEvedy and Jones1978, p. 70). “Estimates of Poland’s population before the 14th century are based on nothing more than general ideas about likely [population – T.G.] densities” (p. 76). For the Maghreb, “There is really nothing on which to base any calculations before the 19th century” (p. 220). These comments are admirably frank, but MJ do, in fact, report population totals for Germany, Poland, and the Maghreb, and economists have used those observations to test hypotheses we view as important. MJ include a bibliography for each group of population estimates, but they typically do not explain how they used the references they list. For Burma, they note that the quantitative record consists of a single publication based on a count of houses in 1783 as well as colonial censuses that began in 1871. Yet MJ report population sizes for that country as far back as 400 bce (pp. 190–92). This is not an isolated example. In discussing the western hemisphere, they refer to debates current at the time they wrote, but those debates suggested large ranges of estimates and pertain to the decades just prior to European contact. Yet MJ provide estimates for countries in this region going back many centuries. Most African entries have the same flavor; the only evidence MJ cites refers to the seventeenth century at the earliest, yet they report estimates for two full millennia.
Caldwell and Schindlmayr (Reference Caldwell and Schindlmayr2002, p. 200) emphasize the difficulty of useful population estimates for most of the world, and even for Europe before 1800. Population figures for the large areas of the globe that once fell under European colonial domination may be the hardest part of the problem. The essays on the Americas collected in Denevan (1992) document debates that continue. Carlos, Feir, and Redish (Reference Carlos, Feir and Redish2022, p. 522), for example, note that in the early twentieth century, estimates of the pre-contact population of North America north of Mexico City ranged from 1.2 to 18 million people. Later efforts narrowed that range to 1.2 to 6.1 million people. The demographic consequence of colonial contact is one measure of imperialism’s impact on indigenous peoples, so these population measures carry considerable interpretative weight. Continuing differences of opinion do not reflect a lack of research.Footnote 5
Systematic discussion of MJ has been limited, but specialists tend not to be impressed. As Austin (Reference Austin2008, p. 1102) puts it, “If you look up McEvedy and Jones expecting a treatise, detailing the original evidence and the reasoning behind the judgements by which it was converted into useable data, you will be disappointed.” In discussing one particular study that relies heavily on MJ, Austin (Reference Austin2008, p. 1002) says that “there is simply no epistemological basis for Nunn’s use of the word ‘data’ – literally, ‘things that are given’ or granted – to refer to the guesses that have been made about the population of future African countries in 1400.”Footnote 6
MJ’s effort reflects a long interest in the world’s population from distant times. MJ draw on these earlier efforts, which include Clark (Reference Clark1968) and Durand (Reference Durand1974). (Online Appendix Table A.1 summarizes the leading examples.) Caldwell and Schindlmayr (Reference Caldwell and Schindlmayr2002) discuss the intellectual history of these research projects, stressing their skepticism about the apparent consensus in the figures. MJ’s effort differs from their predecessors in one important respect: the earlier estimates pertain to large regions or continents. MJ usually report populations for the areas that correspond to modern nation-states.
How did MJ derive population estimates from before, as they say, there was anything on which to base such estimates? Reading their descriptions and examining the figures suggests four overlapping approaches. In some cases, they state explicitly their reliance on one of these approaches, but more often, their method only reveals itself in the estimates. First, they start with the earliest official census and work backward. What Clark (Reference Clark1968, p. 61) calls “jobbing back” can yield good population estimates given the right raw materials and technique. The population of a country in 1500 equals its 1600 population minus deaths and net emigrants, plus births in the period 1500–1600. Wrigley and Schofield (Reference Wrigley and Schofield1981, chapter 7) offer an example of this approach. They start with the reliable census of England and Wales for 1841 and work back in time using estimates of births and deaths, along with more speculative estimates of net migration, to produce annual populations back to 1541. The challenge for earlier periods is that we rarely have anything like good counts of births and deaths, much less migrants, and the effort demands attention to complex sources. Creating the vital events series was the heart of Wrigley and Schofield’s project.Footnote 7
Austin (Reference Austin2008, p. 1002) stresses that momentous historical events such as the rise of the Atlantic slave trade greatly complicate such efforts. Few areas of the globe have been entirely spared these destabilizing episodes. Sometimes we even lack the equivalent of a reliable end-period enumeration, such as the 1841 census for England and Wales. Recent efforts to improve historical African population counts provide better-reasoned figures than MJ’s for that continent, but run into a source problem. The twentieth-century colonial censuses that form their end-period figure are themselves not terribly reliable. In addition, for Africa, we lack the sources that would allow us to estimate the earlier population increases needed for useful “jobbing back.”Footnote 8
A hint comes from the suspiciously round progression of population figures for single countries.Footnote 9 Table 1 shows the overall patterns; in many cases, MJ apparently devised a population estimate after deciding on a round figure for percentage growth. The many commonalities across countries are implausible. Individual country histories drive home the problem. In MJ’s reckoning, England’s population grew by 750,000 between 1600 and 1650, and by another 750,000 in the next half-century (McEvedy and Jones Reference McEvedy and Jones1978, p. 43). Austria added 250,000 people every 50 years between 1650 and 1800 (pp. 88–92). Thailand added 250,000 people in both the sixteenth and seventeenth centuries (p. 193). Burma’s population growth during the same period was 500,000 per century.
Notes: “Round” means the inter-period percentage change is evenly divisible by 50. I computed the percentage changes and then rounded them to the nearest integer value (e.g., 33.2 becomes 33 percent). This procedure has no effect until the last period.
The table shows that, for example, of the 79 countries for which MJ reported data in 1500 and 1600, 30 percent have an implied percentage change in population that is a round figure. The modal change for that period was 25 percent; MJ think 16 percent of countries had that modal growth rate. The second most common figure is 0 percent and the third is 50 percent.
Source: Computed from the MJ database.
Second, MJ apparently wanted their estimates to reflect their view that until the late medieval period, population grew at a constant rate. In disagreeing with an earlier author on the right total world population for the year 1000, MJ note that “our figure for AD 1, being 100m below the agreed figure for AD 1000, fits better on the sort of exponentially rising curve that everyone agrees best describes mankind’s population growth” (p. 354). As the quotation implies, MJ also worried about consistency between theirs and earlier estimates. Caldwell and Schindlmayr (Reference Caldwell and Schindlmayr2002, p. 199) call this “an example of a dangerous circularity,” while Biraben dismisses the MJ data after noting this fact.Footnote 10
Third, in the face of ignorance, MJ felt comfortable assigning identical growth rates to places they thought were similar. This practice doubtless underlies much of what we see in Table 1. For 35 percent of countries, MJ assign the same figure to population growth between the years 1 and 1000.
Finally, especially before 1500, MJ tended to reason from the nature of an economy and the population they thought it could support. They are rarely explicit about this tactic, but it shows through remarks such as “likely population densities” in the passage about Poland quoted earlier. To the extent that they estimate population in this way, MJ’s figures reflect not the population of a particular country at a point in time, but their views about the population density consistent with the kind of economy MJ thought the country had. Since they do not claim any serious knowledge of the economy or of the number of people it can support, the basis for this reasoning is unclear.Footnote 11
Maddison
Several of the articles discussed later rely in part on estimates reported by the late Angus Maddison. Maddison famously constructed, updated, and used a database that offered estimates of population and GDP/capita for most of the world’s countries, again, in some versions, going as far back as the year 1 ce.Footnote 12 For the last major revision of his estimates, Maddison says of his population data: “The following detailed estimates for 1500 onwards rely heavily on monographic country studies for the major countries. To fill holes in my dataset I draw on McEvedy and Jones (Reference McEvedy and Jones1978). For the preceding millennium and a half, I use their work extensively” (Maddison Reference Maddison2001, p. 230). Maddison adds that he relies on MJ rather than earlier accounts because MJ are “the most detailed and best documented” (p. 230).
Thus, for many places before 1500, Maddison’s database just reproduces MJ’s figures. This is not always the case, however; Maddison was able to incorporate the fruits of research published between 1978 and his own publication. This led to some substantive revisions, but those revisions reflect the research literature’s emphasis. He updated 23 percent of MJ’s observations for the year 1000, for example, and 40 percent of the observations for 1500. The majority of Maddison’s changes for the year 1000 were in non-European countries (eight of nine countries that changed were outside Europe). For 1500, this pattern changes; 10 of 16 changes are for European countries, and in 1700, 8 of 12 are for Europe. These changes reflect contributions from the research literature.
Some individual changes are much larger, however. Maddison added 50 percent to Mexico’s population for the year 1000, and he doubled Peru’s population in that same year (Maddison Reference Maddison2001, table B-5, p. 235). He increased the population of the territory that would become the United States by 125 percent for the year 1500. For later periods, especially in the twentieth century, Maddison revises the MJ estimates more comprehensively. In 1850, 84 percent of Maddison’s 51 observations have values different from MJ’s, although the average absolute difference (3 percent) is smaller than for earlier years.Footnote 13
MEASUREMENT ERROR AND ROUNDING
Relative to “perfect” data for every country in the world, how far wrong will MJ take us? It is worth reviewing some general consequences of measurement error for the kinds of linear models that most researchers use.Footnote 14 Denote the true population of country i in year t as Ṗit. The MJ estimate is Pit. The difference between MJ’s estimate and the true population is the measurement error ε it, such that Pit = Ṗit + ε it. Classical measurement error is the special cases where ε it is additive and uncorrelated with Ṗit. We have two general implications. First, classical measurement error in the dependent variable alone does not bias estimates. The ε it are swept into the regression error term, and the only consequence is some efficiency loss. Second, measurement error in any regressor implies bias in all of the estimates.
Consider the following regression:
where Pit is the mis-measured variable. While I write Equation (1) for a panel framework, that is not necessary for what follows. Classical measurement error in Pit implies that the estimate for γ will be smaller in absolute value than it would be if we could use Ṗit instead. The estimate is attenuated. The estimate for β will also be biased in ways we cannot ordinarily sign. The problem arises from the correlation between the measurement error ε it and the regression error term μit, which is why some researchers employ instrumental-variable techniques in using the MJ data as a regressor. The fixed-effects estimator does not necessarily yield unbiased estimates in the presence of even classical measurement error. Fixed effects only “deals with” measurement error if the errors in Pit are, for each country i, the same for all years. In that case, the measurement error becomes part of the estimated country fixed effects (Deaton Reference Deaton1997, pp. 108–110).Footnote 15
Classical measurement error in the dependent variable ordinarily does not bias regression estimates because the measurement error is added to the regression disturbance term. This result requires that the measurement error be additive: Pit = Ṗit + ε it. One common case of non-additive measurement error appears when the dependent variable is the ratio of two variables and the denominator is measured with error. Consider a common example: an urbanization figure is formed as the number of people living in cities divided by MJ’s population estimate. Rewriting Equation (1),
where Cit is the urban population. Using MJ’s population estimate implies that the denominator is the true population plus measurement error, Pit = Ṗit + ε it. Substituting and re-arranging, we have:
The ratio in the original dependent variable makes the measurement error multiplicative and causes bias in estimates of β. More generally, if measurement error is not classical, then we need to model the error. Hyslop and Imbens (Reference Hyslop and Imbens2001) discuss several cases, including one where measurement error in a regressor leads to overestimates of that coefficient instead of the attenuation we expect with classical measurement error.
What does this mean for econometric studies that used mis-measured population estimates? If we maintain the assumption that the measurement error is classical, we can say two things. When population is the dependent variable, the estimates may be less efficient, but there should be no bias due to measurement error alone. If population is a regressor, on the other hand, then the estimate for population will be attenuated. Additionally, the other estimates in this case will be biased and inconsistent. Thus, using population as a “control” can lead to bias even for variables not thought to suffer from measurement error. If the error is not classical, on the other hand, then we cannot say much without modeling the measurement error.
While economists tend to assume that measurement error is always classical, in this case, we know that this is not the case. MJ state that they have rounded their estimates in ways that make the measurement error depend on the true value. This rounding applies to every country and every period, but the rounding rule depends on the population size. This means the measurement error depends on population size:
All figures are rounded on the following system: below one million to the nearest .1 million, between one and 10 millions to the nearest .25 million, between 10 and 20 million to the nearest .5 million and between 20 and 100 millions to the nearest million. Above 100 million the rounding is to the nearest 5 million, above a billion… to the nearest 25 million. (McEvedy and Jones Reference McEvedy and Jones1978, p. 9)
Thus, MJ tell us that they create measurement error that is larger for larger populations. We cannot know precisely the implications of MJ’s rounding rules. We can, however, simulate the “true” populations to get a feel for how much trouble the rounding can cause. I use a Monte Carlo exercise to simulate the rounded-off portion of each population estimate. Adding that rounded portion to MJ’s reported numbers yields a simulated “true” population. We can then ask whether that simulated “true” population is correlated with the error caused by rounding. This exercise can only address the measurement error caused by rounding; the other flaws remain. Table 2 shows the result: the rounding induces a high degree of correlation between the measurement error and the population. This result holds for four different assumed functional forms for the rounding error, including two that are asymmetric in different ways. The correlation stems from MJ’s different rules for different size categories.
Notes: The upper panel reports the results of 1000 Monte Carlo draws that assume the stated distributional form for the population value MJ rounded-off. Each experiment computes the correlation between the simulated “true” value and the simulated error induced by rounding. The p-values are for the null hypothesis that the correlation is zero. Small p-values for the correlations indicate violation of the classical measurement-error assumption. See text and Online Appendix Section 2 for details on computation. The beta distribution assumes parameters 1 and .5. The beta distribution is asymmetric; “1- beta” places the thicker part of the density on the left-hand side instead of the right-hand side. Panel B reports the number of countries affected by MJ’s different rounding rules. No countries in MJ have populations larger than 1 billion in these years, so their last rounding category does not appear in the table.
Sources: MJ data and own calculations.
Online Appendix Section 2 reports details of this simulation along with two additional assessments of the importance of this rounding. The first uses the populations of the 50 United States for the period 1900–1970. The second uses the populations of countries around the world for the period 1960–2020. In both cases, I apply MJ’s rounding rules and examine the correlation between the true values and the errors created by rounding. Rounding for the U.S. states does not consistently imply correlation, while for the countries of the world, the correlation between the true value and the measurement error is considerable. The countries dataset is the closer analogy to MJ because the countries span the entire range of their rounding rules.
MJ’s rounding procedure creates a distinct problem when a country’s population crosses one of the thresholds implied by their rounding rule. Portugal, they report, had a population of 900,000 in 1400 and 1.25 million in 1500 (p. 103). These figures imply that Portugal’s population increased by 350,000 people, or 39 percent, in those 100 years. Taking the rounding into account, however, implies upper and lower bounds for the population estimate in both 1400 and 1500. The true increase could be as small as 19 percent or as great as 58 percent.
The non-classical nature of the measurement error in MJ poses a serious problem for any estimates that rely on it. We can evaluate earlier, published work under the assumption of classical measurement error, and that is not a bad place to start. But MJ’s rounding applies to every country and period in their data, which means that none of the standard intuitions based on classical measurement error really apply.
CIRCULARITY
Many economists who use MJ’s figures think of population (or a derivative such as population density) as a proxy for an economic aggregate such as output. Critics such as Caldwell and Schindlmayr (Reference Caldwell and Schindlmayr2002) and Austin (Reference Austin2008) note MJ often use ideas about the economy to derive an estimate of population size, thus making the population estimates a poor proxy for an economic aggregate. This is especially true in places and times for which the population data are thin. As noted, MJ defend an estimate for medieval Poland by referring to “likely population densities.” In a more explicit example, MJ discuss agricultural conditions in a region that comprises the modern states of Columbia, Venezuela, and the Guyanas to defend their assumption that until 1500, Colombia always accounted for 2/3 of the region’s population (McEvedy and Jones Reference McEvedy and Jones1978, p. 302).
Austin stresses that this approach makes their estimates hostage to ideas about an economy and economic change. It is a particular problem for Africa because we know relatively little about that continent’s economic history. Maddison (Reference Maddison2001, p. 238), for example, adopts MJ’s estimates for Africa in preference to earlier alternatives because MJ “assumed a more dynamic growth process.” That is, Maddison preferred MJ’s population estimates because he agreed with their assessment of the African economy. Neither Maddison nor MJ offer independent evidence about the African economy. To the extent MJ assigned population estimates based on their perceptions of economic performance, a regression using population as a proxy for growth tells us more about MJ than about economic growth.
SOFT CLONES
Researchers who use MJ’s data treat them as if they imply independent observations; put differently, if there are N countries listed for a given year, this reflects N pieces of information. This is not always true, for two distinct reasons that I will call “soft” and “hard” clones. MJ themselves create the soft clones. Frankly admitting that they lack meaningful data, they assign to some countries the population dynamics of countries they think are similar. Sometimes they make this approach explicit. After concluding that Afghanistan has no useful population data before the twentieth century, MJ say that “Perhaps the best approach is to compare Afghanistan with Iran” (McEvedy and Jones Reference McEvedy and Jones1978, p. 156). What they did, in fact, was to assume that Afghanistan had half the population of Iran in every year before 1900. The measurement error for Afghanistan thus has two sources. The Afghan numbers share any measurement error in the figures for Iran, and they also suffer from the error implied by any deviation of Afghanistan’s true population dynamics from Iran’s.
MJ includes many soft clones. Kenya and Uganda, for example, had identical populations through 1800, although the text does not say why. In some cases, they appeal to the idea that neighboring countries should have similar population growth rates: “… the fact that population doubled in most European countries between A.D. 1000 and 1300 can be taken as strong evidence for it doing so in other European countries for which direct evidence is lacking” (p. 11). Thus, in their reckoning, Poland, Hungary, and Czechoslovakia each grew 20 percent between 1000 and 1100. In the fifteenth century, European Russia and China each grew by one-third. As late as 1600–1700, Romania and Austria each grew by 11.11 percent. Soft clones probably underlie the patterns we see in Table 1.
HARD CLONES
A final problem reflects both MJ’s estimates and the way some economists have used them. MJ report many populations for regions rather than modern countries. Some economists create country-level populations out of the regions by allocating the regional population among the constituent modern nation-states. I will call the resulting countries “hard clones.” In the cross-section, these clones differ in size within the region. By construction, however, in the time series, all members of a clone group share the population growth rate MJ assigned to the region. Hard clones account for an especially large portion of the African country-level observations, but they appear in other parts of the world, as well. In Nunn and Qian (2011), hard clones account for 76 percent of the observations in Africa, 36 percent in Europe, and 41 percent in Asia. In Ashraf and Galor (Reference Ashraf and Galor2011), the clones are similar for these continents; three-quarters of their Western Hemisphere countries are clones.Footnote 16
The literature includes two different ways to create countries out of regions. Nunn (Reference Nunn2008, p. 170) assumes that the relative sizes of the populations within each region are the same as reported for 1950. Nunn and Qian (2011) do not say explicitly how they disaggregated the regions, but for most countries, their population figures are similar to Nunn’s, so the approach is probably similar.Footnote 17 Ashraf and Galor (Reference Ashraf and Galor2011, Reference Ashraf and Galor2013) disaggregate the regions by assuming that each country within a region has the same population density in each year. In general, the resulting “country” populations created by the two methods differ in the cross-section; Nunn and Qian’s Nigeria in 1500 is not the same size as Ashraf and Galor’s. Yet both cloning methods imply that Nigeria has the same growth rate between any two years. This growth rate is simply the rate implicit in the region from which Nigeria is cloned.
Hard cloning adds further error to MJ’s guesses; how much is something we cannot say precisely because we do not know the true populations of the clones in those years. We can, however, study the implications of these two methods in contexts where we have the equivalent of valid country-level numbers. For the years 1900–1970, I constructed a panel from the population of the 50 United States as reported in the decennial census. I then aggregate the state populations into four standard regions. The state populations (which we know) are analogous to the unknown country populations that hard cloning attempts to recover. The U.S. regions are like the regions in the MJ book. I apply both the Nunn-Qian and Ashraf-Galor methods to estimating the population of each state in the period 1900–1960, as if all I knew was the population of each state in 1970 (for Nunn-Qian) and the state area and regional population in each year 1900–1970 (Ashraf-Galor). Table 3 summarizes the errors these methods produce. In most years, the Nunn-Qian approach produces smaller errors than Ashraf-Galor, although those errors are still large. Only in 1960 did the median error from the Nunn-Qian approach fall below 5 percent of the actual state population. The Ashraf-Galor approach produces a smaller median error in the early years, but the variance of the errors using this method is large. The two types of error are not highly correlated in the cross-section.Footnote 18
Notes: All figures are the error as a percentage of the actual state population in that year. The error is defined as the actual population minus the population implied by the method in question. There are 50 states in each year (the tables consider territories that became states later as states). “NQ” (Nunn-Qian) assumes that the relative population sizes within a region in 1970 were true in all previous years. For this method, 1970 is accurate by construction. “AG” (Ashraf-Galor) assumes that every state within a region has the same population density each year. The calculations assume four regions: Northeast, Midwest, South, and West. See text and the Online Appendix for details.
Source: Computed from MJ database.
What does this exercise tell us about disaggregating MJ’s regions? The Nunn-Qian method assumes that population growth rates within each region are similar over time. The method goes wrong for regions with a state like California, which experienced especially rapid growth in the twentieth century. In 1900, California’s population accounted for 34 percent of the “West” region; in 1970, it was 57 percent. This is probably why the Nunn-Qian approach improves monotonically better over time; the 1970s weights better approximate the population distribution in 1960 than in 1900. It also illustrates the danger of their research, which starts with MJ’s estimates for 1000. Between the years 1000 and 1950, there was plenty of opportunity for the countries within a region to grow at different rates, producing a version of the problem noted for California.Footnote 19 Ashraf and Galor’s approach, on the other hand, requires that the population densities for countries within a region be identical. This approach fares poorly in the U.S. case because of the unequal population densities within some U.S. regions; the “Midwest” region, for example, includes states like Ohio (204 persons per square mile in 1900) as well as states like North Dakota (nine persons per square mile). The assumption is unlikely to hold at any point in time, and the discrepancy between assumption and reality could change over time with the introduction of new crops or other changes that lead to uneven economic development within the region. The U.S. experience may not provide a strict analogy to the regions these two methods attempt to disaggregate, but this exercise highlights how far wrong things can go if strong assumptions do not hold. The U.S. case also highlights the questions that we would need to ask before disaggregating data in this way. Do we really know enough about the sub-regional patterns in the Sahel in 1000, for example, to divide up a regional population?
The hard clones play an especially important role in Africa. MJ report only 12 regions for Africa. Nunn (Reference Nunn2008)’s Africa has 52 countries, while Nunn and Qian (2011)’s have 47.Footnote 20 Three-quarters of the African observations are thus clones. Given the difference in methods, we expect Nunn-Qian and Ashraf-Galor to assign different populations to the same country, but the differences can be huge. Online Appendix Figure 1 reports the distribution of the ratio of Ashraf-Galor’s clones to Nunn-Qian’s for the Old World in 1500. This figure illustrates the great range in values for a given place and time that result from cloning the MJ regions. In Africa, this ratio ranges from .192 (Malawi) to .61 (Nigeria) through South Africa (1.003) to Congo (2.208) and Côte d’Ivoire (2.887).Footnote 21
The disaggregation problems account for only one of two different sources of measurement error for hard clones. The first comes from MJ itself; MJ’s regional estimates are themselves noisy and rounded. Cloning assigns that noise to each of the country-level figures and adds additional error because we do not really know what the right allocations within a region should be. This additional disaggregation error is, by definition, negatively correlated for countries within a given MJ region and year. The cloned population estimates cannot be “correct.” The implications for change over time, however, are the same: every clone from a given region must grow at the same rate, the growth rate MJ assigned to the region. Breaking these regions up into observations does not create more information. It just creates clones.
Econometric Implications of Hard Clones
We cannot re-estimate earlier models using correct population data because that is obviously not available. The best check on the implications of cloning would be to dispense with the disaggregation and re-estimate the models using the units MJ reported. Since that check would require redefinition of all the other variables as well, it lies beyond the scope of this paper. We can, however, show that relying on clones significantly affects the results of published research. The following discussion only considers two cases and focuses on this issue alone. Table 4 considers the baseline results from Nunn and Qian (2011), which studies the old question of whether the potato’s introduction in the Old World caused population growth. The regressions use the population of all countries in the Old World at century intervals between 1000 and 1900, along with the years 1750 and 1850. The dependent variable is always population. Although they report and discuss other specifications, Nunn and Qian focus on models in which the regressor of interest is the interaction between an index of the fraction of a country’s land that is suitable for potato cultivation and a dummy for the years 1750 and later. They regard this interaction as a proxy for the effect of the potato’s actual introduction.Footnote 22 Every specification includes year and country fixed effects. Some models have no additional controls; we focus on models that include the “baseline” controls.
Notes: The dependent variable is population. The regressions have fixed effects for time period and country. The standard errors are clustered by country. The models in (2) and (6) drop all clones on the continent of Africa. All specifications include the “baseline” controls. See Nunn and Qian (2011, table IV). In Columns (1)–(4), the hard clones are computed using the Nunn-Qian approach; in (5)–(7), using the Ashraf-Galor approach. “AME” is the average marginal effect for the potato/post-1750 interaction.
Source: Computed from the replication files for Nunn and Qian (2011).
Table 4, Column (1), reproduces the result from Nunn and Qian (2011, table IV, column (1)). As they stress, the interaction implies that the potato’s introduction increased population sizes in the years 1750 and later. When we drop the African clones (Column (2)), however, the point estimate (and the average marginal effect) are no longer significantly different from zero. Dropping all clones (Column (3)) does not produce this effect; the problem appears to be the African clones. On the other hand, if we drop all of Africa (Column (4)), the point estimate and average marginal effect (AME) become even smaller. Given that about three-quarters of the African observations are clones, it is difficult to know whether Africa in general does not fit the story or if there is something particular to African clones.
The next three specifications repeat (2)–(4) using Nunn and Qian’s data but use the Ashraf-Galor definition of clones. The results differ somewhat, but the overall message is the same. In this model, it does not matter how we construct the clones; dropping Africa’s clones, all clones, or all of Africa has the same effect as with Nunn and Qian’s definition. Table 4 holds two lessons. First, the Nunn-Qian result depends critically on the inclusion of Africa or its clones. This may not hold for all of Nunn and Qian’s specifications, although Online Appendix Section 4 demonstrates the same problem in their fully flexible approach. Second, and more generally, in this example, it does not much matter how we construct the hard clones. This follows from including country-level fixed effects. Since identification comes from within-country change and since the clones, however constructed, all have the same growth rates as the regions from which they are disaggregated, in this type of model, the error created by cloning does not depend on how the clones are constructed.Footnote 23
Fixed-effects models cannot cure measurement error in general, as I stressed earlier. Even the countries that are not clones in the Nunn-Qian or Ashraf-Galor datasets have rounding error plus the measurement error inherent in MJ’s guesses. This replication exercise makes a narrow and specific point. I have shown that first, Nunn and Qian’s results depend on including African observations that are really clones, and second, there is no important difference between the additional measurement error created by two different ways of creating hard clones.Footnote 24 Surely those who will rely on MJ in the future should at least dispense with the clones and use as their units of analysis the regions that appear in the population data.
HOW ECONOMISTS USE MJ
To obtain a more specific idea of how economists use these data, I examined every paper that cites MJ that was published in one of the “Top 5” economics journals through 2020.Footnote 25 I set aside many of these papers for the rest of this discussion. This list includes a few articles that cite MJ but do not use the data in econometric exercises. Shiue and Keller (Reference Shiue and Keller2007, p. 1194), for example, cite MJ and other authorities as implying that their two regions, China and Europe, had similar populations at the end of the eighteenth century. Rogers (Reference Rogers1994, p. 467) cites MJ to defend the assumption that long-term population growth rates were nearly zero until relatively recently. This usage seems consistent with the spirit in which MJ offer their estimates. I also set aside papers that only use MJ’s estimates for 1900 and later. By that date, the information MJ reports comes almost entirely from reasonable census reports (although they round even these figures). This includes articles such as Acemoğlu, Johnson, and Robinson (Reference Acemoğlu, Johnson and Robinson2001).Footnote 26
A first question pertains to dates; the MJ data are more suspect in earlier periods. The year 1500 does not form a magical dividing line, but it is the earliest year for which we have anything like reliable estimates for populations of even most European countries, which tend to have the best-founded estimates. Several articles depend in a serious way on MJ’s population estimates from before 1500. Ashraf and Galor (Reference Ashraf and Galor2011) report econometric results that depend critically on population data from the years 1, 1000, and 1500. Population is the variable of interest in Nunn and Qian (2011), which starts with the year 1000. Nunn (Reference Nunn2008) uses the 1400 estimates alone.Footnote 27 Several other papers also rely on data from 1500–1800.
A second issue is whether MJ’s population figures form the dependent variable or a regressor. Many articles use population as the dependent variable, where it does least harm under the assumption of classical measurement error. These include Ashraf and Galor (Reference Ashraf and Galor2011, Reference Ashraf and Galor2013) and Nunn and Qian (2011).Footnote 28 In others, the MJ data scale the dependent variable. As noted previously, this means the measurement error in the dependent variable is not classical, and the estimates are biased in unpredictable ways. Acemoğlu, Johnson, and Robinson (Reference Acemoğlu, Johnson and Robinson2005)’s urbanization regressions are an example.
Some articles, however, create regressors from MJ’s estimates. This list includes Iyigun (Reference Iyigun2008) as well as Gennaioli and Voth (Reference Gennaioli and Voth2015). Iyigun (Reference Iyigun2008) studies whether military pressure from the Ottoman Empire helped reduce conflict among European states in the early-modern period. The econometric models rely on annual observations for the period 1450–1700. The dependent variables measure intra-European conflict. The controls include measures of Ottoman military pressure as well as the populations of Europe and, in some specifications, the Ottoman Empire’s. Iyigun (Reference Iyigun2008, p. 1476) describes the population data as a proxy for economic “size and strength.” The estimated effect for European population size is imprecisely estimated in most specifications, while the Ottoman population variable is more precisely estimated but switches signs, depending on the dependent variable. The point estimates for both population variables must be attenuated if this is classical measurement error, so we cannot really say whether Europe became more peaceful simply because of economic growth, nor can we assess the implications of Ottoman economic conditions for European conflict. Moreover, the estimates for his main variable of interest, the extent of Ottoman military incursions into Europe, may be biased because of the measurement error in population.
Gennaioli and Voth (Reference Gennaioli and Voth2015, table 3) address a related question, and their population figures cause similar trouble. They study the determinants of battle success in early-modern European conflicts. The authors set this up as a horse race between fiscal strength on the one hand and population size on the other. Greater fiscal strength allows a state to pay more mercenaries and support more allies. Population size could matter in early-modern war because larger populations make it easier to field larger armies. In most specifications, the fiscal variable has a positive and significant effect on battlefield success, while the relative populations of the two combatants have almost none. They conclude that “Differences in population size do not have a systematic effect on the chance of battlefield success” (Gennaioli and Voth Reference Gennaioli and Voth2015, p. 1430). This result could reflect nothing more than the measurement error in MJ’s estimates.
A third issue pertains to how the authors confront the possibility of measurement error in the population data. Acemoğlu, Johnson, and Robinson (Reference Acemoğlu, Johnson and Robinson2002) and Acemoğlu et al. (2008) explicitly discuss measurement error and use IV methods to contend with measurement error in regressors. Others take a different approach. Ashraf and Galor (Reference Ashraf and Galor2011, p. 2011) claim:
The most comprehensive worldwide cross-country historical estimates of population and income per capita since the year 1 CE have been assembled by Colin McEvedy and Richard Jones (1978) and Angus Maddison (2003), respectively. Indeed, despite inherent problems of measurement associated with historical data, these sources remain unparalleled in providing comparable estimates across countries in the last 2,000 years and have, therefore, widely been regarded as standard sources for such data in the long-run growth literature.
They do not argue that MJ’s data meet any particular standard. Rather, they know of nothing better (it is “unparalleled”) and everyone else uses it (it is the “standard source in the long-run growth literature”).
Nunn and Qian (2011, p. 616) address measurement error more explicitly, but their discussion consists of general statements that are not relevant to the MJ data:
Accuracy is an obvious concern for historical data that span such a long time horizon and broad cross-section. However, classical measurement error in our outcome variables will not bias our regression estimates. Similarly, any systematic measurement error that varies by time-period or by country is captured by the country and year fixed effects, which are included in all specifications.
Population is their dependent variable, so they are correct that if the measurement error is classical, it does not bias their results. They provide no reason to think this is true, and, as noted, MJ say it is not true. The second statement about fixed effects is equally true but irrelevant to the case. Neither of these extreme assumptions is likely. Nor can they be true simultaneously.
CONCLUSIONS
We know the population of the United States in 2020 to a high degree of accuracy. For historical episodes, we might not always do as well, but we can definitely do better than MJ. Palma, Reis, and Zhang (2020), for example, provide improved estimates for Portugal (1527–1850) by combining two sets of historical estimates with some judicious reasoning. Similar approaches may be possible for other times and places. Federico and Tena-Junguito (Reference Federico and Tena-Junguito2022) provide considerable improvement over MJ for the period since 1800 by making better use of published data. Earlier periods may require different approaches. Refining the estimates for Poland in 1400, for example, may not just require consulting more published works, but also original research using, for example, essentially archeological techniques. But it can be done. It would not be useful to assert that because we cannot know the population of Poland in 1400 with the same accuracy as we can in 2020, there is no point in using historical population counts. The opposite extreme is more common and pernicious: many economists take the view that the accuracy of historical data does not matter because it cannot be as precise as modern reports.
We can do, and have done, better. Consider one example. MJ’s estimates imply that the population of England and Wales grew at an average annual rate of .32 percent in the period 1600–1650. Wrigley and Schofield’s figures put that rate at .5 percent. For the period 1650–1700, the estimated growth rates are .26 in MJ and –0.07 in Wrigley and Schofield; for 1700–1750, they are .10 and .26. The differences are substantial. MJ missed the population stagnation of the second half of the seventeenth century and significantly understated the population growth of the first half of the eighteenth century. Population figures directly underlie, any statement about per-capita GDP or its growth rate and are thus central to understanding the Industrial Revolution. We can do much better than MJ’s guesses for many countries, especially in the period since 1500 (e.g., Vos Reference Vos2014, pp. 366–69).
Econometric estimates that rely on MJ form a particular literature that takes a “cross-country regression” approach to economic growth, political economy, and related questions. Economists differ on the usefulness of the general research strategy, and those who favor such studies may insist that some data are better than none. Even those who take this view, however, should be aware of the pitfalls of the source and the way some use it. As I have noted, Acemoğlu and his co-authors tend to use MJ as carefully as one can. Others have compounded MJ’s weaknesses by trying to create information that is not in the source. Some of the research discussed here appeals to the idea that classical measurement error does not cause bias in linear models when the measurement error affects only the dependent variable. This observation is mathematically true but not relevant to the MJ data.
This paper documents a series of problems in a published source that underpins many articles published in the leading general-interest economics journals. Publication in these outlets has strong professional rewards and conveys signals. One signal is that if everyone does something inappropriate, then it is fine. A second signal discourages the original work necessary to improve the basis of our knowledge. The researchers who did the groundwork on which MJ is based understood themselves as contributing to a broader literature in the social sciences. Their contributions were rewarded within their own niches. The same applies to all of the effort that went into constructing the considerable information on historical economies that Maddison summarizes. To the extent the profession signals a lack of interest in such work, it is unlikely we will ever learn more about, for example, the population of Poland in 1400.
This discussion holds a simpler lesson. Many economists today download a dataset and merge it into other datasets without consulting the original sources. Examining MJ’s book is instructive. The introduction explains the problem of non-classical measurement error. A look at the graphs (such as my Figure 1) would lead most to treat the data with considerable caution. Anyone looking at MJ’s maps for Africa should wonder why their Africa has so many countries in the data.