1. Introduction
Reconstructing the climates of Earth’s deep past is epistemically challenging, yet of the utmost importance in a contemporary scientific context where climate scientists are increasingly turning to paleoclimatology to help understand the Earth’s climate and apply that understanding to our current climate crisis. Footnote 1 Paleoclimatologists rely on “paleoclimate proxies” to measure past climatic variables such as temperature. Paleoclimate proxies include ice cores, tree rings, sediment cores, coral growth rings, fossilized pollen, fossilized leaves, and more. Their use involves measuring some attribute of modern-day traces of the past and reconstructing the most likely environmental conditions under which that trace was formed. For example, the elemental composition of the calcareous shells of marine microorganisms depends on the temperatures at which the relevant chemical reactions involved occurred. Knowledge of this relationship allows paleoclimatologists to reconstruct past temperatures, within some degree of uncertainty.
In this paper, I provide an analysis of some of the key data and measurement practices scientists deploy in relation to paleoclimate proxies. I argue that the development and use of paleoclimate proxies to reconstruct past climates demonstrate some of the benefits of disunity or lack of standardization in data and measurement practices, especially practices which indicate that paleoclimatologists are not looking to produce one, definitive record of Earth’s past climate but are satisfied with several (possibly conflicting) records. The claim that disunified data and measurement practices are beneficial contrasts with the intuitive view that data are necessarily more useful if they are reusable or interoperable, or suggestions that measurement procedures need to be standardized to be successful. Importantly, I do not claim that paleoclimatologists are intentionally deploying disunified practices as a strategy, nor that they are even aware of the benefits of those practices, nor that the benefits of these practices necessarily outweigh the risks of disunity (which I think remains to be seen in this case). The claim made in this paper is just that the practices paleoclimatologists currently utilize do, in fact, produce some surprising benefits, specifically related to error and uncertainty management.
My argument builds on recent work in the philosophy of the Earth sciences and medical sciences which has demonstrated that similar intuitions concerning the benefits of coherence and coordination, including in measurement and data practices, do not always hold. In particular, Teru Miyake (Reference Miyake2011, Reference Miyake2017a,Reference Miyakeb) has analyzed various scientific debates in the history of the geosciences, especially geophysics and seismology, and has shown how conflicting models or theories about the unobservable processes beneath the Earth’s surface ultimately helped these sciences to progress. Alisa Bokulich (Reference Bokulich2020) has also argued in the context of measurements of geologic time that disagreement between different radiometric dating methods has been an important source of understanding about sources of error in these different measurements. Miguel Ohnesorge (Reference Ohnesorge2021, Reference Ohnesorge2022) has argued that disagreeing measurements of the ellipticity of the Earth by nineteenth-century geodesists can nonetheless be seen as epistemically successful. Likewise, I demonstrate in the context of paleoclimate proxies that there are reasons that count in favor of data and measurement practices scientists implement that do not produce a single, unified record of Earth’s past climates—even if the scientists themselves are unaware of these reasons or are not using these reasons to guide their practice. Finally, in the history and philosophy of medicine, Rebecca Jackson (Reference Jackson2021) has argued that in the case of “drop” measurements in anesthesiology, reasons in favor of non-standardized measurement units (e.g., patient experience and safety) outweighed reasons in favor of standardized units (e.g., consistency and ease of communication). Although, unlike Jackson, I will not claim to weigh the reasons in favor of or against standardization—I am dealing with a live, contemporary case rather than a historical one, so I don’t have the benefits of hindsight that Jackson has—I will, like her, call attention to some surprising benefits of disunity and lack of standardization.
To do so, I focus on two paleoclimatological practices through which proxy measurements or data are kept disunified. The first relates to the calibration of paleoclimate proxy measurements (section 2). Here I show that paleoclimatologists tend not to intercalibrate their proxy measurements, nor otherwise statistically combine them, and I discuss several reasons why doing so might be beneficial, including preserving the independence of these multiple lines of evidence and preventing the production of unquantifiable or unidentifiable sources of error and uncertainty.
Second, I address data infrastructure, including especially how paleoclimate proxy data are stored and norms or requirements surrounding metadata (section 3). Both data storage and metadata practices are remarkably non-standardized, with authority over these practices spread very thinly across multiple individuals and organizations. Rather than see these consequences of the relevant data infrastructure decisions as a failure, I instead argue that a structure which subsequently and at least temporarily minimizes data travel and reuse actually has the important benefit of preventing inadvertent compounding of sources of error and uncertainty.
In conclusion, I reiterate that disunity in both practices is helpful in mitigating or managing error and uncertainty (section 4). In a non-ideal epistemic situation such as that found in nearly any study of the deep past, dealing with error and uncertainty appropriately is paramount. Furthermore, in a science that potentially has massive implications for action—as, in this historical moment, all of the Earth and environmental sciences do—scientists are prone to be cautions and take seriously considerations of inductive risk (Oreskes Reference Oreskes2015). Other scientific areas that are similarly non-ideal and risky may also benefit from managing error and uncertainty by preserving disunity, rather than assuming that standardization, coherence, and statistical integration is necessarily a superior epistemic or pragmatic strategy.
2. Disunity in proxy calibration
In this section, I argue that there are benefits to disunified measurement practices in paleoclimatology, namely, the decision not to intercalibrate different proxies with one another. First, I use existing philosophical literature on calibration to explain how paleoclimate proxies are calibrated to the instrumental record. I then suggest some potentially beneficial consequences of not intercalibrating or otherwise statistically combining proxy records, having to do with error and uncertainty management and maintaining multiple, independent lines of evidence.
In order to use paleoclimate proxies to reconstruct past climates, these measurements need to be appropriately calibrated; in other words, a model of the measurement process needs to be developed and refined. In general, calibration of any measurement involves settling on a way to convert a measurement indication into a measurement outcome. A measurement indication is the reading of a measurement instrument after a measurement procedure has been performed, whereas a measurement outcome is a value actually attributed to the relevant measurand. Often, measurement indications and measurement outcomes are not the same; some conversion process may be required to translate from one to the other. Footnote 2 Sometimes, this conversion process is embedded within a measurement instrument itself. In any case, determining how to convert between a measurement indication and a measurement outcome is referred to as calibration. Calibration of paleoclimate proxies involves connecting the attributes of present-day traces of the past with the relevant characteristics of the past climate under which those traces were formed.
In Hasok Chang’s (Reference Chang2004) influential discussion of the “problem of nomic measurement”—how do we know that our measurement tools are adequately representing the quantities being measured when those measurement tools are the only way of accessing the very same quantities?—Chang introduces the idea of “metrological extension.” Metrological extension involves applying a measurement technique beyond the context in which it was developed (in his case, to higher or lower temperatures). Metrological extension gives us one way of understanding proxy calibration; in the case of paleoclimatology, the measurements are being extended to further time periods. In other words, paleoclimatologists have a way to ground their measurements of temperatures of the past that was not available to early thermometrists attempting to develop temperature measurements of the present for the first time: proxy-based measures of temperature can be compared to instrumental (thermometer-based) temperature records as a means of initial calibration. For example, attributes of tree rings or organismal remains or layers of ice can be initially correlated with thermometer-based temperature measurements in a laboratory or field work setting. Once proxy-based temperature measurements are adequately calibrated to instrumental readings, the proxy-based measurements can be extended further back in time than the instrumental readings go.
As proxies are used to extend measures of the climate back in time, additional sources of error and uncertainty are introduced and need to be accounted for in order to isolate the climatic signal. For example, changes in abundance of different types of pollen or the chemical composition of microorganisms in sediment layers might be signals of changes in climate, but, over long enough time scales, might instead be signals of something else, such as changes in range or adaptation (both of which can be indicative of climatic change but also of other ecological changes). Extrapolating climate measurements back in time requires an understanding of the ways in which many processes may affect the proxy-based signal.
The practice of calibrating paleoclimate proxies to the instrumental record accords with Eran Tal’s (Reference Tal2017) influential account of calibration. The instrumental measurements (which might come from laboratory settings, the historical record, or in situ measurements) are taken as “fixed,” and serve epistemically as a sort of measurement standard to which the other, proxy-based measurements must be compared. Then, a proxy-based record that temporally coincides with the instrumental record is compared to the instrumental record, in order to develop a correlation between the proxy-based measurement and the instrumental measurement. This correlation is used to develop a calibration function that takes as inputs features of the proxy (e.g., stable isotope ratios in the sediment or ice core layer) and outputs climatic features (e.g., temperature).
Other than these explicit comparisons between proxy-based measurements and instrumental measurements, paleoclimatologists also use what are called “proxy system models” (PSMs) in order to constrain the relationship between the proxy-based measurement and a measurement of the climate (e.g., Dolman and Laepple Reference Dolman and Laepple2018; Lawman et al. Reference Lawman, Partin, Dee, Casadio, Di Nezio and Quinn2020). PSMs are also known as “forward models”: they take as inputs features of the object to be measured, i.e., the climate, and output the expected measurement indication, i.e., features of the proxy system, that would obtain in those circumstances. In the context of paleoclimatology, PSMs are simulations of how the trace left of the past climate will be affected over time before it is collected and analyzed by us. These simulations are extremely important for helping paleoclimatologists understand and vicariously control for conflating factors that might influence the proxy signal. Footnote 3
PSMs are a crucial part of proxy calibration. Recall that I’ve argued that proxy-based measurements of the past climate should be seen as a case of metrological extension of climate measurements into new contexts—further back in time. Extending climate measurements further back in time introduces new sources of error and uncertainty. PSMs allow researchers to simulate the effect of these processes, and consequently to account for them in the calibration of the proxy measurement. Of course, our understanding of these processes and how they have changed over time is itself imperfect and subject to revision, but the important thing is that the use of a PSM allows for development of a calibration function that does not mindlessly extrapolate back in time a calibration function developed on present-day correlations between measurement indications and measurement outcomes. Developing a calibration function that is not time invariant is more felicitous for the research context.
Let’s take stock. So far, I have argued that the development of paleoclimate proxies is best analyzed as a case of metrological extension of climate measurements into new temporal contexts. Successful metrological extension in this case requires calibrating paleoclimate proxies to instrumental data and extrapolating the relevant calibration function back in time. However, extrapolation back in time is non-trivial. Metrological extension into new temporal domains thus highlights how different sources of error and uncertainty in proxy-based measurement contexts might change over time, since the relationships between the measurement indication and the measurement outcome are not time invariant. Paleoclimatologists have thus come to rely on PSMs, a type of forward model that allows for confounding factors to be simulated explicitly and for the ultimate proxy calibration to take into account processes other than climatic ones which may change the proxy-based signal over time. The result is a growing list of paleoclimate proxy measurement procedures, each calibrated to varying levels of accuracy and precision and all used for different periods in Earth’s history.
The thesis of this paper, though, is that disunity or lack of standardization within various paleoclimatological data and measurement practices has some beneficial consequences (whether or not paleoclimatologists themselves are aware of these benefits, and whether or not these beneficial effects are outweighed by detrimental ones). I suggest that the benefits of disunity are evidenced by a calibration strategy that paleoclimatologists do not deploy: intercalibration between proxies.
By and large, different proxies are not intercalibrated. For example, in principle one could constrain tree ring–based climate reconstructions using sediment-based climate reconstructions, or vice versa (one could do this for any pair of paleoclimate proxies that overlap in temporal and geographic range). This strategy has the potential to yield a more precise calibration function for any given proxy; the more times the proxy is intercalibrated with other proxies, the more tightly constrained its calibration function becomes, as it would need to cohere with all of the other proxy records. However, paleoclimatologists generally do not use intercalibration to constrain their proxy calibration functions; proxies are calibrated to the instrumental record, and that’s it. I suggest that there are two possible benefits of this practice (again, benefits of which paleoclimatologists may not be aware).
First, intercalibrating paleoclimate proxy measurements would cause them to become dependent on one another, in the sense that a change to one measurement process would result in a corresponding change to the other measurement process. If two measurement procedures (or other sources of evidence) are dependent on one another, it is not surprising if and when the evidence they produce agrees in its support of a claim. For example, two dependent paleoclimate proxies would be expected to agree in support of any given claim about climate trends in particular times and places. Philosophers of the historical sciences have often argued that maintaining independent lines of evidence is a useful strategy for historical scientists, exactly because agreement between multiple, independent lines of evidence is prima facie surprising (e.g., Wylie Reference Wylie1989, Reference Wylie2002, Reference Wylie, Dawid, Twining and Vasilaki2011; Cleland Reference Cleland2011, Reference Cleland and Baker2013; Forber and Griffith Reference Forber and Griffith2011; Currie Reference Currie2018; Bokulich Reference Bokulich2020). Footnote 4 Likewise, Martin Vezér (Reference Vezér2015, Reference Vezér2017) says it is unlikely that multiple different proxy-based records of past climates would agree about claims if those claims were false; so, he argues, when multiple proxy-based climate reconstructions agree on a particular claim, that claim is more likely to be true than if the claim were only supported by one proxy-based reconstruction. Footnote 5
These kinds of arguments that use multiple, independent lines of evidence to support a claim are called consilience arguments, and philosophers of the historical sciences broadly agree that lines of evidence need to be independent (in the right sort of way, whatever that is) to be used in such arguments. More contentious is the idea that multiple, independent lines of evidence can also be used to support robustness arguments. Rather than using multiple lines of evidence to support a claim, robustness arguments involve using multiple lines of evidence to demonstrate whether a claim is (in)sensitive to various background assumptions. According to Vezér (Reference Vezér2015), convergence of multiple paleoclimate proxies shows that the claims different proxies make about the past are insensitive to the details of a particular proxy-based measurement process; this is a robustness argument. However, two philosophical debates about robustness arguments complicate analysis of the paleoclimate case. First, there is debate about whether using multiple, independent lines of evidence really does result in the sensitivity test that robustness arguments purport to provide (e.g., Levins Reference Levins1966; Orzack and Sober Reference Orzack and Sober1993; Staley Reference Staley2004). Footnote 6 Second, there is debate about whether independence of multiple lines of evidence really is necessary to make robustness arguments—for discussion and argument that multiple lines of evidence don’t need to be independent in a strict, formal sense, see Schupbach (Reference Schupbach2018).
Regrettably, this leaves us with limited or unsatisfactory answers to several philosophical questions about independent lines of evidence. These include: What kind of independence, exactly, is required to use multiple lines of evidence in consilience or robustness arguments? Do consilience and robustness arguments really require independent lines of evidence, at all? And are consilience or robustness arguments even possible, perhaps given some of the difficulties answering the other two questions? Answering all of these is outside of the scope of this paper. However, I am able to say that if consilience and/or robustness arguments are possible and require a certain kind of independence (e.g., the kind that would make it unlikely for different lines of evidence to agree if what they agreed upon wasn’t true), then the current practice of not intercalibrating paleoclimate proxies enables consilience and/or robustness arguments, because not intercalibrating preserves the different measurements’ independence. This conditional at least indicates a possible benefit to not intercalibrating different paleoclimate proxies.
A second benefit of not intercalibrating multiple paleoclimate proxies, I argue, is that doing so allows paleoclimatologists to mitigate the inadvertent compounding of various sources of error and uncertainty. As described above, proxy-based reconstructions of past climates have to contend with many sources of error and uncertainty, a problem which is exacerbated as these records get extended further back in time. The main way of estimating the extent of the error is to use PSMs or other model-based strategies. However, when these techniques aren’t available, or when they are underdeveloped, paleoclimatologists have few means of estimating the magnitude of error and uncertainty, and, especially, of teasing out different sources of error and uncertainty. Footnote 7 More philosophical work needs to be done on error and uncertainty in proxy-based measurement contexts. For our purposes, suffice it to say that the difficulties paleoclimatologists have in quantifying their error and uncertainty for each proxy would be compounded if these proxies were intercalibrated. I therefore think that, by refusing to intercalibrate, paleoclimatologists are able to prevent further aggravation of the effects of the uncertainties inherent in their measurement processes.
One legitimate counterexample to this trend of not intercalibrating—and an example that gives a little more insight into paleoclimatologists’ explicit understanding of the risks of intercalibration—is found in van Dam and Utescher (Reference van Dam and Utescher2016). Footnote 8 They compare plant- and mammal-based reconstructions of precipitation in present-day Europe during the Neogene (about 23–20 million years ago). As the authors note, proxies for precipitation have been some of the least successful, especially when compared with temperature proxies. They carefully selected fossil sites and samples in order to perform their comparison of the two proxies for paleoprecipitation, then suggesting the possibility of intercalibrating the two in order to more tightly constrain the calibration curve for each. Importantly, intercalibration is only suggested because there are no clear modern analogue species for some of the mammals used, making calibration with the instrumental record especially difficult. This counterexample indicates not only that the practices of paleoclimatologists are varied and difficult to make sweeping generalizations about, but also the conditions which have to be met before paleoclimatologists consider intercalibration a plausible way to go. In this case, intercalibration is presented as a last resort; constraining the paleoprecipitation record by other means has proven unsuccessful, so paleoclimatologists are willing to give intercalibration a try. In other words, this study is evidence for paleoclimatologists’ hesitancy about intercalibrating multiple paleoclimate proxies.
In addition to not intercalibrating paleoclimate proxies, paleoclimatologists also tend not to combine reconstructions based on multiple proxies. There has long been recognition of the benefits of so-called “multiproxy” studies. These benefits include the fact that multiple proxies in combination can cover a broader geographic range. For example, Michael Mann (Reference Mann2002) suggests that tree rings, corals, and ice cores are best used in combination, in part because these different proxies cover different geographic regions (terrestrial temperate zone, marine tropics, and polar regions, respectively). In addition to serving as complementary records in a geographic sense, different proxies may have different strengths and weaknesses, which can balance each other out.
However, despite this acknowledgment that multiproxy methods are superior to any method which relies too heavily on a single proxy, there is no consensus on how to best combine information from different proxy records. One obvious approach would be to average multiple reconstructions of the same climatic variable, e.g., temperature. However, doing so risks illicitly combining substantially different measurands as though they were one; for example, polar and tropical temperatures cannot be very meaningfully averaged. Relatedly, averages can be easily skewed by uneven sampling, a problem which plagues paleoclimatology as well as historical sciences and climate science more broadly (e.g., Raja et al. Reference Raja, Dunne, Matiwane, Khan, Nätscher, Ghilardi and Chattopadhyay2022; Brönnimann and Wintzer Reference Brönnimann and Wintzer2019). As a result of these well-founded hesitations to average different paleoclimate reconstructions together, “the interpretations of multi-proxy datasets [often] rely on visually matching several proxy records” (Schroeter et al. Reference Schroeter, Toney, Lauterbach, Kalanke, Schwarz, Schouten and Gleixner2020, 1). In other words, multiple proxy-based reconstructions are simply placed on one graph of, for instance, temperature over time, and agreement or disagreement between these proxies is just indicated by visual agreement between the resulting lines.
Take, for example, the 60 million year global surface temperature reconstruction reproduced in the second chapter of the most recent IPCC report (Gulev et al. Reference Gulev, Thorne, Ahn, Dentener, Domingues, Gerland, Gong, Kaufman, Nnamchi, Quaas, Rivera, Sathyendranath, Smith, Trewin, von Schuckmann, Vose, Masson-Delmotte, Zhai, Pirani, Connors, Péan, Berger, Caud, Chen, Goldfarb, Gomis, Huang, Leitzell, Lonnoy, Matthews, Maycock, Waterfield, Yelekçi, Yu and Zhou2021). Specifically, for the period ranging from 60 million to 1 million years ago, the IPCC used two proxy-based temperature reconstructions, one from Hansen et al. (Reference Hansen, Sato, Russell and Kharecha2013) and the other from Westerhold et al. (Reference Westerhold, Marwan, Drury, Liebrand, Agnini, Anagnostou, Barnet, Bohaty, Vleeschouwer, Florindo, Frederichs, Hodell, Holbourn, Kroon, Lauretano, Littler, Lourens, Lyle, Pälike, Röhl, Tian, Wilkens, Wilson and Zachos2020); the data in Hansen et al. (Reference Hansen, Sato, Russell and Kharecha2013) is likewise based on data presented in Zachos et al. (Reference Zachos, Dickens and Zeebe2008). Zachos et al. (Reference Zachos, Dickens and Zeebe2008) and Westerhold et al. (Reference Westerhold, Marwan, Drury, Liebrand, Agnini, Anagnostou, Barnet, Bohaty, Vleeschouwer, Florindo, Frederichs, Hodell, Holbourn, Kroon, Lauretano, Littler, Lourens, Lyle, Pälike, Röhl, Tian, Wilkens, Wilson and Zachos2020) both purport to give a global picture of temperature, using data from several ocean sediment cores collected during successor drilling programs: the Deep Sea Drilling Project (1968–1983), the Ocean Drilling Program (1985–2004), and the Integrated Ocean Drilling Program (2004–present). These studies both use benthic foraminifera tests (shells), calculating a proxy called ${\delta ^{18}}$ O (which looks at the ratio of heavy to light oxygen isotopes) for these and converting that to temperature (correcting for different foram genera). In Hansen et al. (Reference Hansen, Sato, Russell and Kharecha2013) and Westerhold et al. (Reference Westerhold, Marwan, Drury, Liebrand, Agnini, Anagnostou, Barnet, Bohaty, Vleeschouwer, Florindo, Frederichs, Hodell, Holbourn, Kroon, Lauretano, Littler, Lourens, Lyle, Pälike, Röhl, Tian, Wilkens, Wilson and Zachos2020), these “raw” data were then smoothed over different temporal resolutions. Both Hansen et al. (Reference Hansen, Sato, Russell and Kharecha2013) and Westerhold et al. (Reference Westerhold, Marwan, Drury, Liebrand, Agnini, Anagnostou, Barnet, Bohaty, Vleeschouwer, Florindo, Frederichs, Hodell, Holbourn, Kroon, Lauretano, Littler, Lourens, Lyle, Pälike, Röhl, Tian, Wilkens, Wilson and Zachos2020) have performed statistical processes which integrate different data sources in order to reconstruct temperatures since the Paleocene. However, more interesting for our purposes is the fact that then, when the IPCC decided to use these two studies, they decided to portray them as two different reconstructions of the past. The two temperature profiles are plotted on the same graph, in different colors, and the viewer can see that they broadly agree on Earth’s climate history (see figure 1). Yet, even though these two studies used cores obtained by the same drilling programs and used the same proxy for temperature ( ${\delta ^{18}}$ O), the two records were kept separate. Footnote 9
Why? I speculate that the benefits of keeping multiple proxy records visually separate are analogous to the benefits of not intercalibrating different proxies offered above. First, keeping separate proxy-based reconstructions separate preserves, at least to some extent, their independence. Independent lines of evidence might be important for consilience and robustness arguments, i.e., arguments which are intended to enhance our confidence in or the security of particular hypotheses. Visually distinguishing multiple proxy-based reconstructions makes it clearer to the viewer that they are intended to play these roles. Second, refusal to statistically combine different reconstructions—or, due caution about when to do so—helps paleoclimatologists to manage error and uncertainty, by preventing them from unintentionally compounding these by combining proxies with different sources and magnitudes of error and uncertainty.
In summary, I have shown that paleoclimatologists at least have a tendency to both (a) not intercalibrate multiple proxies with one another and also (b) not otherwise statistically combine multiple proxy-based records, such as by averaging these reconstructions together. I do not wish to suggest that these practices are never pursued, nor that they never should be. But, I have attempted to explain that there are benefits of keeping different proxies and the reconstructions based on them separate, rather than integrating them with one another—even if the paleoclimatologists themselves are not aware of these benefits, and even if, all things considered, these benefits are outweighed by various risks. Keeping the separate proxies disunified can help to cope with various sources of error and uncertainty, an important task for a measurement technique that is so nascent. Additionally, preserving the independence of these multiple lines of evidence may be crucial if paleoclimatologists want to use them in consilience or robustness arguments. Preserving the disunity of proxies in these cases can, then, have important benefits.
3. Disunity in data infrastructure
Using paleoclimate proxies requires an elaborate data infrastructure, a term I use to encompass practices pertaining to data collection, storage, and travel. I argue that there are important components of the paleoclimate proxy data infrastructure that are disunified, in ways that have the surprising potential to benefit paleoclimatological practice. I focus on how data storage practices are disunified, especially how the use of metadata is not standardized, even within particular archives or databases. I argue that this lack of standardization (predictably) prevents data reuse and repurposing, but that, perhaps counterintuitively, this is actually a feature and not a bug. I will focus on the Paleoclimatology Database, operated by the National Oceanic and Atmospheric Administration (NOAA) through the National Centers for Environmental Information (NCEI).
The Paleoclimatology Database involves proxy data that can be filtered based on proxy type (e.g., ice core, pollen, coral, tree ring), investigator (i.e., who published the dataset), location, time period, and more. Individual datasets matching the filtering criteria can then be downloaded and analyzed. The Paleoclimatology Database has been immensely useful for proxy-based paleoclimate research in recent years. For example, one relatively high-profile project has been the Past Global Changes (PAGES) 2k Consortium, a collaborative effort to compile multiple proxy records from the Common Era. The PAGES2k Consortium (2017) has utilized the NCEI database to make their records usable by other researchers.
In order to contribute to the Paleoclimatology Database, contributors must follow a template (they then submit the template-compliant dataset via email, after which it is reviewed and posted online). Footnote 10 The requirements provided by the template are as follows. First, researchers must use terminology defined explicitly in the Paleoenvironmental Standard Terms (PaST) Thesaurus. Footnote 11 The PaST Thesaurus standardizes the terms used for different variables, such as “material” (the material on which the measurements are made) or “data type” (the category in which the dataset will be housed; this variable has only a set list of possible options, e.g., Borehole, Ice Cores, Pollen). Second, there is a required set of metadata that must be provided with the dataset. These include: details about the publication (if any) associated with the dataset, the date of the contribution, funding information, the latitude and longitude coordinates of the study location, the dates over which the analysis was performed, the length of the core (if relevant), the species involved in biogenic proxy measurements (if relevant), a section on “chronology” (i.e., the dates obtained from the material studied, and, perhaps, information about how those dates were obtained—although this is not required), and labels for the columns of the dataset itself (using the PaST Thesaurus).
Ostensibly, the fact that the Paleoclimatology Database requires use of a template with metadata requirements at all may be seen to indicate paleoclimatologists’ attempt at standardization of digital data storage. However, there are some key pieces of metadata that are not required by the Paleoclimatology Database template, and these indicate that paleoclimatologists may be reluctant to require that their data be interoperable and reusable in particular ways. Most importantly, the database template does not require metadata about which calibration function was used and where that calibration function came from. To be fair, some datasets do include this information; for example, data that convert the ratio between magnesium and calcium (Mg/Ca) in samples of Globigerinoides ruber (a species of planktic foraminifera) to sea surface temperature (SST) reconstructions almost always are explicit that they used the calibration curve established in Anand et al. (Reference Anand, Elderfield and Conte2003) to perform the calibration. However, this is not true of many proxy-based reconstructions, especially those for which there is less of a consensus about which calibration function to use. For example, the Paleoclimatology Database has 17 datasets that use benthic foraminifera to produce climate reconstruction data (usually, bottom water temperature [BWT]) based on Mg/Ca. Of these 17, only 6 datasets listed in their metadata the calibration function they used to convert Mg/Ca into temperature (some of these list the equation itself, others just refer the user to another study to find the calibration curve). Footnote 12 Likewise, of 23 datasets that use the ratio of strontium to calcium (Sr/Ca) in corals as a proxy for SST, only eight include calibration-related metadata. Footnote 13 Examples of metadata with and without calibration information are reproduced in figure 2.
Metadata about the calibration function used is necessary for these proxy data to be interoperable (i.e., compared to one another or otherwise integrated) or reusable in new research contexts. In order to be interoperable, for instance, reconstructions that are based on the same material should probably be calibrated using the same calibration function (or there should be some justification for using a different function; for instance, maybe some calibration functions are better suited to certain locations over others). If data are not appropriately calibrated for intercomparison, or if it is not possible to tell whether they are or not, then it is not clear what hypotheses about the agreement or disagreement of data from different datasets are appropriate. For example, if identical elemental ratio data (e.g., Mg/Ca or Sr/Ca) are calibrated to temperature differently, then we wouldn’t expect the temperature reconstructions to agree, whereas if we are testing for consilience or robustness between different samples we would expect the reconstructions to agree. Likewise, reuse of data is hindered by lack of calibration metadata. For example, calibration metadata can be used to “reverse engineer” the original proxy-based measurements if these are otherwise unavailable, which could then be recalibrated if new and improved calibration functions are developed in the future.
At first blush, then, the lack of standardization about calibration-related metadata appears to be a failure on the part of the database organizers, or at least individual contributors. However, I argue that actually the decision not to include or require this metadata might actually have some (perhaps unintended) benefits, exactly because it prevents interoperability and reuse. In particular, I propose that preventing interoperability and reuse contributes beneficially to management of error and uncertainty.
Recall that, in the context of proxy calibration, I argued that not intercalibrating multiple proxies with one another prevents compounding of sources of error and uncertainty in ways that are difficult to detect or quantify. Appropriate interoperability and reuse of data also requires a level of understanding of sources of error and uncertainty that is presently not available for most proxy-based records of Earth’s past climates. Data reuse, for instance, requires by definition that data can be exported from one research context to another. Doing so involves some attempt to show that data that were adequate for research purposes in the first context are also adequate for research purposes in the second context (Bokulich and Parker Reference Bokulich and Parker2021). And, in many cases, whether data are adequate or not for a given research purpose will depend on their associated degrees of error and uncertainty; if estimates of these are unavailable (or themselves highly uncertain), it is often difficult to tell whether data can satisfactorily travel between research contexts. Likewise, interoperability requires data to be able to be integrated with other data. Absent reliable indications of error and uncertainty, it is difficult to know whether and when two datasets can be appropriately integrated. Furthermore, integration or intercomparison of different datasets may, like intercalibration, compound the relevant errors and uncertainties. Compounding sources of error and uncertainty is ill-advised unless researchers have systematic ways of doing so, which paleoclimatologists (as of yet) do not. Of course, there are also risks associated with lack of standardization, and further work needs to be done to weigh the risks and benefits in this case.
These considerations indicate that many proxy-based datasets are not ready for reuse or interoperability, but that this isn’t necessarily a bad thing, and instead contributes beneficially to paleoclimatologists’ ability to manage error and uncertainty. Proscriptions of interoperability and reuse violate the so-called “FAIR” data principles; FAIR stands for “findability,” “accessibility,” “interoperability,” and “reusability.” Proponents of the FAIR data principles make claims such as that “FAIRness is a prerequisite for proper data management and data stewardship” (Wilkinson et al. Reference Wilkinson, Dumontier, Aalbersberg, Appleton, Axton, Baak, Blomberg, Boiten, Santos, Bourne, Bouwman, Brookes, Clark, Crosas, Dillo, Dumon, Edmunds, Evelo, Finkers, Gonzalez-Beltran, Gray, Groth, Goble, Grethe, Heringa, ’t Hoen, Hooft, Kuhn, Kok, Kok, Lusher, Martone, Mons, Packer, Persson, Rocca-Serra, Roos, Schaik, Sansone, Schultes, Sengstag, Slater, Strawn, Swertz, Thompson, Lei, Mulligen, Velterop, Waagmeester, Wittenburg, Wolstencroft, Zhao and Mons2016, 6). Cases such as, I suggest, in the context of paleoclimate proxy data, where there are benefits to preventing interoperability and reusability of data put pressure on these claims in the data ethics and data management literature. Further philosophical work is needed in the nascent field of data ethics to better articulate the scope of standards such as the FAIR principles, or to replace them altogether. Footnote 14
Some other features of the Paleoclimatology Database also indicate that paleoclimate data are not interoperable and reusable on a broad scale. First, the user interface of the database does not make it possible to perform any comparison of multiple datasets at once. For instance, one might want to generate a single graph that tracks SST over time based on Sr/Ca in corals. This functionality is not possible within the database itself, so a user wanting to generate this kind of graph would have to do so by manually downloading all of the relevant datasets and compiling them together. (This process itself might highlight to the user the lack of information about calibration or other ways in which data from different datasets are dissimilar, such as their temporal resolution; this is roughly what happened to me!) By contrast, another widely used database of historical data, The Paleobiology Database (https://paleobiodb.org/), has greater functionality along these lines, integrating all of the data into one interface, as well as “benchmarking” tools to prevent misuse of the data. Second, the Paleoclimatology Database does not purport whatsoever to have data that are “cleaned,” for example by removing duplicates or checking for typographical errors, nor does it provide much transparency on how or whether individual contributors have cleaned their own data. This, again, is left up to individual researchers to figure out. Overall, the fact that the Paleoclimatology Database’s structure makes these practices of reuse and interoperability so labor-intensive disincentivizes researchers from pursuing these projects (whether or not this was the intention of the database managers). And, as I have argued in the context of intercalibration (in section 2) and calibration metadata, the fact that lack of standardization in this case prevents reuse and interoperability actually has beneficial consequences, namely for error and uncertainty management.
In summary, I have suggested that disunity or lack of standardization in paleoclimatologists’ data infrastructure, especially paleoclimatologists’ decisions with respect to database design, has some surprising benefits. In particular, I have argued that the lack of required metadata about proxy calibration—as well as, perhaps, the lack of an integrated user interface or transparency about dataset “cleaning” practices—serves to prevent multiple proxy datasets from being reused or intercompared. Contrary to claims such as those by proponents of the FAIR data principles that interoperability and reuse are necessarily positive qualities of datasets, I suggest that data infrastructures which may discourage or at least not require data to be interoperable or reusable may actually help researchers mitigate rather than compound the various sources of error and uncertainty that affect their data. Similar to my argument concerning the lack of proxy intercalibration offered in section 2, then, I propose that disunification in paleoclimatologists’ data practices can be beneficial, even in ways of which paleoclimatologists may be unaware and, admittedly, in ways that might, in this case, be outweighed by the risks of non-standardization. Both philosophers and scientists should think more about strategic database design, including required metadata, in terms of what effect this design will have on scientific practice. For example, perhaps it is the case that even if the metadata that would be needed to permit reuse and interoperability shouldn’t be required, other metadata (concerning sources of error and uncertainty) could be required in order to help facilitate eventual reuse of these datasets.
4. Conclusion
I have argued that there are data and measurement practices that paleoclimatologists perform that demonstrate the benefits of disunity or lack of standardization. First, I explained how paleoclimate proxies are calibrated to the instrumental record, and tend not to be intercalibrated with one another. Not intercalibrating multiple proxies preserves the independence of the climate reconstructions based on these different measurements, and also prevents inadvertent compounding of multiple sources of error and uncertainty that affect different proxies differently. Second, I showed how even a shared Paleoclimatology Database does not enable reuse or interoperability of proxy data, because there are key pieces of metadata (e.g., about calibration methods) that are not required by the database. Contrary to claims that suggest detailed metadata is required for the data to be useful, I argued that at least temporarily preventing reuse or integration of datasets might actually have the beneficial consequence of mitigating error and uncertainty.
Both of these ways in which proxy data and measurements are disunified or non-standardized have in common that they enable paleoclimatologists to handle myriad sources of error and uncertainty. Paleoclimatology is currently still enormously error- and uncertainty-ridden, and so error and uncertainty management is at the forefront of researchers’ concerns. Future work in philosophy of paleoclimatology should investigate whether the benefits described herein outweigh or are outweighed by the risks associated with these same practices. Additionally, future philosophical work on other areas of science where error and uncertainty are central should investigate whether or how these areas might preserve or construct disunity in their data and measurement practices.
Acknowledgments
This paper has benefited greatly from feedback from Alisa Bokulich, Wendy Parker, Michaela McSweeney, Rachell Powell, Wally Fulweiler, Greg Lusk, Meghan Page, Miguel Ohnesorge, Federica Bocchi, Leticia Castillo Brache, and two anonymous referees, as well as audiences at the Cambridge Early-Career Workshop in the Philosophy of Measurement in 2023, The Future of the Past: Philosophical Issues in the Historical Sciences at the Hebrew University of Jerusalem in 2022, and Measurement at the Crossroads in 2022. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1840990. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.