INTRODUCTION
The 24th International Radiocarbon (14C) conference in Zurich was the first since the 70th anniversary of the first papers to be published on the technique. The history of the developments and early years have been covered in two issues of the journal Radiocarbon (Volume 64, issues 3–4, 2022). A comprehensive overview of the complex topic of radiocarbon calibration would be beyond the scope of a single paper, and so the intention here is to highlight a few key aspects of the associated research and some recurrent themes. The view presented is inevitably only one of many perspectives that could be presented and is largely intended to cover the material presented by the author at the Zurich conference.
EARLY RESEARCH AND INNOVATION
It is always useful to re-read the early papers on any technique to see what the main concerns were at the time, to see what aspects of the original research turned out to have most significance, and topics which turned out to be less important. This is particularly true for radiocarbon in relation to calibration because the whole topic of radiocarbon distribution in the environment was key to the initial work (Arnold and Libby Reference Arnold and Libby1949; Libby et al. Reference Libby, Anderson and Arnold1949; Anderson and Libby Reference Anderson and Libby1951). The work presented in these papers shows a dual approach which has continued within the field, of documenting the levels of radiocarbon in the environment, including the ocean, and testing samples of known age to see if there is evidence of variation (primarily at this stage in the atmosphere) over time.
The underlying questions of heterogeneity and stability are ultimately ones of scale: at some level there is always variation, and the research question is really not whether there is such variation, but whether it is significant. What constitutes significant variation in turn depends on instrumental precision. The very earliest measurements presented had quoted uncertainties of 5–10% (Libby et al. Reference Libby, Anderson and Arnold1949) and, over the time period for which they had known-age material, subsequent research would suggest they were right in their conclusions within this margin (see Figure 1).
There are already hints at potential problems in interpretation, however, in that although Libby et al. (Reference Libby, Anderson and Arnold1949) gives the measurement uncertainty as 5–10%, the quoted counting errors are sometimes tighter than this on single measurements.
Very quickly, measurement precision became much higher as seen for example in Suess (Reference Suess1955) where measurement precisions have improved by about on order of magnitude to ∼0.5%, allowing the dilution of radiocarbon by fossil fuels to be inferred. This result prefigures much of the later use of such known-age measurements in its use both for interpreting radiocarbon as ages and in drawing conclusions about carbon cycle changes.
Crowe (Reference Crowe1958) shows even more clearly that instrument precision itself does not imply accuracy. It presents apparently very significant excursions in the radiocarbon values together with the suggestion that these could be due to solar flare events. Most of the datapoints shown in this paper are significantly different to the later calibration curves but it is interesting that mechanisms which later turned out to be important (albeit at a much smaller scale) were being considered so early. With hindsight, given this paper, it is perhaps surprising that early dates, without calibration, were taken at face value, resulting in the subsequent substantial revisions of chronology (Renfrew Reference Renfrew1973).
AREAS OF RESEARCH FOCUS
As it became apparent that calibration was going to be required, there was a focus on documenting the changes over time (Damon et al. Reference Damon, Long and Grey1966; Dyck Reference Dyck1967; Libby Reference Libby1967). Inevitably these efforts were limited by instrumental precision, measurement capacity, archive availability, and accuracy; these limitations have remained important.
Instrumental Precision
In the early decades of the method, and indeed even when AMS became possible, the precision was limited by counting statistics. As seen in Crowe (Reference Crowe1958), other aspects of the uncertainty were sometimes underestimated. Ideally for any measurement technique you would be capable of making measurements to a much higher precision than you do on a routine basis. This allows for proper quantification of the sources of uncertainty. In the case of radiocarbon measurements the methods were, and indeed often still are, being used close to the limits of instrumental capability. This is a particular characteristic of the method arising from the fact that counting statistics are usually an important element in the uncertainty of any measurement.
Measurement Capacity
Another aspect of radiocarbon dating is the limited measurement capacity of labs capable of making measurements for calibration purposes. Especially with the decay counting methods, where precision takes time, this limitation was a combination of technical capability and financial support. Taking the long view we can see that this situation really only changed substantially when the instrumental precision of AMS became high enough to replace decay counting methods for all parts of the calibration curve. The effect of this is evident in the large expansion of data available for IntCal20 (Reimer et al. Reference Reimer, Austin, Bard, Bayliss, Blackwell, Bronk Ramsey, Butzin, Cheng, Edwards and Friedrich2020).
Archive Availability
The other constraint, evident from the early years, has been availability of suitable archives for calibration purposes. This is heavily reliant on research in other areas, but also constrained by the high bar which is set for samples which can be considered as known-age (Reimer et al. Reference Reimer, Bard, Bayliss, Beck, Blackwell, Bronk Ramsey, Brown, Buck, Edwards and Friedrich2013). Tree rings provided the main basis for this where possible (Stuiver et al. Reference Stuiver and Reimer1986) but started to be augmented with marine data for longer timescales (Stuiver and Braziunas Reference Stuiver and Braziunas1993). Lake sediments and speleothems were seen as having potential as long-term archives more directly related to the atmosphere, but initial attempts led to inconsistent results (Kitagawa et al. Reference Kitagawa and van der Plicht2000; Beck et al. Reference Beck, Richards, Edwards, Silverman, Smart, Donahue, Hererra-Osterheld, Burr, Calsoyas, Jull and Biddulph2001; van der Plicht et al. Reference van der Plicht, Beck, Bard, Baillie, Blackwell, Buck, Friedrich, Guilderson, Hughen and Kromer2004), due primarily to the imprecision of varve timescales for lake sediments and the variability inherent in the dead carbon fraction of many speleothems. Through further research, these new sources of information had become useful for IntCal13 (Reimer et al. Reference Reimer, Bard, Bayliss, Beck, Blackwell, Bronk Ramsey, Brown, Buck, Edwards and Friedrich2013) and by IntCal20 (Reimer et al. Reference Reimer, Austin, Bard, Bayliss, Blackwell, Bronk Ramsey, Butzin, Cheng, Edwards and Friedrich2020) had become fundamental to calibration beyond the reach of dendrochronology. However, the precision and resolution possible with wood samples cannot be matched and, even when they can only be dated relative to absolutely dated speleothems, they remain the best way to look at high frequency signals (Cooper et al. Reference Cooper, Turney, Palmer, Hogg, McGlone, Wilmshurst, Lorrey, Heaton, Russell and McCracken2021). In the future it is reasonable to expect that the full calibration curve will be dominated by measurements on wood.
Accuracy
Accuracy and precision are always closely linked and, as discussed above, Crowe (Reference Crowe1958) shows very clearly what happens when precision improves very rapidly. Any instrumental measurements should involve an assessment of uncertainty, and those making the measurements normally try their best to provide a reasonable estimate. In the case of radiocarbon, however, the processes involved are very complex. A measurement of radiocarbon is aiming to evaluate the level of radiocarbon in some source reservoir (typically the atmosphere or ocean) at some specific time. In order to estimate uncertainty it is necessary to understand the process by which the sample came from the reservoir to the measurement instrument. This implies understanding the relevant biological processes in the sample formation, the degradation processes of a sample once deposited, the effect of sampling and treatment processes in the laboratory and the measurement process itself. These topics span several scientific disciplines, and it is not surprising that estimating uncertainty properly has required considerable work across the discipline over the last seven decades. In the end it is only possible to estimate what is properly understood, and it is often only possible to understand something after it has been precisely measured. For this reason, it is always difficult to maintain accuracy as precision improves.
Similar issues of course are relevant to other dating techniques and so, in the context of radiocarbon calibration, accuracy of the calendar dates associated with the measurements are also subject to some of the same potential problems.
ENDURING PROBLEMS
Looking back at the development of the calibration curves we can see two persistent problems. The first is that of precision outstripping accuracy and the second is related to this in terms of how we use the calibration curves to interpret chronologies.
Precision Outstripping Accuracy
If the different calibration curves generated over the years are compared to one another it is clear that shifts from one curve to the next are sometimes beyond the quoted uncertainty on the curves themselves, at least at 1σ (Figure 2). It is important to unpick some of the reasons for this because this can help understand how calibration curves should be understood.
As with measurement uncertainty, when it comes to the calibration curve itself and the quoted uncertainty, it is always necessary to work within the envelope of the existing state of understanding. This means that uncertainties can only reflect “known unknowns,” not “unknown unknowns” (Jackson Reference Jackson2012). The strategy that has been employed to deal with this has normally been to look for congruence between different lines of evidence. The NotCal04 publication (van der Plicht et al. Reference van der Plicht, Beck, Bard, Baillie, Blackwell, Buck, Friedrich, Guilderson, Hughen and Kromer2004) discusses the reasons for limiting the IntCal04 calibration curve to 26k cal BP when there were older records available. Such a strategy however, can never deal with all of the potential problems and there are a number of fundamental reasons for this.
The first thing to remember is that, particularly for the older parts of the time scale, there are only a very limited number of datasets available. This is due both to the scarcity of archives and to constraints in the available measurement capacity. As a result, calibration curves are inevitably constructed without being able to control for all possible sources of variation. Although this might not seem to be so much of a problem for the younger part of the timescale, here the resolution required is also greater, and for annual resolution records this still persists as a problem. In the update from IntCal13 to IntCal20 we can see changes (Figure 2) which could not have been predicted from the IntCal13 data because, at their temporal resolution, they were internally consistent.
The second important point is that the overall statistical model we use for calibration has some inbuilt limitations. We use the Gaussian standard uncertainty for both the data and the curve, whereas the true scatter and therefore our true uncertainty is more long-tailed; this means that errors which are appropriate at 1σ, are unlikely to be realistic at 2σ, 3σ or higher. Another issue is the correlation between errors on both the radiocarbon and calendar timescales for reasons which are either not fully understood or quantified.
Of course, it would be possible to overcome these issues by quoting a substantially higher uncertainty in the curve itself, but this approach would have drawbacks. The most fundamental problem comes when the curve is used for statistical modeling where overestimation of uncertainty is just as problematic as underestimation. Even for simple calibrations we would be reducing our resolution everywhere in order to forestall problems in particular (as yet unidentified) periods.
For all these reasons it remains of paramount importance to view any calibration curve as a work in progress, representing the best estimate of the environmental variation in radiocarbon with the data currently available. It should always be expected that such estimates to improve over time and in some cases changes might be significant (van der Plicht et al. Reference van der Plicht, Bronk Ramsey, Heaton, Scott and Talamo2020).
Premature Interpretation of Chronologies
The most notable enduring theme seen through the history of radiocarbon dating is the set of problems arising from the mismatch between precision and accuracy. Figure 1 shows that, for the set of known-age material available, radiocarbon was accurate without calibration within the precision available to Libby. However, this was very quickly no longer true and Crowe (Reference Crowe1958) shows, both the problems with accuracy, and the awareness of the scientific community that there was temporal variability in the levels of radiocarbon in the atmosphere. Despite this, the value of being able to date things independently using a scientific technique was sufficiently great that the rush to interpret archaeological chronologies, although useful in some instances, also led to conclusions which had to be subsequently revised (Renfrew Reference Renfrew1973). The significant changes to chronologies developed in the early decades of the method are largely responsible for calibration being seen as a “bane” of the method.
Since the 1970s the situation has stabilized somewhat, but on a finer scale the same problems have continued to emerge. These can broadly be split into two categories: the use of uncalibrated dates for older time ranges before IntCal09 (Reimer et al. Reference Reimer, Baillie, Bard, Bayliss, Beck, Blackwell, Bronk Ramsey, Buck, Burr and Edwards2009) extended back to 50k cal BP; and very high resolution chronologies (sub-centennial) in the more recent past where more minor changes to the curve have been significant (see for example Taylor Reference Taylor2005 and van der Plicht et al. Reference van der Plicht, Bronk Ramsey, Heaton, Scott and Talamo2020).
SUCCESSFUL STRATEGIES
Although the need for calibration is certainly an additional complication when compared to purely radiometric dating methods, there have been a number of benefits both in terms of research avenues explored and of dating precision.
Dual Use of Datasets
From the first development of the technique, data collected on radiocarbon in the environment has been used both for chronological purposes and for studying the broader earth system.
Measurements on tree rings which form the most important element of the calibration curve, and are often measured primarily to enable calibration, also provide a high-resolution record which is also useful for understanding solar processes and other direct influences on production rate; the importance of this was recognized early (Stuiver Reference Stuiver1961) and still continues (Brehm et al. Reference Brehm, Bayliss, Christl, Synal, Adolphi, Beer, Kromer, Muscheler, Solanki, Usoskin, Bleicher, Bollhalder, Tyers and Wacker2021). The regional coverage also allows us to study global processes, in particular the atmospheric and ocean linkages between the two hemispheres (Hogg et al. Reference Hogg, Palmer, Boswijk, Reimer and Brown2009; Turney et al. Reference Turney, Jones, Phipps, Thomas, Hogg, Peter Kershaw, Fogwill, Palmer, Bronk Ramsey and Adolphi2017).
The marine part of the carbon cycle is much more complex and so most marine paleorecord measurement programs have dual use in mind from the outset. Conceptually there are two main types of study: those which focus on relatively stable low latitude ocean settings (Bard et al. Reference Bard, Hamelin, Fairbanks and Zindler1990; Hughen et al. Reference Hughen, Lehman, Southon, Overpeck, Marchal, Herring and Turnbull2004) and those which cover more variable higher latitude settings (such as Austin et al. Reference Austin, Bard, Hunt, Kroon and Peacock1995). The former are more useful for global calibration and the latter for understanding regional and temporal variation. Important though marine palaeodata have been for calibration in the older time range, their value for understanding global ocean process has arguably been larger, starting from early in the development of the method (Broecker et al. Reference Broecker, Gerard, Ewing and Heezen1960; Bard et al. Reference Bard, Hamelin, Fairbanks and Zindler1990).
In addition to the palaeoarchives of radiocarbon, direct atmospheric measurements have become increasingly important for calibration as we have started to use radiocarbon as a dating method in the post-bomb period (see the latest iteration of the calibration curves in Hua et al. Reference Hua, Turnbull, Santos, Rakowski, Ancapichún, De Pol-Holz, Hammer, Lehman, Levin and Miller2022).
Altogether there is a clear synergy with data collected for different primary purposes now providing a wealth of information, both for both calibration (Reimer et al. Reference Reimer, Austin, Bard, Bayliss, Blackwell, Bronk Ramsey, Butzin, Cheng, Edwards and Friedrich2020) and for studying solar and geoscientific processes (Heaton et al. Reference Heaton, Bard, Bronk Ramsey, Butzin, Köhler, Muscheler, Reimer and Wacker2021).
Adoption of Statistical Methods
One complication facing practitioners of radiocarbon, once the need for calibration had been established, was how the calibration should be done. While the radiocarbon date measured (for a particular reservoir) is a function of the age, the age is not a function of the radiocarbon date (Figure 3), with multiple possible values and inversions. The initial approach adopted used classical statistics to find the range of possible solutions (Stuiver and Reimer Reference Stuiver and Reimer1986). However, it was then realized that this particular problem was amenable to probabilistic approaches (van der Plicht Reference van der Plicht1993) including, if available, other information using a Bayesian framework (Buck et al Reference Buck, Kenworthy, Litton and Smith1991).
The early adoption of Bayesian statistics within the radiocarbon dating community (now ubiquitous in many scientific fields) soon allowed a wide range of different models to be applied by the user community (Bronk Ramsey Reference Bronk Ramsey2009). Many of these methods are also useful for other types of dating and it is interesting from a historical perspective that it was the need for calibration which stimulated their development for radiocarbon first.
Even with good methods for calibration, however, we are left with the issue that a single measurement will often calibrate to a wide age range (typically a century or two) regardless of the precision. The reasons behind this are worth considering from a more fundamental perspective. If we define the time scale as t and the radiocarbon scale as r(t) then we can write:
which means that, to first order, the uncertainty in t should be lower if the gradient (dr/dt) is high and this is indeed the case. However, the steep parts of the curve where this helps for single samples are the exception, with plateaus (with a low gradient) dominating the time scale (Figure 3).
Use of High-Frequency Components of the Radiocarbon Signal
With multiple samples it becomes possible to make much better use of the information in the calibration curve. The most straightforward and indeed most powerful technique is to date samples with known age separation (typically tree rings as in Galimberti et al. Reference Galimberti, Bronk Ramsey and Manning2004), but even less constraining models (Bronk Ramsey Reference Bronk Ramsey2009) can regain much of the resolution lost in the calibration process. Essentially models like this are able to make use of higher frequency components of the radiocarbon signal and of the higher gradients (dr/dt) associated with them.
The discovery of sporadic spikes in radiocarbon production, most likely of solar origin, showed that the gradient in the curve can locally be higher by an order of magnitude than the usual trend (Miyake et al. Reference Miyake, Nagaya, Masuda and Nakamura2012, and Figure 4). These very high frequency components can be used to date events to a single year (as shown in principle in Wacker et al. Reference Wacker, Güttler, Goll, Hurni, Synal and Walti2014 and in practice in Kuitems et al. Reference Kuitems, Wallace, Lindsay, Scifo, Doeve, Jenkins, Lindauer, Erdil, Ledger, Forbes, Vermeeren, Friedrich and Dee2021).
These uses of the high-frequency components of the radiocarbon signal enable us to measure ages with a 95% probability range an order of magnitude shorter than our raw measurement precision. That is only possible because of the presence of high gradients sections in the calibration curve; it would not have been possible if radiocarbon had been a purely radiometric technique.
It is perhaps useful to consider this in relation to the dendrochronological dating method. There the measurement of a single ring (width or stable isotope) gives us no information on age at all and it is the high frequency structure which provides the precision. On the other hand a method like U-Th dating relies purely on the low-frequency decay signal. If we compare all three methods (Figure 5) we can appreciate that, because of the nature of the calibration process, radiocarbon is unusual in making use of a wide frequency spectrum and this helps to ensure robustness within the method.
REMAINING CHALLENGES
Despite the progress that has been made in using calibrated radiocarbon dates for high-precision dating, there are clearly many challenges which remain, which are the topic of other papers presented in this conference proceedings. Many of these difficulties arise because, as the precision of the measurements improves, the temporal and spatial heterogeneity seen in radiocarbon throughout the carbon cycle becomes more significant.
As discussed above, there is an ever-increasing need for high precision and accuracy because these are essential for making maximum use of all of the approaches now available. In terms of measurements for the calibration curve, we remain limited by both the availability of archives and on our capacity to measure them.
There are also other challenges which are more fundamental in nature. Dating single samples continues to deliver limited chronological precision unless, like long lived wood samples, they can be subsampled. Our knowledge of which reservoir or reservoirs the carbon in our samples comes from is a particular problem for the dating of human remains but more generally reservoirs such as rivers and lakes are often poorly understood. Furthermore the oceans, while studied in greater detail, vary spatially and temporally in ways which are very difficult to quantify fully (Heaton et al. Reference Heaton, Köhler, Butzin, Bard, Reimer, Austin, Bronk Ramsey, Grootes, Hughen, Kromer, Reimer, Adkins, Burke, Cook, Olsen and Skinner2020). Even in the atmosphere there is still much work to be done to fully understand variation in radiocarbon levels in the polar and tropical regions, both spatially and throughout the seasonal cycle.
CONCLUSIONS
In some ways the impact of the requirement for radiocarbon calibration has come full circle. Initially it was certainly an unwanted complication and shifts in the inferred chronologies for users of the method caused confusion. But calibration now provides us with methods to use radiocarbon as a dating tool with a potential resolution far higher than if it had not been needed. However, these advantages do not apply to all sample types and there is certainly much more work to do to make maximum use of the information embedded within the radiocarbon calibration records.
What the past research on this topic shows very clearly, however, is the way in which scientific research and data intended for one purpose often has impact in a much broader range of scientific disciplines. The radiocarbon calibration records are exceptional in their range of applications both for chronology and for understanding of the earth and solar systems. It seems likely that the work needed to meet the remaining challenges in these areas will have equally broad implications, which is why radiocarbon remains such an interesting and exciting field for research more than 70 years since its inception.