Non-technical Summary
The amount of DNA in a plant’s genome—its genome size—is thought to be an important parameter for controlling where plants can grow, how well they compete with each other, and how susceptible they are to going extinct. Genome size also varies enormously across modern plants, but there is considerable uncertainty over how this diversity developed, and how important genome size was for controlling plant evolution and extinction in the geological past. Because we cannot directly measure genome size in plant fossils, we need genome size proxies—other traits we can measure that can stand in for genome size in fossil specimens. These proxies need to have a close relationship with genome size to be used successfully, and in this paper, we test two possible proxies for plant genome size. One of these is sporomorph (pollen and spore) size, and the other is the length of guard cells, which form the borders of stomatal pores in plant leaf surfaces. We work with living plants from a botanical garden, comparing pollen size and guard cell length with genome size, and expand the analysis to sporomorph size from a wider range of plant groups using data taken from the published literature. We show that sporomorph size has a complicated but mostly weak relationship with genome size, suggesting that it is a poor proxy to use in the fossil record. Stomatal guard cell length has a much stronger relationship with genome size, with the potential to provide accurate genome size estimates, although further work is needed before this can be used confidently as a proxy.
Introduction
Genome size (GS)—the amount of DNA in a cell nucleus—varies by five orders of magnitude across the land plants (Pellicer et al. Reference Pellicer, Hidalgo, Dodsworth and Leitch2018; Leitch et al. Reference Leitch, Johnston, Pellicer, Hidalgo and Bennett2019). GS variations within and among species have been linked to differences in life strategy, growth rate, plant and organ size, habitat preference, geographic range, invasiveness, and diversification rate (Knight and Ackerly Reference Knight and Ackerly2002; Knight et al. Reference Knight, Molinari and Petrov2005; Beaulieu et al. Reference Beaulieu, Leitch, Patel, Pendharkar and Knight2008; Hodgson et al. Reference Hodgson, Sharafi, Jalili, Diaz, Montserrat-Marti, Palmer and Cerabolini2010; Manzaneda et al. Reference Manzaneda, Rey, Bastida, Weiss-Lehman, Raskin and Mitchell-Olds2012; Veselý et al. Reference Veselý, Bureš and Šmarda2013, Reference Veselý, Šmarda, Bureš, Stirton, Muasya, Mucina and Horová2020; Henry et al. Reference Henry, Bainard and Newmaster2015; Puttick et al. Reference Puttick, Clark and Donoghue2015; Simonin and Roddy Reference Simonin and Roddy2018; Guignard et al. Reference Guignard, Crawley, Kovalenko, Nichols, Trimmer, Leitch and Leitch2019; Roddy et al. Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020; Wang et al. Reference Wang, Wang, Bai, Jin, Nie, Harris and Che2022; Fujiwara et al. Reference Fujiwara, Liu, Meza-Torres, Morero, Vega, Liang, Ebihara, Leitch and Schneider2023). While the specific mechanisms behind these correlations are debated, a link between GS and cell size has been identified, which has knock-on effects for key traits such as stomatal size and density, vein density, and cell packing density in general, which in turn control stomatal conductance and transpiration, and therefore photosynthetic rates and carbon gain, as well as susceptibility to water stress and how well plants can grow and compete in different environments (Knight et al. Reference Knight, Molinari and Petrov2005; Beaulieu et al. Reference Beaulieu, Leitch, Patel, Pendharkar and Knight2008; Hodgson et al. Reference Hodgson, Sharafi, Jalili, Diaz, Montserrat-Marti, Palmer and Cerabolini2010; Franks et al. Reference Franks, Freckleton, Beaulieu, Leitch and Beerling2012; Brodribb et al. Reference Brodribb, Jordan and Carpenter2013; Lomax et al. Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014; Henry et al. Reference Henry, Bainard and Newmaster2015; McElwain et al. Reference McElwain, Yiotis and Lawson2016; Simonin and Roddy Reference Simonin and Roddy2018; Guignard et al. Reference Guignard, Crawley, Kovalenko, Nichols, Trimmer, Leitch and Leitch2019; Roddy et al. Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020; Veselý et al. Reference Veselý, Šmarda, Bureš, Stirton, Muasya, Mucina and Horová2020; Théroux-Rancourt et al. Reference Théroux-Rancourt, Roddy, Earles, Gilbert, Zwieniecki, Boyce, Tholen, McElrone, Simonin and Brodersen2021). Critically, the smaller genome sizes of many angiosperms relative to gymnosperms and ferns, and the higher maximum rates of photosynthetic carbon gain that this enables, are thought to have been key factors in the rapid diversification and rise to ecological dominance of angiosperms in the face of declining atmospheric pCO2 levels during the last ~110 Myr (Leitch and Leitch Reference Leitch and Leitch2012; Brodribb et al. Reference Brodribb, Jordan and Carpenter2013; Henry et al. Reference Henry, Bainard and Newmaster2015; Clark et al. Reference Clark, Hidalgo, Pellicer, Liu, Marquardt, Robert and Christenhusz2016; Dodsworth et al. Reference Dodsworth, Chase and Leitch2016; McElwain et al. Reference McElwain, Yiotis and Lawson2016; Pellicer et al. Reference Pellicer, Hidalgo, Dodsworth and Leitch2018; Simonin and Roddy Reference Simonin and Roddy2018; Roddy et al. Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020; Théroux-Rancourt et al. Reference Théroux-Rancourt, Roddy, Earles, Gilbert, Zwieniecki, Boyce, Tholen, McElrone, Simonin and Brodersen2021). Larger genomes are also more costly to construct and maintain, and the decreased surface area to volume ratio of larger cells is thought to limit cellular processes and reduce stomatal response times (Drake et al. Reference Drake, Froend and Franks2013; Hidalgo et al. Reference Hidalgo, Pellicer, Christenhusz, Schneider, Leitch and Leitch2017). Conversely, increases in DNA content through polyploidization/whole-genome duplication (WGD), with concomitant increases in GS, are hypothesized to have prompted diversification and morphological innovation, as well as conferring greater resilience to extinction, although it has been challenging to find consistent support for these expectations across plant clades and datasets (Fawcett et al. Reference Fawcett, Maere and Van de Peer2009; Mayrose et al. Reference Mayrose, Zhan, Rothfels, Magnuson-Ford, Barker, Rieseberg and Otto2011; Leitch and Leitch Reference Leitch and Leitch2012; Tank et al. Reference Tank, Eastman, Pennell, Soltis, Soltis, Hinchliff, Brown, Sessa and Harmon2015; Clark and Donoghue Reference Clark and Donoghue2018; Landis et al. Reference Landis, Soltis, Li, Marx, Barker, Tank and Soltis2018; Clark et al. Reference Clark, Puttick and Donoghue2019; Porturas et al. Reference Porturas, Anneberg, Cure, Wang, Althoff and Segraves2019).
Most of this information has been gleaned from analyzing GS in extant taxa and using molecular phylogenies to reconstruct GS and WGD through time (e.g., Soltis et al. Reference Soltis, Soltis, Bennett and Leitch2003; Hidalgo et al. Reference Hidalgo, Mathez, Garcia, Garnatje, Pellicer and Vallès2010; Henry et al. Reference Henry, Bainard and Newmaster2015; Puttick et al. Reference Puttick, Clark and Donoghue2015; Tank et al. Reference Tank, Eastman, Pennell, Soltis, Soltis, Hinchliff, Brown, Sessa and Harmon2015; Clark et al. Reference Clark, Hidalgo, Pellicer, Liu, Marquardt, Robert and Christenhusz2016; Landis et al. Reference Landis, Soltis, Li, Marx, Barker, Tank and Soltis2018; Pellicer et al. Reference Pellicer, Hidalgo, Dodsworth and Leitch2018; Wang et al. Reference Wang, Wang, Bai, Jin, Nie, Harris and Che2022; Fujiwara et al. Reference Fujiwara, Liu, Meza-Torres, Morero, Vega, Liang, Ebihara, Leitch and Schneider2023). However, the only empirical record of plant evolutionary history, including the ecological and evolutionary dynamics of extinct lineages and the response of plants to mass extinctions and long-term climatic and environmental changes, comes from the fossil record (Taylor et al. Reference Taylor, Taylor and Krings2009; McElwain and Steinthorsdottir Reference McElwain and Steinthorsdottir2017; Jordan et al. Reference Jordan, Carpenter, Holland, Beeton, Woodhams and Brodribb2020). A comprehensive understanding of GS variations across the land plant phylogeny—including the temporal patterns and evolutionary consequences of GS increases (i.e., through WGD) and downsizing within lineages (Masterson Reference Masterson1994; Clark and Donoghue Reference Clark and Donoghue2018; Clark Reference Clark2023), the role of GS and WGD for controlling survivorship across mass extinctions and success in postextinction radiations (Fawcett et al. Reference Fawcett, Maere and Van de Peer2009), and whether GS has had a role in shaping plants’ response to past climatic and environmental upheavals (Hodgson et al. Reference Hodgson, Sharafi, Jalili, Diaz, Montserrat-Marti, Palmer and Cerabolini2010; Franks et al. Reference Franks, Freckleton, Beaulieu, Leitch and Beerling2012; Lomax et al. Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014; Simonin and Roddy Reference Simonin and Roddy2018)—therefore requires GS data to be generated from fossils as well as from living plants.
While GS cannot be measured directly from fossils, it can be estimated using relevant morphological traits. For example, rapid permineralization of Jurassic royal ferns (Osmundaceae) preserved measurable cell nuclei, which matched in size with those of extant relatives and thus suggested relative GS stability in this family through time (Bomfleur et al. Reference Bomfleur, McLoughlin and Vajda2014). Such preservation is incredibly rare, however, and GS estimation from fossils has mostly relied on cell size measurements, which as noted earlier are expected to scale with GS. In particular, stomatal guard cell length (GCL) correlates strongly with GS and is measurable from fossil leaves and cuticles, and as such, GCL measurements have been used to reconstruct changes in (relative) GS through time and across taxa (Masterson Reference Masterson1994; Beaulieu et al. Reference Beaulieu, Leitch, Patel, Pendharkar and Knight2008; Franks et al. Reference Franks, Freckleton, Beaulieu, Leitch and Beerling2012; Lomax et al. Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014; McElwain and Steinthorsdottir Reference McElwain and Steinthorsdottir2017; Simonin and Roddy Reference Simonin and Roddy2018; Clark et al. Reference Clark, Puttick and Donoghue2019). This approach has shown that plant GS has varied considerably over the last ~400 Myr, and supports a scenario of genome downsizing in Cretaceous angiosperms relative to contemporary gymnosperms and ferns (Franks et al. Reference Franks, Freckleton, Beaulieu, Leitch and Beerling2012; Lomax et al. Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014; Simonin and Roddy Reference Simonin and Roddy2018).
Despite these findings, there are uncertainties over the use of GCL as a GS proxy. For example, it is not clear how much the GS–GCL relationship can be modified by evolutionary adaptation to the physical environment (Jordan et al. Reference Jordan, Carpenter, Koutoulis, Price and Brodribb2015; McElwain and Steinthorsdottir Reference McElwain and Steinthorsdottir2017; Roddy et al. Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020; Veselý et al. Reference Veselý, Šmarda, Bureš, Stirton, Muasya, Mucina and Horová2020), although environmentally driven intraspecific phenotypic plasticity in GCL is thought to be minimal (Lomax et al. Reference Lomax, Woodward, Leitch, Knight and Lake2009; Jordan et al. Reference Jordan, Carpenter, Koutoulis, Price and Brodribb2015). Fossil leaves and cuticles are also mostly derived from wetland environments and woody plants, providing a biased view of vegetation change through time, with dryland and herbaceous taxa in particular being underrepresented in the fossil record (Taylor et al. Reference Taylor, Taylor and Krings2009; Lomax et al. Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014; McElwain and Steinthorsdottir Reference McElwain and Steinthorsdottir2017; Veselý et al. Reference Veselý, Šmarda, Bureš, Stirton, Muasya, Mucina and Horová2020).
An additional source of data for studying plant evolution comes from the fossil sporomorph (pollen and spore) record. Sporomorphs are produced by plants in vast quantities, can be transported over large distances, and preserve well in the fossil record because of their resistant sporopollenin wall (exine) (Traverse Reference Traverse2007). This means that sporomorphs can be found in a greater range of sedimentary environments, and in greater quantities, than other plant remains, and provide some of the earliest records of major plant clades in the fossil record (Traverse Reference Traverse2007). Sporomorphs therefore represent an abundant and widespread record of past vegetation, and coupled with the fact that sporomorph size is a straightforward trait to measure in both extant and fossil specimens, could provide a rich and independent archive of GS data.
As with plant macrofossils, the fossil sporomorph record has its own intrinsic biases, including the often-low taxonomic resolution obtainable from sporomorph morphology, with fossil sporomorphs in particular being challenging to split into biologically meaningful taxa; extremely variable sporomorph production among species, with some taxa (e.g., wind-pollinated plants and some spore-producing plants) releasing vast quantities of sporomorphs and others (e.g., insect-pollinated plants) releasing relatively little; and the high transport potential of sporomorphs, meaning that source areas for assemblages can be large and challenging to constrain (Traverse Reference Traverse2007). However, the complementary nature of the properties and biases of the sporomorph and plant macrofossil records means that together they could provide a broad understanding of GS dynamics through plant evolutionary history, if both could be demonstrated to contain usable GS information.
Previous work, however, has provided mixed evidence for a correlation between GS and sporomorph size: while some studies have reported a positive relationship between the two (Huang et al. Reference Huang, Chou, Hsieh, Wang and Chiou2006; De Storme et al. Reference De Storme, Zamariola, Mau, Sharbel and Geelen2013; Henry et al. Reference Henry, Bainard and Newmaster2015; Jan et al. Reference Jan, Schüler and Behling2015; Barrington et al. Reference Barrington, Patel and Southgate2020), others have found that the relationship is either weak or nonexistent, especially once phylogeny is accounted for in the analysis (Knight et al. Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010; Dyer et al. Reference Dyer, Pellicer, Savolainen, Leitch and Schneider2013; Coulleri et al. Reference Coulleri, Urdampilleta and Ferrucci2014; Wang et al. Reference Wang, Wang, Bai, Jin, Nie, Harris and Che2022). To some extent, this discrepancy can be explained by taxonomic scale, with some studies reporting a positive relationship focusing on ploidy series within species or on GS variations among closely related species (e.g., Huang et al. Reference Huang, Chou, Hsieh, Wang and Chiou2006; De Storme et al. Reference De Storme, Zamariola, Mau, Sharbel and Geelen2013; Barrington et al. Reference Barrington, Patel and Southgate2020) and studies reporting no significant relationship mostly working across broader taxonomic scales (e.g., Knight et al. Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010; Coulleri et al. Reference Coulleri, Urdampilleta and Ferrucci2014; Wang et al. Reference Wang, Wang, Bai, Jin, Nie, Harris and Che2022), although this pattern does not hold true in all cases (Dyer et al. Reference Dyer, Pellicer, Savolainen, Leitch and Schneider2013; Henry et al. Reference Henry, Bainard and Newmaster2015). Knight et al. (Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010) suggested that pollen size (PS) and GS might be disconnected because of selective pressures for pollen to be small, but this hypothesis has yet to be explicitly tested. It is also not known how much the negative results reported by some studies (e.g., Knight et al. Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010; Coulleri et al. Reference Coulleri, Urdampilleta and Ferrucci2014; Wang et al. Reference Wang, Wang, Bai, Jin, Nie, Harris and Che2022) were the result of compiling data from the literature, rather than assembling entirely new datasets, especially given the impact of different processing approaches and storage and slide-mounting media on sporomorph size (Christensen Reference Christensen1946; Andersen Reference Andersen1960; Reitsma Reference Reitsma1969; Wei et al. Reference Wei, Jardine, Gosling and Hoorn2023) and the potential for intraspecific size variation in sporomorphs (Ejsmond et al. Reference Ejsmond, Wrońska-Pilarek, Ejsmond, Dragosz-Kluska, Karpińska-Kołaczek, Kołaczek and Kozłowski2011; Wei et al. Reference Wei, Jardine, Gosling and Hoorn2023).
Here, we address these issues by assessing the potential of sporomorph size and GCL to be used as paleo-GS proxies. We achieve this via three sets of analyses (Table 1).
Table 1. Analyses and datasets used in this study. Numbers of plants and measurements (stomatal guard cell length measurements for leaves, otherwise pollen size) are only given for specimens collected from the University of Münster Botanical Garden, with other data taken from the literature and representing multiple studies and sampling protocols

Analysis 1
We analyze angiosperm PS, GCL, and GS data, with the PS and GCL data generated from the same plants growing in one location to (1) eliminate environmental variation as a control on plant morphological traits and (2) provide a direct comparison of the strength of the relationship between GS and GCL, and GS and PS, respectively. We also fit evolutionary models to these data to directly compare the underlying evolutionary dynamics of GS, GCL, and PS, and to test the hypothesis of Knight et al. (Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010) that pollen is specifically selected to be smaller in size.
Analysis 2
We analyze sporomorph size data for ferns, Pinaceae, Poaceae, and Asteraceae, using both data from the literature and new measurements, taking into account processing procedures and storage and slide-mounting media for the literature compilation. We do this to better understand the consistency of the relationship between sporomorph size and GS, both at different taxonomic ranks and across clades, in order to reconcile the mixed evidence for the strength of this relationship reported by studies to date.
Analysis 3
We assess the power of GCL to accurately predict angiosperm GS using published GCL data as an independent validation dataset. Because the overall positive relationship between GCL and GS is already established, we do this to push the development of the paleo-GS proxy to the next step, with validation using independent data (i.e., out of sample prediction) to evaluate how accurate GS estimates are likely to be with new data that are not part of the calibration process.
Materials and Methods
Analysis 1: Angiosperm GCL and PS Data
Sample Collection
Flowers/anthers and leaves were collected from plants in the University of Münster Botanical Garden, Münster, Germany (51.964°N, 7.610°E), with the exception of elder (Sambucus nigra), hazel (Corylus avellana), and hornbeam (Carpinus betulus), which were sampled from other locations in Münster (Supplementary Table 1). Sampling was mostly carried out between March and September 2021, with some additional sampling in March and April 2022. The sampling strategy focused on obtaining both a broad phylogenetic range of species and a broad range of pollen sizes, using the literature (Beug Reference Beug2004; Knight et al. Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010) as a guide for PS variation. Sampling was limited to species with published GS information, using the Kew C-values database (Leitch et al. Reference Leitch, Johnston, Pellicer, Hidalgo and Bennett2019) as a source and the authority data in both the botanical garden species labels and the C-values database to ensure taxonomic consistency in sampling.
A target of three individual plants per species was set as a sampling aim, which was achieved for 49 out of 61 (= 80.3%) species sampled; in the remainder of cases, only 2 plants (7/61 = 11.5%) or 1 plant (5/61 = 8.2%) was available for sampling (Supplementary Table 1). For each plant, three leaves were collected, and pollen was sampled from several flowers and/or anthers (depending on flower morphology and number of flowers per plant). While data were collected for each leaf separately (see “GCL Data Generation”), the pollen collected from each plant was pooled to ensure sufficient material for analysis.
PS Data Generation
Following collection, the flowers, anthers, or pollen were air dried, and then acetolyzed to isolate the pollen wall. Although acetolysis has been shown to increase the size of sporomorphs, the effect is thought to be minimal after several minutes of treatment (Christensen Reference Christensen1946; Reitsma Reference Reitsma1969). Acetolysis was carried out for 5 minutes in a water bath at 90°C, using a 9:1 mixture of ≥99% acetic anhydride ([CH3CO]2O) and 96% sulfuric acid (H2SO4), after which the samples were topped up with 100% glacial acetic acid (CH3COOH), centrifuged for 5 minutes at 2500 bpm, and decanted. The samples were then washed twice in deionized water and twice in isopropyl alcohol ([CH3]2CHOH), with centrifugation and decanting after each step (Whitney and Needham Reference Whitney and Needham2014). Finally, the residue was mixed with silicone oil, the remaining isopropyl alcohol was left to evaporate, and the samples were mounted on slides. Silicone oil was selected instead of glycerol to prevent the swelling of pollen grains between processing and measurement (Andersen Reference Andersen1960; Wei et al. Reference Wei, Jardine, Gosling and Hoorn2023).
We aimed to measure 30 pollen grains per sample; however, in a minority of cases (11 out of 166 = 6.6% of plants sampled) fewer than 30 grains were available for measurement (Supplementary Table 1). To generate the PS data, we measured the longest axis for each species. For species with prolate pollen (e.g., Acanthus mollis), this was the polar axis, measured in equatorial view with obliquely oriented (i.e., tilted) grains avoided; for species with oblate pollen (e.g., C. avellana), the equatorial axis was measured in polar view, with obliquely oriented grains avoided; and for species with more or less spherical pollen (e.g., Cucurbita pepo), a simple measurement of the grain diameter was taken regardless of the orientation. Measurements were made from the outer surface of the exine but excluding any surface sculptural elements such as echinae and clavae. Measurement data were generated using a Leica DM LB 2 microscope fit with a Leica DFC480 camera, using Leica QWin Standard v. 3.5.1 software.
GCL Data Generation
To generate GCL data, we made nail varnish peels following the procedure in Porter et al. (Reference Porter, Evans-Fitz.Gerald, Yiotis, Montañez and McElwain2019). First, leaf surface impressions were made using Coltene President The Original light body dental putty on the same day as the leaves were collected to limit changes in cell size linked to leaf dehydration. The dental putty was applied to the leaf surface (abaxial, with the exception of the pond lily Nuphar lutea, which has exclusively adaxial stomata), left for 30 minutes to set, and then peeled off. To make the nail varnish peels, we used Sally Hansen Miracle Gel top coat, because this has been found to work well in previous analyses (Porter et al. Reference Porter, Evans-Fitz.Gerald, Yiotis, Montañez and McElwain2019). This was applied evenly to the dental putty impression of the leaf surface, left to dry for 30 minutes, and then peeled off and mounted on a slide. Measurements were carried out using the same microscope and software as for the PS data (see “PS Data Generation”). We measured 10 guard cells per leaf, resulting in 30 GCL measurements per plant and 30 to 90 GCL measurements per taxon (Supplementary Table 1).
Seven of the 61 species sampled for leaves had sunken guard cells, meaning that the full length of the guard cell could not be measured (Supplementary Table 2). Rather than clearing and staining the leaves or processing them to isolate the epidermis, both of which may have impacted on cell size relative to the nail varnish peel-based measurements, we estimated GCL for these taxa using stomatal pore length (PL). Different relationships have previously been used to scale from PL to GCL or vice versa. For example, McElwain and Steinthorsdottir (Reference McElwain and Steinthorsdottir2017) assumed that GCL = 1.5 × PL, while Franks and Beerling (Reference Franks and Beerling2009) and Simonin and Roddy (Reference Simonin and Roddy2018) used GCL = 2 × PL. Franks et al. (Reference Franks, Royer, Beerling, Van de Water, Cantrill, Barbour and Berry2014) provided a range of geometric ratios to convert GCL to PL, including different scaling factors for non-Poaceae angiosperms with GCL <30 μm (PL/GCL = 0.3; therefore GCL = 3.33 × PL) and >30 μm (PL/GCL = 0.6; therefore GCL = 1.67 × PL). We checked these scaling factors with our own material, generating both PL and GCL data from the closest relatives to the taxa with sunken stomata in our dataset (Supplementary Table 2), but found a different relationship to previous estimates, especially compared with Franks et al. (Reference Franks, Royer, Beerling, Van de Water, Cantrill, Barbour and Berry2014), with a continuous function explaining the relationship best rather than a threshold at 30 μm (Supplementary Fig. 1A). We therefore regressed GCL onto PL to calculate a new scaling relationship from these data. While a general scaling of GCL = 1.36 × PL provided a good estimate of GCL, the quadratic relationship GCL = 1.71 × PL − 0.01 × PL2 provided a better fit (corrected Akaike information criterion [AICc] of 1278 vs. 1394 for the quadratic and linear relationships, respectively; Supplementary Fig. 1B), and we used this to calculate GCL for the seven taxa with sunken stomata in our dataset (Supplementary Fig. 2A).
GS Data, Pollination Syndrome, and Phylogeny Estimation
GS data were harvested from the Kew C-values database (Leitch et al. Reference Leitch, Johnston, Pellicer, Hidalgo and Bennett2019), using the prime estimate (i.e., the C-value identified within the database as the preferred value for each species) in each case. Four species in our dataset were represented by two ploidy levels in the C-values database, each with associated GS values (Supplementary Table 3). It did not make a substantial difference to the results which set of GS values we used for these species; we therefore used the lower GS values for these species in our main analyses, but included the results based on the higher GS values in the Supplementary Material (Supplementary Fig. 2). We also compiled pollination syndrome (animal, wind, or selfing) data using the TRY plant trait database (Kattge et al. Reference Kattge, Bönisch, Díaz, Lavorel, Prentice, Leadley and Tautenhahn2020), supplemented with data from the literature as necessary (Supplementary Table 3). The pollination syndrome data are, however, highly unbalanced, with 68/84 (81%) of the species in our dataset being primarily animal pollinated, and just 8 (9.5%) being wind pollinated and 8 primarily self-pollinating. We therefore did not include pollination syndrome as an explanatory variable in our PS analyses.
The phylogeny was generated using the V.PhyloMaker2 package (Jin and Qian Reference Jin and Qian2019, Reference Jin and Qian2022) for R (R Core Team 2022) (see “Software and Reproducibility” for full version information), using the GBOTB.extended.LCVP.tre megatree as a backbone phylogeny. This approach works by identifying taxa in the dataset species list that are already present in the backbone megatree, binding in new species based on genus or family membership, and then pruning the megatree so that only taxa in the dataset species list are left (for full details, see Jin and Qian Reference Jin and Qian2019, Reference Jin and Qian2022). Before generating the phylogeny, we used the lcvplants (Freiberg et al. Reference Freiberg, Winter, Gentile, Zizka, Muellner-Riehl, Weigelt and Wirth2020) R package to harmonize the taxonomy of the PS, GCL, and GS data according to the Leipzig Catalogue of Vascular Plants (LCVP) nomenclature, to make it consistent with that of the megatree (Supplementary Table 3).
Data Analysis
We calculated the mean PS and GCL for each taxon for use in subsequent analyses. Because the PS, GCL, and GS data were all highly skewed (Supplementary Fig. 3), we used log10-transformed data in our analyses. We measured the phylogenetic signal in the variables using two complementary metrics: Pagel’s λ (Pagel Reference Pagel1999) and Blomberg’s K (Blomberg et al. Reference Blomberg, Garland and Ives2003). In both cases, a value of 0 indicates phylogenetic independence (= no phylogenetic signal) and a value of 1 means that correlations of trait values among species correspond to a Brownian motion model of trait evolution (= high phylogenetic signal); Blomberg’s K can take values higher than 1, indicating higher phylogenetic signal than expected under Brownian motion (Münkemüller et al. Reference Münkemüller, Lavergne, Bzeznik, Dray, Jombart, Schiffers and Thuiller2012).
We used linear regression to assess the relationships between variables. Preliminary model fitting with ordinary least squares (OLS) regression revealed significant phylogenetic autocorrelation (assessed using Pagel’s λ and Blomberg’s K) in the model residuals, which is expected given the shared evolutionary history of the taxa in the dataset (Grafen Reference Grafen1989; Pagel Reference Pagel1999). We therefore used phylogenetic generalized least squares (P-GLS) regression (Grafen Reference Grafen1989) in addition to OLS regression. We used the corPagel correlation structure from ape (Paradis and Schliep Reference Paradis and Schliep2019) in the P-GLS models, which estimates the strength of the phylogenetic signal in the residuals using Pagel’s λ and thus allows it to vary depending on the data. For the regression of PS onto GS, the model failed to converge, however, so we used the corBrownian correlation structure, which assumes a Brownian motion model of evolution for the residuals (equal to λ = 1). We evaluated the explanatory power of the models using the R 2pred measure of Ives (Reference Ives2019) for the P-GLS models, and standard R 2 estimates for the OLS models.
We fit three models of trait evolution to the logged PS, GCL, and GS data. We fit Brownian motion (BM), Ornstein-Uhlenbeck (OU), and early burst (EB) models of trait evolution, which encapsulate neutral drift, evolution toward selective optima, and rapid evolution followed by a slowing down of evolutionary rates, respectively (Hansen Reference Hansen1997; Butler and King Reference Butler and King2004; Harmon et al. Reference Harmon, Losos, Jonathan Davies, Gillespie, Gittleman, Bryan Jennings and Kozak2010). If, following the hypothesis of Knight et al. (Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010), PS is selected to be small, then we would expect an OU process to be the best-fitting model, with PS pulled toward an adaptive optimum at the lower end of the size spectrum, while GCL and GS would be better fit by either BM or EB models. We ranked the models for each variable using AICc, which balances model fit against model complexity and corrects for small sample sizes (Anderson Reference Anderson2008).
Analysis 2: Clade-Level Sporomorph Data
Sporomorph size data were compiled from the literature for three focal groups: Polypodiophyta (ferns), Pinaceae (pine family), and Poaceae (grass family) (Table 1, Supplementary Tables 4–9). These groups were chosen because they are all common in the fossil sporomorph record (Traverse Reference Traverse2007); the ferns and the pines broaden out the analyses beyond angiosperms, and grasses have been the focus of past attempts to leverage paleoecologically relevant information from PS (e.g., Schüler and Behling Reference Schüler and Behling2010; Jan et al. Reference Jan, Schüler and Behling2015; Wei et al. Reference Wei, Jardine, Gosling and Hoorn2023), but without a broad-scale analysis of GS variation.
The fern spore data were taken from the compilation of Leslie and Bonacorsi (Reference Leslie and Bonacorsi2022) and represent size-range midpoint data for 1211 extant fern species, with the majority of data derived from Tryon and Lugardon (Reference Tryon and Lugardon1991). The majority of species in the data compilation are homosporous, with the size measurements representing isospores, while 16 species (1.3%) are heterosporous with both micro- and megaspore size data included. Bisexual isospores and male microspores are not directly comparable in terms of size distributions because of differing functional biology and selective pressures for resource allocation (Haig and Westoby 1998; Leslie and Bonacorsi Reference Leslie and Bonacorsi2022). However, because only 16/1211 measurements represented microspores, and this was reduced to 1/113 species following taxonomic harmonization and limiting the dataset to species with published GS data (discussed later in this section), we analyzed these as one combined dataset and excluded the megaspore data from the analysis.
Both the Pinaceae and Poaceae data are based on our own literature compilations. Pinaceae pollen is predominantly bisaccate, with a central body (corpus) and two air sacs (sacci). Measurements in the literature comprise a mix of length, depth, and breadth measurements for the corpus, and length measurements for the entire grain, including both corpus and sacci (we harmonized the usage of the terms length, depth, and breadth following the measurement scheme of Cain [Reference Cain1940]). All four variables were highly correlated, so we focused our analysis on corpus length, because this included the highest number of measurements in the dataset (Supplementary Fig. 4), and it could incorporate non-saccate Pinaceae taxa such as Larix and Pseudotsuga. For Poaceae pollen, measurements in the literature mostly represent the maximum grain diameter (treated as grain length), although in some cases the grain diameter perpendicular to the direction of the grain length was given (treated as the grain width) or the mean of these two measurements. Again, these variables were highly correlated, so we focused our analysis on grain length, because this comprises the majority of the measurements (Supplementary Fig. 5).
Because the slide-mounting medium used may impact on the measurements (Christensen Reference Christensen1946; Andersen Reference Andersen1960; Reitsma Reference Reitsma1969; Wei et al. Reference Wei, Jardine, Gosling and Hoorn2023), we also recorded this information, along with the microscope type (LM vs. SEM) used for measurement. However, in each dataset, one mounting medium dominates (lactic acid for fern spores and glycerine for Pinaceae and Poaceae pollen), and any impact of mounting medium on the resultant measurements is relatively minor (Supplementary Fig. 6), so we did not incorporate this variable into the model fitting. The three compilations were manually vetted before further analysis, with spelling mistakes corrected and any taxa not resolved to species level removed from the datasets.
A fourth clade, Asteraceae (the daisy family), was targeted for additional sampling from the University of Münster Botanical Garden (Table 1). Asteraceae was selected because it is (1) a hyper-diverse, cosmopolitan family that is a key focus of evolutionary and ecological research (Mandel et al. Reference Mandel, Dikow, Siniscalchi, Thapa, Watson and Funk2019; Jardine et al. Reference Jardine, Palazzesi, Telleria and Barreda2022; Palazzesi et al. Reference Palazzesi, Pellicer, Barreda, Loeuille, Mandel, Pokorny, Siniscalchi, Tellería, Leitch and Hidalgo2022); (2) common in the Neogene and Quaternary fossil sporomorph record (Traverse Reference Traverse2007; Palazzesi et al. Reference Palazzesi, Pellicer, Barreda, Loeuille, Mandel, Pokorny, Siniscalchi, Tellería, Leitch and Hidalgo2022); and (3) well represented in the University of Münster Botanical Garden, especially with regard to the subfamilies Asteroideae (daisies, sunflowers) and Carduoideae (thistles). While seven Asteraceae species are present in the Analysis 1 dataset and represented by both pollen and leaves, this additional sampling focused on pollen only and incorporated an additional 23 species (Supplementary Table 1). PS data generation was the same as for the Analysis 1 pollen data.
The remainder of the data compilation and analysis workflow was the same as for Analysis 1. For the literature data, following taxonomic harmonization, we calculated the species mean sporomorph size or GS when taxa were synonymized or when multiple measurements were present for each taxon, removed taxa with unresolved names, and limited the sporomorph size datasets to taxa present in the C-values database.
Analysis 3: Testing the Fidelity of GS Estimates from GCL Data
We used GCL data from Lomax et al. (Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014), which augmented the angiosperm GCL dataset of Beaulieu et al. (Reference Beaulieu, Leitch, Patel, Pendharkar and Knight2008) with measurements of Allium species to capture larger genome sizes, as an independent validation dataset, updating the associated GS data using the Kew C-values database (Leitch et al. Reference Leitch, Johnston, Pellicer, Hidalgo and Bennett2019). Using the botanical garden log10 GCL and GS data, we regressed GS onto GCL in both P-GLS and OLS frameworks to provide predictive models for GS. We then used these models to predict GS from the Lomax et al. (Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014) GCL data. We assessed the predictive power of the GS–GCL relationship via the root-mean-square error of prediction (RMSEP), and the R 2, intercept, and slope values from a linear regression of observed versus predicted log10 GS, with the observed data treated as the response variable and the predicted data treated as the explanatory variable (Piñeiro et al. Reference Piñeiro, Perelman, Guerschman and Paruelo2008). For a successful predictive model, the intercept and slope from this regression should equal 0 and 1, respectively.
Software and Reproducibility
All analyses were carried out in R v. 4.2.2 (R Core Team 2022) using RStudio v. 2023.03.1+446 (R Studio Team 2022), with the packages ape v. 5.6-2 (Paradis and Schliep Reference Paradis and Schliep2019), geiger v. 2.0.10 (Pennell et al. Reference Pennell, Eastman, Slater, Brown, Uyeda, FitzJohn, Alfaro and Harmon2014), lcvplants v. 2.1.0 (Freiberg et al. Reference Freiberg, Winter, Gentile, Zizka, Muellner-Riehl, Weigelt and Wirth2020), MuMIn v. 1.47.1 (Bartoń Reference Bartoń2022), nlme v. 3.1-160 (Pinheiro et al. Reference Pinheiro and Bates2022), phytools v. 1.2-0 (Revell Reference Revell2012), RColorBrewer v. 1.1-3 (Neuwirth Reference Neuwirth2022), rr2 v. 1.1.0 (Ives and Li Reference Ives and Li2018), stringr v. 1.5.0 (Wickham Reference Wickham2022), taxize v. 0.9.100 (Chamberlain and Szöcs Reference Chamberlain and Szöcs2013), V.Phylomaker2 v. 0.1.0 (Jin and Qian Reference Jin and Qian2019, Reference Jin and Qian2022), and viridis v. 0.6.2 (Garnier et al. Reference Garnier, Ross, Rudis, Camargo, Sciaini and Scherer2021). All data files and R code used in this analysis are available for download from figshare (Jardine et al. Reference Jardine, Morck and Lomax2025).
Results
Analysis 1: Angiosperm GCL and PS Data
The GS values range from 764 to 90,180 Mbp (median 4469 Mbp), and differ from clade to clade: monocots typically have larger genomes, and within the eudicots, superrosid species generally have smaller genomes than the superasterids (Fig. 1). This pattern of among-clade GS differences is consistent with previous studies (e.g., Soltis et al. Reference Soltis, Soltis, Bennett and Leitch2003; Pellicer et al. Reference Pellicer, Hidalgo, Dodsworth and Leitch2018), suggesting the taxon selection used here provides a representative sample of angiosperm GS variations. The species mean PS data range from 14 to 123 μm (median 34 μm), and the GCL data range from 14 to 72 μm (median 30 μm) (Supplementary Fig. 3). Pagel’s λ and Blomberg’s K estimates and associated statistical significance values indicate a high phylogenetic signal across all three variables, although the phylogenetic signal is weaker in the GCL values relative to GS and PS (Table 2).

Figure 1. Phylogeny for the Analysis 1 botanical garden dataset, with maximum-likelihood estimates of log10 genome size (GS) across the phylogeny mapped. GS data for the tips of the phylogeny come from the Kew C-values database (https://cvalues.science.kew.org; Leitch et al. Reference Leitch, Johnston, Pellicer, Hidalgo and Bennett2019).
Table 2. Phylogenetic signal for the key parameters from the botanical garden dataset (n = 61 in all cases)

In all cases, the P-GLS models received higher support (in terms of AICc) than the OLS models, and the estimation of λ in the model correlation structure indicated a substantial phylogenetic autocorrelation in the model residuals, although this effect was less pronounced in the regression of GCL onto GS (Table 3). We therefore concentrate here on the P-GLS model results, but provide the details of the OLS models for comparison (Fig. 2, Table 3). Regressing GCL onto GS shows a significant positive relationship, with a consistent linear trend across the whole dataset and within both monocots and eudicots (Fig. 2A, Table 3). There is no obvious relationship between PS and GS, however, with a broad spread of PS values across most of the GS range and the largest two pollen measurements coming from species with small genomes (Cucurbita pepo, PS = 123 μm, GS = 1058 Mbp; Oenothera biennis, PS = 104 μm, GS = 2254 Mbp) (Fig. 2B, Table 3). The weak positive relationship between GS and PS is largely driven by the lilioid monocots Lilium henryi and Fritilleria imperialis having the largest genomes and large pollen (PS of 81 μm and 68 μm, respectively; GS of 83,143 Mbp and 90,180 Mbp, respectively), but beyond this there is no clear correlation either within the monocots or eudicots or taking the angiosperms as a whole.
Table 3. Regression results for the relationship between each parameter and genome size, for the phylogenetically dependent ordinary least squares (OLS) models, and the phylogenetically independent phylogenetic generalized least squares (P-GLS) models. The first two rows relate to the Analysis 1 botanical garden dataset, while the remainder of the results are for the Analysis 2 clade-level analyses. λ gives the strength of the phylogenetic signal in the regression residuals. AICc, corrected Akaike information criterion


Figure 2. Guard cell length (A) and pollen size (B) for the Analysis 1 botanical garden dataset plotted against genome size. Solid regression lines are from phylogenetic generalized least squares (P-GLS) models, dashed regression lines are from ordinary least squares (OLS) models, and point styles and colors represent main clades (see also Fig. 1).

Figure 3. Sporomorph size vs. genome size for the Analysis 2 clade-level datasets. (A) Fern spore size, with point styles and colors representing subclasses; (B) Pinaceae pollen corpus length, with point styles and colors representing genera (see also Supplementary Fig. 7); (C) Poaceae pollen length, with point styles and colors representing subfamilies; (D) Asteraceae pollen size, with point styles and colors representing subfamilies (see also Supplementary Fig. 8). Solid regression lines are from phylogenetic generalized least squares (P-GLS) models, dashed regression lines are from ordinary least squares (OLS) models.
Fitting models of trait evolution shows that the BM model (i.e., neutral drift) is favored for PS and GS, although the difference in AICc values among the three evolutionary models is quite small (Table 4). The OU model (i.e., evolution toward a selective optimum) received strong support for the GCL data, however, with a large difference in AICc relative to the other models. The estimated α parameter, which is the evolutionary “pull” of the variable toward the optimum trait value in the OU model, is also an order of magnitude higher for GCL than for PS and GS (Table 4); for these two variables (PS and GS), the α value is so low that the best-fitting OU model is possibly not that different from the simpler BM model in terms of how it would operate through evolutionary time. In all cases, the change in evolutionary rate through time (r) estimated for the EB model (i.e., rapid evolution followed by a reduction in rates) is essentially zero, again suggesting that the model operating under these parameters would not be that different from a BM model of trait evolution.
Table 4. Results from evolutionary models fit to the botanical garden dataset (n = 61 in all cases), comparing Brownian motion (BM), Ornstein-Uhlenbeck (OU), and early burst (EB) models. σ2 = rate of trait evolution; α = strength of attraction to the optimum value in the OU model; σ20 = initial rate of trait evolution; and r = change in the rate of trait evolution in the EB model. AICc, corrected Akaike information criterion

Analysis 2: Clade-Level Sporomorph Data
As with the Analysis 1 GCL and PS data, P-GLS models received higher support than OLS models for the four sporomorph datasets, with the estimated λ values indicating substantial phylogenetic autocorrelation in the model residuals (Table 3). In all four cases (fern spores, Pinaceae pollen, Poaceae pollen, and Asteraceae pollen), the P-GLS regressions of sporomorph size onto GS reveal positive relationships, although they are only statistically significant for the three literature compilations (Fig. 3, Table 3). The explanatory power of the relationships, quantified via the R 2 values, are at best moderate, especially when the OLS R 2 values (which only include GS as the explanatory variable) and the P-GLS R 2 values (which include both GS and the phylogenetic correlation structure) are compared: in all cases the P-GLS R 2 values are much higher, suggesting that the majority of the explained variance in these models is accounted for by the correlation structure rather than by GS (Table 3).
For the fern spore data, the positive correlation with GS is largely driven by the Ophioglossidae (adder’s-tongue and whisk ferns), with a possible positive relationship also shown by the Equisetidae (horsetails) (Fig. 3A). The Polypodiidae, which make up the majority of the dataset (and extant fern diversity), incorporate a range of spore sizes at the lower end of the GS gradient, leading to a substantial scatter about the fitted regression lines and no clear relationship between spore size and GS (Fig. 3A). The fitted OLS and P-GLS models for the Pinaceae PS data suggest substantially different relationships with GS: the naïve OLS model extracts the overall negative relationship between GS and PS for the family, while the phylogenetically informed P-GLS model picks out the positive relationships shown by some of the genera (Fig. 3B, Table 3). This is particularly clear for the Pinus data, which comprise the majority of the dataset and demonstrate a strongly positive relationship between PS and GS (Table 3, Supplementary Fig. 7), but it is also indicated by the Larix and Picea data, although the sample sizes are too low in each case to carry out genus-level analyses (Fig. 3B). The Poaceae data show a weak but positive relationship between GS and PS, with substantial scatter about the fitted regression lines, as well as some differences among the subfamilies in both PS and GS, demonstrating phylogenetic structuring of the data (Fig. 3C, Table 3).
Focusing in on Asteraceae pollen does not reveal any clear relationship between PS and GS, although there is a phylogenetic split between the Asteroideae (smaller pollen) and the Carduoideae (larger pollen) (Fig. 3D, Table 3, Supplementary Fig. 8). The pollen of Asteroideae is of a generally uniform size of 17–25 μm, despite ranging in GS across an order of magnitude (1754 Mbp in Symphyotrichum puniceum to 26,449 Mbp in Artemisia cana). There are only six Carduoideae species in the dataset, but these do suggest a more positive relationship between GS and PS or, alternatively, an increase in PS variance with GS: the largest three GS values are for species with both large (Echinops spinosissimus, 84 μm; and Echinops ritro, 59 μm) and—for Carduiodeae—smaller (Carlina acaulis, 37 μm) pollen (Fig. 3D).
Analysis 3: Testing the Fidelity of GS Estimates from GCL Data
The fitted P-GLS and OLS models of log10 GS regressed onto log10 GCL for the botanical garden data have the equations log10 GS = 0.22 + 2.32 × log10 GCL and log10 GS = −0.84 + 3.03 × log10 GCL, respectively. Using these equations to predict GS from the Lomax et al. (Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014) GCL dataset leads to a strong positive relationship between observed and predicted GS (Fig. 4). The RMSEP is 0.40 for the P-GLS model predictions and 0.39 for the OLS model predictions. The R 2 values from the regressions of observed onto predicted values, which give the proportion of variance in the measured GS data explained by the predictions, are 0.68 for both models. In the case of the P-GLS model predictions, the intercept and the slope from this regression are significantly different from 0 and 1, respectively (p = 0.0004 for both parameters), with overprediction at low GS values and underprediction at higher GS values (Fig. 4A). For the OLS model, the intercept and slope are indistinguishable from 0 and 1 (intercept, p = 0.38; slope, p = 0.59), and the predictions fall close to the 1:1 line between observed and predicted values (Fig. 4B).

Figure 4. Observed vs. predicted genome size (GS) for the Lomax et al. (Reference Lomax, Hilton, Bateman, Upchurch, Lake, Leitch, Cromwell and Knight2014) guard cell length data, with GS estimated using the (A) phylogenetic generalized least squares (P-GLS) and (B) ordinary least squares (OLS) models from the Analysis 1 botanical garden dataset (see also Fig. 2A). The dotted gray line shows the 1:1 relationship between observed and predicted values. The R 2 values and model coefficients are from regressions of observed onto predicted values, with the models shown as solid black lines and associated 95% confidence intervals shown as dashed lines. RMSEP, root-mean-square error of prediction.
Discussion
Here, we have carried out a series of analyses to test and compare the potential of sporomorph size and GCL to operate as paleo-GS proxies. These analyses provide mixed evidence for the influence of GS on sporomorph size. The majority of analyses demonstrated positive results, supporting some sort of relationship between these variables, but in most cases, the relationships are weak and the explanatory power of the regressions is low. Furthermore, unlike with GCL (discussed later in this section), there does not seem to be a consistent overall scaling relationship between GS and sporomorph size, with phylogenetic structuring and clade-specific effects documented (e.g., for different Pinaceae genera [Fig. 3B] and Asteraceae subfamilies [Fig. 3D, Supplementary Fig. 8]). These results are therefore consistent with those of Knight et al. (Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010) in suggesting that there is a limited scope for using sporomorph size as a generally applicable, cross-taxon GS proxy.
One exception is PS in Pinus, and possibly other Pinaceae genera, which showed a clearer relationship with GS (Fig. 3B, Supplementary Fig. 7). This finding is consistent with other analyses that have found a correlation between sporomorph size and GS within species (i.e., across ploidy series) (De Storme et al. Reference De Storme, Zamariola, Mau, Sharbel and Geelen2013) and genera (Barrington et al. Reference Barrington, Patel and Southgate2020), but it is not currently clear if this is a general property of sporomorph size at lower taxonomic ranks or if it is limited to specific taxa or pollen morphologies. The sacci on pollen types such as Pinus aid buoyancy and are thought to have allowed large and heavy pollen grains to evolve while maintaining high transport potential and pollination efficiency (Faegri and van der Pijl Reference Faegri and van der Pijl1971; Schwendemann et al. Reference Schwendemann, Wang, Mertz, McWilliams, Thatcher and Osborn2007; Leslie Reference Leslie2010); the relaxation of selective pressures on PS in these taxa may allow for a closer relationship between PS and GS. If this is correct, then it should also apply to other bisaccate taxa such as Picea and Abies, and it will be worth targeting these with additional sampling to examine these relationships further. If the correlation between PS and GS within Pinus demonstrated here is a more general property of sporomorphs at lower taxonomic ranks, then where sporomorphs can be confidently classified to genera there may be potential to use size variation as an indicator of GS, although this requires further validation and a careful consideration of the often-low taxonomic resolution of palynological data (Mander and Punyasena Reference Mander and Punyasena2014).
Contrary to the hypothesis of Knight et al. (Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010), we found no evidence that it is selective pressures toward smaller sizes that drive the lack of correlation between PS and GS. If the hypothesis of Knight et al. (Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010) was correct, then we would anticipate that an OU model of trait evolution would fit the PS data best, with evolutionary constraints limiting PS values about a selective optimum. However, for both PS and GS, a BM model of trait evolution was favored over an OU model (Table 4), while an OU model was supported for GCL. These results indicate limited constraints on PS relative to GCL and similar evolutionary dynamics underpinning the PS and GS data. Furthermore, although the botanical garden PS data are skewed toward smaller pollen grains (Supplementary Fig. 3C), the GCL distribution is similarly right skewed (Supplementary Fig. 3B), and PS is less constrained at the maximum end of the size spectrum than GCL. Taken together, these results are consistent with the idea that the weak relationship between PS and GS is not driven by constraints on the distribution of PS values.
One possible explanation for the low correlation between PS and GS is that pollination syndrome exerts a stronger control over PS relative to GS. The PS range of wind-pollinated taxa is generally more restricted than that of animal-pollinated taxa, and small or large pollen appear at least in part to be targeted toward specific pollinators or pollination strategies (Whitehead Reference Whitehead and Real1983; Konzmann et al. Reference Konzmann, Koethe and Lunau2019; Hao et al. Reference Hao, Tian, Wang and Huang2020). It is not clear that variations in pollination mechanism can entirely explain the lack of a consistent correlation between GS and PS, however, because even in cases where pollination mechanism is consistent across species, such as in the wind-pollinated Poaceae (Fig. 3C) or predominantly insect-pollinated Asteraceae (Fig. 3D), convincing evidence for GS exerting a strong control on PS is still lacking (Table 3). Post-pollination processes, including pollen germination, tube growth, and fertilization, have also been linked to PS variation, with scaling relationships between PS and flower style length/stigma depth observed in many taxa, and a possible role for competition among pollen grains on the stigma selecting for larger pollen grains (Torres Reference Torres2000; Cruden Reference Cruden2009; Hidalgo et al. Reference Hidalgo, Sánchez-Jiménez, Palazzesi, Loeuille and Garnatje2023). Finally, trade-offs between pollen size and number may also be important, with selective pressures on the number of pollen grains produced by plants possibly driving PS variations as a by-product, at least in plant taxa with smaller pollen grains (Hao et al. Reference Hao, Tian, Wang and Huang2020). Similar number–size trade-offs have also been suggested for spore-producing plants, and in the homosporous fern species that dominate the spore dataset (Fig. 3A), selective pressures on resource allocation in bisexual isospores to maximize the reproductive success of gametophytes (Haig and Westoby Reference Haig and Westoby1988) possibly override the control of GS on spore size. These findings suggest that fully understanding why sporomorph size varies as it does will involve a careful consideration of plant resource allocation, flower morphology, pollination mechanism/spore dispersal, and phylogeny, in addition to a possible role for GS.
Overall, these results add to a growing body of research suggesting that sporomorph size, while straightforward to measure in both extant and fossil material, and therefore tempting to include in trait-based palynological analyses (e.g., Schüler and Behling Reference Schüler and Behling2010; Griener and Warny Reference Griener and Warny2015; Radaeski et al. Reference Radaeski, Cunha and Bauermann2019), is challenging to use meaningfully in paleoecological research (Knight et al. Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010; Jardine and Lomax Reference Jardine and Lomax2017; Wei et al. Reference Wei, Jardine, Gosling and Hoorn2023). Previous analyses have reported a range of possible controlling mechanisms such as climate, vegetation type, photosynthetic pathway, and ploidy/GS, but effect sizes are typically weak, and phylogenetically -informed analyses such as those presented here and elsewhere (e.g., Knight et al. Reference Knight, Clancy, Götzenberger, Dann and Beaulieu2010; Wei et al. Reference Wei, Jardine, Gosling and Hoorn2023) have failed to find consistent relationships that could form the basis for widely applicable proxies (with the possible exception of size-based discrimination of domesticated and wild grass pollen in Holocene pollen records, although this in itself is not without complications [e.g., Bottema Reference Bottema1992; Tweddle et al. Reference Tweddle, Edwards and Fieller2005]).
While the analysis of sporomorph size has not yielded encouraging results for paleo-proxy development, other paleobotanical remains hold greater potential for trait-based analyses of the fossil record. In the present study, the results focusing on GCL suggest considerable scope for estimating GS from fossil specimens. To our knowledge, this is the first time that an assessment of predictive accuracy using independent validation data has been carried out for this approach. The results suggest that log10 GCL can be used in angiosperm-wide datasets as a proxy for log10 GS, although estimates from the P-GLS regression model were somewhat biased, with overprediction at low GS values and underprediction at higher GS values, and further research is required to improve the accuracy of predictions (Fig. 4). A key challenge to using GCL to estimate GS is the influence of environmental variation, especially with regard to stomatal size adaptations to habitat and moisture availability (Jordan et al. Reference Jordan, Carpenter, Koutoulis, Price and Brodribb2015; McElwain and Steinthorsdottir Reference McElwain and Steinthorsdottir2017; Roddy et al. Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020; Veselý et al. Reference Veselý, Šmarda, Bureš, Stirton, Muasya, Mucina and Horová2020). Roddy et al. (Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020) also showed that interspecific GCL variability is greater in taxa with smaller genomes, which is consistent with our botanical garden results (Fig. 2A). Assessing these issues across taxa, and understanding under what conditions or within which taxa GS exerts a weaker control on GCL, is therefore the next step in developing a full understanding of this relationship, which will allow for a greater confidence around using this proxy for reconstructing changes in GS through geological time. While data from other plant groups are also needed to test the applicability of using GCL as a GS proxy across the non-angiosperm tracheophytes, support for a consistent scaling relationship between GS and GCL across the angiosperms, gymnosperms, and ferns (Simonin and Roddy Reference Simonin and Roddy2018) suggests that this proxy could be applied across a wide range of taxa and time periods.
Before deploying this method to directly estimate the GS of fossil plants, it is important to consider whether GCL is the most appropriate guard cell parameter to use as a proxy. Cell volume is expected to scale with GS (Roddy et al. Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020), and changes in stomatal shape at higher genome sizes (Šmarda et al. Reference Šmarda, Klem, Knapek, Vesela, Vesela, Holub, Kuchar, Silerova, Horova and Bures2023) may mean that aspects of cell volume increase are not incorporated into GCL measurements. This may also explain the constraints on GCL evolution indicated by the support for the OU model for the GCL measurements (Table 3). Cell volume is likely to be challenging to measure directly in fossil plant specimens where only the cuticle is preserved, however. Other approaches to relating guard cell size to GS include using guard cell width cubed as a more direct estimate of cell nucleus volume (Franks et al. Reference Franks, Freckleton, Beaulieu, Leitch and Beerling2012) and using simplifying assumptions of guard cell shape to estimate cell volume from GCL (Roddy et al. Reference Roddy, Théroux-Rancourt, Abbo, Benedetti, Brodersen, Castro and Castro2020). While these approaches deserve more consideration, neither is appropriate for working with the dumbbell-shaped guard cells of grasses (Poaceae), which while not common in the fossil leaf or cuticle record have been recorded in Cretaceous and Cenozoic deposits (Morley and Richards Reference Morley and Richards1993; Prasad et al. Reference Prasad, Stromberg, Leache, Samant, Patnaik, Tang, Mohabey, Ge and Sahni2011). Furthermore, the uncertainty around scaling issues that use aspects of stomatal geometry to estimate volume have the potential to add further uncertainty into the predictive models. Consequently, working with GCL directly, given the ease of making these measurements on plant fossils, seems the most productive way forward, while prioritizing gaining a better understanding of the biasing factors that add variance to the relationship with GS.
An additional challenge to using GCL to estimate GS is the presence of sunken stomata in many extant and fossil taxa. This includes most acrogymnosperms, some angiosperms, horsetails (Equisetum), and many pteridosperms (seed ferns) (Kerp Reference Kerp1990; Cullen and Rudall Reference Cullen and Rudall2016; Liang et al. Reference Liang, Leng, Höfig, Niu, Wang, Royer, Burke, Xiao, Zhang and Yang2022; Šantrůček Reference Šantrůček2022). When working with extant taxa with sunken stomata, the leaves can be cleared and stained, or their cuticles and epidermises isolated, which allows the full guard cell extent to be measured (Jordan et al. Reference Jordan, Carpenter, Koutoulis, Price and Brodribb2015; Porter et al. Reference Porter, Evans-Fitz.Gerald, Yiotis, Montañez and McElwain2019; Liang et al. Reference Liang, Leng, Höfig, Niu, Wang, Royer, Burke, Xiao, Zhang and Yang2022). In many dispersed fossil cuticle specimens, the entire guard cell is not visible, however (Liang et al. Reference Liang, Leng, Höfig, Niu, Wang, Royer, Burke, Xiao, Zhang and Yang2022), which means that the only option will be to estimate GCL from the stomatal PL, assuming this itself is recoverable from the cuticle. As noted in the “Materials and Methods,” a number of different scaling factors have previously been used to convert from PL to GCL or vice versa, although none were entirely consistent with the calibration data we generated here (Supplementary Fig. 1). Further research is therefore needed to develop calibration functions, which will not only be useful for GS prediction but also for estimating stomatal conductance (Simonin and Roddy Reference Simonin and Roddy2018) and pCO2 (Franks et al. Reference Franks, Royer, Beerling, Van de Water, Cantrill, Barbour and Berry2014), where GCL rather than PL has been measured.
Conclusion
Here, we have analyzed the relationship between sporomorph size, stomatal GCL, and GS. There seems to be limited scope for using fossil sporomorph size as a proxy for GS, although focusing on the sporomorph–GS relationship at lower taxonomic ranks (i.e., within genera) may produce stronger relationships. Contrary to expectations, the weak relationship between PS and GS is not driven by selective pressures for pollen to be small, and pollination biology and post-pollination processes may be more important for controlling PS distributions. Consistent with previous literature, we found a robust relationship between GS and GCL. Testing the predictive power of this relationship with independent validation data suggests that quantitative paleo-GS estimates are possible with GCL measurements, although further work is needed to understand drivers of variation in both GS and guard cell size, so that the accuracy of this proxy can be better constrained.
Acknowledgments
We thank D. S. Bauer for support with sample collection from the University of Münster Botanical Garden. S. Punyasena, G. Jordan, and two anonymous reviewers are thanked for reviews that greatly improved the paper. P.E.J. acknowledges funding from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) project nos. 443701866 and 521558051. B.H.L. acknowledges funding from the Natural Environment Research Council (NE/T000392/1).
Competing Interests
The authors declare no competing interests.
Data Availability Statement
All data and R code, Supplementary Tables 1–9, and Supplementary Figures 1–8 can be accessed at: https://doi.org/10.6084/m9.figshare.25074524.