Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-27T16:53:48.830Z Has data issue: false hasContentIssue false

Polygenic Selection and Environmental Influence on Adult Body Height: Genetic and Living Standard Contributions Across Diverse Populations

Published online by Cambridge University Press:  06 December 2024

Davide Piffer*
Affiliation:
Independent researchers
Emil O.W. Kirkegaard
Affiliation:
Independent researchers
*
Corresponding author: Davide Piffer; Email: pifferdavide@gmail.com

Abstract

We analyzed whole-genome sequencing (WGS) data from 51 populations and combined WGS and array data from 89 populations. Multiple types of polygenic scores (PGS) were employed, derived from multi-ancestry, between-family genome-wide association study (GWAS; MIX-Height), European-ancestry, between-family GWAS (EUR-Height), and European-ancestry siblings GWAS (SIB-Height). Our findings demonstrate that both genetic and environmental factors significantly influence adult body height between populations. Models that included both genetic and environmental predictors best explained population differences in adult body height, with the MIX-Height PGS and environmental factors (Human Development Index [HDI] + per capita caloric intake) achieving an R2 of .83. Our findings shed light on Deaton’s ‘African paradox’, which noted the relatively tall stature of African populations despite poor nutrition and childhood health. Contrary to Deaton’s hypotheses, we demonstrate that both genetic differences and environmental factors significantly influence body height in countries with high infant mortality rates. This suggests that the observed tall stature in African populations can be attributed, in part, to a high genetic predisposition for body height. Furthermore, tests of divergent selection based on the QST (i.e., standardized measure of the genetic differentiation of a quantitative trait among populations) and FST (neutral marker loci) measures exceeded neutral expectations, reaching statistical significance (p < .01) with the MIX-Height PGS but not with the SIB-Height PGS. This result indicates potential selective pressures on body height-related genetic variants across populations.

Type
Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of International Society for Twin Studies

Human body height (stature) is a complex trait that exemplifies the intricate interplay between genetic predispositions and environmental influences. Twin and family studies have consistently demonstrated the high heritability of body height, with estimates ranging from 80−90% in Western adult populations (Grasgruber et al., Reference Grasgruber, Sebera, Hrazdíra, Cacek and Kalina2014; Silventoinen et al., Reference Silventoinen, Sammalisto, Perola, Boomsma, Cornes, Davis, Dunkel, De Lange, Harris, Hjelmborg, Luciano, Martin, Mortensen, Nisticò, Pedersen, Skytthe, Spector, Stazi, Willemsen and Kaprio2003). However, this strong genetic component presents a paradox when juxtaposed against the marked increase in average body height observed globally over the past century, largely attributed to improvements in living conditions.

Heritability, in this context, refers to the proportion of variance in a trait within a population that can be attributed to genetic differences. High heritability means that genetic differences play a substantial role in determining individual differences in body height. However, it is crucial to understand that even a heritability of 100% does not preclude environmental influences on a trait in the more general sense. For instance, a new kind of environmental cause may be introduced to change the phenotype. If the distribution of this cause varies between people, the heritability will be decreased (total variance increases while genetic variance remains constant). Or, it may leave heritability unchanged if the causal factor is evenly distributed and lacks interactive effects with genetic causes (total variance remains the same, but the mean is changed). Because of this, heritability estimates can vary across different environments and populations; they are not strictly speaking a property of the trait itself, but a population statistic that depends on the context. The large increase in human height seen over the last few hundred years, then, is not a paradox because it shows that while genetic causation dominates individual differences within a cohort of a population, environmental factors between cohorts may have a large effect.

While we do not know for certain what the causes of the historical increase in height are, nutrition, healthcare access and overall socioeconomic development have been linked (NCD Risk Factor Collaboration, 2020). Specific factors include protein and overall energy (caloric) intake, disease prevalence, infant health, and living conditions such as access to clean water and sanitation (Checkley et al., Reference Checkley, Buckley, Gilman, Assis, Guerrant, Morris, Mølbak, Valentiner-Branth, Lanata and Black2008; Martorell & Zongrone, Reference Martorell and Zongrone2012; Prüss-Üstün et al., Reference Prüss-Üstün, Bos, Gore and Bartram2014; Victora et al., Reference Victora, Adair, Fall, Hallal, Martorell, Richter and Sachdev2008).

These environmental factors influence body height through various biological mechanisms, as must be the case for any variable affecting humans. Adequate nutrition, particularly protein intake, provides the building blocks necessary for bone growth and development. Caloric intake ensures sufficient energy for growth processes. Disease prevalence in childhood can impact body height by diverting energy from growth to immune responses and by interfering with nutrient absorption. Early life health conditions, including maternal health during pregnancy and infant nutrition, set the stage for future growth trajectories. As such, access to clean water and sanitation reduces the risk of infections that could impair growth. Together, these factors create an environmental context that either supports or hinders an individual’s potential for body height.

Height’s sensitivity to environmental conditions has made it a valuable proxy for living standards in economic and social science research. Variations in body height across populations and time periods often reflect cumulative impacts of nutrition, healthcare and socioeconomic conditions during growth periods, with taller stature generally associated with better childhood conditions and improved later-life outcomes (Deaton, Reference Deaton2007). Using body height as an indicator of development between populations or across long time spans, however, does rely on a blank slatist assumption that populations do not differ in their genetic potential for body height, and that these do not change over time (no selection or genetic drift). This assumption is in question in the light of persistent height gaps between modern populations, even when these grow up in the same countries, and because studies of both modern and ancient genomes show that height was under selection (Piffer & Kirkegaard, Reference Piffer and Kirkegaard2024; Stulp et al., Reference Stulp, Barrett, Tropf and Mills2015; Stulp et al., Reference Stulp, Bonnell and Barrett2023).

To investigate these dynamics, we employed regression models to assess the impact of height polygenic scores (PGSs) and environmental factors on body height across different countries and ethnic groups. In the field of evolutionary ecology, the potential association between populations’ phenotype and their local environmental conditions is a crucial step in identifying the selective pressures responsible for adaptive phenotypic differentiation (Blanco-Sánchez et al., Reference Blanco-Sánchez, Ramírez-Valiente, Ramos-Muñoz, Pías, Franks, Escudero and Matesanz2024; Blanquart et al., Reference Blanquart, Kaltz, Nuismer and Gandon2013). We utilized the Human Development Index (HDI) as a general indicator of living standards, infant mortality rates to indicate infant health, and measures of daily per capita protein and total energy intake, along with the prevalence of wasting, as indicators of nutrition.

PGSs are numerical estimates of an individual’s genetic predisposition or potential for a trait, based on the total effect of many genetic variants identified through genome-wide association studies (GWAS). These scores are typically calculated by summing the number of trait-associated alleles an individual carries, weighted by the effect size of each allele. PGSs provide a way to quantify the genetic component of complex traits like body height, allowing for the investigation of genetic influences across populations.

This study sought to assess the validity of Deaton’s (Reference Deaton2007) hypothesis, which suggests that the positive correlation between health, nutrition and average height observed in affluent nations does not hold true for developing countries. Furthermore, we investigated the paradoxical phenomenon highlighted by Deaton, wherein African populations exhibit relatively tall statures despite poorer childhood health and nutritional conditions. Our research aimed to determine whether genetic factors could account for this unexpected trend.

Genetic variation across populations often exhibits spatial patterns, with geographically proximate groups showing greater genetic similarity. This phenomenon, known as spatial or genetic autocorrelation, can be quantified using measures such as the fixation index (FST), which assesses population differentiation (genetic distance) based on allele frequencies. By incorporating these metrics, we can account for shared ancestry and migration patterns, enabling a more nuanced analysis of the relationship between genetic factors and body height across diverse populations. This approach also facilitates the identification of signals indicating local adaptation (Berg & Coop, Reference Berg and Coop2014). To further investigate potential selective pressures, we employed two standard population genetics tests for divergent selection: the QST-FST comparison (Spitze, Reference Spitze1993) and the FST-enrichment test (Guo et al., Reference Guo, Wu, Zhu, Zheng, Trzaskowski, Zeng, Robinson, Visscher and Yang2018). Significant differences between population genetic differentiation in phenotypic traits (QST) and population differentiation in molecular markers driven only by neutral processes (FST) are interpreted as evidence of selection (Leinonen et al., Reference Leinonen, McCairns, O’Hara and Merilä2013; Merilä & Crnokrak, Reference Merilä and Crnokrak2001; Whitlock, Reference Whitlock2008; Whitlock & Guillaume, Reference Whitlock and Guillaume2009.

By integrating diverse genetic datasets with detailed environmental data, this study aims to provide a comprehensive understanding of how genetic predispositions and environmental contexts interact to determine body height variations globally. This research seeks to contribute to the broader discourse on human growth and development by elucidating the relative contributions of genetic and environmental factors.

Methods

Genome Samples

Samples were collected from a variety of public databases and were classified by the technology employed to genotype the DNA (micro-array chips vs whole-genome sequencing (WGS).

The WGS dataset was comprised of the following: 1000 Genomes (1000 Genomes Project Consortium, 2015), gnomAD (Karczewski et al., Reference Karczewski, Francioli, Tiao, Cummings, Alfoldi, Wang, Collins, Laricchia, Ganna, Birnbaum, Gauthier, Brand, Solomonson, Watts, Rhodes, Singer-Berk, England, Seaby, Kosmicki and MacArthur2020), SweGen (Rentoft et al., Reference Rentoft, Svensson, Sjödin, Olason, Sjöström, Nylander, Osterman, Sjögren, Netotea, Wibom, Cederquist, Chabes, Trygg, Melin and Johansson2019), Genome of the Netherlands (GoNL, nlgenome.nl), 1000 Polish genomes (Kaja et al., Reference Kaja, Lejman, Sielski, Sypniewski, Gambin, Dawidziuk, Suchocki, Golik, Wojtaszewska, Mroczek, Stępień, Szyda, Lisiak-Teodorczyk, Wolbach, Kołodziejska, Ferdyn, Dąbrowski, Woźna, Żytkiewicz, Bodora-Troińska and Sztromwasser2022), NARD (Yoo et al., Reference Yoo, Kim, Kim, Kim, Shin, Kim, Yang, Lo, Cho, Matsuda, Schuster, Kim, Kim and Seo2019), Turkish Genome Project (Alkan et al., Reference Alkan, Kavak, Somel, Gokcumen, Ugurlu, Saygi, Dal, Bugra, Güngör, Sahinalp, Özören and Bekpen2014), Taiwan Genomes (Hsu et al., Reference Hsu, Wu, Shih, Liu, Tsai, Lee, Chen, Tseng, Lo, Lin, Chen, Chen, Chou, Chang, Su, Guo, Mao, Chen and Chen2023), Mexico City Prospective Study (Ziyatdinov et al., Reference Ziyatdinov, Torres, Alegre-Díaz, Backman, Mbatchou, Turner, Gaynor, Joseph, Zou, Liu, Wade, Staples, Panea, Popov, Bai, Balasubramanian, Habegger, Lanche, Lopez and ¼ Tapia-Conyer2023), Korean Variant Archive (Lee et al., Reference Lee, Lee, Jeon, Lee, Jang, Yang, Park, Lee, Choi, Choi, Gee, Oh, Jang, Lee, Baek, Koh, Yoon, Kim, Chae, Park and Choi2022), ABraOM (Naslavsky et al., Reference Naslavsky, Scliar, Yamamoto, Wang, Zverinova, Karp, Nunes, Ceroni, de Carvalho, da Silva Simões, Bozoklian, Nonaka, Dos Santos Brito Silva, da Silva Souza, de Souza Andrade, Passos, Castro, Mendes-Junior, Mercuri and – Zatz2022), DanMAC5 (https://danmac5.cpr.ku.dk/) covering 51 ethnic groups in total.

The array dataset was compiled from a variety of open access online resources. We imputed the datasets after QC using TOPMed imputation server (Das et al., Reference Das, Forer, Schönherr, Sidore, Locke, Kwong, Vrieze, Chew, Levy, McGue, Schlessinger, Stambolian, Loh, Iacono, Swaroop, Scott, Cucca, Kronenberg, Boehnke and Fuchsberger2016) with Minimac imputation (Fuchsberger et al., Reference Fuchsberger, Abecasis and Hinds2014), covering 89 ethnic groups in total.

GWASs Used for Polygenic Scores

We utilized results from multiple recent GWASs in order to test the robustness of the results. Each model was named after the population it was trained or, or the method used.

  1. 1. MIX-Height. For body height, we used the significant single nucleotide polymorphisms (SNPs) from the largest GWAS to date (Yengo et al., Reference Yengo, Vedantam, Marouli, Sidorenko, Bartell, Sakaue, Graff, Eliasen, Jiang, Raghavan, Miao, Arias, Graham, Mukamel, Spracklen, Yin, Chen, Ferreira, Highland and Hirschhorn2022), which comprised a multi-ancestry sample (after LD pruning with a threshold of r 2 < .1).

  2. 2. EUR-Height. We used the significant SNPs from the European-ancestry subsample of the Okbay et al. (Reference Okbay, Wu, Wang, Jayashankar, Bennett, Nehzati, Sidorenko, Kweon, Goldman, Gjorgjieva, Jiang, Hicks, Tian, Hinds, Ahlskog, Magnusson, Oskarsson, Hayward, Campbell, Porteous and Young2022) GWAS.

  3. 3. SIB-Height. Summary statistics for sibship (within-family) GWAS of body height were retrieved from the largest meta-analysis of sibship GWAS (Howe et al., Reference Howe, Nivard, Morris, Hansen, Rasheed, Cho, Chittoor, Ahlskog, Lind, Palviainen, van der Zee, Cheesman, Mangino, Wang, Li, Klaric, Ratliff, Bielak, Nygaard, Giannelis and Davies2022). There were 290 SNPs that remained significant after clumping and applying a significance threshold (p < 5 × 10-8, LD r 2 < .1). Within-family GWASs, specifically sibship studies, offer unique advantages in genetic research. By comparing siblings who share approximately 50% of their genetic material and a common family environment, these studies can better isolate the effects of specific genetic variants on body height. This approach helps control for confounding factors such as population stratification and shared environmental influences that can bias traditional GWAS results. However, sibship studies also have limitations, including smaller sample sizes and reduced statistical power due to the lower phenotypic and genotypic variance within families compared to between families. Despite these practical constraints, sibship GWAS provide valuable insights into the genetic architecture of complex traits like body height (Howe et al, Reference Howe, Nivard, Morris, Hansen, Rasheed, Cho, Chittoor, Ahlskog, Lind, Palviainen, van der Zee, Cheesman, Mangino, Wang, Li, Klaric, Ratliff, Bielak, Nygaard, Giannelis and Davies2022).

Health, Nutrition and Living Standards

Food and Agriculture Organization (FAO) daily per capita protein supply estimates were obtained from Our World in Data (https://ourworldindata.org/grapher/daily-per-capita-protein-supply) and the average infant mortality (from birth to 1-year-olds only) prevalence by country for the years 1995−2000 was obtained from the United Nations 2022 Revision of World Population Prospect (https://population.un.org/wpp/).

Wasting prevalence (the share of child under 5 years old that fall two standard deviations below the expected weight for their height) was obtained from the estimates provided by the World Bank (https://ourworldindata.org/wasting-definition) for the years 1995−2000.

For HDI, we used the estimates provided by the United Nations Development Programme (2024). We obtained subnational HDI estimates for Italy, Spain, China and the US from www.globaldatalab.org.

Average daily dietary energy consumption per capita for 2018 were obtained from the report published by the FAO (Roser et al., Reference Roser, Ritchie and Rosado2013).

Phenotypes

Average measured body height (the phenotype, or trait) was obtained mainly from the largest meta-analysis by the NCD-RisC factor collaboration (2020). Other sources were used for subregions that were not covered by the latter. The body height estimates for the Chinese provinces were obtained from Piffer and Kirkegaard (Reference Piffer and Kirkegaard2024), who relied on two sources: (1) an analysis of 57,574 samples from the Chinese General Social Survey (CGSS; Lu et al., Reference Lu, Hu, Yang, Zhang, Lu, Gong, Li, Shen, Zhang and Zhuang2022), and (2) an analysis based on data from 660K users collected from the big data platform ‘Xiangshan Weighing Instrument Group’ (2022年中国居民身高体重健康数据报告 [2022 Height and Weight Health Data Report of Chinese Residents]). For Sudan Nilotic, Chali (Reference Chali1995) was used. For Italy, we obtained regional estimates from Corsini (Reference Corsini2009).

Spatial Autocorrelation

Spatial autocorrelation measures how a variable correlates with itself across geographic locations, determining whether nearby observations are more similar (positive autocorrelation) or dissimilar (negative autocorrelation) than expected by chance.

A typical example of positive spatial autocorrelation is housing prices, where high-value properties cluster together, as do lower priced ones, driven by shared factors like neighborhood amenities and socioeconomic conditions. Negative spatial autocorrelation can be seen in the placement of competing retail stores, which tend to spread out to avoid market saturation, placing distance between themselves and direct competitors.

In population genetics, spatial autocorrelation is often assessed by comparing geographic distances with allele frequencies or genetic distance measures like FST (Sokal et al., Reference Sokal, Jacquez and Wooten1989). Piffer (Reference Piffer2015) adapted this methodology, using FST distances as a proxy for spatial similarity, replacing geographic distances, and focusing on the absolute differences in PGSs to assess similarity at loci under selection. This innovative approach allowed for the first-time testing of autocorrelation purely at the genetic level, and by combining it with the partial Mantel test, it quantified how well PGSs predicted phenotypic differences while controlling for neutral genetic variation.

Mantel Test

To assess the relationship between genetic and phenotypic distances across populations, we employed the Mantel test (Hubert et al., Reference Hubert, Golledge and Costanzo1981; Mantel, Reference Mantel1967; Sokal, Reference Sokal1979). This nonparametric method evaluates the correlation between two distance matrices.

The Mantel test calculates a z statistic, defined as:

$${Z_{AB}} = \sum\nolimits_{ij} {{a_{ij}}{b_{ij}}} $$

Where:

Xij and Yij are elements of the two distance matrices X and Y, respectively.

This z statistic is then compared to its distribution under the null hypothesis of no correlation between the matrices. The significance of z is assessed through a permutation procedure:

The observed z statistic is calculated for the original matrices.

One matrix is randomly permuted, and z is recalculated.

This process is repeated numerous times to generate a null distribution.

The p value is determined as the proportion of permuted z statistics that are equal to or more extreme than the observed z statistic (Mantel, Reference Mantel1967; Mielke, Reference Mielke1979).

We used this test to evaluate the correlation between genetic distances (as measured by FST) and differences in body height or PGSs between populations. A significant positive correlation would indicate that genetically distant populations tend to have more divergent body height or PGSs, potentially suggesting a role for population structure or local adaptation in body height variation.

Moran’s I

To assess the spatial autocorrelation of body height and PGSs across populations, we employed Moran’s I statistic (Moran, Reference Moran1950). Moran’s I is a measure of global spatial autocorrelation that quantifies the degree to which a variable is correlated with itself through space. The statistic ranges from −1 to +1, where:

Values of +1 indicate perfect positive spatial autocorrelation (similar values cluster together).

Values of −1 indicate perfect negative spatial autocorrelation (dissimilar values cluster together). Values near 0 suggest random spatial distribution.

Moran’s I is calculated as:

$$I = {N \over W} \cdot {{\sum\nolimits_i {\sum\nolimits_j {{w_{ij}}({x_i} - \overline x)({x_j} - \overline x)} } } \over {\sum\nolimits_i {{{({x_i} - \overline x)}^2}} }}$$

Where:

N is the number of spatial units; x is the variable of interest; ${\rm{\overline x}}$ is the mean of x; wij is the spatial weight between locations i and j; W is the sum of all spatial weights.

We used a distance-based weight matrix, where weights were inversely proportional to the geographic distance between populations. Statistical significance was assessed through a permutation test with 999 randomizations. This analysis allows us to determine whether the spatial distribution of body height and PGSs exhibits significant clustering or dispersion across the studied populations.

Spatial Autoregressive and Spatial Error Model

The spatial autoregressive model (SAR) incorporates spatial dependence directly into the dependent variable. It assumes that the value of the dependent variable in a given location is influenced by the values of the dependent variable in neighboring locations. This spatial dependence is captured through a spatial lag term, which represents the average values of the dependent variable in the neighboring regions, weighted by the spatial proximity of those regions.

The SAR model is formulated as follows:

$${\rm{Y = \rho WY + X\beta + \varepsilon Y}}$$

Where:

  • Y is the dependent variable.

  • ρ is the spatial autoregressive coefficient, which captures the strength of the spatial dependence.

  • W is the spatial weights matrix, which defines the structure of spatial relationships (e.g., neighbors based on geographic or genetic distance).

  • Xβ represents the explanatory variables and their coefficients.

  • ε is the error term.

In this model, ρ indicates the degree to which the dependent variable in one location is influenced by the dependent variables in neighboring locations. The model can be used to measure both direct and indirect (spillover) effects of the explanatory variables. The SAR model is useful when there is reason to believe that the outcome at one location is influenced by outcomes in nearby locations. The spatial error model (SEM) accounts for spatial dependence in the error terms, rather than in the dependent variable itself. This is based on the idea that unobserved factors, which influence the dependent variable, are spatially correlated. In this case, spatial autocorrelation arises from omitted variables or measurement errors that are spatially clustered, rather than the dependent variable being directly influenced by neighboring values.

The SEM is formulated as follows:

$$Y = X\beta + \epsilon $$

With the error term specified as:

$$\epsilon = \lambda W\epsilon + u$$

Where:

  • Y is the dependent variable.

  • Xβ represents the explanatory variables and their coefficients.

  • ε is the error term, which exhibits spatial autocorrelation.

  • λ is the spatial autoregressive parameter for the errors, capturing the extent to which spatially correlated unobserved factors influence the outcome.

  • W is the spatial weights matrix, similar to the SAR model.

  • u is a white noise error term.

The SEM is useful when there is spatial autocorrelation in the residuals, indicating that unobserved spatial processes are influencing the outcome. This model helps correct for this spatial autocorrelation, improving the overall model fit and accuracy.

The following steps were employed to implement the spatial autoregressive and spatial error models:

  1. 1. Defining F ST thresholds. The thresholds variable defined different FST values (e.g., 0.03, 0.05, 0.07, 0.09). These thresholds determined which populations were considered neighbors based on genetic distance.

  2. 2. Filtering neighbors by F ST . The genetic distance matrix was filtered for each threshold to retain only neighbors with FST values less than the specified threshold. This ensured that only populations with sufficient genetic similarity (i.e., below the threshold) were treated as neighbors.

  3. 3. Neighbor list creation. After filtering the distance matrix, a neighbor list was created for each population, including only those populations that met the FST threshold.

  4. 4. Spatial weights list. The neighbor list was converted into a spatial weights list, representing the influence of each population on its neighbors. This spatial weights list was used as input to the spatial models.

  5. 5. Model and impact calculation. For each FST threshold, a spatial autoregressive model was fitted using the spatial weights list. The direct, indirect, and total impacts of the spatial lag were then calculated, and the results were analyzed.

This process was repeated for each FST threshold to assess the influence of genetic similarity on the spatial relationships between populations.

Tests of Divergent Selection

QST was computed using the formula QST = σ²B/(σ²B + 2σ²W) (Leinonen et al., Reference Leinonen, McCairns, O’Hara and Merilä2013). QST is defined as the level of genetically based population differentiation in quantitative traits (Li et al., Reference Li, Löytynoja, Fraimout and Merilä2019).

The total (additive) genetic variance is the variance of the PGSs across all individuals in all populations. The genetic variance within populations is the average variance of the PGSs within each population, weighted by the number of individuals in each population.

QST is then calculated as the genetic variance among populations divided by the sum of the genetic variance among populations and twice the genetic variance within populations.

In fact, population differences in the mean of a quantitative trait due to positive covariances — that is, (cross-population) linkage disequilibrium (LD) — between distant variants can arise for polygenic traits under divergent selection. In practice, this happens when alleles with similar effects are driven to similar frequencies within populations across multiple loci (Latta, Reference Latta1998; Le Corre & Kremer, Reference Le Corre and Kremer2003; Ma et al., Reference Ma, Hall, Onge, Jansson and Ingvarsson2010). The other component of genetic differentiation in quantitative traits, FST, does not take into account this covariance of allelic effects and it was shown to be usually very small for highly polygenic traits subject to divergent selection pressures (Berg & Coop, Reference Berg and Coop2014).

Indeed, FST is based on the variances of individual allele frequencies, which are unsigned, meaning that positive and negative effects of alleles on a trait can cancel each other out, leading to an underestimation of the true extent of genetic differentiation in polygenic traits.

Conversely, there may be substantial levels of genetic differentiation (FST > 0.15) without any variations in the population means (QST = 0; Le Corre & Kremer, Reference Le Corre and Kremer2012).

If QST/FST quantity is higher than 1, and p < .05, the quantitative trait in question is inferred to have been subject to divergent selection.

To produce QST values free of the cross-population (long-range) LD, the effect and noneffect GWAS alleles were randomly shuffled with a probability of .5 to produce a null distribution of PGSs and calculate random QST values. The expected value of QST is equal to the FST of GWAS SNPs (FSTQ; Le Corre & Kremer, Reference Le Corre and Kremer2012) because this operation removes the variance due to cross-population linkage disequilibrium. We filtered variants with MAF < 0.01 (minor allele frequency) in any of the 5 1KG superpopulations. By filtering out variants with MAF < 0.01, the analysis focuses on variants that are more reliably measured, better understood, and more consistent across populations, thereby producing more robust and interpretable QST estimates that align with the theoretical expectations set forth by studies like Le Corre and Kremer (Reference Le Corre and Kremer2012).

Results

Tables 1 and 2 present the following data for each GWAS: the number of independent SNPs after linkage disequilibrium (LD) clumping and thresholding (r² < .1 and p < 5 × 10−8); the number of matching SNP IDs found in the WGS and WGS + array datasets; the Cronbach’s alpha of the resulting PGSs.

Table 1. WGS dataset

Note: WGS, whole-genome sequencing; GWAS, genomewide association study; SNP, single-nucleotide polymorphism.

Table 2. WGS + array dataset

Note: WGS, whole-genome sequencing; GWAS, genomewide association study; SNP, single-nucleotide polymorphism.

The PGSs computed using the WGS dataset consistently demonstrated higher match rates and internal consistency compared to those derived from the WGS + array dataset. This difference in performance is likely due to the more comprehensive genomic coverage provided by WGS.

Among the different GWAS sources, the multi-ancestry GWAS (MIX-Height) yielded PGSs with the highest internal consistency, as measured by Cronbach’s alpha. Specifically, the MIX-Height PGS achieved alpha values of .85 and .49 in the WGS and WGS + array datasets respectively. In contrast, the sibship GWAS (SIB-Height) produced PGS with lower internal consistency, with alpha values of .51 and .21 in the WGS and WGS + array datasets respectively.

These results suggest that the choice of GWAS source and genomic data type (WGS vs. WGS + array) significantly impacts the reliability and consistency of the resulting PGSs. The superior performance of the multi-ancestry GWAS-derived scores may indicate its greater generalizability across diverse populations.

Correlation of Polygenic Scores With Phenotypic and Environmental Variables

Our analysis revealed consistent patterns of correlation between average PGSs, environmental variables, and phenotypic body height across both the WGS and WGS + array datasets (Figures 1 and 2).

WGS Dataset

Environmental factors positively associated with body height included daily per capita energy intake, Human Development Index (HDI), and daily per capita protein intake, with correlations of 0.73, 0.70, and 0.62, respectively. Conversely, wasting prevalence and infant mortality were negatively correlated with body height, with correlations of −.61 and −.59 respectively. PGSs also showed positive correlations with average body height, ranging from .26 for SIB-Height to .70 for EUR-Height-2014.

WGS + Array Dataset

Environmental factors positively correlated with body height included daily per capita energy intake, HDI, and daily per capita protein intake, with correlations of .60, .56 and .50 respectively. Wasting prevalence and infant mortality were negatively correlated with body height, with correlations of −.52 and −.53 respectively. PGSs were also positively correlated with average body height, ranging from .34 for EUR-Height to .70 for EUR-Height-2014.

These results highlight consistent patterns across both datasets, with environmental factors and PGSs showing strong correlations with average body height.

Figures 1a and 1b (WGS and WGS + array dataset respectively). Correlation matrix of PGS means and phenotypic means.

Figure 1a. Whole genome sequencing (WGS) dataset: correlation matrix of polygenic score means and phenotypic means.

Note: *p < .05, **p < .01, ***p < .001.

Figure 1b. Whole genome sequencing (WGS) + array dataset: correlation matrix of polygenic score means and phenotypic means.

Note: *p < .05, **p < .01, ***p < .001.

Figures 2a and 2b display scatterplots visualizing the correlations between MIX-Height and SIB-Height with average body height within the WGS and WGS + array datasets respectively.

Figure 2a. Whole genome sequencing (WGS) dataset with scatterplots of MIX-Height, SIB-Height and average measured body height.

Figure 2b. Whole genome sequencing (WGS) and array dataset with scatterplots of MIX-Height, SIB-Height and average measured body height.

Regression Models

Table 3 summarizes the results of 30 models examining the relationship between PGSs, environmental factors, and body height across the two datasets (WGS and WGS + array). Per capita (log) GDP was excluded due to its high correlation (.95−.96) with HDI, which is a superior indicator of overall socioeconomic development.

Table 3. Regression models

Note: WGS, whole-genome sequencing; PGS, polygenic score; HDI, Human Development Index;

***p = .001 **p = .01 *p = .05.

Key findings include:

  1. 1. Models using MIX-Height PGS generally show strong associations with body height, with PGS beta values ranging from 0.48 to 0.60.

  2. 2. Environmental factors like HDI, calories, and proteins also show strong positive associations with body height, with beta values typically above 0.5 and often significant (indicated by ***).

  3. 3. Combined models (e.g., HDI and calories) often show higher model R² values, indicating better explanatory power.

  4. 4. Models using SIB-Height PGS show similar patterns but generally have lower PGS beta values and model R² values compared to MIX-Height PGS models.

  5. 5. The WGS dataset generally yields higher model R² values compared to the WGS + array dataset.

Overall, the MIX-Height PGS and HDI, calories, and proteins as environmental factors are consistently significant predictors of body height, with combined models providing the highest explanatory power.

Effects of Environmental and Genetic Factors Among High Infant Mortality Countries

We subset the WGS + array dataset to only those countries or regions with high infant mortality rates, resulting in a subsample of 27 populations. Our regression analysis incorporated both genetic and environmental predictors, specifically PGSs for body height, infant mortality, and caloric intake.

The results presented in Table 4 demonstrate that both MIX-Height and SIB-Height PGS significantly predict body height, with MIX-Height showing stronger associations overall. Models that included infant mortality as an environmental predictor indicated a significant negative relationship with body height, while those with caloric intake showed a significant negative association as well.

Table 4. Regression models (only high infant mortality countries)

Note: WGS, whole-genome sequencing; PGS, polygenic score.

***p = .001 **p = .01 *p = .05.

Interaction Effects

We coded HDI as a dummy variable, with a cut-off value of 0.8 separating low from high HDI populations. The interaction between MIX-Height and HDI was not significant, but the one between Calories and HDI was significantly negative, indicating, contrary to Deaton’s predictions, that the positive effect of calorie intake on body height is reduced in the higher HDI group (Table 5).

Table 5. Regression models (with interaction term)

Note: WGS, whole-genome sequencing; PGS, polygenic score.

However, in the model with SIB-Height, neither the interaction between SIB-Height and HDI nor the one between Calories and HDI were significant (Table 5).

Latitude

Latitude showed a tendency towards positive correlations with body height PGS in both the WGS and WGS+Array datasets. However, statistical significance was limited. In the WGS dataset, only EUR-Height reached significance. These observations, combined with Bergmann’s rule linking body size to latitude (Bergmann, Reference Bergmann1848), suggested the need for further investigation into latitude’s influence on both phenotypic and genotypic body height.

To explore patterns of local adaptation, we developed regression models using body height PGS as the dependent variable and latitude as the independent variable. We enhanced the model by including superpopulation as a categorical variable.

Although human genetic variation can be clustered in many ways, we defined six superpopulations, expanding on the five categories used in the 1000 Genomes Project (1KG): AFR (African); AMR (Amerindian); EAS (East Asian); EUR (European); SAS (South Asian); MENA (Middle Eastern/North African) — added to cover populations in our dataset not represented in 1KG.

Notable characteristics of our categorization:

  • The AMR category encompassed both indigenous populations (e.g., Amazonian, Andean, Mesoamerican) and admixed groups (e.g., Mexican, Puerto Rican).

  • Similar to 1KG, our AFR superpopulation included admixed samples, such as African Caribbean (ACB) and African ancestry in Southwest USA (ASW).

  • Some heavily admixed populations could not be classified under this scheme; for example, Basters or Coloured from South Africa, or Brazilians.

This approach allowed us to account for both geographical and genetic diversity in our analysis of height-related genetic adaptations across different populations.

This allowed us to assess the impact of latitude while accounting for broad ancestry groups.

Controlling for superpopulation, latitude had a significantly positive relationship (beta from 0.31 to 0.52) with body height PGS in 3 out of 4 models. The effect of superpopulation, especially African and East Asian ancestry, suggests that the correlation between latitude and body height PGSs is obscured by ancestry (Table 6).

Table 6. Regression of body height polygenic scores on latitude and superpopulation

Note: PGS, polygenic score; AFR, African; EAS, East Asian; EUR, European; MENA, Middle Eastern/North African; SAS, South Asian; AMR (not shown because used as reference group), Amerindian/Hispanic.

***p = .001 **p = .01 *p = .05.

Spatial-Genetic Autocorrelation Models

The spatial autocorrelation tests were conducted using the combined WGS and array dataset. This dataset was chosen because it allowed for the calculation of FST (Fixation Index) genetic distances across all samples. In contrast, the WGS dataset alone contained numerous samples with frequencies derived from aggregate data, making FST calculations unfeasible for those particular samples.

Typically, spatial autocorrelation in the dependent variable (or residuals) is used to check whether the model adequately captures spatial dependencies in the outcome of interest. However, since we are also interested in the relationship between neutral genetic variation (as captured by FST) and variation at quantitative trait loci (i.e. PGS), we also computed the autocorrelation at the level of the PGS.

FST was computed using PLINK 2.0 (www.cog-genomics.org/plink/2.0/; Chang et al., Reference Chang, Chow, Tellier, Vattikuti, Purcell and Lee2015), with the default (Hudson) method (Hudson et al., Reference Hudson, Slatkin and Maddison1992). After filtering for missing genotype (mind > 0.5), 251 samples removed due to missing genotype data, including two entire samples (Morocco and Algeria). ‘Han’ was removed because it was redundant. Hence, there were 86 populations left.

Mantel Test

The Mantel correlations for MIX-Height and SIB-Height were .291 and .395 (p < .01) respectively. The Mantel correlation for Height was .152 (p = .009).

Partial Mantel Test

A regression model (Piffer, Reference Piffer2015) and the partial Mantel test (Legendre & Legendre, Reference Legendre and Legendre1998) using R package ncf (Bjornstad, Reference Bjornstad2022) were carried out to discern the effects of selection from those of ancestry.

The results of the regression model with Height (distance matrix) as dependent and PGS + FST (distance matrices) as independent variables are shown in Table 7.

Table 7. Regression of phenotypic distances on PGS and FST distances

Note: PGS, polygenic score.

***p = .001 **p = .01 *p = .05.

The partial Mantel correlation coefficient between MIX-Height and Height after controlling for FST was .193 (p = .003). Conversely, the partial correlation between Height and FST after accounting for MIX-Height was not significant and around 0 (.092).

The partial Mantel correlation coefficient between SIB-Height and Height after controlling for FST was .323 (p = .002). Conversely, the partial correlation between Height and FST after controlling for SIB-Height was around zero (.014).

Moran’s I with KNN

Global Moran’s I was calculated using the spdep R package (Bivand & Wong, Reference Bivand and Wong2018) employing the spatial K-nearest neighbors (KNN) method for K values ranging from 1 to 6. This KNN approach estimates the value for each case by averaging the values from the K nearest populations, as determined by the distance matrix. The results, presented in Figure 3, indicate weak to moderate spatial autocorrelation, with Moran’s I values ranging from 0.178 to 0.286.

Figure 3. Moran’s I for average body height, MIX-Height and SIB-Height for different values of K.

Spatial Autoregressive and Spatial Error Model

Spatial autoregressive and spatial error models were employed to investigate the relationship between population body height, PGSs, and spatial genetic structure. After filtering the genetic distance matrix, neighbors for each population were identified based on where the genetic distance fell below specific FST thresholds, resulting in a neighbour list.

For each FST threshold, models were constructed using average body height as the dependent variable. These models incorporated two key predictors:

  1. 1. A non-spatial predictor: either MIX-Height or SIB-Height PGSs;

  2. 2. A spatial component: In the spatial autoregressive model, a spatial lag term (lag Height) was computed using the FST weights.

In the spatial error model, the spatial error term accounted for autocorrelation in the residuals. Both models were run to capture different aspects of spatial dependence in the data. The results, presented in Table 8, showcase the p values for the PGS and the spatial autocorrelation parameters ρ (rho) and λ (lambda) for the spatial autoregressive and the spatial error models respectively. The Likelihood Ratio (LR) test evaluates whether including a spatial dependence parameter (such as ρ in a spatial autoregressive model or λ in a spatial error model) significantly improves the model’s fit compared to a model without spatial dependence. The p value associated with the LR test indicates whether the improvement in model fit (due to including the spatial parameter) is statistically significant.

Table 8. Spatial autoregressive and spatial error models with different FST thresholds

Note: PGS, polygenic score.

***p = .001 **p = .01 *p = .05.

Spatial Autoregressive and Spatial Error Model With HDI and PGS as Predictors

We added an environmental variable (HDI) to the spatial autoregression and error models to assess the impact of genetic and environmental factors after accounting for spatial autocorrelation. Both PGS and HDI were significant predictors in all models, whereas the autocorrelation parameter was significant only in the models with the most stringest FST threshold (FST <0.03) (Table 9).

Table 9. Spatial autoregressive and error models with HDI and PGS as predictors

Note: HDI, Human Development Index; PGS, polygenic score. Statistically significant p values are shown in bold type.

Tests of Divergent Selection

The QST-FST test was used to identify divergent selection. We calculated QST, random QST (rQST), and FST for 26 populations from the 1000 Genomes Project (Table 10). We used the QST-FST test to detect divergent selection, calculating QST, random QST, and FST for 26 populations from the 1000 Genomes Project (Table 10). Following the notation from Le Corre and Kremer (Reference Le Corre and Kremer2012), we denote the FST at quantitative trait loci as FSTQ and use FST to represent neutral genetic differentiation.

Table 10. Results of QST-FST test

Note: GWAS, genomewide association study; MC, Monte Carlo (empirical) p value

To assess the cross-population LD component of population differentiation, we compared QST to rQST. The results of this simulation are shown in Figure 4.

Figure 4. Distribution of random QST (rQST) versus QST (dashed line).

To generate rQST, we shuffled effect and nFoneffect alleles 1000 times using Monte Carlo simulation. This process yielded 1000 QST values. We also report the z score and p value for QST/rQST

As theory predicts, randomly shuffled QST values closely matched the FST values of GWAS SNPs (.083 vs. .081 and .098 vs. .103). The QST /FST ratio exceeded 1 for both MIX-Height and SIB-Height, but only MIX-Height showed statistical significance (p < .01).

We also carried out the local (pairwise) QST test for MIX-Height. SIB-Height was not used because it failed to reach significance in the global comparison, hence it did not have the power to detect local associations after Bonferroni correction. Figure 5 shows the QST values for the pairwise comparisons and the p value of the QST values that were significant after bootstrapping (n = 1000).

Figure 5. Local (pairwise) QST test for MIX-Height.

For the allelic differentiation component (FST), we compared the FST distribution of randomly matched SNPs to that of GWAS SNPs (Figure 6).

Figure 6. FSTQ versus FST (randomly matched single nucleotide polymorphisms).

Discussion

The results of this study are consistent with a model where both genetic and environmental factors influence population differences in adult body height. The results replicate across the WGS and WGS + array datasets comprising 51 and 89 (partially overlapping) populations.

This is the first study to systematically explore the impact of genetics and environmental factors on body height across multiple ethnic groups, using a comprehensive approach that integrates polygenic scores (PGS) and various environmental indicators. Previous studies have investigated the genetic determinants of body height and the role of environmental factors, but they often focused on specific populations or regions. For instance, Turchin et al. (Reference Turchin, Chiang, Palmer, Sankararaman, Reich and Hirschhorn2012) explored the genetic basis of body height in European populations, while Deaton (Reference Deaton2007) examined the relationship between body height and socioeconomic conditions within countries. However, these studies did not combine genetic and environmental factors in a unified model across diverse global populations.

In our analysis, PGSs derived from the MIX-Height GWAS consistently demonstrated strong associations with average measured body height across both datasets. Multiple regression models using MIX-Height PGSs demonstrated significant PGS beta values, ranging from 0.402 to 0.588. These findings highlight the robust predictive power of the MIX-Height PGS in capturing genetic influences on body height.

Environmental factors such as the HDI, daily caloric intake, and protein consumption also showed significant associations with body height. For instance, HDI beta values ranged from 0.539 to 0.968, while calories and proteins had beta values ranging from 0.638 to 0.837. These results underscore the critical role of nutrition and overall living standards in determining adult body height.

The combined models incorporating both genetic and environmental factors generally yielded the highest model R² values, indicating superior explanatory power. For example, the model combining MIX-Height PGS with HDI and caloric intake achieved an R² of .846, highlighting the synergistic effects of genetics and environment on body height.

Models utilizing SIB-Height PGS also exhibited significant associations, though their PGS beta values and model R² values were generally lower compared to MIX-Height PGS models. This may reflect the smaller sample sizes and reduced phenotypic and genotypic variance within families, which limit the power of sibship GWAS to detect statistical associations.

The consistency of these results across both WGS and WGS + array datasets further strengthens our conclusions. While the WGS dataset generally produced higher model R² values, the replication of findings in the WGS + array dataset demonstrates the robustness of our models across different genomic platforms.

This study addresses the African paradox proposed by Deaton (Reference Deaton2007), which observed high average stature in African countries despite poor nutrition and childhood health. Our findings suggest that this phenomenon is likely due to a high genetic endowment for body height among African populations, as evidenced by their higher than average polygenic scores (Figures 2a and 2b). Additionally, our results challenge Deaton’s hypotheses that (1) genetic differences do not account for substantial variation in body height among low income/high childhood mortality countries and (2) nutrition and childhood health are not significant predictors of body height variations among such countries. When we subset the dataset to only the countries with high childhood mortality, the effects of SIB-Height and MIX-Height, infant mortality and caloric intake were all significant, indicating that both environmental and genetic factors influence body height in these populations. Contrary to Deaton’s proposed balancing effect of infant mortality, we found that the effect of infant mortality on body height was negative (Table 4). When we coded HDI as a dummy variable, with a cut-off value of 0.8 separating low from high HDI populations, the interaction between MIX-Height and HDI was not significant, but the one between Calories and HDI was significantly negative, indicating, contrary to Deaton’s predictions, that the positive effect of calorie intake on body height is reduced in the higher HDI group.

Our study reveals that the Dutch population achieved the highest MIX-Height polygenic score, providing compelling genetic evidence for their status as the world’s tallest nationality. This finding lends molecular support to the theory proposed by Stulp et al. (Reference Stulp, Bonnell and Barrett2023) that natural selection has favored taller individuals in the Netherlands over time.

Interestingly, our analysis of nutritional factors yielded unexpected results. Contrary to popular belief, the Dutch protein and calorie intake levels were found to be average when compared to other developed countries. This observation challenges the widespread notion that the exceptional body height of the Dutch population is primarily attributable to high consumption of dairy products.

Our models incorporated latitude as a climate proxy, drawing on Bergmann’s rule, which posits that colder environments lead to larger body sizes, including increased body height in humans. While the global correlation was weak, our analysis revealed a more pronounced effect of latitude on body height PGSs when accounting for major ancestry groups (‘superpopulations’). This effect reached statistical significance in three out of four models (Table 6). Notably, African ancestry showed a strong positive influence on body height PGS, suggesting that African populations are taller than expected given their relatively equatorial geographical origin. This finding positions African populations as an outlier in the latitude-body height relationship. Conversely, East Asian populations appear to have a genetic predisposition for shorter stature than their latitude would predict, though this effect is generally less pronounced than the African outlier. These unexpected results invite further exploration of potential underlying mechanisms, such as sexual selection pressures or adaptations for disease resistance. It is worth noting that Allen’s rule (Allen, 1877), which proposes that taller, leaner body types can be advantageous for heat dissipation in hot climates, offers predictions that contrast with Bergmann’s rule. Specifically, Allen’s rule would predict that people living in hotter climates have relatively longer limb length, but not necessarily taller stature.

This complexity underscores the need for a more nuanced understanding of the interplay between genetics, climate and human body height variation.

Our study incorporated spatial autocorrelation — more accurately termed ‘genetic autocorrelation’ in this context — using several methodologies: Mantel tests, partial Mantel correlations, Moran’s I statistics, spatial autoregressive and spatial error models. The Mantel tests (Figures 1a and 1b) and Moran’s I analyses (Figure 3) revealed evidence of moderate genetic autocorrelation. However, when we integrated genetic autocorrelation into regression models alongside PGS as predictors, its effect was diminished. Specifically, in Mantel regression and partial correlation analyses, the impact of genetic autocorrelation was minimal (Table 7).

Interestingly, phenotypic height exhibited much stronger spatial autocorrelation than genotypic height (i.e., PGS), as shown by the Moran’s I values for different values of k in Figure 3. This was despite ‘spatial’ autocorrelation being actually measured from genetic distance matrices, thus not directly indicating geographical or cultural proximity.

Possible explanations for this phenomenon is that countries that are genetically similar (as indicated by FST distances) often share similar environments due to geographical proximity, historical migration patterns, or socioeconomic similarities. These shared environmental factors could amplify genetic predispositions, leading to more similar phenotypic heights across populations, even if PGS values for height are not perfectly aligned. There could be a correlation between genetics and the environment within genetically similar populations. For example, countries with a shared genetic background (as measured by FST) may also have similar cultural or socioeconomic conditions that influence height. This gene-environment correlation can cause genetically similar countries to exhibit more phenotypic similarity than would be predicted by their PGS alone. In this case, the environment reinforces the genetic potential for height, leading to stronger autocorrelation in phenotypic height.

In our regression analysis, we opted to define neighbors based on FST distance rather than using a fixed number of neighbors, as done in the KNN method, which can include ‘neighbors’ with large genetic distances. This choice was supported by the Mantel correlograms (Figures 1a and 1b), which revealed negative autocorrelation at higher FST values (>.05), potentially reducing our ability to detect autocorrelation.

The spatial autoregressive and error models showed a stronger autocorrelation effect when using MIX-Height compared to SIB-Height as the predictor, reflected in rho and lambda coefficients of approximately 0.7 (Table 8). Despite this, PGS remained a significant predictor. The standardized beta for MIX-Height ranged from 0.279 (p = .004) in the spatial autoregressive model with FST < 0.03, to 0.435 (p < .001) in the spatial error model with FST < 0.09.

For SIB-Height, PGS was a significant predictor across all FST thresholds, with beta values ranging from 0.328 in the spatial autoregressive and error models with FST < 0.03, to 0.783 in the spatial error model with FST < 0.07. However, the genetic autocorrelation effect was significant only at the lowest FST threshold of .03.

The degree of autocorrelation decreased after incorporating an environmental predictor (HDI) into the regression models, indicating that phenotypic-level autocorrelation is partially driven by environmental factors. In models with the lowest FST threshold, rho decreased from 0.698 to 0.422 for MIX-Height and from 0.588 to 0.373 for SIB-Height (Table 9). Notably, PGS and HDI remained significant predictors of average height in all models that accounted for spatial autocorrelation across all FST thresholds

From a theoretical evolutionary genetics perspective, controlling for genetic autocorrelation serves as a method to detect the overdispersion of genetic values among populations relative to neutral expectations, as reflected in the FST distances matrix. Significant levels of overdispersion are interpreted as signals of local adaptation. Conceptually, this approach bears a strong resemblance to QST-FST comparisons, a widely employed test for divergent selection (Merilä & Crnokrak, Reference Merilä and Crnokrak2001).

QST values exceeded Fst(q) for both MIX-Height and SIB-Height, with QST-FST ratios of 3.41 and 1.67 respectively. A Monte Carlo simulation, involving reshuffling of effect and noneffect alleles, showed significant results for MIX-Height (p = .008) but not for SIB-Height (p = .179) (Figure 5).

This test alone is overly conservative because selection effects encompass both allelic differentiation and the covariance of allelic effects across populations. Allelic differentiation influences the difference between FST at GWAS loci (FSTQ) and FST at neutral loci, while QST randomization reveals the impact of allelic effect covariance across populations — the primary component of differentiation in polygenic traits (Berg & Coop, Reference Berg and Coop2014; Le Corre & Kremer, Reference Le Corre and Kremer2012;).

To address this, we conducted the FST enrichment (FSTQ-FST) test to examine the allelic differentiation component. The results were significant for MIX-Height but not for SIB-Height (p < .001 and .10, respectively) (Figure 6).

It is crucial to note that the interpretation of PGSs within a causal framework is only valid within this specific theoretical context. This caveat is particularly salient for PGSs that account for only a small proportion of phenotypic variation, such as those derived from sibship GWASs. The underlying reason for this limitation lies in the complex interplay between local adaptation and allele frequencies across populations.

When local adaptation occurs, it tends to increase the frequencies of alleles that have similar effects (either increasing or decreasing) on a given trait across populations. Importantly, this phenomenon is not limited to the alleles identified by GWAS; it extends to other unidentified alleles that contribute to the trait. Consequently, the effect of the genetic variants comprising the polygenic score on the population-level trait is not direct. Instead, it is mediated by a broader set of alleles that, under evolutionary expectations, follow similar patterns of frequency distribution (Berg & Coop, Reference Berg and Coop2014; Piffer, Reference Piffer2013).

The underlying reason for this limitation lies in the complex interplay between local adaptation and allele frequencies across populations.

This indirect relationship can be further elucidated as follows:

GWAS identifies a subset of alleles associated with a trait, which are used to construct the PGS. These identified alleles represent only a fraction of the total genetic variation influencing the trait.

Local adaptation influences the frequencies of these identified alleles across populations. This process occurs through natural selection acting on the phenotypic effects of these alleles in different environmental contexts.

Simultaneously, local adaptation affects the frequencies of unidentified alleles that also contribute to the trait. These unidentified alleles, while not captured in the PGS, are subject to the same evolutionary pressures as the identified alleles.

The observed effect of the PGS at the population level is thus a proxy for the cumulative effect of both identified and unidentified alleles. This relationship arises because the PGS, based on a limited set of identified alleles, serves as an indicator of broader genetic patterns shaped by local adaptation.

This cumulative effect reflects the broader evolutionary forces shaping genetic variation related to the trait, rather than a direct causal relationship between the specific alleles in the PGS and the trait variation across populations. In essence, the PGS acts as a marker for the overall genetic architecture influenced by local adaptation, encompassing both the measured and unmeasured genetic components.

Understanding this indirect relationship is crucial for accurately interpreting the results of studies employing PGSs, particularly in the context of population genetics and local adaptation. It underscores the need for caution when making causal inferences based on PGSs, especially when these scores explain only a small fraction of phenotypic variation. Researchers must consider the broader evolutionary context and the potential influence of unidentified genetic factors when drawing conclusions from PGS analyzes across populations.

In summary, our study provides compelling evidence that both genetic predispositions and environmental conditions significantly contribute to adult body height. The integration of PGSs and environmental factors offers a comprehensive understanding of the determinants of body height, reinforcing the importance of considering both genetic and nongenetic factors in future research on human growth and development. However, our study had several limitations:

Population diversity. Although the study includes a wide range of populations, the sample sizes for certain regions may be insufficient to capture the full genetic and environmental variability within those areas. This could lead to biased estimates of the impact of both genetic and environmental factors on body height.

Environmental measures. The environmental indicators used in this study, such as HDI, daily caloric intake and protein consumption, are proxies and may not fully capture the complex environmental influences on body height. More granular data on specific health and nutritional factors would provide a clearer picture.

Model limitations. The models used in this study, while comprehensive, may still be overly simplistic in capturing the interplay between genetics and environment. Nonlinear relationships and interactions between different environmental factors could be more thoroughly explored in future research.

Genetic data representation. The genetic data, although extensive, may not cover all relevant genetic variants influencing body height. Future studies could benefit from even more comprehensive genomic data, including rare variants and epigenetic factors. Moreover, the use of FST distance matrices, albeit common in population genetics, is not entirely correct as it does not always satisfy the triangle inequality and thus is not a metric (Arbisser & Rosenberg, Reference Arbisser and Rosenberg2020).

Replication and validation. While the results are robust across the datasets used, replication in other independent datasets and validation in different demographic contexts would strengthen the generalizability of the findings.

Ethical considerations. The study of genetic differences across populations must be conducted with careful consideration of ethical implications and potential misuse of the findings. This includes ensuring that the research does not reinforce stereotypes or lead to discrimination.

Accuracy of measured height. The accuracy of average measured body height data can vary, and discrepancies in measurement techniques across different studies or populations might affect the results. Consistency in measurement methods is crucial for reliable comparisons.

Missing data. Several ethnic groups had missing data on average body height, which could skew the results. The absence of complete data for all populations limits the study’s ability to fully capture the genetic and environmental determinants of body height across diverse groups.

These limitations should be considered when interpreting the results and in designing future research to build on these findings.

References

Alkan, C., Kavak, P., Somel, M., Gokcumen, O., Ugurlu, S., Saygi, C., Dal, E., Bugra, K., Güngör, T., Sahinalp, S. C., Özören, N., & Bekpen, C. (2014). Whole genome sequencing of Turkish genomes reveals functional private alleles and impact of genetic interactions with Europe, Asia and Africa. BMC Genomics, 15, 963. https://doi.org/10.1186/1471-2164-15-963 CrossRefGoogle ScholarPubMed
Allen, J. A. (1877). The influence of Physical conditions in the genesis of species. Radical Review, 1, 108140.Google Scholar
Arbisser, I. M., & Rosenberg, N. A. (2020). FST and the triangle inequality for biallelic markers. Theoretical Population Biology, 133, 117129. https://doi.org/10.1016/j.tpb.2019.05.003 CrossRefGoogle ScholarPubMed
Berg, J. J., & Coop, G. (2014). A population genetic signal of polygenic adaptation. PLoS Genetics, 10, e1004412. https://doi.org/10.1371/journal.pgen.1004412 CrossRefGoogle ScholarPubMed
Bergmann, C. (1848). Über die Verhältnisse der Wärmeökonomie der Thiere zu ihrer Größe [On the proportions of heat economy of animals to their size). Vandenhoeck & Ruprecht.Google Scholar
Bivand, R., & Wong, D. (2018). Comparing implementations of global and local indicators of spatial association. TEST, 27, 716748. https://doi.org/10.1007/s11749-018-0599-x CrossRefGoogle Scholar
Bjornstad, O. N. (2022). ncf: Spatial Covariance Functions. R package version 1.3-2, https://github.com/objornstad/ncf Google Scholar
Blanco-Sánchez, M., Ramírez-Valiente, J. A., Ramos-Muñoz, M., Pías, B., Franks, S. J., Escudero, A., & Matesanz, S. (2024). Range-wide intraspecific variation reflects past adaptation to climate in a gypsophile Mediterranean shrub. Journal of Ecology, 112, 15331549. https://doi.org/10.1111/1365-2745.14322 CrossRefGoogle Scholar
Blanquart, F., Kaltz, O., Nuismer, S. L., & Gandon, S. (2013). A practical guide to measuring local adaptation. Ecology Letters, 16, 11951205. https://doi.org/10.1111/ele.12150 CrossRefGoogle Scholar
Chali, D. (1995). Anthropometric measurements of the Nilotic tribes in a refugee camp. Ethiopian Medical Journal, 33, 211217.Google Scholar
Chang, C. C., Chow, C. C., Tellier, L. C. A. M., Vattikuti, S., Purcell, S. M., & Lee, J. J. (2015) Second-generation PLINK: Rising to the challenge of larger and richer datasets. GigaScience, 4, 7. https://doi.org/10.1186/s13742-015-0047-8 CrossRefGoogle Scholar
Checkley, W., Buckley, G., Gilman, R. H., Assis, A. M., Guerrant, R. L., Morris, S. S., Mølbak, K., Valentiner-Branth, P., Lanata, C. F., & Black, R. E. (2008). Multi-country analysis of the effects of diarrhoea on childhood stunting. International Journal of Epidemiology, 37, 816830. https://doi.org/10.1093/ije/dyn099 CrossRefGoogle ScholarPubMed
Corsini, C. A. (2009). Statura, salute e migrazioni: le leve militari italiane. Forum Edizioni Google Scholar
Das, S., Forer, L., Schönherr, S., Sidore, C., Locke, A. E., Kwong, A., Vrieze, S. I., Chew, E. Y., Levy, S., McGue, M., Schlessinger, D., Stambolian, D., Loh, P.-R., Iacono, W. G., Swaroop, A., Scott, L. J., Cucca, F., Kronenberg, F., Boehnke, M., … Fuchsberger, C. (2016). Next-generation genotype imputation service and methods. Nature Genetics, 48, 12841287.CrossRefGoogle ScholarPubMed
Deaton, A. (2007). Height, health, and development. Proceedings of the National Academy of Sciences of the United States of America, 104, 1323213237. https://doi.org/10.1073/pnas.0611500104 CrossRefGoogle ScholarPubMed
Fuchsberger, C., Abecasis, G. R., & Hinds, D. A. (2014). minimac2: Faster genotype imputation. Bioinformatics, 31, 782784. https://doi.org/10.1093/bioinformatics/btu704 CrossRefGoogle ScholarPubMed
Grasgruber, P., Sebera, M., Hrazdíra, E., Cacek, J., & Kalina, T. (2014). Major correlates of male height: A study of 105 countries. Economics & Human Biology, 15, 81100. https://doi.org/10.1016/j.ehb.2014.07.002 CrossRefGoogle Scholar
Guo, J., Wu, Y., Zhu, Z., Zheng, Z., Trzaskowski, M., Zeng, J., Robinson, M. R., Visscher, P. M., & Yang, J. (2018). Global genetic differentiation of complex traits shaped by natural selection in humans. Nature Communications, 9, 1865. https://doi.org/10.1038/s41467-018-04191-y CrossRefGoogle Scholar
Howe, L. J., Nivard, M. G., Morris, T. T., Hansen, A. F., Rasheed, H., Cho, Y., Chittoor, G., Ahlskog, R., Lind, P. A., Palviainen, T., van der Zee, M. D., Cheesman, R., Mangino, M., Wang, Y., Li, S., Klaric, L., Ratliff, S. M., Bielak, L. F., Nygaard, M., Giannelis, A., … Davies, N. M. (2022). Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nature Genetics, 54, 581592. https://doi.org/10.1038/s41588-022-01062-7 CrossRefGoogle ScholarPubMed
Hsu, J. S., Wu, D. C., Shih, S. H., Liu, J. F., Tsai, Y. C., Lee, T. L., Chen, W. A., Tseng, Y. H., Lo, Y. C., Lin, H. Y., Chen, Y. C., Chen, J. Y., Chou, T. H., Chang, D. T., Su, M. W., Guo, W. H., Mao, H. H., Chen, C. Y., & Chen, P. L. (2023). Complete genomic profiles of 1496 Taiwanese reveal curated medical insights. Journal of Advanced Research. Advance online publication. https://doi.org/10.1016/j.jare.2023.12.018 CrossRefGoogle Scholar
Hubert, L. J., Golledge, R. G., & Costanzo, C. M. (1981). Generalized procedures for evaluating spatial autocorrelation. Geographical Analysis, 13, 224233.CrossRefGoogle Scholar
Hudson, R. R., Slatkin, M., & Maddison, W. P. (1992). Estimation of levels of gene flow from DNA sequence data. Genetics, 132, 583589. https://doi.org/10.1093/genetics/132.2.583 CrossRefGoogle ScholarPubMed
Kaja, E., Lejman, A., Sielski, D., Sypniewski, M., Gambin, T., Dawidziuk, M., Suchocki, T., Golik, P., Wojtaszewska, M., Mroczek, M., Stępień, M., Szyda, J., Lisiak-Teodorczyk, K., Wolbach, F., Kołodziejska, D., Ferdyn, K., Dąbrowski, M., Woźna, A., Żytkiewicz, M., Bodora-Troińska, A., … Sztromwasser, P. (2022). The Thousand Polish Genomes – A Database of Polish Variant Allele Frequencies. International Journal of Molecular Sciences, 23, 4532. https://doi.org/10.3390/ijms23094532 CrossRefGoogle ScholarPubMed
Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alfoldi, J., Wang, Q., Collins, R. L., Laricchia, K. M., Ganna, A., Birnbaum, D. P., Gauthier, L. D., Brand, H., Solomonson, M., Watts, N. A., Rhodes, D., Singer-Berk, M., England, E. M., Seaby, E. G., Kosmicki, J. A., … MacArthur, D. G. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581, 434443. https://doi.org/10.1038/s41586-020-2308-7 CrossRefGoogle ScholarPubMed
Latta, R. G. (1998). Differentiation of allelic frequencies at quantitative trait loci affecting locally adaptive traits. The American Naturalist, 151, 283292.CrossRefGoogle ScholarPubMed
Le Corre, V., & Kremer, A. (2003). Genetic variability at neutral markers, quantitative trait loci and trait in a subdivided population under selection. Genetics, 164, 12051219.CrossRefGoogle Scholar
Le Corre, V., & Kremer, A. (2012). The genetic differentiation at quantitative trait loci under local adaptation. Molecular Ecology, 21, 15481566. https://doi.org/10.1111/j.1365-294X.2012.05479.x CrossRefGoogle Scholar
Lee, J., Lee, J., Jeon, S., Lee, J., Jang, I., Yang, J. O., Park, S., Lee, B., Choi, J., Choi, B. O., Gee, H. Y., Oh, J., Jang, I. J., Lee, S., Baek, D., Koh, Y., Yoon, S. S., Kim, Y. J., Chae, J. H., Park, W. Y., … Choi, M. (2022). A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East Asian population. Experimental & Molecular Medicine, 54, 18621871. https://doi.org/10.1038/s12276-022-00871-4 CrossRefGoogle ScholarPubMed
Legendre, P., & Legendre, L. (1998). Numerical ecology (2nd ed.). Elsevier.Google Scholar
Leinonen, T., McCairns, R. J., O’Hara, R. B., & Merilä, J. (2013). Q(ST)-F(ST) comparisons: Evolutionary and ecological insights from genomic heterogeneity. Nature Reviews Genetics, 14, 179190. https://doi.org/10.1038/nrg3395 CrossRefGoogle ScholarPubMed
Li, Z., Löytynoja, A., Fraimout, A., & Merilä, J. (2019). Effects of marker type and filtering criteria on QST-FST comparisons. Royal Society Open Science, 6, 190666. https://doi.org/10.1098/rsos.190666 CrossRefGoogle Scholar
Lu, G., Hu, Y., Yang, Z., Zhang, Y., Lu, S., Gong, S., Li, T., Shen, Y., Zhang, S., & Zhuang, H. (2022). Geographic latitude and human height – Statistical analysis and case studies from China. Arabian Journal of Geosciences, 15, 335. https://doi.org/10.1007/s12517-021-09335-x CrossRefGoogle Scholar
Ma, X. F., Hall, D., Onge, K. R., Jansson, S., & Ingvarsson, P. K. (2010). Genetic differentiation, clinical variation and phenotypic associations with growth cessation across the Populus tremula photoperiodic pathway. Genetics, 186, 10331044. https://doi.org/10.1534/genetics.110.1208734 CrossRefGoogle Scholar
Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Research, 27, 209220.Google Scholar
Martorell, R., & Zongrone, A. (2012). Intergenerational influences on child growth and undernutrition. Paediatric and Perinatal Epidemiology, 26, 302314.CrossRefGoogle ScholarPubMed
Merilä, J., & Crnokrak, P. (2001). Comparison of genetic differentiation at marker loci and quantitative traits: Natural selection and genetic differentiation. Journal of Evolutionary Biology, 14, 892903. https://doi.org/10.1046/j.1420-9101.2001.00348.x CrossRefGoogle Scholar
Mielke, P. W. (1979). On asymptotic non-normality of null distributions of MRPP statistics. Communications in Statistics—Theory and Methods, A8, 15411550.CrossRefGoogle Scholar
Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. Biometrika, 37, 1723. https://doi.org/10.2307/2332142 CrossRefGoogle ScholarPubMed
Naslavsky, M. S., Scliar, M. O., Yamamoto, G. L., Wang, J. Y. T., Zverinova, S., Karp, T., Nunes, K., Ceroni, J. R. M., de Carvalho, D. L., da Silva Simões, C. E., Bozoklian, D., Nonaka, R., Dos Santos Brito Silva, N., da Silva Souza, A., de Souza Andrade, H., Passos, M. R. S., Castro, C. F. B., Mendes-Junior, C. T., Mercuri, R. L. V., – Zatz, M. (2022). Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil. Nature Communications, 13, 1004. https://doi.org/10.1038/s41467-022-28648-3 CrossRefGoogle ScholarPubMed
NCD Risk Factor Collaboration (NCD-RisC). (2020). Height and body-mass index trajectories of school-aged children and adolescents from 1985 to 2019 in 200 countries and territories: A pooled analysis of 2181 population-based studies with 65 million participants. Lancet, 396, 15111524. https://doi.org/10.1016/S0140-6736(20)31859-6 CrossRefGoogle Scholar
Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., Sidorenko, J., Kweon, H., Goldman, G., Gjorgjieva, T., Jiang, Y., Hicks, B., Tian, C., Hinds, D. A., Ahlskog, R., Magnusson, P. K. E., Oskarsson, S., Hayward, C., Campbell, A., Porteous, D. J., … Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54, 437449. https://doi.org/10.1038/s41588-022-01016-z CrossRefGoogle ScholarPubMed
1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526, 6874. https://doi.org/10.1038/nature15393 CrossRefGoogle Scholar
Piffer, D. (2013). Factor analysis of population allele frequencies as a simple, novel method of detecting signals of recent polygenic selection: The example of educational attainment and IQ. Mankind Quarterly, 54, 168200. https://doi.org/10.46469/mq.2013.54.2.3 CrossRefGoogle Scholar
Piffer, D. (2015). A review of intelligence GWAS hits: Their relationship to country IQ and the issue of spatial autocorrelation. Intelligence, 53, 4350. https://doi.org/10.1016/j.intell.2015.08.008 CrossRefGoogle Scholar
Piffer, D., & Kirkegaard, E. O. W. (2024). Evolutionary trends of polygenic scores in European populations from the paleolithic to modern Times. Twin Research and Human Genetics, 27, 3049. https://doi.org/10.1017/thg.2024.8 CrossRefGoogle Scholar
Prüss-Üstün, A., Bos, R., Gore, F., & Bartram, J. (2014). Safer water, better health: Costs, benefits and sustainability of interventions to protect and promote health. World Health Organization.Google Scholar
Rentoft, M., Svensson, D., Sjödin, A., Olason, P. I., Sjöström, O., Nylander, C., Osterman, P., Sjögren, R., Netotea, S., Wibom, C., Cederquist, K., Chabes, A., Trygg, J., Melin, B. S., & Johansson, E. (2019). A geographically matched control population efficiently limits the number of candidate disease-causing variants in an unbiased whole-genome analysis. PLoS One, 14, e0213350. https://doi.org/10.1371/journal.pone.0213350 CrossRefGoogle Scholar
Roser, M., Ritchie, H., & Rosado, P. (2013). Food supply. https://ourworldindata.org/food-supply Google Scholar
Silventoinen, K., Sammalisto, S., Perola, M., Boomsma, D. I., Cornes, B. K., Davis, C., Dunkel, L., De Lange, M., Harris, J. R., Hjelmborg, J. V., Luciano, M., Martin, N. G., Mortensen, J., Nisticò, L., Pedersen, N. L., Skytthe, A., Spector, T. D., Stazi, M. A., Willemsen, G., & Kaprio, J. (2003). Heritability of adult body height: A comparative study of twin cohorts in eight countries. Twin Research, 6, 399408. https://doi.org/10.1375/twin.6.5.399 CrossRefGoogle ScholarPubMed
Sokal, R. R. (1979). Testing statistical significance of geographic variation patterns. Systematic Zoology, 28, 227231.CrossRefGoogle Scholar
Sokal, R. R., Jacquez, G. M., & Wooten, M. C. (1989). Spatial autocorrelation analysis of migration and selection. Genetics, 121, 845855, https://doi.org/10.1093/genetics/121.4.845 CrossRefGoogle ScholarPubMed
Spitze, K. (1993). Population structure in Daphnia obtusa: Quantitative genetic and allozymic variation. Genetics, 135, 367374. https://doi.org/10.1093/genetics/135.2.367 CrossRefGoogle Scholar
Stulp, G., Barrett, L., Tropf, F. C., & Mills, M. (2015). Does natural selection favour taller stature among the tallest people on earth? Proceedings of the Royal Society B: Biological Sciences, 282, 20150211. https://doi.org/10.1098/rspb.2015.0211 CrossRefGoogle ScholarPubMed
Stulp, G., Bonnell, T., & Barrett, L. (2023). Simulating the evolution of height in the Netherlands in recent history. The History of the Family, 28, 434456. https://doi.org/10.1080/1081602X.2023.2192193 CrossRefGoogle Scholar
Turchin, M. C., Chiang, C. W., Palmer, C. D., Sankararaman, S., Reich, D., Genetic Investigation of ANthropometric Traits (GIANT) Consortium, & Hirschhorn, J. N. (2012). Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Nature Genetics, 44, 10151019. https://doi.org/10.1038/ng.2368 CrossRefGoogle ScholarPubMed
United Nations Development Programme. (2024). Human Development Report 2023-24: Breaking the gridlock: Reimagining cooperation in a polarized world.Google Scholar
Victora, C. G., Adair, L., Fall, C., Hallal, P. C., Martorell, R., Richter, L., & Sachdev, H. S. (2008). Maternal and child undernutrition: Consequences for adult health and human capital. The Lancet, 371, 340357. https://doi.org/10.1016/S0140-6736(07)61692-4 CrossRefGoogle ScholarPubMed
Whitlock, M. C. (2008). Evolutionary inference from QST. Molecular Ecology, 17, 18851896. https://doi.org/10.1111/j.1365-294X.2008.03712.x CrossRefGoogle ScholarPubMed
Whitlock, M. C., & Guillaume, F. (2009). Testing for spatially divergent selection: comparing QST to FST. Genetics, 183, 10551063. https://doi.org/10.1534/genetics.108.099812 CrossRefGoogle Scholar
Yengo, L., Vedantam, S., Marouli, E., Sidorenko, J., Bartell, E., Sakaue, S., Graff, M., Eliasen, A. U., Jiang, Y., Raghavan, S., Miao, J., Arias, J. D., Graham, S. E., Mukamel, R. E., Spracklen, C. N., Yin, X., Chen, S. H., Ferreira, T., Highland, H. H., & Hirschhorn, J. N. (2022). A saturated map of common genetic variants associated with human height. Nature, 610, 704712. https://doi.org/10.1038/s41586-022-05275-y CrossRefGoogle Scholar
Yoo, S. K., Kim, C. U., Kim, H. L., Kim, S., Shin, J. Y., Kim, N., Yang, J. S. W., Lo, K. W., Cho, B., Matsuda, F., Schuster, S. C., Kim, C., Kim, J. I., & Seo, J. S. (2019). NARD: whole-genome reference panel of 1779 Northeast Asians improves imputation accuracy of rare and low-frequency variants. Genome Medicine, 11, 64. https://doi.org/10.1186/s13073-019-0677-z CrossRefGoogle ScholarPubMed
Ziyatdinov, A., Torres, J., Alegre-Díaz, J., Backman, J., Mbatchou, J., Turner, M., Gaynor, S. M., Joseph, T., Zou, Y., Liu, D., Wade, R., Staples, J., Panea, R., Popov, A., Bai, X., Balasubramanian, S., Habegger, L., Lanche, R., Lopez, A., ¼ Tapia-Conyer, R. (2023). Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature, 622, 784793. https://doi.org/10.1038/s41586-023-06595-3 CrossRefGoogle Scholar
Figure 0

Table 1. WGS dataset

Figure 1

Table 2. WGS + array dataset

Figure 2

Figure 1a. Whole genome sequencing (WGS) dataset: correlation matrix of polygenic score means and phenotypic means.Note: *p < .05, **p < .01, ***p < .001.

Figure 3

Figure 1b. Whole genome sequencing (WGS) + array dataset: correlation matrix of polygenic score means and phenotypic means.Note: *p < .05, **p < .01, ***p < .001.

Figure 4

Figure 2a. Whole genome sequencing (WGS) dataset with scatterplots of MIX-Height, SIB-Height and average measured body height.

Figure 5

Figure 2b. Whole genome sequencing (WGS) and array dataset with scatterplots of MIX-Height, SIB-Height and average measured body height.

Figure 6

Table 3. Regression models

Figure 7

Table 4. Regression models (only high infant mortality countries)

Figure 8

Table 5. Regression models (with interaction term)

Figure 9

Table 6. Regression of body height polygenic scores on latitude and superpopulation

Figure 10

Table 7. Regression of phenotypic distances on PGS and FST distances

Figure 11

Figure 3. Moran’s I for average body height, MIX-Height and SIB-Height for different values of K.

Figure 12

Table 8. Spatial autoregressive and spatial error models with different FST thresholds

Figure 13

Table 9. Spatial autoregressive and error models with HDI and PGS as predictors

Figure 14

Table 10. Results of QST-FST test

Figure 15

Figure 4. Distribution of random QST (rQST) versus QST (dashed line).

Figure 16

Figure 5. Local (pairwise) QST test for MIX-Height.

Figure 17

Figure 6. FSTQ versus FST (randomly matched single nucleotide polymorphisms).