Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-28T03:46:05.311Z Has data issue: false hasContentIssue false

Human genomic data have different statistical properties than the data of randomised controlled trials

Published online by Cambridge University Press:  11 September 2023

Mirjam J. Borger
Affiliation:
Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands m.j.borger@rug.nl; f.j.weissing@rug.nl; e.boon@rug.nl https://www.marmgroup.eu/
Franz J. Weissing
Affiliation:
Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands m.j.borger@rug.nl; f.j.weissing@rug.nl; e.boon@rug.nl https://www.marmgroup.eu/
Eva Boon
Affiliation:
Groningen Institute for Evolutionary Life Sciences, University of Groningen, Groningen, The Netherlands m.j.borger@rug.nl; f.j.weissing@rug.nl; e.boon@rug.nl https://www.marmgroup.eu/

Abstract

Madole & Harden argue that the Mendelian reshuffling of genes and genomes is analogous to randomised controlled trials. We are not convinced by their arguments. First, their recipe for meeting the demands on randomised experiments is inherently inconsistent. Second, disequilibrium across chromosomes conflicts with their assumption of statistical independence. Third, the genome-wide association study (GWAS) method has many pitfalls, including low repeatability.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Madole & Harden (M&H) attempt to unravel the role of heredity in human behaviour by arguing that the methods of causal analysis can be applied to behavioural genetic data, thus establishing causal links between genes and behaviour. Their key argument is that “within-family genetic effects represent the product of a counterfactual comparison in the same way as average treatment effects from randomised controlled trials” (target article, abstract). Based on this argument, they “advance a framework for identifying, interpreting, and applying causal effects of genes on human behavior” (target article, abstract). While we agree with the authors that human behaviour genetics needs a sound foundation, we see at least three reasons why their proposed framework is not suitable for providing such a foundation.

The first reason is the inherent inconsistency of the proposed approach. M&H discuss whether and when behavioural genetic experiments meet four critical demands on randomised experiments. They argue that the first three demands (independence, sample homogeneity, potential exposability) can be met if the analysis is based on sibling studies, where siblings grow up in a common environment. In contrast, the fourth demand, SUTVA (stable unit treatment value assumption), requires that the siblings do not affect each others' behaviour, that is, grow up in different environments. The fourth demand (growing up in different environments) is contradictory to the first three demands (growing up in a common environment). Thus, at least one of the demands will be violated in any genetic data set. Obviously, this undermines M&H's argument that within-family genetic effects are comparable to the outcome of randomised controlled trials.

The second reason is an unfounded extrapolation from single-gene to genome-wide causation. The key argument in the target article is that Mendelian inheritance has similar properties as the randomisation procedure of controlled trials. Mendel's rules, however, apply to single genes or unlinked pairs of genes, while M&H are mainly interested in the causal analysis of genome-wide association studies (GWASs), where thousands of single-nucleotide polymorphisms (SNPs) are considered simultaneously.

M&H are aware of this problem, in which the physical linkage of genes on a chromosome results in the co-inheritance of alleles at linked loci and subsequent correlations across loci. Consequently, they propose to focus on “a set of alleles that are all in high linkage disequilibrium with each other (but not in linkage disequilibrium with other alleles)” (target article, sect. 3.2.1, para. 6). In this approach, it is crucial to identify such sets of alleles. If the physical linkage of gene loci would be the sole (or most important) cause of linkage disequilibrium, the proposed method might be feasible, as the SNPs used in GWASs provide a physical linkage map, allowing to identify chromosomal regions that are closely linked. However, the term “linkage disequilibrium” is misleading. It suggests that “disequilibrium” (as statistical associations across loci) is mainly caused by physical linkage. Yet, factors like natural and sexual selection, non-random mating, genetic drift, or gene flow can create considerable disequilibrium at unlinked loci, such as loci on different chromosomes (Hedrick, Reference Hedrick2005). Alleles at different loci can, for example, get associated through selection if they produce a high-fitness genotype in combination (but not on their own).

Theoretical considerations suggest that such “epistatic effects” (statistical interactions between genotypes at two or more loci) are common. For example, the evolution of female preferences in sexual selection largely relies on the build-up of disequilibrium between sender and receiver genes (Kuijper, Pen, & Weissing, Reference Kuijper, Pen and Weissing2012). Regulatory networks (such as gene-regulatory networks, metabolic networks, or the immune network) are another important class of examples, as a large percentage of human genes are involved in such networks (Chatterjee & Ahituv, Reference Chatterjee and Ahituv2017). Genes underlying a regulatory network are functionally linked (through selection on the operation of the network) in intricate and unpredictable ways (Van Gestel & Weissing, Reference Van Gestel and Weissing2016, Reference Van Gestel and Weissing2018), and their epistatic interaction will likely result in linkage disequilibrium (even in the absence of physical linkage).

Controlled crossing experiments in animals indeed confirm ample disequilibrium caused by epistatic effects (Flint & Mackay, Reference Flint and Mackay2009; Mackay, Reference Mackay2014). Such experiments cannot be conducted on humans, but likely epistasis is common in our species too. The problem is that epistasis, and its associated disequilibrium, tends to remain hidden in GWASs (Mackay, Reference Mackay2014). This implies that a major source of statistical dependence remains hidden to the researcher, making it almost impossible to correct for linkage disequilibrium in the way suggested by M&H.

The third reason is based on previously documented pitfalls of the GWAS method. M&H have high expectations regarding the GWAS method, while this method is heavily criticised in other branches of genetics because of its low repeatability and its tendency to produce false positives (e.g., Marjoram, Zubair, & Nuzhdin, Reference Marjoram, Zubair and Nuzhdin2014; Zhou et al., Reference Zhou, Pierre, Gonzales, Zou, Cheng, Chitre and Palmer2020; Zuk, Hechter, Sunyaev, & Lander, Reference Zuk, Hechter, Sunyaev and Lander2012). Low repeatability is a major problem, as it either indicates the limited ability of these studies to generalise (i.e., big differences between study populations in how genes cause behaviour) or that most results are actually artefacts of the model (false positives). In the GWAS method it is possible to set the sensitivity of models. Yet, this is a complicated trade-off, especially when using the method to find many genes with weak effects. When the sensitivity is low, only genes with strong effects can be found, which might result in a bias, as possibly important other genes (with weaker effects) cannot be found. On the contrary, setting the sensitivity high will result in many false positives, which might also result in wrong conclusions. Even if the sensitivity is kept constant between studies, low repeatability is found. To increase repeatability of studies, statistical corrections can be added. However, these corrections are generally limited in their success, as artefacts can still appear (e.g., Mills & Mathieson, Reference Mills and Mathieson2022).

In conclusion, we argue that the causal framework proposed by M&H is not suited to understand the effects of genes on behaviour. While we agree with the authors that human behaviour genetics needs a sound causal foundation, this remains a formidable challenge.

Financial support

MJB is supported by ALW-NWO Grant No. ALWOP.531.

Competing interest

None.

References

Chatterjee, S., & Ahituv, N. (2017). Gene regulatory elements, major drivers of human disease. Annual Review of Genomics and Human Genetics, 18, 4563.CrossRefGoogle ScholarPubMed
Flint, J., & Mackay, T. F. C. (2009). Genetic architecture of quantitative traits in mice, flies, and humans. Genome Research, 19, 723733. doi: 10.1101/gr.086660.108CrossRefGoogle ScholarPubMed
Hedrick, P. W. (2005). Genetics of populations (3rd ed). Jones & Bartlett.Google Scholar
Kuijper, B., Pen, I., & Weissing, F. J. (2012). A guide to sexual selection theory. Annual Review of Ecology, Evolution, and Systematics, 43, 287311.CrossRefGoogle Scholar
Mackay, T. F. C. (2014). Epistasis and quantitative traits: Using model organisms to study gene–gene interactions. Nature Reviews Genetics, 15, 2233. https://doi.org/10.1038/nrg3627CrossRefGoogle ScholarPubMed
Marjoram, P., Zubair, A., & Nuzhdin, S. V. (2014). Post-GWAS: Where next? More samples, more SNPs or more biology? Heredity, 112, 7988.CrossRefGoogle ScholarPubMed
Mills, M. C., & Mathieson, I. (2022). The challenge of detecting recent natural selection in human populations. Proceedings of the National Academy of Sciences, USA, 119, e2203237119.CrossRefGoogle ScholarPubMed
Van Gestel, J., & Weissing, F. J. (2016) Regulatory mechanisms link phenotypic plasticity to evolvability. Scientific Reports, 6, 24524. doi: 10.1038/srep2452CrossRefGoogle ScholarPubMed
Van Gestel, J., & Weissing, F. J. (2018). Is plasticity caused by single genes? Nature, 555, E20. doi: 10.1038/nature25495CrossRefGoogle ScholarPubMed
Zhou, X. S., Pierre, C. L., Gonzales, N. M., Zou, J., Cheng, R., Chitre, A. S., … Palmer, A. A. (2020). Genome-wide association study in two cohorts from a multi-generational mouse advanced intercross line highlights the difficulty of replication due to study-specific heterogeneity. Genes, Genomes, Genetics, 10, 951965.CrossRefGoogle ScholarPubMed
Zuk, O., Hechter, E., Sunyaev, S. R., & Lander, E. S. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences, USA, 109, 11931198.CrossRefGoogle ScholarPubMed