Behavioural genetic methods are gaining traction in various areas of social science. It has been argued that some of these methods, particularly genome-wide association studies (GWAS) and polygenic scores based on them, will generate new causal insights into individual outcomes such as educational attainment (e.g., Freese, Reference Freese2018; Harden, Reference Harden2021; Liu & Guo, Reference Liu and Guo2016). Madole & Harden (M&H) offer a theoretical framework for these arguments by sketching a two-tiered view of causal investigation in behavioural genetics that mirrors the commonly used distinction between difference-making and mechanistic types of inquiry (see Tabery [Reference Tabery2014] for a philosophical discussion of this distinction in the context of behavioural genetics). In what follows, I will not evaluate the arguments concerning the first tier (including the proposed analogy between a within-family study of polygenic scores and a randomised controlled trial) but will focus on the second tier, which the authors term “second-generation causal knowledge” and which may also be called mechanistic knowledge or mechanistic understanding.
Let us assume one has successfully identified some single-nucleotide polymorphisms (SNPs) linked with causal genetic variants. In the authors' view, once that is done, a goldmine of second-generation causal discoveries (by which they mean discoveries of higher-level causal mechanisms) awaits the researcher who is keen to explore the processes through which genetic effects manifest. The mechanisms' supposed components include both phenotypic traits and environmental factors that contribute to individual outcomes and could serve as intervention targets. The identification of causal mechanisms is sometimes seen as the only strong rationale for genetically informed inquiry in the policy-oriented social sciences. According to Cesarini and Visscher (Reference Cesarini and Visscher2017), “it is only to the extent that genetic information makes it possible to tailor more effective interventions that genetic data may be a useful supplement to systems already in place” (p. 3). At the same time, mechanism elucidation is not necessarily guaranteed or even significantly facilitated by “first-generation” genetic findings, contrary to what M&H seem to suggest. Establishing a causal link between a phenotypic or environmental variable and an outcome presents a significant challenge of its own, requiring a mix of observational and experimental evidence generated by a variety of methods. Top-down approaches such as “phenotypic annotation” (Belsky & Harden, Reference Belsky and Harden2019) that are mentioned in the paper can help identify networks of phenotypic and environmental correlates, but disentangling the causal relations within those networks – and pinpointing suitable intervention targets – is a task that goes beyond a simple mapping of associations. Some techniques for causal inference about phenotypic mediators based on observational genetic data have been described elsewhere (e.g., Briley, Livengood, & Derringer, Reference Briley, Livengood and Derringer2018; Pingault et al., Reference Pingault, O'Reilly, Schoeler, Ploubidis, Rijsdijk and Dudbridge2018) but face several significant challenges, including widespread pleiotropy. M&H briefly describe “integrating genomic data into longitudinal, experimental research designs” (target article, sect. 3.4, para. 7) as a way of meeting the demands of causal inference at this stage of investigation but do not articulate a well-developed research programme with a clear added value for genetic methods.
That brings us to another question – why should one use genetic information as a starting point to elucidate causally relevant phenotypic and environmental factors for a particular outcome? Genetic data are not the only possible handle on the kind of phenotypic characteristics and processes scientists are interested in (especially given the adage that all traits result from a combination of genetic and environmental factors). Interestingly enough, “first-generation” knowledge about environmental causes can likewise be used (and has been used) as a springboard for mechanistic inquiry, even probing into some of the same causal intermediaries as genetically informed research, including cognitive or psychological characteristics of individuals. For instance, it is known that socioeconomic status explains a large proportion of the variance in educational attainment in different societies (see e.g., Eriksson, Lindvall, Helenius, & Ryve, Reference Eriksson, Lindvall, Helenius and Ryve2021). Scientists have linked the effects of socioeconomic status on educational attainment with differences in phenotypes such as executive function (Hackman, Gallop, Evans, & Farah, Reference Hackman, Gallop, Evans and Farah2015). This demonstrates that once we arrive at a solid understanding that a particular environmental factor is an important difference maker, it can then be used to investigate possible causal pathways; genetic knowledge is not unique in this sense.
In this context, one worry an advocate of genetic methods might have is that the observed effects of environmental factors such as socioeconomic background could also reflect genetic differences between individuals and therefore fall short of the standards for causal inference. However, it is possible to control for these differences in order to arrive at more accurate estimates of environmental influence. For instance, Kendler, Turkheimer, Ohlsson, Sundquist, and Sundquist (Reference Kendler, Turkheimer, Ohlsson, Sundquist and Sundquist2015) have shown in an adoption study of siblings that being adopted into a family with a higher socioeconomic status generated significant advantages in terms of measured IQ after controlling for genetic factors. This suggests a more limited albeit important role for genetic tools, including polygenic scores: as controls in the study of environmental variables. Even though this application is not without its pitfalls (see Akimova, Breen, Brazel, & Mills [Reference Akimova, Breen, Brazel and Mills2021] on the potential for introducing bias), when applied with care, genetic controls may help address some of the worries that the “first-generation” knowledge of environmental factors does not meet a stringent epistemic standard.
In summary, “second-generation” goals of causal inquiry in the context of human behaviour cannot be achieved by genetic methods alone, nor do genetically informed research designs provide the only possible path towards a mechanistic understanding. Therefore, it would be desirable to clearly situate these designs within the wider disciplinary and methodological terrain, indicating how they relate to the other known ways of generating the epistemic goods that are being sought.
Behavioural genetic methods are gaining traction in various areas of social science. It has been argued that some of these methods, particularly genome-wide association studies (GWAS) and polygenic scores based on them, will generate new causal insights into individual outcomes such as educational attainment (e.g., Freese, Reference Freese2018; Harden, Reference Harden2021; Liu & Guo, Reference Liu and Guo2016). Madole & Harden (M&H) offer a theoretical framework for these arguments by sketching a two-tiered view of causal investigation in behavioural genetics that mirrors the commonly used distinction between difference-making and mechanistic types of inquiry (see Tabery [Reference Tabery2014] for a philosophical discussion of this distinction in the context of behavioural genetics). In what follows, I will not evaluate the arguments concerning the first tier (including the proposed analogy between a within-family study of polygenic scores and a randomised controlled trial) but will focus on the second tier, which the authors term “second-generation causal knowledge” and which may also be called mechanistic knowledge or mechanistic understanding.
Let us assume one has successfully identified some single-nucleotide polymorphisms (SNPs) linked with causal genetic variants. In the authors' view, once that is done, a goldmine of second-generation causal discoveries (by which they mean discoveries of higher-level causal mechanisms) awaits the researcher who is keen to explore the processes through which genetic effects manifest. The mechanisms' supposed components include both phenotypic traits and environmental factors that contribute to individual outcomes and could serve as intervention targets. The identification of causal mechanisms is sometimes seen as the only strong rationale for genetically informed inquiry in the policy-oriented social sciences. According to Cesarini and Visscher (Reference Cesarini and Visscher2017), “it is only to the extent that genetic information makes it possible to tailor more effective interventions that genetic data may be a useful supplement to systems already in place” (p. 3). At the same time, mechanism elucidation is not necessarily guaranteed or even significantly facilitated by “first-generation” genetic findings, contrary to what M&H seem to suggest. Establishing a causal link between a phenotypic or environmental variable and an outcome presents a significant challenge of its own, requiring a mix of observational and experimental evidence generated by a variety of methods. Top-down approaches such as “phenotypic annotation” (Belsky & Harden, Reference Belsky and Harden2019) that are mentioned in the paper can help identify networks of phenotypic and environmental correlates, but disentangling the causal relations within those networks – and pinpointing suitable intervention targets – is a task that goes beyond a simple mapping of associations. Some techniques for causal inference about phenotypic mediators based on observational genetic data have been described elsewhere (e.g., Briley, Livengood, & Derringer, Reference Briley, Livengood and Derringer2018; Pingault et al., Reference Pingault, O'Reilly, Schoeler, Ploubidis, Rijsdijk and Dudbridge2018) but face several significant challenges, including widespread pleiotropy. M&H briefly describe “integrating genomic data into longitudinal, experimental research designs” (target article, sect. 3.4, para. 7) as a way of meeting the demands of causal inference at this stage of investigation but do not articulate a well-developed research programme with a clear added value for genetic methods.
That brings us to another question – why should one use genetic information as a starting point to elucidate causally relevant phenotypic and environmental factors for a particular outcome? Genetic data are not the only possible handle on the kind of phenotypic characteristics and processes scientists are interested in (especially given the adage that all traits result from a combination of genetic and environmental factors). Interestingly enough, “first-generation” knowledge about environmental causes can likewise be used (and has been used) as a springboard for mechanistic inquiry, even probing into some of the same causal intermediaries as genetically informed research, including cognitive or psychological characteristics of individuals. For instance, it is known that socioeconomic status explains a large proportion of the variance in educational attainment in different societies (see e.g., Eriksson, Lindvall, Helenius, & Ryve, Reference Eriksson, Lindvall, Helenius and Ryve2021). Scientists have linked the effects of socioeconomic status on educational attainment with differences in phenotypes such as executive function (Hackman, Gallop, Evans, & Farah, Reference Hackman, Gallop, Evans and Farah2015). This demonstrates that once we arrive at a solid understanding that a particular environmental factor is an important difference maker, it can then be used to investigate possible causal pathways; genetic knowledge is not unique in this sense.
In this context, one worry an advocate of genetic methods might have is that the observed effects of environmental factors such as socioeconomic background could also reflect genetic differences between individuals and therefore fall short of the standards for causal inference. However, it is possible to control for these differences in order to arrive at more accurate estimates of environmental influence. For instance, Kendler, Turkheimer, Ohlsson, Sundquist, and Sundquist (Reference Kendler, Turkheimer, Ohlsson, Sundquist and Sundquist2015) have shown in an adoption study of siblings that being adopted into a family with a higher socioeconomic status generated significant advantages in terms of measured IQ after controlling for genetic factors. This suggests a more limited albeit important role for genetic tools, including polygenic scores: as controls in the study of environmental variables. Even though this application is not without its pitfalls (see Akimova, Breen, Brazel, & Mills [Reference Akimova, Breen, Brazel and Mills2021] on the potential for introducing bias), when applied with care, genetic controls may help address some of the worries that the “first-generation” knowledge of environmental factors does not meet a stringent epistemic standard.
In summary, “second-generation” goals of causal inquiry in the context of human behaviour cannot be achieved by genetic methods alone, nor do genetically informed research designs provide the only possible path towards a mechanistic understanding. Therefore, it would be desirable to clearly situate these designs within the wider disciplinary and methodological terrain, indicating how they relate to the other known ways of generating the epistemic goods that are being sought.
Financial support
This work was supported by funding from Cambridge Commonwealth, European & International Trust and St. John's College, University of Cambridge.
Competing interest
None.