Introduction
The reduction in costs of genotyping technologies in recent years has led to an explosion of genetic and phenotypic information collected on large numbers of people. The primary aim of these studies is to identify genetic polymorphisms associated with a quantitative trait or with an increased risk of disease. These genome-wide association studies (GWAS) have led to an important increase in understanding of the underpinnings of psychiatric and other disorders (Sullivan et al. Reference Sullivan, Daly and O'Donovan2012; Visscher et al. Reference Visscher, Brown, McCarthy and Yang2012; Gratten, Reference Gratten2016). They have provided empirical data demonstrating that common disorders are polygenic and have allowed interrogation of the genetic architecture, in terms of number, frequency, effect size and interactions of genetic risk factors. However, the potential use of GWAS data goes far beyond identification of trait associated loci.
Here, we present an overview of some recently developed methods utilising genome-wide genotype and phenotype data on large numbers of individuals and show how they can be applied to the research of psychiatric disorders. These new methods serve at least one of the following purposes: (i) estimation of the extent of genetic influence on traits (Estimation of proportion of variance attributable to genome-wide single-nucleotide polymorphisms (SNPs), SNP-heritability or h 2SNP), (ii) uncovering of shared genetic control between traits (Estimation of genetic correlation from using genome-wide SNPs), (iii) predictions of genetic risk for individuals (Polygenic risk prediction), (iv) uncovering of causal relationships between traits [Mendelian randomisation (MR)], (v) identifying causal SNPs and genes (Fine-mapping and gene prioritisation), or (vi) Detection of genetic heterogeneity. There is often an overlap between these applications: some methods can be applied for more than one purpose, and some of the available software implements more than one method. In this overview, it is not possible to be exhaustive, and we apologise to authors whose methods have not been included.
Most of the methods presented here require genetic data in either of two formats. One is that of full individual level genotype data and phenotypic measurements on each person, where the genetic data can be represented as a matrix with allele counts for each genetic marker for each person. While this offers the largest range of analytic options, file sizes can be very large, which can become prohibitive as computational burden is usually non-linear with increasing numbers of individuals and markers. Moreover, privacy concerns can prevent this type of data from being shared across research groups. Summary statistics of genome-wide association analysis represent the second data format, for which data sharing has fewer privacy concerns (Pasaniuc & Price, Reference Pasaniuc and Price2016). GWAS summary statistics comprise the association test statistic (including direction of effect for a reference allele), standard error, p value of association and allele frequency of each SNP. While it has been shown that it is possible to infer whether an individual was part of a cohort using summary statistics, the power to do so is limited (Homer et al. Reference Homer, Szelinger, Redman, Duggan, Tembe and Muehling2008; Sankararaman et al. Reference Sankararaman, Obozinski, Jordan and Halperin2009; Visscher & Hill, Reference Visscher and Hill2009), and in any case requires the genome-wide genotype data of the individual to be identified. To guard against any privacy concerns GWAS summary statistics can be provided using allele frequencies estimated in large independent samples of the same ethnicity. Methods which require only summary statistic data benefit from shorter analytical run-times, much reduced computer memory requirements, and applicability to a larger number of traits. These benefits usually come at the cost of lower precision (larger standard errors), when compared with methods that utilise individual level genotype data.
While genetic data in one of these two formats are required by all methods presented here, some methods additionally make use of other information, such as genomic annotation or expression quantitative trait locus (eQTL) data. Genomic annotation can give clues about the functional importance of a region in which a SNP resides, whereas eQTL data are the result of an association test where the phenotype analysed is expression of a gene.
Here we review a range of different polygenic methods and highlight their aims and the input data they require (Fig. 1). For each method, we provide some examples of applications relevant to psychiatric genetics research (Table 1).
Estimation of proportion of variance attributable to genome-wide SNPs
Heritability is the proportion of phenotypic variance that can be attributed to genetic factors. It is a key quantity in genetics research as it summarises the role of causal inherited variation. Trait heritability can be estimated by comparing the phenotypic resemblance among family members to their coefficients of relationships. However, estimating heritability from relatives can result in upwards-biased estimates (Vinkhuyzen et al. Reference Vinkhuyzen, Wray, Yang, Goddard and Visscher2013) if non-genetic factors shared by relatives cannot be disentangled from the shared genetic relationships. This can be circumvented by estimating heritability from genome-wide markers in unrelated (that is, very distantly related) individuals. Genomic-restricted maximum-likelihood analysis (GREML) can be used to estimate the proportion of phenotypic variance, which is captured by genotyped SNPs (SNP-heritability), by using genetic data of unrelated individuals (Yang et al. Reference Yang, Benyamin, McEvoy, Gordon, Henders and Nyholt2010). SNP-heritability estimates are typically lower than those from twin or family-based heritability estimates, because genotyped SNPs account only for a subset of all genetic effects (the remainder includes other types of polymorphisms and SNPs that are not tagged by genotyped SNPs). Hence, the parameter estimated by SNP-heritability analysis depends on the genotype data available in the data set analysed, and can only converge to the traditional parameter being estimated from family data, when the genotypes available are fully representative of the variation in the genome.
Estimation of SNP-heritability has been of particular importance for disease traits, especially those of low lifetime risk (<1% is typical of most common diseases) for which it is difficult to collect the large samples needed to calculate heritability from estimates of increased risk in relatives of those affected. Both traditional-heritability and SNP-heritability estimates are presented on the liability scale (and depend on lifetime risk of disease in the population), and empirical data of the GWAS era (Lee et al. Reference Lee, DeCandia, Ripke, Yang, Sullivan and Goddard2012a; Stahl et al. Reference Stahl, Wegmann, Trynka, Gutierrez-Achury, Do and Voight2012) demonstrates that the polygenic model implied in these estimates is justified. GWAS case–control samples for disease traits are usually heavily oversampled for cases compared with a population sample and so SNP-heritability estimates are made on this binary case–control scale and transformed to the liability scale accounting for this ascertainment (Lee et al. Reference Lee, Wray, Goddard and Visscher2011). The SNP-heritability estimates are relatively robust to choice of lifetime risk for most common diseases (lifetime risk <1%). However, for the very common diseases such as major depressive disorder (MDD) (lifetime risk 15%), SNP-heritability estimates are more sensitive to the choice of lifetime risk estimate, and to screening v. non-screening of controls (Peyrot et al. Reference Peyrot, Boomsma, Penninx and Wray2016).
The GREML method for estimation of SNP-heritability is based on a linear mixed model (Meuwissen et al. Reference Meuwissen, Hayes and Goddard2001), where a central component is the genetic relationship (or similarity) matrix, which captures the genetic relatedness of all pairs of individuals (Hayes et al. Reference Hayes, Visscher and Goddard2009). For application to human data, the algorithm is implemented in GCTA (Yang et al. Reference Yang, Lee, Goddard and Visscher2011a). Several independent studies have confirmed that GREML results in unbiased estimates of SNP heritability when the model assumptions are met, or have investigated how departures from these assumptions can influence the results (Speed et al. Reference Speed, Hemani, Johnson and Balding2012; Zhou & Stephens, Reference Zhou and Stephens2014). Potential biases that might influence heritability estimation are an association between minor allele frequency (MAF) and SNP effect size, which does not fit the assumptions of the model, or an overrepresentation of causal SNPs in regions of high or low linkage disequilibrium (LD) (Lee et al. Reference Lee, Wray, Goddard and Visscher2011; Speed et al. Reference Speed, Hemani, Johnson and Balding2012), both of which can be overcome in GCTA through the use of the GREML–LDMS approach (Yang et al. Reference Yang, Bakshi, Zhu, Hemani, Vinkhuyzen and Lee2015a). The runtime of GCTA–GREML is a function of both the number of markers, M, and the number of individuals, N. Compute time is O(N 3 + MN 2), which includes a component for the construction of the genetic relationship matrix and a component for the actual REML algorithm. The steep increase in runtime with larger N makes it impractical for data sets with very large numbers of individuals. The software BOLT-LMM can estimate variance components through a stochastic approximation algorithm, which circumvents the costly calculation of a genetic relationship matrix and thus reduces the runtime to only O(MN 1.5) (Loh et al. Reference Loh, Tucker, Bulik-Sullivan, Vilhjálmsson, Finucane and Salem2015). Note that both GCTA and BOLT-LMM have other features such as linear mixed model association analysis (Yang et al. Reference Yang, Zaitlen, Goddard, Visscher and Price2014) and genotype × environment (G × E) analysis, which are not the topic of this review.
LD score regression (LDSC) (Bulik-Sullivan et al. Reference Bulik-Sullivan, Loh, Finucane, Ripke, Yang and Patterson2015b) is a method, which requires only summary statistics to estimate SNP-heritability and has therefore shorter runtime than the methods discussed so far. Under polygenic genetic architecture, SNPs which are highly correlated with many other SNPs (have a high LD score) are more likely to tag a causal SNP and are therefore expected, on average, to have a higher association test statistic than SNPs which are not highly correlated with many other SNPs. The regression coefficient of the association test statistics of all SNPs on their LD score is a function of SNP-heritability (Yang et al. Reference Yang, Weedon, Purcell, Lettre, Estrada and Willer2011b; Bulik-Sullivan et al. Reference Bulik-Sullivan, Loh, Finucane, Ripke, Yang and Patterson2015b). Any factor that increases the association statistic of a SNP independently of its LD score (as might be found in population stratification, which induces correlations in test statistics across chromosomes) will increase the intercept term of this regression (Bulik-Sullivan et al. Reference Bulik-Sullivan, Loh, Finucane, Ripke, Yang and Patterson2015b). LDSC estimates SNP-heritability with vastly reduced computational speed compared with GREML, but with higher standard error (s.e.) of estimates. The s.e. of GREML SNP-heritability estimates is accurately approximated as 316/N, where N is the total sample size (Visscher et al. Reference Visscher, Hemani, Vinkhuyzen, Chen, Lee and Wray2014). For LDSC the standard errors of the variance component estimates are typically larger (usually by 50% or more) than those of a GREML analysis for the same sample size (Yang et al. Reference Yang, Bakshi, Zhu, Hemani, Vinkhuyzen and Nolte2015b). However, it is typical that LDSC can be applied to larger data sets (which generate smaller s.e.) since only summary statistics are needed. Comparisons of estimates from GREML and LDSC show that the accuracy of estimates from LDSC are dependent on LD scores calculated from a population representative of the population used to estimate GWAS summary statistics (Yang et al. Reference Yang, Bakshi, Zhu, Hemani, Vinkhuyzen and Lee2015a; Brown et al. Reference Brown, Ye, Price and Zaitlen2016).
GREML, LDSC and similar methods can also be used to estimate the SNP-heritability based on genomic partitioning, for example by chromosome, minor allele frequency, genomic annotations or single loci (Shi et al. Reference Shi, Kichaev and Pasaniuc2016). Analyses partitioning SNP-heritability by cell type-specific genomic annotations have shown enrichment of GWAS discoveries in trait relevant cells (Finucane et al. Reference Finucane, Bulik-Sullivan, Gusev, Trynka, Reshef and Loh2015).
Examples of applications to psychiatric disorders
GREML SNP-heritability estimated for psychiatric disorders usually ranges from 15% to 30%, depending on the disease, and reported estimates are at most half of the estimates derived from family studies (Lee et al. Reference Lee, DeCandia, Ripke, Yang, Sullivan and Goddard2012a, Reference Lee, Ripke, Neale, Faraone, Purcell and Perlis2013; Anttila et al. Reference Anttila, Finucane, Bras, Duncan, Falcone and Gormley2016). SNP-heritability estimates of quantitative mental traits tend to be relatively low (<0.15), for example, a meta-analysis of the Big Five personality traits reported significant SNP-heritability estimates below 0.2 for all five traits (Lo et al. Reference Lo, Hinds, Tung, Franz, Fan and Wang2016). This is in line with another study of up to 300 000 people finding SNP-heritability estimates for subjective well-being, depressive symptoms and neuroticism in the same range (Okbay et al. Reference Okbay, Baselmans, De Neve, Turley, Nivard and Fontana2016). Partitioning heritability by tissue-specific genomic annotations has identified the central nervous system as the most relevant tissue in the aetiology of schizophrenia and bipolar disorder (Finucane et al. Reference Finucane, Bulik-Sullivan, Gusev, Trynka, Reshef and Loh2015).
Estimation of genetic correlation using genome-wide SNPs
Two traits are genetically correlated, if there is a correlation between the true effect sizes of SNPs affecting the two traits, or in other words, when, on-average, SNPs have directionally similar effects on two traits. Genetic correlation can reflect pleiotropy, which is defined as a genetic variant affecting both traits. However, while a non-null genetic correlation can imply pleiotropy is present at many SNPs, a genetic correlation of zero could arise when pleiotropy is common, but the direction of effects are uncorrelated across SNPs. Genetic correlations (r G) are of interest because they suggest a shared aetiology. However, misdiagnosis between two genetically uncorrelated diseases can generate significant estimates of r G (Kendler, Reference Kendler1987; Wray et al. Reference Wray, Lee and Kendler2012). The availability of genetic marker data on disease case–control samples has allowed the interrogation of the genetic relationship between diseases, often for the first time, since traditional methods to estimate genetic correlation based on increased risk of a disease in relatives of those with another disease requires often unattainably large samples (Visscher et al. Reference Visscher, Hemani, Vinkhuyzen, Chen, Lee and Wray2014).
Formally, genetic correlation is defined as genetic covariance between two traits, scaled by the genetic standard deviations of the two contributing traits. Methods used to estimate SNP heritability can be extended into a bivariate form to estimate r G. In order to estimate genetic correlation using GREML, individual-level genetic data and measurements on two phenotypes are required (from the same or from different individuals). The power of bivariate GREML analyses to detect r G departing from 0 or from 1 depends on the population value of r G, the SNP-heritability of both traits, on the sample sizes, on whether the same or different samples are used for the two traits, and for disease traits, on the proportion of cases in the sample (Visscher et al. Reference Visscher, Hemani, Vinkhuyzen, Chen, Lee and Wray2014). For example, for two diseases with lifetime prevalence of 1%, SNP-h 2 of 0.2 and genetic correlation of 0.5, 5000 cases and 5000 controls for each disease are sufficient to have 89% power at type 1 error rate of 5% to detect a genetic correlation greater than 0, corresponding to a standard error of 0.06. The BOLT-LMM software is also capable of calculating r G in a bivariate GREML analysis with shorter runtime (Loh et al. Reference Loh, Tucker, Bulik-Sullivan, Vilhjálmsson, Finucane and Salem2015). In contrast to heritability estimates, r G estimates are scale independent (approximately) and hence scale transformation is not needed (Lee et al. Reference Lee, Yang, Goddard, Visscher and Wray2012b).
If summary statistics on two or more traits are available, LDSC can be used to estimate r G between them, albeit with higher standard errors than GREML. To make the application of LDSC for SNP-heritability and r G estimation even more user-friendly, the LD Hub resource has been developed, which provides access to summary statistics from more than 200 different traits. LD Hub then calculates genetic correlation estimates between each of these traits and any additional traits for which the user provides summary statistics data (Zheng et al. Reference Zheng, Erzurumluoglu, Elsworth, Kemp, Howe and Haycock2017).
Estimation of r G is most commonly applied to uncover genetic relationships between two different traits, but it can also be applied to detect heterogeneity in genetic effects between two different groups, for example a trait might be under different genetic control in men and in women or in old people and young people (i.e. G × E). It has also been applied to two data sets of the same disease, where r G is expected to be one (Lee et al. Reference Lee, Ripke, Neale, Faraone, Purcell and Perlis2013), to infer between-sample heterogeneity or two data sets of the same disease but of different ethnicity (De Candia et al. Reference De Candia, Lee, Yang, Browning, Gejman and Levinson2013). Calculating r G across populations using LDSC is not straightforward, however, due to between population differences in both allele frequencies and LD structure. The program Popcorn addresses this problem and allows one to estimate the transethnic genetic correlation based on summary statistics and LD matrices from two populations (Brown et al. Reference Brown, Ye, Price and Zaitlen2016).
Examples of applications to psychiatric disorders
One of the first applications of the bivariate GREML method to disease traits was to estimate genetic correlations between psychiatric disorders, presenting evidence that most pairs of disorders result in estimates that are significantly different from zero (PGC cross disorder group (Lee et al. Reference Lee, Ripke, Neale, Faraone, Purcell and Perlis2013)). From those initial estimates the high correlations between schizophrenia and bipolar disorder (~0.6) and between bipolar and MDD (0.5) were considered plausible given evidence from family studies; however, the high genetic correlation between schizophrenia and MDD was more surprising for the clinical community. In fact, close study of the literature from family studies (PGC Cross disorder group, 2013) showed that all three genetic correlation estimates were consistent with published increased risk to relatives (RR) of one disorder for individuals diagnosed with another. However, for the same genetic correlation the size of RR involving a very common disorder (~15% lifetime risk) such as MDD is much smaller than when both disorders are less common (<1% lifetime risk for both schizophrenia and bipolar disorder).
Genetic correlation can arise through misdiagnosis between two diseases. For example, those first presenting with clinical features consistent with a diagnosis of bipolar disorder can in the long-term receive a diagnosis of schizophrenia (and vice versa) (Joyce, Reference Joyce1984; Meyer & Meyer, Reference Meyer and Meyer2009). However, it can be shown analytically that very high misclassification rate of 20% would be needed under no shared aetiology to result in r G of ~0.6 estimated between schizophrenia and bipolar disorder (Wray et al. Reference Wray, Lee and Kendler2012). In contrast, a high genetic correlation between disorders would be consistent with some clinical presentations being difficult to classify.
LDSC has been applied to a full battery of GWAS summary statistics and report higher estimates of genetic correlations estimates between pairs of psychiatric disorders than between psychiatric- and non-psychiatric disorders (Bulik-Sullivan et al. Reference Bulik-Sullivan, Finucane, Anttila, Gusev, Day and Loh2015a). Some notable examples include a positive genetic correlation between schizophrenia and anorexia nervosa, and a positive genetic correlation between bipolar disorder and years of education. A study investigating genetic sharing between neurological and psychiatric traits found that among neurological disorders significant genetic correlations are rare, but that there is some overlap in genetic risk between migraine and MDD, and ADHD and Tourette syndrome (Anttila et al. Reference Anttila, Finucane, Bras, Duncan, Falcone and Gormley2016). On the other hand, genetic correlations between psychiatric traits and personality traits are more common. Figure 2 shows the top genetic correlations for psychiatric disorders and traits. Data obtained from LD Hub.
Polygenic risk prediction
Estimates of SNP effects can be used to predict the genetic risk of individuals. Simple risk scores for each individual are calculated as the sum over all per SNP effects, where the per SNP effect is the allele count of the SNP for the individual multiplied by the effect size of the SNP (Wray et al. Reference Wray, Goddard and Visscher2007; Purcell et al. Reference Purcell, Wray, Stone, Visscher, O'Donovan and Sullivan2009). Here the SNP effects come from a typically large and well-powered discovery (sometimes called training) data set. While the ultimate goal of genetic risk prediction is in applications where the phenotype has not yet been observed, in research applications the risk predictor is evaluated for individuals in a target (sometimes called validation or testing) data set where the phenotype has already been recorded, so that the efficacy of the predictor can be evaluated.
Screening of high-risk individuals for early intervention or prevention programs is a potential clinical application of polygenic risk prediction that at present is not widely used because of the low accuracy of genetic risk predictors (Chatterjee et al. Reference Chatterjee, Shi and García-Closas2016). However, there are applications of polygenic risk prediction in research where low prediction accuracy is less limiting; for example, it could be a cost-effective strategy to conduct follow-up studies in samples ascertained to be low or high for polygenic risk. The genetic predictor is evaluated against a measured phenotype in the target sample, which may or may not be the same phenotype from which the predictor was constructed. Generally, if the predicted and the measured trait are genetically correlated, then there should be positive prediction accuracy, given enough power in both data sets (Dudbridge, Reference Dudbridge2013).
In the usual implementation of polygenic risk scores, SNP effect sizes have been estimated from the standard, one SNP at a time GWAS analysis. The construction of polygenic risk scores is based on some decision about the proportion of SNPs to include in the predictor. As the discovery sample p value threshold becomes more lenient, the increased predictive power of including estimated effect sizes from more true positive associated SNPs is balanced by the inclusion of more false positives. The optimum proportion of SNPs to include depends on the (unknown) genetic architecture and size of the discovery sample. In the latest schizophrenia GWAS, the optimum p value threshold was identified as 0.05 (based on variance explained in out-of-sample prediction across many samples), although inclusion of all SNPs did not lower accuracy drastically (Ripke et al. Reference Ripke, Neale, Corvin, Walters, Farh and Holmans2014). Often a range of different p value thresholds are used to determine the best predictor, although this approach is prone to overfitting. Software such as PLINK (Purcell et al. Reference Purcell, Neale, Todd-Brown, Thomas, Ferreira and Bender2007; Chang et al. Reference Chang, Chow, Tellier, Vattikuti, Purcell and Lee2015) implements basic polygenic risk scoring, while PRSice compares polygenic risk predictors using a large number of p value cutoffs to find the optimum threshold given the data (Euesden et al. Reference Euesden, Lewis and O'Reilly2015). In polygenic risk scoring, LD among SNPs is usually accounted for by applying LD-clumping, i.e. pruning of SNPs based on LD but with higher preference to SNPs with lower p values (usually simply termed ‘clumping’). Prediction accuracy can be improved by using methods that provide estimates of SNP effects that are conditional on all other SNPs, thereby directly taking SNP LD correlations into account (de Los Campos et al. Reference de Los Campos, Vazquez, Fernando, Klimentidis and Sorensen2013), such as Genomic Best Linear Unbiased Prediction (GBLUP) (Meuwissen et al. Reference Meuwissen, Hayes and Goddard2001), which is widely used in animal breeding. In GWAS, there are many more SNPs compared with individuals so effects sizes of each SNP are not estimable in a multiple regression model. In GBLUP, it is assumed that SNP effect sizes are drawn from a normal distribution and a shrinkage term, or penalty term, proportional to the trait heritability is introduced in the model. More complex models are BayesR (Moser et al. Reference Moser, Lee, Hayes, Goddard, Wray and Visscher2015) or LASSO (least absolute shrinkage and selection operator) (Abraham et al. Reference Abraham, Kowalczyk, Zobel and Inouye2013). These methods shrink the SNP effect estimates (i.e. the effect attributable to SNPs in LD with each other is shared between the correlated SNPs) and will ensure the predicted phenotypes are more accurate, and in LASSO and BayesR shrink a proportion of SNPs to zero. The gains in prediction accuracy are dependent on underlying genetic architecture (Moser et al. Reference Moser, Lee, Hayes, Goddard, Wray and Visscher2015).
If individual-level genotype data are not available, it is still possible to transform marginal SNP effects (standard GWAS summary statistics) into penalised, conditional SNP effects, by making use of an LD reference data set (Yang et al. Reference Yang, Ferreira, Morris, Medland, Madden and Heath2012; Robinson et al. Reference Robinson, Kleinman, Graff, Vinkhuyzen, Couper and Miller2017). This is implemented in GCTA (Yang et al. Reference Yang, Lee, Goddard and Visscher2011a) and in LDpred (Vilhjálmsson et al. Reference Vilhjálmsson, Yang, Finucane, Gusev, Lindström and Ripke2015), which not only accounts for LD between SNPs, but also uses a Bayesian framework to adjust the SNP effects for traits where an infinitesimal model (all SNPs have some effect) is not the best fit to the data (e.g. autoimmune disorders). This is analogous to the selecting SNPs based on p value in standard polygenic risk prediction and can further improve accuracy.
Finally, GBLUP approaches can be extended to a multi-trait model, which can further improve prediction accuracy when phenotypes are genetically correlated, because measurements on each trait provide information on the genetic values of the other correlated traits. This approach has been implemented in the program MTG2 and has been shown to improve prediction accuracy for schizophrenia and bipolar disorder (Maier et al. Reference Maier, Moser, Chen, Ripke, Coryell and Potash2015).
Examples of applications to psychiatric disorders
Polygenic risk prediction is widely used in psychiatric genetics, not to infer an individual's case control status, but to gain a better understanding of disease aetiology. Polygenic risk prediction in applications relevant to psychiatry has been reviewed previously (Wray et al. Reference Wray, Lee, Mehta, Vinkhuyzen, Dudbridge and Middeldorp2014). Some more recent examples include schizophrenia polygenic risk scores calculated for community samples of individuals, which explain variation in creativity (Power et al. Reference Power, Steinberg, Bjornsdottir, Rietveld, Abdellaoui and Nivard2015) and cannabis use (Power et al. Reference Power, Verweij, Zuhair, Montgomery, Henders and Heath2014). An association between schizophrenia polygenic risk scores and negative symptoms and anxiety disorder in adolescents gives reason to hope that these kind of studies can not only lead to a better understanding of disease aetiology, but may someday also contribute to early intervention programmes (Jones et al. Reference Jones, Stergiakouli, Tansey, Hubbard, Heron and Cannon2016; Kendler, Reference Kendler2016). Polygenic risk prediction additionally provides a novel approach to studying gene – environment interactions (G × E). Traditional G × E studies, which test for an interaction effect between single genetic variants and an environmental exposure on disease risk, often suffered from low power caused by the small amount of variance explained by individual genetic loci. In contrast, interactions between a polygenic risk score and environmental exposure can be detected more easily, because polygenic risk scores explain more of the variance in disease risk than individual loci. This type of G × E study has been applied to investigate a potential interaction between a polygenic risk score for MDD and childhood trauma on the risk for MDD. Two independent studies found that both the polygenic risk score and exposure to childhood trauma increase the risk for MDD, but came to different conclusions about the nature of the interaction between the two: One study found a positive interaction, meaning that those with both exposure to childhood trauma and high polygenic risk scores are at the greatest risk of developing MDD (Peyrot et al. Reference Peyrot, Milaneschi, Abdellaoui, Sullivan, Hottenga and Boomsma2014), while another study found a negative interaction, where people with exposure to childhood trauma and low polygenic risk scores are at the highest risk (Mullins et al. Reference Mullins, Power, Fisher, Hanscombe, Euesden and Iniesta2016).
Mendelian randomisation
MR analysis investigates the causal relationships between traits. It is a specific form of an instrumental variable analysis (Evans & Davey Smith, Reference Evans and Davey Smith2015), where the goal is to test the causal effect of an explanatory variable (exposure to a risk factor) on a dependent outcome variable (such as disease risk). In MR, genetic markers are used as the instrumental variables. The MR poster child is the application to blood lipid levels and myocardial infarction (Voight et al. Reference Voight, Peloso, Orho-Melander, Frikke-Schmidt, Barbalic and Jensen2012). Together with a number of randomised controlled trials (Keene et al. Reference Keene, Price, Shun-Shin and Francis2014), this application of MR has had major impact on drug development by providing evidence against a causal role of HDL and thus helped in the search for effective ways of preventing myocardial infarction, and demonstrates how evidence from an MR analysis could be used to circumvent costly randomised controlled trials.
Despite its great potential, MR is often limited by low power, and by the fact that it is very difficult to show that all the assumptions (see Fig. 3) that are necessary to infer causality are met. However, if MR is applied bidirectionally for trait pairs of approximately comparable power, and evidence for significant causality is detected in only one direction, then this can help to infer causality over pleiotropy. The power in MR studies is a function of the true causal association between exposure and outcome and of the variance explained by the instrumental variables (Brion et al. Reference Brion, Shakhbazov and Visscher2013). Since statistically significantly associated SNPs often only explain a small proportion of the genetic variance, for many pairs of traits, very large sample sizes are needed to achieve sufficient power to detect causal associations [see online calculator (Brion et al. Reference Brion, Shakhbazov and Visscher2013)].
In recent years, many improvements to the MR method have been developed. Of particular importance for polygenic traits is the extension from single-SNP instrumental variables to instrumental variables that comprise multiple SNPs (Evans et al. Reference Evans, Brion, Paternoster, Kemp, McMahon and Munafò2013). This may increase power since in polygenic traits individual SNPs are likely to be weak instruments. Two-sample MR is an extension that makes it possible to combine independent data for genotype – exposure and genotype – outcome in an MR experiment, thus allowing to investigate causal associations among all phenotypes for which well-powered GWAS data are available (Burgess et al. Reference Burgess, Butterworth and Thompson2013). Further developments of MR include better ways to test some of the assumptions, modifications which allow the relaxation of the no-pleiotropy assumption, and improvements which increase the power to detect causal effects (Evans & Davey Smith, Reference Evans and Davey Smith2015). The web-based resource MR BASE has been developed to simplify the application of MR to test causality between a large number of traits and to compare different variations of the method (Hemani et al. Reference Hemani, Zheng, Wade, Laurin, Elsworth and Burgess2016). Summary data for more than a thousand traits have been collected and can be tested for causal associations with data provided by the user.
Examples of applications to psychiatric disorders
Several MR studies have investigated a potential causal influence of variables, which are known to be associated with psychiatric traits and diseases from observational studies. One study which looked at BMI as a potential risk factor concluded that there is no evidence of BMI being a causal influence on schizophrenia and bipolar disorder, but weak evidence of BMI conferring a higher risk of MDD. It was noted, however, that the BMI association suffers from low power caused by small sample sizes for MDD (Hartwig et al. Reference Hartwig, Bowden, Loret de Mola, Tovo-Rodrigues, Davey Smith and Horta2016). C-Reactive protein (CRP) is a potential risk factor for psychiatric disorders with undisputed correlational association, but unclear causality. A study from 2016 surprisingly found a protective role of genetically elevated CRP levels on the risk for schizophrenia, as well as weak (nominally significant) evidence for a risk increasing effect on bipolar disorder as well as a range of other somatic traits (Prins et al. Reference Prins, Abbasi, Wong, Vaez, Nolte and Franceschini2016). This is in contradiction with another 2016 study, which identified elevated CRP levels to confer an increased risk of schizophrenia (Inoshita et al. Reference Inoshita, Numata, Tajima, Kinoshita, Umehara and Nakataki2016). There is much debate on whether cannabis is a risk factor for psychosis or schizophrenia, or whether the association is due to reverse causation or due to a confounding factor (McGrath et al. Reference McGrath, Welham, Scott, Varghese, Degenhardt and Hayatbakhsh2010; Gage et al. Reference Gage, Hickman and Zammit2016). Recently MR studies have provided evidence for a causal role of cannabis in the development of schizophrenia, but also for a reverse causation (Gage et al. Reference Gage, Jones, Burgess, Bowden, Davey Smith and Zammit2017; Vaucher et al. Reference Vaucher, Keating, Lasserre, Gan, Lyall and Ward2017). Several other risk factors for schizophrenia, anxiety and depression have been investigated through MR with negative results (Bjørngaard et al. Reference Bjørngaard, Gunnell, Elvestad, Davey Smith, Skorpen and Krokan2013; Gage et al. Reference Gage, Smith, Zammit, Hickman and Munafò2013; Taylor et al. Reference Taylor, Burgess, Ware, Gage, Richards and Davey Smith2016).
In summary, MR studies investigating risk factors for psychiatric disorders could in a few cases provide evidence for a risk increasing effect. As the power of GWAS with increasing sample sizes, there will be more robust SNP associations which can be used as instrumental variables. This will provide more evidence for whether the many negative results were just caused by low power or by the absence of a true causal association, but interpretation of causality needs to carefully consider confounding factors.
Fine-mapping and gene prioritisation
LD between SNPs is both a blessing and a curse for GWAS. On one hand, it makes it possible to probe only a subset of all genetic variants yet still detect associations for a much larger set, either through tagging of non-genotyped SNPs by genotyped SNPs or through LD-based imputation to sequenced reference samples. On the other hand, it means that a detected association does not necessarily imply a causal role for the associated SNPs. Fine-mapping attempts to identify which out of a number of associated SNPs in a LD region have a causal role, and which are merely associated because they are in LD with causal SNPs (Sekar et al. Reference Sekar, Bialas, de Rivera, Davis, Hammond and Kamitaki2016).
To better understand disease aetiology, it may be of interest to identify causal genes, rather than causal SNPs. In many cases the causal gene may simply be the gene closest to the most strongly associated SNP in a region, but this is not always so (Claussnitzer et al. Reference Claussnitzer, Dankel, Kim, Quon, Meuleman and Haugen2015).
Identifying causal SNPs
A range of different approaches have been developed for the fine-mapping of SNPs. Most of them use information on functional annotation of the genome and LD between SNPs, in addition to SNP association statistics on one or more diseases. An algorithm (PICS) utilising all these kinds of information has recently been applied to 21 autoimmune disorders and identified many putative causal variants by integrating information from different types of functional annotations, including epigenetic marks and gene expression information (Farh et al. Reference Farh, Marson, Zhu, Kleinewietfeld, Housley and Beik2015).
Fine-mapping methods can use LD information to either identify causal SNPs within a region that may not have the strongest association signal, but are located in a functional genomic element like an enhancer, or they can use LD information to identify multiple independently associated SNPs, by calculating the association signal conditionally on the association signal of neighbouring SNPs. Traditionally, this would require full genotype data on the trait of interest. However, it has been demonstrated that it is possible to borrow LD information from a reference genotype data set for a conditional analysis, making it possible to apply this approach to traits for which only summary statistics are available (Yang et al. Reference Yang, Ferreira, Morris, Medland, Madden and Heath2012; GCTA-cojo). For this to work well, the LD structure in the reference genotype population should be a good approximation of the LD structure in the population on which the GWAS has been performed. Cross-ethnic genetic studies can aid fine mapping of disease loci, exploiting differences in allele frequency and LD (Morris, Reference Morris2011), but application to psychiatric disorders has been limited by the dearth of non-European data sets.
Fine-mapping can benefit from data on multiple traits. When two traits share regions of significant genetic associations it can be investigated if they share causal loci at those shared regions, or if different loci drive the regional association in each trait. This has been investigated in a Bayesian framework using only summary statistics and resulted in the identification of 341 loci associated with more than one trait across 42 different phenotypes (Pickrell et al. Reference Pickrell, Berisa, Liu, Ségurel, Tung and Hinds2016). SNPs associated with two traits form the basis of the previously discussed MR methods.
Identifying causal genes
Genome-wide association studies suffer from a massive multiple-testing burden, owing to the large number of association tests between SNPs and phenotype. To minimise the number of false positive results, associations are usually required to be significant at a p value of 5 × 10−8 (Bonferroni correction of a million independent tests). Gene-based tests have a reduced multiple-testing burden (~20 000 independent tests) and give biological meaning to association results. In gene-based tests SNPs are aggregated into larger groups (Neale & Sham, Reference Neale and Sham2004) assuming that SNPs exert their effect through nearby genes, which is not always true. The PrediXcan method refines this aggregation step by including external tissue specific eQTL data to predict gene expression levels based on SNP data (Gamazon et al. Reference Gamazon, Wheeler, Shah, Mozaffari, Aquino-Michaels and Carroll2015). This has several advantages over conventional gene-based tests as it limits the multiple-testing burden by only using SNPs, which are known to affect gene expression, and the direction of effect of a SNP on expression levels is not lost when aggregating multiple SNPs. By using tissue-specific eQTL data, associations can be tested between a phenotype and expression changes in tissues relevant to the phenotype. PrediXcan has been used to identify genes which may play causal roles in amyloid deposition and cognitive changes in Alzheimer's disease (Hohman et al. Reference Hohman, Dumitrescu, Cox and Jefferson2017) and genes associated with Asthma (Ferreira et al. Reference Ferreira, Jansen, Willemsen, Penninx, Bain and Vicente2017). While PrediXcan requires individual-level genotype data, the extension MetaXcan requires only summary statistics and promises similar accuracy, if the right reference population is used for LD estimation (Barbeira et al. Reference Barbeira, Shah, Torres, Wheeler, Torstenson and Edwards2016). Transcriptome-wide association study (TWAS) is a summary statistics based method similar to MetaXcan, which differs in the algorithm used to predict expression (Gusev et al. Reference Gusev, Ko, Shi, Bhatia, Chung and Penninx2016a). More recently, TWAS has been applied to detect pairs of traits with genetic correlations at the level of predicted expression (Mancuso et al. Reference Mancuso, Shi, Goddard, Kichaev, Gusev and Pasaniuc2017). A current limiting factor in this and other expression-based methods is the quality of tissue-specific eQTL data. The previously mentioned methods prioritise genes based on predicted effects of SNPs on expression. This is in contrast to other methods, such as DEPICT, which use gene expression data to predict gene function and prioritise genes at specific loci based on the predicted function (Pers et al. Reference Pers, Karjalainen, Chan, Westra, Wood and Yang2015).
While an association of predicted gene expression and a phenotype is suggestive of a causal role for that gene, pleiotropy is an alternative explanation for this association. That is, the same SNPs could independently lead to expression changes in one gene and via a different route have an effect on the phenotype. The summary statistics-based MR (SMR) method (Zhu et al. Reference Zhu, Zhang, Hu, Bakshi, Robinson and Powell2016) attempts to distinguish between these two scenarios using eQTL SNPs as instrumental variables and gene expression as exposure variable. This method also applies a follow-up test (HEIDI), which identifies and excludes regions where multiple linked SNPs are independently associated with gene expression and phenotype, as a strategy to prioritise regions with evidence of a simple causal mechanism for functional follow-up. The method was applied to several complex human traits and has identified 126 putatively causal genes, of which 77 were not the closest gene to their respective top associated GWAS hit.
MetaXcan, TWAS and SMR use the same type of data to identify genes of interest. However, there are many subtle differences between the methods, which will likely lead to unique results for each method. To date, no systematic comparison of their relative performance has been published.
Examples of applications to psychiatric disorders
PrediXcan has been applied to bipolar disorder, resulting in the identification of two genes, PTPRE and BBX, for which predicted increased expression in whole blood and the anterior cingulate cortex, respectively, was associated with increased risk of bipolar disorder (Shah et al. Reference Shah, Wheeler, Gamazon, Nicolae, Cox and Im2016). SMR has been applied to schizophrenia, highlighting two genes, SNX19 and NMRAL1, with a potentially causal influence (Zhu et al. Reference Zhu, Zhang, Hu, Bakshi, Robinson and Powell2016). The previous two examples have highlighted genes by using eQTL data, but the concept can be extended to account for the fact that chromatin modifications may mediate the association between genetic variants and eQTLs, and that splice-QTLs, rather than eQTLs, may underlie the genetic effect of a SNP. A TWAS study on schizophrenia has incorporated these ideas and has highlighted 157 genes, many of which were identified due to brain specific splice-QTLs (Gusev et al. Reference Gusev, Mancuso, Finucane, Reshef, Song and Safi2016b).
Detection of genetic heterogeneity
Most genetic studies are based on the assumption that individuals who exhibit similar symptoms or who have been diagnosed with the same disease are representatives of the same underlying biology defined by a common genetic architecture. Under a polygenic disease architecture, each individual is likely to have a unique combination of risk loci, but with each combination drawn from the pool of risk loci. Genetic heterogeneity occurs when individuals with the same clinical presentation have risk alleles drawn from independent (or perhaps correlated) sets of risk loci, and the genetic risk profile of one or more subgroups of cases departs from that of the rest. This may arise through misclassification of some cases, or through distinct aetiological pathways leading to the same diagnosis (Flint & Kendler, Reference Flint and Kendler2014; Jeste & Geschwind, Reference Jeste and Geschwind2014). The inherent phenotypic heterogeneity within psychiatry makes detection of genetic heterogeneity an appealing goal and identification of distinct pathways holds the promise of shedding light on disease aetiology. Furthermore, a biological basis for disease stratification could lead to more personalised treatments (Kapur et al. Reference Kapur, Phillips and Insel2012).
Although the concept of identifying genetic sub-groups is intuitively appealing simulations suggest it is very difficult to find meaningful genetic groupings if each sub-group has a genetic architecture of a large number of loci with small effects. The large number of combinations of risk loci, their small effect sizes, the uncertainty about the size or even about the presence of genetically heterogeneous groups, and the challenging disentanglement from population stratification all contribute to the difficulty of this problem. As a result, even data sets comprising hundreds of thousands of individuals may not provide sufficient power for naïve approaches to detecting even the simplest scenarios of genetic heterogeneity (Han et al. Reference Han, Pouget, Slowikowski, Stahl, Lee and Diogo2016; Maier et al. unpublished results).
The BUHMBOX method (Han et al. Reference Han, Pouget, Slowikowski, Stahl, Lee and Diogo2016) frames the question of disease heterogeneity in a different way and sets out to test if two diseases that share a genetic basis (r G > 0) are correlated because the shared genetic risk factors are present in the whole sample (pleiotropy) or are confined to only a subgroup of individuals (identifiable genetic sub-type) (Han et al. Reference Han, Pouget, Slowikowski, Stahl, Lee and Diogo2016). The BUHMBOX method investigates LD-independent risk loci for disease B in individuals diagnosed with disease A. If only a subgroup of individuals has a higher genetic risk for disease B, this will induce a correlation among the disease B risk loci, which would support the presence of genetic heterogeneity and not pleiotropy. This approach can demonstrate presence of a genetic sub-group without identifying which specific individuals diagnosed with disease A are genetically more similar to those with disease B. The method found evidence for heterogeneity among seronegative rheumatoid arthritis cases, suggesting that they may contain a significant proportion of seropositive cases. The power of the BUHMBOX method to detect heterogeneity depends on the number of cases, number of markers, risk allele frequency, odds ratio and proportion of disease A cases that are genetically disease B cases (heterogeneity proportion). For example, with 2000 cases and 2000 controls, a heterogeneity proportion of 0.2 and 50 risk loci, the power to detect heterogeneity at a significance threshold of 0.05 is 92% (Han et al. Reference Han, Pouget, Slowikowski, Stahl, Lee and Diogo2016).
Alternatively, genetic heterogeneity can be studied by first grouping individuals (for example, disease cases) based on non-genetic data and then testing for genetic heterogeneity between these disease subtypes. A recent method follows this approach by jointly modelling the probability for each SNP of whether its frequency differentiates cases and controls and/or differentiates disease subgroups. Applied to type 1 diabetes, this method suggests that cases with and without autoantibodies exhibit a different genetic architecture for type 1 diabetes disease risk (Liley et al. Reference Liley, Todd and Wallace2016).
Examples of applications to psychiatric disorders
The BUHMBOX method was used to investigate the shared genetic basis between MDD and schizophrenia, and found no evidence that suggested that a subset of MDD cases was genetically more similar to schizophrenia cases, implying that the genetic correlation estimated between the disorders reflect pleiotropy (Han et al. Reference Han, Pouget, Slowikowski, Stahl, Lee and Diogo2016). Application of this method will become more interesting as sample sizes increase.
Conclusions
For many psychiatric disorders, genetic factors explain more variation in disease risk in the population than any other known risk factors (Sullivan et al. Reference Sullivan, Daly and O'Donovan2012), but only recently has it become possible to resolve the overall familial genetic risk into individual risk factors at the DNA level. The evidence is now conclusive that psychiatric disorders, like many other common disease and disorders are highly polygenic underpinned by thousands of genetic loci, each of which contributes a small amount to the overall genetic risk. After a period in which many candidate gene studies have reported association results, which failed to replicate (Chanock et al. Reference Chanock, Manolio, Boehnke, Boerwinkle, Hunter and Thomas2007), the hypothesis-free GWAS approach has established itself as the dominating paradigm to find associated genetic loci. With ever growing sample sizes, more and more SNPs surpass the stringent p value threshold for almost all investigated traits. However, it is also becoming clear that the bulk of genetic risk factors remains hidden among those loci that do not achieve genome-wide significance. Many of the methods presented in this review leverage the large amount of information that is harboured by genetic variants, regardless of whether or not they achieve significance. While the focus of some methods is on individual SNPs or genes, other methods aggregate over a potentially large number of loci to answer questions such as ‘What is the combined genetic effect of all measurable SNPs on phenotypic variance?’, ‘Do these traits have a shared genetic aetiology?’ or ‘Do these traits causally influence one another?’. One thing that all of these methods have in common is that their utility crucially depends on the power to detect an association, which in turn depends on sample size. Larger sample sizes lead to a higher computational burden, but for most analytical questions, which have been presented here, there are methods which can utilise summary statistics and thus drastically reduce runtime and memory requirements.
The literature on new methods is ever-growing and while we have tried to present an overview of key methods to help navigation of this complex field, it is difficult to be fully exhaustive. In particular, pathway analyses are important application of GWAS summary statistics, but these methods have been reviewed recently (Wang et al. Reference Wang, Li and Hakonarson2010) and so were not considered here. We have illustrated the methods with some example applications; however, we expect the full potential of the data will only be revealed in coming years when studies with half a million or more people will be widely available. A key issue for the field is to develop cost-effective strategies to capture larger sample sizes with both DNA samples and phenotypic data as these are needed to evaluate the extent to which genetic data can explain phenotypic heterogeneity and to fulfil the potential of more personalised medicine.
Acknowledgements
This research is funded by the Australian National Health and Medical Research Council (N.R.W., grant numbers 1078901, 1087889), (P.M.V., grant number 1078037).
Declaration of Interest
None.
Ethical Standards
Not applicable.