The SNP heritability model refers to a statistical model parametrising the variance of SNP effect sizes [Reference Speed, Kaphle and Balding4]. Most SNP heritability analyses to date assume a 1-parameter model that assigns the same variance to each SNP, which is unrealistic and has led to sub-optimal modelling of the genetic architecture and inaccurate estimates. Recent works have incorporated additional parameters to incorporate the properties of SNPs such as the minor allele fraction (MAF), local LD patterns and functional knowledge into the model to better capture heritability. This has led to the development of models such as the LDAK model [Reference Speed, Cai, Johnson, Nejentsev and Balding2], the Baseline LD model [Reference Gazal, Finucane, Furlotte, Loh, Palamara, Liu, Schoech, Bulik-Sullivan, Neale, Gusev and Price1] and recently the BLD-LDAK model [Reference Speed, Holmes and Balding3]. The BLD-LDAK heritability model is a complex (66 df) model containing highly correlated sets of predictors. My thesis is focused on exploring and testing existing and new predictors of SNP heritability and combining these to build a parsimonious heritability model and compare its performance with the BLD-LDAK model. I start by evaluating the BLD-LDAK heritability model using data from the UK Biobank project over 14 traits and provide updated results of SNP heritability and functional enrichment analyses. In the next step, I collect a comprehensive set of functional annotations that might be predictive of heritability from public genomic databases such as ENCODE, RoadMap Epigenome project, UCSC Genome Browser and RefSeq genes, and subject them to systematic variable selection to construct a new 10-parameter BIC10 heritability model. I then perform heritability analyses of traits recorded in the UK Biobank project to compare models. I also compare heritability models based on phenotype prediction accuracy across a range of diverse traits from the UK Biobank and assess the portability of heritability models across human ancestries. I show that the BIC10 and BLD-LDAK heritability models have equivalent performance, although the BIC10 model has 56 fewer parameters. The fewer degrees of freedom provide better interpretability and computational advantages for heritability analysis without loss of accuracy.
The published thesis is available at http://hdl.handle.net/11343/325143.