The role of gene expression in complex trait

The role of gene expression in complex trait

The role of gene expression in complex trait heritability Alexis Battle Motivation and introduction How can we use gene expression and epigenetics to help us understand complex trait genetics? Majority of trait-associated variation is non-coding. Common hypothesis is that most of these function by altering gene expression. SNP A Motivation and introduction Using expression and epigenetic data to inform missing heritability: Quantify contribution of this important component of trait heritability? Explain mechanism?

Increase power to detect trait-associated variants (or build good predictors)? 1. Genetics of gene expression Genetic variants affect gene expression Expression of gene IKZF1 eQTL (expression Quantitative Trait Locus) analysis: Association between genotype and RNA expression levels CC CA SNP genotype at chr20: 5177528 AA Prevalence of eQTLs

Number of genes with eQTL identified Cis-eQTLs have now been identified for nearly every human gene, with numerous large studies available 10,914 cis eQTLs cis-sQTLs 6,738 2,851 1,158 200 400 600 800

1000 Cohort size Battle, Genome Research, 2014 Large-scale eQTL analyses DGN: 922 whole blood RNA-seq GEUVADIS: 462 LCL RNA-seq MUTHER: 850, several tissues, microarray and later RNA-seq Wright et al, 2014: 2,752 twins, whole blood microarray Westra et al, 2013: meta-analysis of 5,311 whole blood microarray samples GTEx Project GTEx Consortium v6p data 449 genotyped donors

7051 gene expression samples 42 post-mortem tissues 31 solid-organ tissues 10 brain subregions 44 tissues 116 individuals (WGS) 449 individuals (RNA-seq + genotype) The GTEx Consortium, Nature 2017 B r a in

A n te r io r c in g u la te c o r te x (B A 2 4 ) 9 3 8 / 0 (n = 7 2 ) C a u d a te n u c le u s (b a s a l g a n g lia ) P itu ita r y 1 9 6 7 / 0 (n = 1 0 0 ) 1 6 0 7 / 0 (n = 8 7 ) C e r e b e lla r h e m is p h e r e 2 5 5 7 / 0 (n = 8 9 ) T h y r o id C e r e b e llu m 7976 / 21 3 4 5 4 / 0 (n = 1 0 3 ) (n = 2 7 8 ) C o r te x 2 0 8 6 / 0 (n = 9 6 ) F r o n ta l C o r te x (B A 9 ) 1 5 8 8 / 0 (n = 9 2 ) H ip p o c a m p u s 8 5 3 / 0 (n = 8 1 ) H y p o th a la m u s

8 7 9 / 0 (n = 8 1 ) N u c le u s a c c u m b e n s ( b a s a l g a n g lia ) 1 6 1 7 / 0 (n = 9 3 ) P u ta m e n (b a s a l g a n g lia ) 1 2 3 8 / 3 (n = 8 2 ) L iv e r 1 2 3 1 / 0 (n = 9 7 ) S to m a c h 2 9 3 8 / 0 (n = 1 7 0 ) A d r e n a l G la n d 2 6 9 3 / 1 (n = 1 2 6 ) S u b c u ta n e o u s A d ip o s e 6 9 6 3 / 2 (n = 2 9 8 ) V is c e r a l O m e n tu m 3 5 7 1 / 0 (n = 1 8 5 ) S m a l l In t e s t i n e T e r m i n a l I l e u m 1 0 0 2 / 0 (n = 7 7 ) P r o s ta te 1 0 4 5 / 0 (n = 8 7 ) V a g in a 5 8 2 / 4 (n = 7 9 )

W h o le B lo o d 5 8 6 2 / 1 (n = 3 3 8 ) T ib ia l N e r v e 8 0 8 7 / 0 (n = 2 5 6 ) T ib ia l A r te r y 6 7 3 6 / 0 (n = 2 8 5 ) T o ta l u n iq u e e G e n e s c is : 1 9 7 2 5 ( F D R 5 % ) tr a n s : 9 3 ( F D R 1 0 % ) b E s o p h a g u s M u s c u la r is 5 7 3 1 / 0 (n = 2 1 8 ) G a s tr o e s o p h a g e a l J u n c tio n 2 2 3 7 / 0 (n = 1 2 7 ) E sophagus M ucosa 6 1 6 9 / 3 (n = 2 4 1 ) A tr ia l A p p e n d a g e

3 2 8 4 / 0 (n = 1 5 9 ) B re a s t M a m m a r y T is s u e 3 2 7 1 / 0 (n = 1 8 3 ) A o r ta 5 1 6 2 / 1 (n = 1 9 7 ) S ig m o id C o lo n 2 2 6 9 / 0 (n = 1 2 4 ) U te r u s 6 5 5 / 0 (n = 7 0 ) N o t s u n e x p o s e d s k in (s u p r a p u b ic ) 4 4 9 9 / 1 (n = 1 9 6 ) 0.3 Most trans per tissue: 35 Testis0.2(N=157) 0.1 100

T e s tis 6 7 9 6 / 3 5 (n = 1 5 7 ) T r a n s f o r m e d fib r o b l a s t s 7 5 1 3 / 1 (n = 2 7 2 ) E B V - T r a n s fo r m e d ly m p h o c y te s 2 3 6 0 / 0 (n = 1 1 4 ) Tested gene Most cis per tissue: c Tibial nerve (N=256) 8,087 0.4 P a n c re a s 3 6 2 1 / 2 (n = 1 4 9 ) T r a n s v e r s e C o lo n 3 7 2 3 / 2 (n = 1 6 9 ) O v a r ie s 1 1 6 7 / 0 (n = 8 5 )

S u n e x p o s e d s k in (lo w e r le g ) 7 1 0 9 / 6 (n = 3 0 2 ) 0.57 0.50 Proportion of C o r o n a ry A r te r y 1 8 8 2 / 0 (n = 1 1 8 ) Lung 5 8 8 4 / 2 (n = 2 7 8 ) S p le e n 2 1 6 3 / 0 (n = 8 9 ) S k e le ta l M u s c le 6 0 4 9 / 9 (n = 3 6 1 ) 1.00 0.89

0.75 Total unique eQTL genes: 0.25 Cis: 19,725 (FDR 5%) 0.00 lincRNA Trans: 93 (FDR 10%) Tissue Testis Skeletal Muscle L e ft V e n tr ic le 3 8 5 5 / 0 (n = 1 9 0 ) # eGenes / # Tested genes a Proportion of tested

or annotated genes Genetic effects across human tissues d 200 Sample siz Testis Thyroid Skeletal Muscle Vagina Sun exposed skin (lower leg) Pancreas Esophagus Mucosa Putamen (basal ganglia) Transformed fibroblasts Transverse Colon Subcutaneous Adipose Whole Blood

Lung Aorta Not sun exposed skin (suprapubic) Adrenal Gland The GTEx Consortium, Nature 2017 1 Characterizing eQTLs across tissues Cis-eQTL variants fall in tissue-specific regulatory elements (from Roadmap Epigenomics) GTEx discovery tissue Proportion shared eQTLs given shared CREs a 0.4 0.6

0.8 1.0 log10(eVariant:CRE OR) 1.2 0.90 0.85 0.80 0.75 0.70 0.70 0.7 Proportion given dif The GTEx Consortium, Nature 2017 # eGenes / # T

Trans-eQTLs 0.2 0.1 100 200 Sample GTEx size Testis Thyroid Skeletal Muscle Vagina Sun exposed skin (lower leg) Pancreas Esophagus Mucosa Putamen (basal ganglia) Transformed fibroblasts

Transverse Colon Subcutaneous Adipose Whole Blood Lung Aorta Not sun exposed skin (suprapubic) Adrenal Gland 300 Large studies:

Sample size 100 200 300

1 Westra et al (N=5,311, using GWAS variants only) ALSPAC (N=869) MUTHER (N=850) DGN (N=922) Framingham (N=5257) 10 100 # transeQTLs (FDR 10%) Studies report wildly different # hits (10s10000s) Replication and validation remains poor We remain underpowered at current sample sizes Challenges for trans-eQTL detection Power False positives from many sources e.g. over and under correcting confounders (Dahl et al, 2017)

Filter'out'cross'mapping'pairs Mapping error (similar to probe cross-hybrid.) True positive cis-eQTL False'positive'' trans'eQTL incorrect'mapping Slide adapted from Yuan He Heritability of gene expression Despite eQTLs being pervasive, estimates for heritability of gene expression are modest Average over genes ranging from 0.09 to 0.3 (Price et al, 2008/2011, Wright et al 2014, Wheeler et al 2016, MUTHER) Informs need for greater power to detect trans-eQTLs Figure from Wright et al NG 2014 Heritability of gene expression

Trans effects contribute much more to gene expression heritability than cis h2cis/ h2 estimates range from 10-40% Price et al 2011 Wright et al 2014 Grundberg et al 2012 Varies by tissue, population, power, method h2cis sparse (Wheeler et al 2016), trans often mediated by cis effects 2. Connecting expression and epigenetics to complex traits eQTLs and complex disease genetics Help interpret GWAS variants (especially non-coding): understand mechanism guide interventions Gene 1 Gene 2

C DNA RNA protein ? ? eQTLs and complex disease genetics Help interpret GWAS variants (especially non-coding): understand mechanism guide interventions Gene 1 C DNA

Gene 2 eQTL RNA protein drug ? Most SNPs are eQTLs Butmost of these just tag functional variants Need to evaluate whether underlying causal variants are actually shared (co-localization) Slide adapted from Casey Brown, UPenn Most SNPs are eQTLs

Co-localization analysis: Butmost of these just tag functional variants Need to evaluate whether underlying causal variants are actually shared (co-localization) Slide adapted from Casey Brown, UPenn Proportion of SNP 0.4 0.75 log10(OR) eQTLs and complex disease genetics 0.2 0.0 0.5 0.25

~50% of genetic variants associated with human 0.2 0.4 disease co-localize with an eQTL0.05 GWAS OMIM LoF 0 over 5 12.21 15 compared to 92% simply associated p intol. < 0.05/44 (still enriched background) e 75 50 Nearest gene Not nearest Gene No colocalization 25 HDL BMI LDL IBD TG Crohn's SLE WHRadjBMI UC T2D PBC CAD SCZ Heart Rate FG

Celiac Alzheimers RA WHR BIP PGCCROSS 0 0.75 0.50 0.25 0.00 Celiac PBC T2D d Proportion

GWAS loci colocalized - log10(Pvalue) # GWAS loci The GTEx Consortium, Nature 2017 Deciphering mechanism Number of GWAS loci 150 100 50 0

2.5 5.0 7.5 10.0 Number of colocalized genes 53% of co-localized GWAS loci have > 1 target gene, ambiguity remains Slide adapted from Casey Brown eQTL data informs heritability GE co-score regression indicates cis-eQTLs explain mean 21% of h2 across a set of complex traits OConnor et al. bioRxiv, 2017 Epigenetic data

ENCODE, Roadmap Epigenomics Regulatory elements: promoters, enhancers Transcription factor binding sites CpG sites ChromHMM ENCODE Project Consortium. Plos Biology 2011. Epigenetic data informs heritability LD score regression, related approaches partition h2 Large scale epigenetic data (Roadmap, ENCODE) enable analysis, indicate contribution of gene regulation Figure from Finucane, NG, 2015 Ommigenic model Most/all expressed genes in disease-relevant cell

types affect trait Highlights potential role of eQTLs, trans effects Boyle et al., Cell, 2017 3. Complex effects of genetic variation on gene expression What are we missing? Most studies are done on steady-state total expression measurements at a single adult or post-mortem time point Disease-relevant states include different developmental stages, environmental exposures, cell types Other variant classes and regulatory effects Context-specificity Many factors can modulate regulatory effects Altered transcription factor abundance Epigenetic changes

GTEx tissue-specificity of cis and trans Trans eQTLs appear more highly tissue-specific than cis-eQTLs The GTEx Consortium, Nature 2017 Tissue specificity and heritability From Finucane et al, NG, 2018 Detecting context-specific QTLs Many other contexts beyond tissue: Recent work explores QTLs in diverse environments, such as infection response Fairfax et al, Science 2014 Lee, Science 2014 NPRL3 p=2.08e06

0.40 0.30 0.20 Methods for identifying allelic response from RNAseq data allelic balance 0.50 0 1 BP meds and BP NPRL3: related to

meds genes involved in homeostasis of fluid volume Knowles et al, NM, 2017 Diverse variants and readouts Diverse genetic variant classes, enabled by improved variant calling and methods Structural variants Repeats Diverse molecular phenotypes important to h2: Alternative splicing (Li et al, Science 2016) Translation, protein abundance (Wu et al, 2013 and Battle et al,2015) Epigenetic changes including chromatin accessibility, histone modifications, methylation, etc (McVicker 2013, Grubert 2015, Banovich 2014) 4. Further possibilities

Detecting more? Can expression and epigenetic data help detect more variants or explain more heritability? New methods integrate diverse data to learn and apply priors to GWAS analysis and prediction scores Pickrell AJHG 2014 estimates 5% increase in loci detectable Marigorta NG 2017 Pickrell, 2014 Rare variants Recent work emphasizes importance of rare variation in driving extreme expression levels Li et al, Nature, 2017 Rare variants Preprint (Hernandez et al 2017) suggests rare

variants explain a large fraction of heritability of 4, 2017; doi: http://dx.doi.org/10.1101/219238 . The copyright holder for this preprint (which was not gene expression he author/funder. It is made available under a CC-BY 4.0 International license . ly es C). ar- he in o-

at 5. Conclusions Progress what weve learned Genetics of gene expression: Prevalence of genetic variants affecting gene expression Large catalogs of cis-QTLs, diverse contexts, variants, mol phenotypes Connections to complex traits: Better data and methods provide better estimate of contribution of expression to h2, and interpretation of individual variants (MR, etc) Current estimates indicate gene expression contribute sizeable but not majority fraction to trait h2 Contribution of expression, epigenetic data to explaining missing h2? Modestly improved power for identifying individual GWAS hits through informed priors, potential for better prediction Improved interpretation and mechanism

Why delve deeper into expression? Help determine when and how much to invest in WGS, expression, epigenetic data To continue understanding implicated Genes Tissue and cell types Epigenetic and other regulatory mechanisms Challenges and caveats Ambiguity: many variants affect multiple genes Interpretability: missing relevant cell types Power: trans-eQTLs also require large sample sizes Ongoing effots Scaling up eQTL studies, finding trans: eQTLGen: meta-analysis of all available whole blood expression data including over 30,000 samples GTEx v8: 1,000 individuals, WGS, over 50 tissues Environment and dynamic QTLs

Single cell analysis - Human Cell Atlas, etc Integrated analysis connecting epigenetic and expression data for improved resolution, disambiguation, power Methods Acknowledgements GTEx Consortium Casey Brown Barbara Engelhardt Stephen Montgomery Ira Hall Collaborators David Knowles Jonathan Pritchard Yoav Gilad Funding sources

NIH, NHGRI, NIMH R01 HG008150 R01 MH101814 Searle Scholar Fund Cis-eQTLs remain to be discovered GTEx trans-eQTLs Trans-eQTL often coincide with cis-eQTLs Tissue-specific mechanisms identified eQTLs 5 bc a log10(eVariant:CRE OR) log10(eVariant:CRE

Proportion OR) Trans OtherCis Cis Cis eQTLs Trans Backgr.Top Cis 0.20 0.8 0.15 0.10 0.4 0.05 0.0

Top Cis Trans Other Trans 0.8 Trans Thyroid Trans Thyroid Trans 0.4 Testis Trans Testis

Trans All 0.0 Backgr. * * Backgr. 0.00 0.0 0.0 10.0 ation t

Trans All 2.5 5.0 7.5 10.0 Promoter Enhancert Mendelian Randomization 0.0 0.1Enhancer 0.2 0.3 Promoter 0.4

Proportion overlapping piRNA The GTEx Consortium, Nature 2017 Pr Multiple independent SNPs per gene Average number of independent ciseQTLs per eGene 1.3 1.2

1.1

100 200 Sample size

300 Variants associated with many genes Cis-eQTL variants have multiple gene targets, particularly once considering multiple tissues Progress what weve learned Genetics of gene expression: Understand prevalence of cis-eQTLs Improved eQTL catalogs based on larger studies Complexity: context-specificity, allelic heterogeneity, multiple gene targets Coverage of diverse variant classes and molecular phenotypes including alternative splicing Rare variant effects on gene expression Progress what weve learned Connections to complex traits: Better epigenetic data and eQTL catalogs provide better estimate of contribution of expression to h2 Improved methods:

Co-localization, fine-mapping Mendelian randomization approaches LD-score regression and related approaches tailored for utilizing expression and epigenetic data Current estimates indicate gene expression contribute sizeable but not majority fraction to trait h2 Progress what weve learned Contribution of expression and epigenetic data to explaining missing h2? Modestly improved power for identifying individual GWAS hits through informed priors Potential improvements for prediction Improved interpretation and mechanism Identified target genes of individual GWAS hits Identified relevant tissues and cell types in aggregate Challenges and caveats Ambiguity many variants affects multiple

genes in cis, in multiple tissues When missing the relevant cell types, genes, or environments current methods are not always interpretable Trans-eQTLs should be major component, but they are largely uncharacterized due to power Key questions? How much heritability is explained by expression How much heritability is explained by epigenetics? And is that all reflected in expression if measured in right tissue, right time point, right context? Limitations of current data? Limitations of current methods? Can expression/epigenetic data HELP explain missing heritability

Recently Viewed Presentations

  • Structures and Forces!

    Structures and Forces!

    Examples are plywood or juice box material. ... e.g. elbows, door hinges, other examples?? Rigid Joints - attach parts of a structure without allowing movement. Rigid Joints. These types of joints fall into 5 categories: Fasteners - nails, bolts, screws.
  • Regulated Community Co-Regulators Citizens  Our Projects Local Government

    Regulated Community Co-Regulators Citizens Our Projects Local Government

    This page also lists the members of the various E-Enterprise governing bodies (EELC, EE Executive Committee, EE and Exchange Network Management Board, and the EE and Exchange Network Interoperability and Operations Team).
  • Chief Privacy Officer Study

    Chief Privacy Officer Study

    The Conventional Debate - Critiquing U.S. Law Fragmented, under-inclusive, disconnected from rights framework, ill-defined 1995 study of corporate practices systemic inattention & lack of resources policies "non-existent" or not followed in practice Low-level attention - Attributes failures to "ambiguity" regarding...
  • 1. What is sociology? 2. What type of things do sociologist ...

    1. What is sociology? 2. What type of things do sociologist ...

    Sociology interested in group behavior of complex societies v. anthropology focuses on past cultures and present simple societies. Economists - sociologists study the effects of economic factors on the lives of different groups in society. Political science - SS study...
  • Sending Emails Through Lyris - WordPress.com

    Sending Emails Through Lyris - WordPress.com

    A Common Law Degree for Canada OBA Council Debate Backgrounder December 3, 2010 * PREMISE OF THIS DEBATE Public image of lawyers tied to the formation (fr.) of lawyers: Public resists paying for educating lawyers graduating without skills, believes law...
  • Fourth Edition - Hao Jin&#x27;s website

    Fourth Edition - Hao Jin's website

    Comparing Monopoly and Perfect Competition. Equilibrium in a perfectly competitive market results in the greatest amount of economic surplus, or total benefit to society, from the production of a good or service. A monopoly will produce less and charge a...
  • doc.: IEEE 802. 15-09-0804-14-004f June 2010 Project: IEEE

    doc.: IEEE 802. 15-09-0804-14-004f June 2010 Project: IEEE

    This is an important issue for OOK modulation since no synchronization update can be made within the receiver unless a "1" is received. For this reason, a long sequence of zeros will cause a long period between frequency drift updates...
  • Posting from a Purchases Journal to an Accounts Payable Ledger

    Posting from a Purchases Journal to an Accounts Payable Ledger

    Why should a business frequently post from the purchases journal to the accounts payable ledger? SLIDE . ANSWER. Posting frequently to the accounts payable ledger helps ensure that vendor accounts are paid on time and that the business can continue...