TitleDeep learning sequence-based ab initio prediction of variant effects on expression and disease risk.
Publication TypeJournal Article
Year of Publication2018
AuthorsZhou, J, Theesfeld, CL, Yao, K, Chen, KM, Wong, AK, Troyanskaya, OG
JournalNat Genet
Volume50
Issue8
Pagination1171-1179
Date Published2018 08
ISSN1546-1718
KeywordsAlgorithms, Computer Simulation, Deep Learning, Gene Expression, Genetic Predisposition to Disease, Genome-Wide Association Study, Humans, Models, Genetic, Mutation, Polymorphism, Single Nucleotide, Promoter Regions, Genetic, Quantitative Trait Loci
Abstract

Key challenges for human genetics, precision medicine and evolutionary biology include deciphering the regulatory code of gene expression and understanding the transcriptional effects of genome variation. However, this is extremely difficult because of the enormous scale of the noncoding mutation space. We developed a deep learning-based framework, ExPecto, that can accurately predict, ab initio from a DNA sequence, the tissue-specific transcriptional effects of mutations, including those that are rare or that have not been observed. We prioritized causal variants within disease- or trait-associated loci from all publicly available genome-wide association studies and experimentally validated predictions for four immune-related diseases. By exploiting the scalability of ExPecto, we characterized the regulatory mutation space for human RNA polymerase II-transcribed genes by in silico saturation mutagenesis and profiled > 140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effects, making ExPecto an end-to-end computational framework for the in silico prediction of expression and disease risk.

DOI10.1038/s41588-018-0160-6
Alternate JournalNat. Genet.
PubMed ID30013180
PubMed Central IDPMC6094955
Grant ListHHSN272201000054C / AI / NIAID NIH HHS / United States
U54 HL117798 / HL / NHLBI NIH HHS / United States
R01 GM071966 / GM / NIGMS NIH HHS / United States
U19 AI117873 / AI / NIAID NIH HHS / United States
R01 HG005998 / HG / NHGRI NIH HHS / United States