Title | Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development. |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Zhou, J, Schor, IE, Yao, V, Theesfeld, CL, Marco-Ferreres, R, Tadych, A, Furlong, EEM, Troyanskaya, OG |
Journal | PLoS Genet |
Volume | 15 |
Issue | 9 |
Pagination | e1008382 |
Date Published | 2019 09 |
ISSN | 1553-7404 |
Keywords | Algorithms, Animals, Computational Biology, Computer Simulation, Drosophila, Embryonic Development, Forecasting, Gene Expression Profiling, Gene Expression Regulation, Developmental, Genes, Developmental, Genome-Wide Association Study, Machine Learning, Spatio-Temporal Analysis, Transcriptome |
Abstract | Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles. |
DOI | 10.1371/journal.pgen.1008382 |
Alternate Journal | PLoS Genet. |
PubMed ID | 31553718 |
PubMed Central ID | PMC6779412 |
Grant List | R01 GM071966 / GM / NIGMS NIH HHS / United States HHSN272201000054C / AI / NIAID NIH HHS / United States R01 HG005998 / HG / NHGRI NIH HHS / United States U54 HL117798 / HL / NHLBI NIH HHS / United States |