
Areas of Research: Human genetic variation, genomic regulation, complex traits, human genetics
Department|Program:
- Computer Science
bee@princeton.edu
Research Lab
609-258-0933
322 Computer Science Building
Website
Research Focus
My long term research goal is to change the way scientists analyze high dimensional biomedical data for the goal of scientific discovery. The rate of change in technology that advances our ability to collect observations about genomic data, including DNA, single cells, and tissue samples, rapidly makes analytic methods for these observations obsolete. Furthermore, the complexity of the biological phenomena we attempt to quantify and understand overwhelms current methods that oversimplify the complexity in order to scale to the data magnitude. General approaches to data analysis, including principal component analysis and linear regression, are insufficient for the intricacy of modern biomedical data; new approaches using statistical models and machine learning methods that include analysis- and technology-specific structure must be developed for many types of genomic studies.
My group builds and applies structured hierarchical models and approximate methods for the analysis of high-dimensional genomic data. Our work in developing methods for modern genomic technologies and scientific questions requires three types of innovations. First, statistical models need to be adapted to capture the complexity of the data. Second, inference algorithms for these structured models need to scale to the size of the data. Third, software infrastructure must be usable by the biomedical community. The impact of addressing these issues is that the pace of discovery and actionable results from biomedical research is accelerated, because the analytic solutions from advanced platforms are broadly available and immediately applicable. The development of these frameworks is specific to technology and analytic goals, and is not easily generalized. To this end, our work has broadly focused on innovations in two types of statistical analyses: structured regression models for hypothesis testing, and hierarchical latent variable models for dimension reduction and exploratory data analysis, as detailed below.
To this end, my work has broadly focused on innovations in two types of statistical analyses: structured regression models for hypothesis testing, and hierarchical latent variable models for dimension reduction and exploratory data analysis. Along with the development of these frameworks comes adaptations of inference methods for robust and tractable posterior inference in these models by using ideas from machine learning, and validation of the latent structure and hypothesis testing using experimental validation.
Selected Publications
- GTEx Consortium, Battle A*, Brown CD*, Engelhardt BE*, Montgomery S* (2017). "Genetic effects on gene expression across human tissues" Nature (550):204-213.
http://www.nature.com/articles/nature24277
- Gao C, Zhao S, McDowell IC, Brown CD, Engelhardt BE (2016). "Context-specific differential gene co-expression networks via Bayesian biclustering models" PLOS Computational Biology 12(7):e1004791.
http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.10...
- Zhao S, Gao C, Mukherjee S, Engelhardt BE (2016). "Bayesian group latent factor analysis with structured sparsity" Journal of Machine Learning Research 17(196):1-47.
http://jmlr.org/papers/v17/14-472.html
- Tonner PD, Darnell CD, Engelhardt BE*, Schmid A* (2016). "Detecting differential growth of microbial populations with Gaussian process regression" Genome Research 27(2):320-333.
http://genome.cshlp.org/content/27/2/320.long