With the emergence of large-scale genomic datasets, there is a unique opportunity to leverage machine learning approaches as standard tools for genome-wide association (GWA) studies. Unfortunately, while machine learning methods have been shown to account for nonlinear data structures and exhibit greater predictive power over classic linear models, these same algorithms have also become criticized as ``black box'' techniques. Here, we present Biologically Annotated Neural Networks (BANNs), a novel probabilistic framework that makes machine learning fully amenable for GWA applications. BANNs are feedforward models with partially connected architectures that are based on biological annotations. This setup yields a fully interpretable neural network where the input layer encodes SNP-level effects, and the hidden layer models the aggregated effects among SNP-sets. Part of our key innovation is to treat the weights and connections of the network as random variables with prior distributions that reflect how genetic effects manifest at different genomic scales. The BANNs software uses scalable variational inference to provide fully interpretable posterior summaries which allow researchers to simultaneously perform (i) fine-mapping with SNPs and (ii) enrichment analyses with SNP-sets on complex traits. Through simulations, we show that our method improves upon state-of-the-art fine mapping and enrichment approaches across a wide range of genetic architectures. We then further illustrate the benefits of BANNs by analyzing real GWA data assayed in approximately 2,000 heterogenous stock of mice from Wellcome Trust Centre for Human Genetics and approximately 7,000 individuals from the Framingham Heart Study. Lastly, using a subset of individuals of European ancestry from the UK Biobank, we show that BANNs is able to replicate known associations that required functional validation using statistics alone.
About Dr. Crawford
Lorin Crawford is a Senior Researcher at Microsoft Research New England. He also holds a position as the RGSS Assistant Professor of Biostatistics at Brown University. His scientific research interests involve the development of novel and efficient computational methodologies to address complex problems in statistical genetics, cancer pharmacology, and radiomics (e.g., cancer imaging). Dr. Crawford has an extensive background in modeling massive data sets of high-throughput molecular information as it pertains to functional genomics and cellular-based biological processes. His most recent work has earned him a place on Forbes 30 Under 30 list, The Root 100 Most Influential African Americans list, and recognition as an Alfred P. Sloan Research Fellow and a David & Lucile Packard Foundation Fellowship for Science and Engineering.
Before joining Brown, Dr. Crawford received his PhD from the Department of Statistical Science at Duke University and received his Bachelor of Science degree in Mathematics from Clark Atlanta University.