The Skinnider lab develops machine-learning approaches to identify known and unknown small molecules that are relevant to human health and disease, with mass spectrometry-based metabolomics being the primary analytical technique.
The human body contains thousands of small molecules, and is exposed to thousands more during daily life. This complex chemical ecosystem reflects both the endogenous metabolism of human cells, as well as xenobiotic exposures from our diets, our gut flora, and our natural and built environments. Collectively, these small molecules influence our risk of developing disease, determine how we respond to prescription drugs, and provide molecular biomarkers that are used in the clinic to make diagnoses and select treatments.
At present, however, the vast majority of these small molecules remain unknown. Whereas high-throughput techniques can now reliably measure the DNA, RNA, and protein content of any given biospecimen, enumerating the complete complement of small molecules—the metabolome—has proven much more challenging. Mass spectrometry (MS), the workhorse of metabolomics, is capable of detecting thousands of molecules in routine experiments, but the vast majority of these cannot be definitively identified. This profusion of unidentified chemical entities has been dubbed the “dark matter” of the metabolome.
We are interested in illuminating this metabolic dark matter by developing new computational approaches to identify both known and unknown small molecules using mass spectrometry. To achieve this aim, we design and apply cutting-edge AI technologies to translate mass spectrometric information into chemical structures. Although the core focus of the lab is on developing these metabolic technologies themselves, an ancillary focus is on linking the identified molecules to human disease. The lab has a particular focus on the role of unknown metabolites in cancer, via connections with germline risk factors and the human microbiome. A second application entails working with forensic laboratories to identify new synthetic drugs of abuse with mass spectrometry. Because many of these objectives share the common technical challenge of learning complex models from small datasets, the lab is also interested in techniques for low-data learning in the setting of chemistry and biology more generally.