July 26, 2018

In a development with implications for better understanding disease, researchers have created a computational system to predict the effect that mutations in noncoding DNA — sections that don’t produce proteins — have on tissues and cells in the human body.

Genes produce proteins that keep your body functioning and healthy. But genes that code for protein make up less than 2 percent of your DNA. The rest of the DNA might appear to be dormant at first glance, but scientists now appreciate that this region plays a key role in turning genes on and off. Exactly how it does this has been an open question.

Now, researchers from Princeton University and the Flatiron Institute’s Center for Computational Biology in New York City have introduced a method to link variations in non-coding DNA to the operation of genes. Using machine learning, the researchers created a computational method, called ExPecto, that reads sections of DNA and predicts how that segment will alter the activation and deactivation of genes throughout the body.

Olga Troyanskaya, the principal investigator, said the system “can examine any genetic variant and predict its effect on gene expression.”

This story was adapted with permission from an article published by the Simons Foundation and published to the Princeton University homepage.