AI in Action: Combing the genome for the roots of autism

See allHide authors and affiliations

Science  07 Jul 2017:
Vol. 357, Issue 6346, pp. 25
DOI: 10.1126/science.357.6346.25

Artificial intelligence tools are helping to reveal the genetic components of autism.

For geneticists, autism is a vexing challenge. Inheritance patterns suggest it has a strong genetic component. But variants in scores of genes known to play some role in autism can explain only about 20% of all cases. Finding other variants that might contribute requires looking for clues in data on the 25,000 other human genes and their surrounding DNA—an overwhelming task for human investigators. So computational biologist Olga Troyanskaya of Princeton University and the Simons Foundation in New York City enlisted the tools of artificial intelligence (AI).

“We can only do so much as biologists to show what underlies diseases like autism,” explains collaborator Robert Darnell, founding director of the New York Genome Center and a physician scientist at The Rockefeller University in New York City. “The power of machines to ask a trillion questions where a scientist can ask just 10 is a game-changer.”

Troyanskaya combined hundreds of data sets on which genes are active in specific human cells, how proteins interact, and where transcription factor binding sites and other key genome features are located. Then her team used machine learning to build a map of gene interactions and compared those of the few well-established autism risk genes with those of thousands of other unknown genes, looking for similarities. That flagged another 2500 genes likely to be involved in autism, they reported last year in Nature Neuroscience.

But genes don't act in isolation, as geneticists have recently realized. Their behavior is shaped by the millions of nearby noncoding bases, which interact with DNA-binding proteins and other factors. Identifying which noncoding variants might affect nearby autism genes is an even tougher problem than finding the genes in the first place, and graduate student Jian Zhou in Troyanskaya's Princeton lab is deploying AI to solve it.

Artificial intelligence tools are helping reveal thousands of genes that may contribute to autism.


To train the program—a deep-learning system—Zhou exposed it to data collected by the Encyclopedia of DNA Elements and Roadmap Epigenomics, two projects that cataloged how tens of thousands of noncoding DNA sites affect neighboring genes. The system in effect learned which features to look for as it evaluates unknown stretches of noncoding DNA for potential activity.

When Zhou and Troyanskaya described their program, called DeepSEA, in Nature Methods in October 2015, Xiaohui Xie, a computer scientist at the University of California, Irvine, called it “a milestone in applying deep learning to genomics.” Now, the Princeton team is running the genomes of autism patients through DeepSEA, hoping to rank the impacts of noncoding bases.

Xie is also applying AI to the genome, though with a broader focus than autism. He, too, hopes to classify any mutations by the odds they are harmful. But he cautions that in genomics, deep learning systems are only as good as the data sets on which they are trained. “Right now I think people are skeptical” that such systems can reliably parse the genome, he says. “But I think down the road more and more people will embrace deep learning.”


Navigate This Article