Research Article

Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder

See allHide authors and affiliations

Science  14 Dec 2018:
Vol. 362, Issue 6420, eaat6576
DOI: 10.1126/science.aat6576

Structured Abstract


The DNA of protein-coding genes is transcribed into mRNA, which is translated into proteins. The “coding genome” describes the DNA that contains the information to make these proteins and represents ~1.5% of the human genome. Newly arising de novo mutations (variants observed in a child but not in either parent) in the coding genome contribute to numerous childhood developmental disorders, including autism spectrum disorder (ASD). Discovery of these effects is aided by the triplet code that enables the functional impact of many mutations to be readily deciphered. In contrast, the “noncoding genome” covers the remaining ~98.5% and includes elements that regulate when, where, and to what degree protein-coding genes are transcribed. Understanding this noncoding sequence could provide insights into human disorders and refined control of emerging genetic therapies. Yet little is known about the role of mutations in noncoding regions, including whether they contribute to childhood developmental disorders, which noncoding elements are most vulnerable to disruption, and the manner in which information is encoded in the noncoding genome.


Whole-genome sequencing (WGS) provides the opportunity to identify the majority of genetic variation in each individual. By performing WGS on 1902 quartet families including a child affected with ASD, one unaffected sibling control, and their parents, we identified ~67 de novo mutations across each child’s genome. To characterize the functional role of these mutations, we integrated multiple datasets relating to gene function, genes implicated in neurodevelopmental disorders, conservation across species, and epigenetic markers, thereby combinatorially defining 55,143 categories. The scope of the problem—testing for an excess of de novo mutations in cases relative to controls for each category—is challenging because there are more categories than families.


Comparing cases to controls, we observed an excess of de novo mutations in cases in individual categories in the coding genome but not in the noncoding genome. To overcome the challenge of detecting noncoding association, we used machine learning tools to develop a de novo risk score to look for an excess of de novo mutations across multiple categories. This score demonstrated a contribution to ASD risk from coding mutations and a weaker, but significant, contribution from noncoding mutations. This noncoding signal was driven by mutations in the promoter region, defined as the 2000 nucleotides upstream of the transcription start site (TSS) where mRNA synthesis starts. The strongest promoter signals were defined by conservation across species and transcription factor binding sites. Well-defined promoter elements (e.g., TATA-box) are usually observed within 80 nucleotides of the TSS; however, the strongest ASD association was observed distally, 750 to 2000 nucleotides upstream of the TSS.


We conclude that de novo mutations in the noncoding genome contribute to ASD. The clearest evidence of noncoding ASD association came from mutations at evolutionarily conserved nucleotides in the promoter region. The enrichment for transcription factor binding sites, primarily in the distal promoter, suggests that these mutations may disrupt gene transcription via their interaction with enhancer elements in the promoter region, rather than interfering with transcriptional initiation directly.

Promoter regions in autism.

De novo mutations from 1902 quartet families are assigned to 55,143 annotation categories, which are each assessed for autism spectrum disorder (ASD) association by comparing mutation counts in cases and sibling controls. A de novo risk score demonstrated a noncoding contribution to ASD driven by promoter mutations, especially at sites conserved across species, in the distal promoter or targeted by transcription factors.


Whole-genome sequencing (WGS) has facilitated the first genome-wide evaluations of the contribution of de novo noncoding mutations to complex disorders. Using WGS, we identified 255,106 de novo mutations among sample genomes from members of 1902 quartet families in which one child, but not a sibling or their parents, was affected by autism spectrum disorder (ASD). In contrast to coding mutations, no noncoding functional annotation category, analyzed in isolation, was significantly associated with ASD. Casting noncoding variation in the context of a de novo risk score across multiple annotation categories, however, did demonstrate association with mutations localized to promoter regions. We found that the strongest driver of this promoter signal emanates from evolutionarily conserved transcription factor binding sites distal to the transcription start site. These data suggest that de novo mutations in promoter regions, characterized by evolutionary and functional signatures, contribute to ASD.

View Full Text