Multiplex Targeted Sequencing Identifies Recurrently Mutated Genes in Autism Spectrum Disorders

See allHide authors and affiliations

Science  21 Dec 2012:
Vol. 338, Issue 6114, pp. 1619-1622
DOI: 10.1126/science.1227764


Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes—CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1—may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly and DYRK1A-microcephaly) and replicate the importance of a β-catenin–chromatin-remodeling network to ASD etiology.

There is considerable interest in the contribution of rare variants and de novo mutations to the genetic basis of complex phenotypes such as autism spectrum disorders (ASDs). However, because of extreme genetic heterogeneity, the sample sizes required to implicate any single gene in a complex phenotype are extremely large (1). Exome sequencing has identified hundreds of ASD candidate genes on the basis of de novo mutations observed in the affected offspring of unaffected parents (26). Yet, only a single mutation was observed in nearly all such genes, and sequencing of over 900 trios was insufficient to establish mutations at any single gene as definitive genetic risk factors (26).

To address this, we sought to evaluate candidate genes identified by exome sequencing (2, 3) for de novo mutations in a much larger ASD cohort. We developed a modified molecular inversion probe (MIP) strategy (Fig. 1A) (79) with novel algorithms for MIP design; an optimized, automatable work flow with robust performance and minimal DNA input; extensive multiplexing of samples while sequencing; and reagent costs of less than $1 per gene per sample. Extensive validation using several probe sets and sample collections demonstrated 99% sensitivity and 98% positive predictive value for single-nucleotide variants at well-covered positions, i.e., 92 to 98% of targeted bases (figs. S1 to S7 and tables S1 to S9) (10).

Fig. 1

Massively multiplex targeted sequencing identifies recurrently mutated genes in ASD probands. (A) Schematic showing design and general work flow of a modified MIP method enabling ultra-low-cost candidate gene resequencing in very large cohorts (figs. S1 to S7 and tables S1 to S9) (10). (B to E) Protein diagrams of four genes with multiple de novo mutation events. Significant protein domains for the largest protein isoform are shown (colored regions) as defined by SMART (23) with mutation locations indicated. (B) CHD8. (C) GRIN2B. (D) TBR1. (E) DYRK1A. Bold variants are nonsense, frameshifting indels or at splice sites (intron-exon junction is indicated). Domain abbreviations: CHR, chromatin organization modifier; DEX, DEAD-like helicases superfamily; HELC, helicase superfamily C-terminal; BRK, domain in transcription and CHROMO domain helicases; GLU, ligated ion channel l-glutamate– and glycine-binding site; PBP, eukaryotic homologs of bacterial periplasmic substrate binding proteins; TM, transmembrane; STK, serine-threonine kinase catalytic; TBOX, T-box DNA binding.

We applied this method to 2494 ASD probands from the Simons Simplex Collection (SSC) (11) using two probe sets [ASD1 (6 genes) and ASD2 (38 genes)] to target 44 ASD candidate genes (12). Preliminary results using ASD1 on a subset of the SSC implicated GRIN2B as a risk locus (3). The 44 genes were selected from 192 candidates (2, 3) by focusing on genes with disruptive mutations, associations with syndromic autism (13), overlap with known or suspected neurodevelopmental copy number variation (CNV) risk loci (13, 14), structural similarities, and/or neuronal expression (table S3). Although a few of the 44 genes have been reported to be disrupted in individuals with neurodevelopmental or neuropsychiatric disorders (often including concurrent dysmorphologies), their role in so-called idiopathic ASDs has not been rigorously established. Twenty-three of the 44 genes intersect a 49-member β-catenin–chromatin-remodeling protein-protein interaction (PPI) network (2) or an expanded 74-member network (figs. S8 and S9) (3, 4).

We required samples to successfully capture with both probe sets, yielding 2446 ASD probands with MIP data, 2364 of which had only MIP data and for 82 of which we had also sequenced their exomes (2, 3). The high GC content of several candidates required considerable rebalancing to improve capture uniformity (12) (figs. S3A and S10). Nevertheless, the reproducible behavior of most MIPs allowed us to identify copy number variation at targeted genes, including several inherited duplications (figs. S11 and S12 and table S10).

To discover de novo mutations, we first identified candidate sites by filtering against variants observed in other cohorts, including non-ASD exomes and MIP-based resequencing of 762 healthy, non-ASD individuals (12). The remaining candidates were further tested by MIP-based resequencing of the proband’s parents and, if potentially de novo, confirmed by Sanger sequencing of the parent-child trio (10, 12). We discovered 27 de novo mutations that occurred in 16 of the 44 genes (Fig. 1, B to E; Table 1; and table S11). Consistent with an increased sensitivity for MIP-based resequencing, six of these were not reported in exome-sequenced individuals (Table 1, tables S5 and S11, and fig. S13) (3, 4, 6). Notably, the proportion of de novo events that are severely disruptive, i.e., coding indels, nonsense mutations, and splice-site disruptions (17/27 or 0.63), is four times the expected proportion for random de novo mutations (0.16, binomial P = 4.9 × 10−8) (table S12) (15).

Table 1

Six genes with recurrent de novo mutations. Assay is the primary assay that identified the variant. Abbreviations: M, male; F, female; Mut, mutation type; Fs, frameshifting indel; Ns, nonsense; Sp, splice site; Aa, single–amino acid deletion; Ms, missense; EX, exome; HGVS, Human Genome Variation Society nomenclature; NVIQ, nonverbal intellectual quotient.

View this table:

Given their extremely low frequency, accurately establishing expectation for de novo mutations in a locus-specific manner through the sequencing of control trios is impractical. We therefore developed a probabilistic model that incorporates several factors: the overall rate of mutation in coding sequences, estimates of relative locus-specific rates based on human-chimpanzee fixed differences (fig. S14 and table S13), and other factors that may influence the distribution of mutation classes, e.g., codon structure (12). We applied this model to estimate (by simulation) the probability of observing additional de novo mutations during MIP-based resequencing of the SSC cohort. To compare expectation and observation, we treated missense mutations as one class and severe disruptions as a second class. Thus, we could evaluate the probability at a given locus of observing at least x de novo mutations, of which at least y belong to the severe class.

We found evidence of mutation burden—a higher rate of de novo mutation than expected—in the overall set of 44 genes (observed n = 27 versus mean expected n = 5.6, simulated P < 2 × 10−9) (Fig. 2A). The burden was driven by the severe class (observed n = 17 versus mean expected n = 0.58, simulated P < 2 × 10−9). Most severe class mutations intersected the 74-member PPI network (16 out of 17), although only 23 out of 44 genes are in this network (binomial P = 0.0002) (12). Furthermore, 21 out of 27 mutations occurred in network-associated genes (binomial P = 0.004). Of the six individual genes (CHD8, GRIN2B, DYRK1A, PTEN, TBR1, and TBL1XR1) with evidence of mutation burden [alpha of 0.05 after a Holm-Bonferroni correction for multiple testing (Fig. 2A); TBL1XR1 is borderline significant with a more conservative Bonferroni correction], five fall within the β-catenin–chromatin-remodeling network. In our combined MIP and exome data, ~1% (24 out of 2573) of ASD probands harbor a de novo mutation in one of these six genes, with CHD8 representing 0.35% (9 out of 2573) (Fig. 1B and Table 1).

Fig. 2

Locus-specific mutation probabilities and associated phenotypes. (A) Estimated P values for the observed number of additional de novo mutations identified in the MIP screen of 44 ASD candidate genes. Probabilities shown are for observing x or more events, of which at least y belong to the severe class. The observed numbers of mutations in all 44 genes (“Total”) and CHD8 were not seen in any of 5 × 108 simulations. Based on the simulation mean (0.0153), the Poisson probability for seven or more severe class CHD8 mutations is 3.8 × 10−17. Dashed line Bonferroni corrected significance threshold for α = 0.05. *Gene product in the 74-member PPI connected component. (B to D) Standardized head circumference (HC) Z scores for SSC. (B) All probands screened with superimposed normal distribution (dashed). HC Z scores for individuals with de novo truncating and/or splice mutations highlighted for CHD8 (red arrows), DYRK1A (blue arrows), and PTEN (black arrows). (C and D) Box and whisker plots of the HC Z scores for the SSC. Mutations carriers are shown and linked to their respective family members. (C) All family members. (D) Only proband sex–matched family members.

For these analyses, we conservatively used the highest available empirical estimate of the overall mutation rate in coding sequences (3). With the exception of TBL1XR, these results were robust to doubling the overall mutation rate or to using the upper bound of the 95% confidence interval of the locus-specific rate estimate for each of these genes (10). Moreover, we obtained similar results regardless of whether parameters were estimated from rare, segregating variation or from de novo mutations in unaffected siblings (10), as well as with a sequence composition model based on genome-wide de novo mutation (16). Exome sequencing of non-ASD individuals (unaffected siblings or non-ASD cohorts) further supports these conclusions (table S14) (10).

We also validated 23 inherited, severely disruptive variants in the 44 genes (table S15). Two probands with such variants carry de novo 16p11.2 duplications (table S16). Combining de novo and inherited events, severe class variants were observed at twice the rate in MIP-sequenced probands as compared with MIP-sequenced healthy, non-ASD individuals (Fisher’s exact test, P = 0.083). Severe class variants were not transmitted to 14 out of 20 unaffected siblings (binomial P = 0.058) (table S15). However, larger cohorts than currently exist will be needed to fully evaluate these modest trends.

We analyzed phenotypic data on probands with mutations in the six implicated genes. Each was diagnosed with autism on the basis of current, strict, gold-standard criteria. No obvious dysmorphologies or recurrent comorbidities were present. Probands tended to fall into the intellectual disability range for nonverbal IQ (NVIQ) (mean 58.3) (Table 1). However, for CHD8, probands were found to have NVIQ scores ranging from profoundly impaired to average (mean 62.2, range 19 to 98).

Given the previously observed microcephaly in our index DYRK1A mutation case, macrocephaly in both probands with CHD8 mutations (3), and the association of these traits with other syndromic loci (13, 17), we reexamined head circumference (HC) in the larger set of probands with protein-truncation or splice-site de novo events using age- and sex-normalized HC Z scores (12) (Fig. 2B). For CHD8 (n = 8), we observed significantly larger head sizes relative to individuals screened without CHD8 mutations (two-sample permutation test, two-sided P = 0.0007). De novo CHD8 mutations are present in ~2% of macrocephalic (HC > 2.0) SSC probands (n = 366), which suggests a useful phenotype for patient subclassification. For DYRK1A (n = 3), we observed significantly smaller head sizes relative to individuals screened without DYRK1A mutations (two-sample permutation test, two-sided P = 0.0005). Comparison of head size in the context of the families (Fig. 2, C and D, and table S17) provides further support for this reciprocal trend (10). These findings are also consistent with case reports of patients with structural rearrangements and mouse transgenic models that implicate DYRK1A and CHD8 as regulators of brain growth (1821). Macrocephaly was also observed in individuals with de novo and inherited PTEN mutations (22).

Our data support an important role for de novo mutations in six genes in ~1% of sporadic ASDs. As the SSC was specifically established for simplex families and as its probands generally have higher cognitive functioning than has been reported in other ASD cohorts (11), it is unknown how our findings will translate into other cohorts. Furthermore, whereas our data implicate specific loci in ASDs, they are insufficient to evaluate whether the observed de novo mutations are sufficient to cause ASDs (tables S16 and S18).

Exome sequencing and CNV studies suggest that there are hundreds of relevant genetic loci for ASDs. Technologies and study designs directed at identifying de novo mutations, both for the discovery of ASD candidate genes, as well as for their validation, provide sufficient power to implicate individual genes from a relatively small number of events. The analytical framework described here can be applied to any other disorder—simple or complex—for which de novo coding mutations are suspected to contribute to risk. In addition, the experimental methods presented here are broadly useful for the rapid and economical resequencing of candidate genes in extremely large cohorts, as may be required for the definitive implication of rare variants or de novo mutations in any genetically complex disorder.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S14

Tables S1 to S18

References (24100)

References and Notes

  1. See supplementary text on Science Online.
  2. Materials and methods are available as supplementary materials on Science Online.
  3. Acknowledgments: We thank the National Heart, Lung, and Blood Institute, NIH Grand Opportunity (GO) Exome Sequencing Project and its ongoing studies, which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the Women’s Health Initiative Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926), and the Heart GO Sequencing Project (HL-103010); we also thank B. Vernot, M. Dennis, T. Brown, and other members of the Eichler and Shendure labs for helpful discussions. We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren, E. Wijsman). We appreciate obtaining access to phenotypic data on the Simons Foundation Autism Research Initiative (SFARI) Base. Approved researchers can obtain the SSC population dataset described in this study ([ssc_v13]/ui:view) by applying at This work was supported by grants from the Simons Foundation (SFARI 137578, 191889 to E.E.E., J.S., and R.B.), NIH HD065285 (E.E.E. and J.S.), NIH NS069605 (H.C.M.), and R01 NS064077 (D.D.). E.B. is an Alfred P. Sloan Research Fellow. E.E.E. is an Investigator of the Howard Hughes Medical Institute. Scientific advisory boards or consulting affiliations: Ariosa Diagnostics (J.S.), Stratos Genomics (J.S.), Good Start Genetics (J.S.), Adaptive Biotechnologies (J.S.), Pacific Biosciences (E.E.E.), SynapDx (E.E.E.), DNAnexus (E.E.E.), and SFARI GENE (H.C.M.). B.J.O. is an inventor on patent PCT/US2009/30620: Mutations in contactin associated protein 2 are associated with increased risk for idiopathic autism. Raw sequencing data available at the National Database for Autism Research, NDARCOL1878.

Stay Connected to Science

Navigate This Article