A Cas9–guide RNA complex preorganized for target DNA recognition

See allHide authors and affiliations

Science  26 Jun 2015:
Vol. 348, Issue 6242, pp. 1477-1481
DOI: 10.1126/science.aab1452

An RNA seed poised to meet its target

The CRISPR-Cas system in prokaryotes precisely identifies infecting parasitic DNAs and viruses and destroys them. The CRISPR-Cas system has been adapted for facile genome editing, heralding a new age in molecular biology. Jiang et al. show that the Cas9 nuclease adopts a distinct confirmation when it binds to the targeting guide RNA. The guide RNA then assumes a preordered shape. This RNA “seed region” is thus poised to initiate recognition of the DNA target sequence.

Science, this issue p. 1477


Bacterial adaptive immunity uses CRISPR (clustered regularly interspaced short palindromic repeats)–associated (Cas) proteins together with CRISPR transcripts for foreign DNA degradation. In type II CRISPR-Cas systems, activation of Cas9 endonuclease for DNA recognition upon guide RNA binding occurs by an unknown mechanism. Crystal structures of Cas9 bound to single-guide RNA reveal a conformation distinct from both the apo and DNA-bound states, in which the 10-nucleotide RNA “seed” sequence required for initial DNA interrogation is preordered in an A-form conformation. This segment of the guide RNA is essential for Cas9 to form a DNA recognition–competent structure that is poised to engage double-stranded DNA target sequences. We construe this as convergent evolution of a “seed” mechanism reminiscent of that used by Argonaute proteins during RNA interference in eukaryotes.

CRISPR-Cas proteins function in complex with mature CRISPR RNAs (crRNAs) to identify and cleave complementary target sequences in foreign nucleic acids (1). In type II CRISPR systems, the Cas9 enzyme cleaves DNA at sites defined by the 20-nucleotide (nt) guide segment within crRNAs, together with a trans-activating crRNA (tracrRNA) (2) that forms a crRNA:tracrRNA hybrid structure capable of Cas9 association (3). Once assembled on target DNA, the Cas9 HNH and RuvC nuclease domains cleave the double-stranded DNA (dsDNA) sequence within the strands that are complementary and noncomplementary to the guide RNA segment, respectively (3, 4) (Fig. 1A). By engineering a synthetic single-guide RNA (sgRNA) that fuses the crRNA and tracrRNA into a single transcript of 80 to 100 nt (Fig. 1B), Cas9:sgRNA has been harnessed as a two-component programmable system for genome engineering in various organisms (5, 6).

Fig. 1 Overall structure of SpyCas9-sgRNA binary complex.

(A) Domain organization of the type II-A Cas9 protein from S. pyogenes (SpyCas9). (B) Secondary structure diagram of sgRNA bearing complementarity to a 20-bp region λ1 DNA. The seed sequence is highlighted in beige. Bars between nucleotide pairs represent canonical Watson-Crick base pairs; dots indicate noncanonical base-pairing interactions. The base stacking interaction is indicated by a filled square. (C) Tertiary structure of sgRNA in ribbon representation, with a sigma-A weighted composite-annealed omit 2FobsFcalc electron density map contoured at 1.5σ. (D) Ribbon diagram of SpyCas9-sgRNA complex, color-coded as defined in Fig. 1, A and B. (E) Surface representations of the crystal structure of SpyCas9 in complex with sgRNA (depicted in cartoon) showing the same view as in Fig. 1D and a 180°-rotated view.

The utility of Cas9 for both bacterial immunity and genome engineering applications relies on accurate DNA target selection. Target choice relies on base pairing between the DNA and the 20-nt guide RNA sequence, as well as the presence of a 2– to 4–base pair (bp) protospacer adjacent motif (PAM) proximal to the target site (3, 4). The target complementarity of a “seed” sequence within the guide segment of crRNAs is critical for DNA recognition and cleavage (7, 8). In type II CRISPR systems, Cas9 binds to targets by recognizing a PAM and searching the adjacent DNA for complementarity to the 10- to 12-nt “seed” sequence at the 3′ end of the guide RNA segment (Fig. 1B) (3, 911). Crystal structures of Cas9 bound to sgRNA and a target DNA strand, with or without a partial PAM-containing nontarget strand, show the entire 20-nt guide RNA segment engaged in an A-form helical interaction with the target DNA strand (12, 13). How the “seed” region within the guide RNA specifies DNA binding has remained unknown.

To determine how Cas9 assembles with and positions the guide RNA prior to substrate recognition, we solved the crystal structure of catalytically active Streptococcus pyogenes Cas9 (SpyCas9) in complex with an 85-nt sgRNA at 2.9 Å resolution (Fig. 1 and table S1). The overall structure of the Cas9-sgRNA binary complex, representing the pre–target-bound state of the enzyme, resembles the bilobed architecture of the target DNA–bound state, as observed in electron microscopic studies (14), with the guide segment of the sgRNA positioned in the central channel between the nuclease and helical recognition lobes (Fig. 1, C to E). This structural architecture and guide RNA organization is maintained in the crystal structure of a widely used nuclease-inactive version of Cas9 (D10A/H840A, referred to as dCas9) in complex with sgRNA (fig. S1).

Comparison of SpyCas9 crystal structures representing the protein alone and the RNA-bound and RNA-DNA–bound states of the enzyme reveals the nature of Cas9’s conformational flexibility during sgRNA binding and target DNA recognition (Fig. 2A and figs. S2 and S3). The helical recognition lobe undergoes substantial rearrangements upon sgRNA binding but before DNA association, especially in helical domain 3, which moves as a rigid body by ~65 Å into close proximity with the HNH domain (fig. S2D). Superposition of the Cas9-sgRNA pre–target-bound complex onto the target DNA–bound structures reveals further conformational changes, including a modest shift in helical domains 2 and 3, as well as a concomitant displacement of the HNH domain toward the target strand (Fig. 2A and fig. S2, E and F). Together with limited proteolysis data (Fig. 2B and fig. S4), these results show that sgRNA binding drives the major conformational changes within Cas9 (14), although additional structural rearrangements occur upon substrate DNA binding. Interestingly, a guide-target mismatched DNA duplex yields a proteolytic pattern similar to that observed for sgRNA-bound Cas9 (fig. S4B), indicating that Cas9-sgRNA pretarget conformation is competent for PAM recognition because no further conformational change is required prior to target DNA binding.

Fig. 2 Preordering of seed RNA sequence and PAM-recognition cleft for target DNA recognition.

(A) Structural comparison between Cas9-sgRNA complex (pretarget) and target DNA–bound structure (PDB ID 4UN3) (see also movies S1 and S2). Vector length correlates with the domain motion scale. Black arrows indicate domain movements within Cas9-sgRNA upon target DNA binding. (B) Limited proteolysis to test for large-scale conformational changes of Cas9 upon sgRNA binding and target DNA recognition. (C) Overlay of the Cas9-sgRNA pretarget bound complex with the target DNA–bound structures. For clarity, only the PAM-containing CTD domain is shown. (D) Close-up view of the seed-binding channel in surface representation. (E) Superimposed sgRNAs in the pretarget (beige) and target DNA–bound states (black and orange) with only the guide segments shown for clarity. Helical axis is indicated by dotted line. Dihedral angles (θ) between guide segment nucleobases and those of the A-form RNA-DNA heteroduplex in target DNA–bound structures are shown in parentheses. (F) Schematic showing key interactions of SpyCas9 with the sgRNA seed sequence. The inset highlights the conformational change of Tyr450 upon target binding.

The single-stranded guide RNA binding triggers ordering of the PAM recognition region of Cas9. In the absence of sgRNA, Cas9’s PAM-interacting C-terminal domain (CTD) is largely disordered (fig. S2A) (14). However, in the Cas9-sgRNA pre–target-bound complex and target DNA–bound structures, the PAM-interaction CTD domain is structured to accommodate the PAM duplex (Fig. 2C). Two critical arginine resides (Arg1333 and Arg1335) involved in 5′-NGG-3′ PAM recognition (13) are pre-positioned in the Cas9-sgRNA structure to recognize the GG dinucleotide on the nontarget DNA strand. This explains biochemical data indicating that the Cas9-sgRNA complex uses PAM recognition as an obligate step to identify potential DNA target sites (9).

In the Cas9-sgRNA structure, the RNA adopts an L-shaped configuration in which the 5′ guide segment lies in close spatial proximity to stem loop 1 of the sgRNA (Fig. 1C and fig. S5). Similar to the DNA-bound Cas9 complexes, Cas9 in the pre–target-bound state makes extensive hydrogen-bonding contacts and aromatic stacking interactions with the crRNA repeat:tracrRNA anti-repeat duplex and stem loop 1 (fig. S6) (12, 15). In contrast to the sgRNA scaffold (nucleotides G21 to U82) for which clear electron density is observed, we observed unambiguous electron density for only 10 of the 20 nucleotides of the guide RNA segment (nucleotides 11 to 20; Fig. 1, B and C), all of which are located in the seed region. Nucleotides 1 to 10 of the guide RNA segment, although present in the crystals (fig. S1), are disordered. The ordered seed nucleotides (G11 to C20, counting from the 5′ end of the sgRNA) are threaded through the narrow nucleic acid–binding channel formed between the two Cas9 lobes, with their bases facing outward (Fig. 2D and fig. S7). Nucleotides G19, C20, and G11 to U13 are exposed to bulk solvent, whereas nucleotides G14 to C18 are shielded from solvent by helical domain 2. The solvent-exposed PAM-proximal seed nucleotides G19 and C20 are therefore positioned to serve as the nucleation site for initiating target binding. This explains how a 2-bp mismatch immediately adjacent to the PAM in the DNA abolishes Cas9 binding and cleavage activity (9).

The single-stranded guide RNA within the seed region maintains a nearly A-form conformation along the ribose-phosphate backbone (Fig. 2E). To maintain this helical configuration, Cas9 makes extensive hydrogen-bonding interactions with phosphates and 2′-hydroxyl groups of the seed nucleotides (Fig. 2F). Such presentation of the seed sequence in a conformation thermodynamically favorable for helical guide:target duplex formation (16) is reminiscent of the guide RNA positioning observed in eukaryotic Argonaute complexes that recognize transcripts by base pairing with a 6-nt RNA seed sequence (fig. S8, A and B) (1719). This situation is distinct from that observed in the type I CRISPR-Cascade targeting complex, in which the entire crRNA guide region is preordered, rather than just the seed segment (fig. S8C) (2022).

Another similarity between the Cas9-bound sgRNA guide segment and the Argonaute-bound microRNA guide segment is the synchronized tilting of bases at each half-helical turn of the RNA strand. In the Cas9-sgRNA complex, a kink introduced by insertion of Tyr450 between seed nucleobases A15 and G16 results in coordinated tilting of nucleobases G11 to A15 relative to the same region of the guide RNA in the target-bound state (Fig. 2, E and F, and fig. S8A). Notably, the orientation of Tyr450 shifts by ~120° upon target binding (Fig. 2F). The bases G16 to C20 remain in an untilted orientation that is immediately ready for target DNA base pairing. This nonuniformity in base orientation may account for previous observations showing that the 5-nt sequence of the guide RNA that binds to DNA immediately adjacent to the PAM is the most critical segment for Cas9 binding (23).

Structural and biochemical data suggest that guide RNA binding triggers a large structural rearrangement in Cas9. To test whether the seed segment of the RNA itself contributes to formation of an activated Cas9 conformation, we monitored Cas9-sgRNA assembly with the use of a set of progressively truncated guide RNAs containing 0 to 20 nt of the guide segment (N0 to N20; table S2). Limited proteolysis showed that guide RNA binding confers protection from trypsin digestion only when the guide segment has a length of at least 10 nt of the target recognition sequence (N10) (Fig. 3A and fig. S9). The absence of the guide segment results in moderately decreased Cas9 binding affinity for the RNA (fig. S10). Together, these results indicate that despite forming a stable complex with Cas9 (fig. S11), the crRNA:tracrRNA scaffold region of the sgRNA alone fails to induce the target recognition–competent conformation of Cas9.

Fig. 3 The seed sequence triggers Cas9 to reach a target recognition-competent conformation.

(A) SDS–polyacrylamide gel electrophoresis of limited trypsin digestion of SpyCas9 in the presence of truncated guide RNAs. (B) Analytical size-exclusion chromatograms of SpyCas9-sgRNA in the absence or presence of single-stranded target DNA with the indicated number of complementary nucleotides. The dashed line indicates the peak position of stably bound SpyCas9-sgRNA-ssDNA ternary complex eluting from the gel filtration column. (C) Cas9-mediated endonuclease activity time course assays using plasmid and oligonucleotide DNA (32P-labeled on both strands) containing a 20-bp λ1 DNA target sequence and a 5′-TGG-3′ PAM motif. Cn (n = 0, 10, 12, 14, 17, or 20) represents the number of potential guide-target base pairs counted from the PAM end.

To assess the molecular mechanism of Cas9-mediated RNA-DNA hybridization, we first used size exclusion chromatography to evaluate the effects of DNA length on the formation of Cas9-sgRNA-ssDNA (single-stranded DNA) ternary complexes. This analysis showed that target ssDNA length must be at least 10 nt to form a kinetically stable ternary complex with Cas9-sgRNA (Fig. 3B), in good agreement with the requirement for a 10- to 12-bp RNA-DNA heteroduplex to ensure strand propagation observed in Cas9 single-molecule experiments (9, 24). To further explore the importance of the seed region for Cas9-mediated DNA cleavage, we conducted endonuclease activity assays using both plasmid and oligonucleotide DNA substrates and our truncated guide RNAs. The plasmid cleavage assay revealed that the 12-bp seed:DNA heteroduplex is necessary for Cas9-mediated supercoiled plasmid cleavage, which proceeds by nicking first by the RuvC nuclease domain, then by the HNH nuclease domain (Fig. 3C and table S2). These data are consistent with structural observations indicating that the flexible HNH domain can adopt multiple non–catalytically productive states during sgRNA binding and target DNA recognition. In line with previous studies (25), the oligonucleotide cleavage assay showed that the N17 guide RNA displays an almost comparable cleavage rate but much reduced RuvC 3′-5′ exonuclease-trimming activity (3) relative to the N20 guide RNA (Fig. 3C). This trimming activity is more pronounced with the H840A nickase version of Cas9 relative to the D10A nickase version (fig. S12). This observation may explain why the D10A nickase is more efficient than the H840A nickase version of Cas9 when using a double-nicking strategy to enhance genome editing specificity (26).

We propose that the preordered PAM recognition region of the Cas9-sgRNA complex initiates DNA interrogation, followed by base pairing between a short PAM-proximal segment of DNA (1 or 2 bp) and the 3′ end of the seed sequence in the sgRNA (Fig. 4). Conformational changes of Cas9 upon initial DNA binding then accommodate guide RNA strand invasion into and beyond the seed region, triggering additional structural changes necessary for Cas9 to reach a cleavage-competent state. Recent crystal structures of human Argonaute2 bound to a microRNA guide and short RNA target sequences underscore the importance of seed region base pairing for accuracy of target selection (27).

Fig. 4 Proposed mechanism for Cas9-mediated DNA targeting and cleavage.

When Cas9 is in the apo state, its PAM-interacting cleft (dotted circle) is largely disordered. In the pretarget state, the PAM-interacting domain and seed sequence from guide RNA are preorganized for PAM recognition, followed by dsDNA melting next to PAM. The nonseed region is disordered and indicated as a dotted line. Base pairing between the seed sequence and the target DNA drives Cas9 into a near-active conformation; complete base pairing between the full guide segment and the target DNA strand enables Cas9 to reach a fully active state.

Our results suggest the apparent convergent evolution of a similar mechanism for CRISPR-Cas9. Collectively, our structural and biochemical data show that Cas9 is subject to multilayered regulation during its activation. The preordered RNA seed sequence and protein PAM-interacting cleft enable the Cas9-sgRNA complex to interact productively with potential DNA sequences for target sampling. The inactive conformation of apo Cas9, as well as the additional conformational changes required for the complex to reach its ultimate catalytically active state, could help to avoid spurious DNA cleavage within the host genome and hence minimize off-target effects in Cas9-based genome editing.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S12

Tables S1 and S2

Movies S1 and S2

References (2841)

References and Notes

  1. Acknowledgments: Atomic coordinates of Cas9-sgRNA and dCas9-sgRNA structures have been deposited in the Protein Data Bank with accession codes 4ZT0 and 4ZT9. We thank G. Meigs, J. Holton (beamline 8.3.1 of the Advanced Light Source, Lawrence Berkeley National Laboratory), and M. Miller for helpful discussion about data collection and processing; D. King and A. Iavarone for mass spectrometric data analysis; and S. H. Sternberg, M. L. Hochstrasser, M. Jinek, and C. Anders for critical reading of the manuscript. Supported by NSF grant 1244557 (J.A.D.). F.J. is a Merck Fellow of the Damon Runyon Cancer Research Foundation (DRG-2201-14); J.A.D. is a Howard Hughes Medical Institute Investigator.
View Abstract

Navigate This Article