Sequence- and Structure-Specific RNA Processing by a CRISPR Endonuclease

See allHide authors and affiliations

Science  10 Sep 2010:
Vol. 329, Issue 5997, pp. 1355-1358
DOI: 10.1126/science.1192272


Many bacteria and archaea contain clustered regularly interspaced short palindromic repeats (CRISPRs) that confer resistance to invasive genetic elements. Central to this immune system is the production of CRISPR-derived RNAs (crRNAs) after transcription of the CRISPR locus. Here, we identify the endoribonuclease (Csy4) responsible for CRISPR transcript (pre-crRNA) processing in Pseudomonas aeruginosa. A 1.8 angstrom crystal structure of Csy4 bound to its cognate RNA reveals that Csy4 makes sequence-specific interactions in the major groove of the crRNA repeat stem-loop. Together with electrostatic contacts to the phosphate backbone, these enable Csy4 to bind selectively and cleave pre-crRNAs using phylogenetically conserved serine and histidine residues in the active site. The RNA recognition mechanism identified here explains sequence- and structure-specific processing by a large family of CRISPR-specific endoribonucleases.

In prokaryotes, fragments of foreign DNA are integrated into clustered regularly interspaced short palindromic repeat (CRISPR) loci that are transcribed as long RNAs containing a repetitive sequence element derived from the host (16). These CRISPR transcripts (pre-crRNAs) are posttranscriptionally processed into short crRNAs that serve as homing oligonucleotides to prevent the propagation of invading viruses or plasmids harboring cognate sequences (1, 7, 8). CRISPR loci coexist with CRISPR-associated (Cas) proteins (3, 5, 9, 10).

The opportunistic pathogen Pseudomonas aeruginosa UCBPP-PA14 (Pa14) harbors a CRISPR/Cas system that contains two CRISPR elements flanked by six Cas genes (Fig. 1A). Both CRISPRs comprise a characteristic arrangement of 28-nucleotide near-identical repeats interspersed with ~32-nucleotide spacers, some of which match sequences found in bacteriophages or plasmids (11). Processing of primary CRISPR transcripts yields crRNAs that contain one spacer sequence flanked by sequences derived from the repeat element (1, 2, 1215).

Fig. 1

Csy4 specifically cleaves only its cognate pre-crRNA substrate. (A) Schematic of the CRISPR/Cas locus in Pa14. The six Cas genes are flanked by two CRISPR loci, each consisting of a series of 28-nucleotide repeats (black lettering) separated by 32-nucleotide distinct spacer sequences (blue). Red arrows denote the cleavage site. (B) In vitro transcribed Pa14 pre-crRNA (Pa, lanes 1 to 6) was incubated with Csy4 in the absence of exogenous metal ions (lanes 2 to 4) or in the presence of MgCl2 (lane 5) or EDTA (lane 6). S. thermophilus (St) pre-crRNA (lanes 7 and 8) served as negative control. Products were separated by means of denaturing polyacrylamide gel electrophoresis (PAGE) and visualized with SYBR Gold staining (Invitrogen, Carlsbad, CA). (C) Csy4 was expressed in E. coli in the presence (+) or absence (–) of a plasmid expressing a Pa14 CRISPR transcript. Csy4 was affinity purified; copurifying RNA was extracted and analyzed by means of denaturing PAGE and SYBR Gold staining. The ~19-nucleotide RNA corresponds to a protected fragment derived from the CRISPR repeat. (D) The Pa14 crRNA stem-loop (C6-G20) was 3′-end labeled by using [5′-32P] pCp, resulting in a minimal Csy4 substrate containing the 32P radiolabel at the position of the scissile phosphate. The RNA was incubated in the presence (+) or absence (–) of Csy4. Products were separated by means of denaturing PAGE and visualized with phosphorimaging.

To identify the protein (or proteins) responsible for producing crRNAs from pre-crRNAs in Pa14, we tested the six recombinant Cas proteins from Pa14 for endoribonuclease activity and observed sequence-specific pre-crRNA processing with Csy4 (Fig. 1B) (16). Csy4 did not cleave pre-crRNA from Streptococcus thermophilus, which has a repeat stem-loop of a distinct sequence from Pa14 (Fig. 1B). CRISPR transcript cleavage is a rapid, metal ion–independent reaction, as observed for crRNA processing within two other CRISPR/Cas subtypes (1, 2).

Csy4 RNA recognition is highly specific for CRISPR-derived transcripts. When expressed in Escherichia coli together with a synthetic Pa14 CRISPR RNA consisting of eight repeat sequences (derived from the CRISPR locus proximal to the Cas1 ORF) and seven identical spacer sequences, Csy4 copurified with a protected ~19-nucleotide fragment derived from the Pa14 crRNA repeat (Fig. 1C and fig. S1). To explore the protein/RNA interactions required for Csy4 substrate recognition and cleavage, assays were performed in vitro by using RNA oligonucleotides corresponding to different regions of the 28-nucleotide Pa14 CRISPR repeat sequence. A 16-nucleotide minimal RNA fragment, consisting of the repeat-derived stem-loop and one downstream nucleotide, was sufficient for Csy4-catalyzed cleavage (fig. S2A). Csy4-mediated cleavage resulted in products carrying 5′-hydroxyl and 3′-phosphate (or 2′-3′ cyclic phosphate) groups, respectively (Fig. 1D).

Additionally, Csy4 activity required the presence of a 2′-hydroxyl group in the nucleotide immediately upstream of the cleavage site because 2′-deoxyribonucleotide substitution at this position abrogated cleavage but did not disrupt Csy4 binding (fig. S2). We cocrystallized Csy4 in complex with the noncleavable 16-nucleotide minimal RNA substrate in three distinct crystal forms, one containing wild-type Csy4 and two containing a catalytically active point mutant (S22C) of Csy4 (figs. S3 and S4), and solved their structures to a resolution of 2.3, 2.6, and 1.8 Å, respectively (table S1). In all three structures, the RNA binds to Csy4 in an almost identical manner, in which the protein makes extensive interactions with the single-stranded RNA (ssRNA)–double-stranded RNA (dsRNA) junction at the base of the crRNA stem as well as with the major groove of the RNA hairpin (Fig. 2A). The RNA is clamped in a highly basic groove between the main body of the protein and an arginine-rich helix (α3, residues 108 to 120) that inserts into the major groove of the hairpin (Fig. 2B).

Fig. 2

The crystal structure of Csy4 bound to RNA substrate. (A) Front and back views of the complex. Csy4 is colored in blue, and the RNA backbone is colored in orange. (B) Csy4 is shown as a surface representation colored according to electrostatic potential [in the same orientation as in (A), right]. The RNA is shown in ribbon representation and colored orange. (C) Magnified view of the interactions between Csy4 and the major groove of the RNA hairpin. Hydrogen bonding is depicted with dashed lines. (D) Expanded view of the interactions between the arginine-rich helix α3 (blue) and the RNA phosphate backbone (shown in stick format, orange).

In the complex, Csy4 adopts a two-domain architecture consisting of an N-terminal ferredoxin-like domain (residues 1 to 94) and a C-terminal domain (residues 95 to 187) that mediates most of the interactions with the RNA (fig. S5A). At the sequence level, Csy4 shares less than 10% identity with the two other known endoribonucleases involved in crRNA biogenesis, CasE from Thermus thermophilus (17) and Cas6 from Pyrococcus furiosus (2). The crystal structures of CasE and Cas6 in their nucleic acid–free states showed that these proteins possess a duplicated ferredoxin fold. The N-terminal ferredoxin fold is preserved in Csy4; structural superpositions made by using the DALI server (18) indicate that Csy4 in its RNA-binding conformation superimposes with CasE and Cas6 with root-mean-square deviation (RMSD) of 3.8 Å (over the N-terminal 111 Cα atoms) and 3.9 Å (over 104 Cα atoms), respectively. Although the C-terminal domain of Csy4 (residues 95 to 187) shares the same secondary structure connectivity as a ferredoxin-like fold, its conformation is markedly different, possibly as a result of RNA binding (fig. S5B).

The crRNA substrate forms a stem-loop structure (19). Nucleotides 6 to 10 and 16 to 20 base-pair to produce a regular A-form helical stem. The GUAUA pentaloop contains a sheared G11-A15 base pair and an extruded nucleotide U14, which closely resembles the structures adopted by other GNR(N)A pentaloops (20, 21). In the Csy4-RNA complex, the RNA stem loop straddles the β-hairpin formed by strands β6-β7 of Csy4, with the C6-G20 base pair directly stacking onto the aromatic side chain of Phe155 (Fig. 2C). In the context of the full-length CRISPR transcript, this allows Csy4 to recognize the ssRNA-dsRNA junctions in the pre-crRNA and anchor the RNA stem-loop to permit sequence-specific interactions in the major groove.

Arg102 and Gln104, located in a linker segment connecting the body of Csy4 to the arginine-rich helix, make sequence-specific hydrogen-bonding contacts in the major groove of the RNA stem to nucleotides G20 and A19, respectively (Fig. 2C). The Csy4-crRNA interaction is further stabilized by the insertion of the arginine-rich helix into the major groove of the RNA hairpin near the pentaloop (Fig. 2D). The side chains of Arg114, Arg115, Arg118, Arg119, and His120 contact the phosphate groups of nucleotides 7 to 12. Additionally, the sidechain of Arg115 hydrogen-bonds to the base of G11. The binding of the arginine-rich helix to the major groove of the crRNA hairpin is reminiscent of the N-peptide/boxB RNA interaction in lambdoid phages (fig. S6) (22) and of lentiviral Rev-RRE and Tat-TAR complexes (23, 24).

Csy4 recognizes the hairpin element of the CRISPR repeat sequence and cleaves immediately downstream of it. In the Csy4-RNA complex structure, where RNA cleavage is abrogated by a 2′-deoxy modification in nucleotide G20, ordered electron density is only evident for the scissile phosphate between G20 and C21. The ribose and cytosine moieties of C21 are not resolved and presumably disordered. The scissile phosphate binds in a pocket located between the β6-β7 hairpin turn on one side and helix α1 on the other (Fig. 3A), hydrogen-bonding to the backbone amide of Gln149 and the side chain of His29. Ser148 is adjacent to the 2′ ribose carbon atom of nucleotide G20 (4.6 Å) and may make a hydrogen-bonding interaction with the 2′-hydroxyl group of G20 in a bona fide pre-crRNA substrate. Mutation of the strictly conserved His29 or Ser148 (to alanine and cysteine, respectively) abolished cleavage activity without disrupting RNA binding (Fig. 3B and figs. S7 and S8), suggesting that these two residues participate in catalysis. A strongly conserved tyrosine (Tyr176) is also positioned near the scissile phosphate (Fig. 3A). However, mutation of Tyr176 to phenylalanine had only a minimal effect on activity (Fig. 3B).

Fig. 3

Functional analysis of catalytic residues in Csy4. (A) Detailed view of the catalytic center. Only the phosphate group of nucleotide C21 (the scissile phosphate, indicated with an asterisk) is visible in electron density maps. Strictly conserved residues found in the proximity of the scissile phosphate are shown in stick format. The arrow indicates the distance between the hydroxyl group of Ser148 and the 2′ ribose carbon of G20. (B) Cleavage activity of Csy4. Wild-type (WT) Csy4 and a series of single-point mutants were incubated with in vitro transcribed pre-crRNA for 5 min at 25°C. Products were resolved by means of denaturing PAGE and visualized with SYBR Gold staining.

The requirement for a 2′ hydroxyl group in the nucleotide immediately preceding the cleavage site suggests that the catalytic mechanism of Csy4 may proceed through a 2′-3′ cyclic intermediate. In this context, the observation of an invariant serine residue (Ser148) adjacent to the 2′ ribose position upstream of the scissile phosphate is unprecedented and points to Ser148 playing a role in activating and/or positioning the 2′-hydroxyl for a nucleophilic attack on the scissile phosphate. The other functionally critical active-site residue, His29, may act as a proton donor for the 5′-hydroxyl–leaving group because mutation of His29 to lysine partially preserved catalytic activity (fig. S9).

We next tested the functional importance of Csy4 residues involved in crRNA recognition. Alanine substitution of Arg102 abolished pre-crRNA processing in vitro, whereas mutation of Gln104 to alanine did not substantially disrupt activity (Fig. 3B). Mutation of Phe155 to alanine severely impaired crRNA processing, suggesting that this residue also plays an important role in substrate orientation. However, none of the above mutations severely disrupted crRNA binding, as judged by means of electrophoretic mobility shift assays, indicating that the structural integrity of the mutant proteins was not compromised (fig. S8). Thus, interaction between Csy4 and the closing base pair of the RNA stem is critical for pre-crRNA processing, whereas sequence-specific recognition of the penultimate base pair in the stem is less important. Incubation of Csy4 with a panel of short RNA oligonucleotides containing a variety of mutations in the CRISPR repeat stem-loop sequence further confirmed that Csy4 requires a C-G base pair closing the RNA stem and that Csy4 can accommodate different nucleotides at the penultimate RNA base pair (fig. S10).

Phylogenetic analysis of CRISPR loci suggests that CRISPR repeat sequences and structures have co-evolved with the Cas genes (19). The similarity of Csy4 at the fold level to the CRISPR-processing endonucleases CasE and Cas6 suggests that collectively they are likely to have descended from a single ancestral endoribonuclease enzyme that has diverged throughout evolution. The structure described here reveals how Csy4 and related endonucleases from the same CRISPR/Cas subfamily use an exquisite recognition mechanism to discriminate crRNA substrates from other cellular RNAs. This illustrates the importance of co-evolution in shaping molecular recognition mechanisms in the CRISPR pathway. Furthermore, the ability of Csy4 to form a tight complex with the cleaved crRNA product points to Csy4 having a functional role within the CRISPR pathway that extends beyond pre-crRNA cleavage.

Supporting Online Material

Materials and Methods

Figs. S1 to S10

Table S1


References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank W. Westphal for help with purification of Csy4 constructs; J. van der Oost for discussion; J. Doudna Cate and members of the Doudna laboratory for critical reading of the manuscript; and C. Ralston and J. Holton (Beamlines 8.2.2 and 8.3.1, Advanced Light Source, Lawrence Berkeley National Laboratory) and S. Coyle for assistance with X-ray data collection. R.E.H. is supported by the U.S. NIH training grant 5 T32 GM08295. M.J. is supported by a Human Frontier Science Program Long-Term Fellowship. B.W. is a Howard Hughes Medical Institute Fellow of the Life Sciences Research Foundation. This work was supported in part by grants from NSF and the Bill and Melinda Gates Foundation. J.A.D. is a Howard Hughes Medical Institute Investigator. Coordinates and structure factors for the Csy4-crRNA complex have been deposited in the Protein Data Bank under accession codes 2xli, 2xlj, and 2xlk. The authors have filed a related patent.
View Abstract

Stay Connected to Science

Navigate This Article