Research Article

Direct CRISPR spacer acquisition from RNA by a natural reverse transcriptase–Cas1 fusion protein

See allHide authors and affiliations

Science  26 Feb 2016:
Vol. 351, Issue 6276, aad4234
DOI: 10.1126/science.aad4234

CRISPR-Cas captures invading RNA

The CRISPR (clustered regularly interspaced short palindromic repeat) system provides bacteria with an adaptive immune response. DNA captured from viruses and plasmids by CRISPR-associated protein 1 (Cas1) is used by bacteria to target the invaders' for destruction. Silas et al. discover that certain classes of the Cas1 gene are fused to a reverse transcriptase gene (RT-Cas1) (see the Perspective by Sontheimer and Marraffini). These RT-Cas1 proteins are able to capture and directly incorporate both DNA and RNA into CRISPR loci. RT-Cas1 systems could be effective against parasitic RNA species, or even to modulate bacterial gene expression.

Science, this issue p. 10.1126/science.aad4234; see also p. 920

Structured Abstract


Cells use a variety of mechanisms to prevent the propagation of parasitic information. A family of adaptive immune systems associated with CRISPRs in prokaryotes has been shown to protect cell populations from “selfish” DNA, including viruses and plasmids. CRISPR-mediated immunity begins with an “adaptation” phase, involving the heritable acquisition of short sequence segments (spacers) from the genome of the infectious agent by the host. This information is stored within CRISPR arrays in the host genome and is used by CRISPR-associated (Cas) nucleases in the subsequent “interference” phase to identify and disrupt infections by the same invader. CRISPR-Cas systems include those capable of interfering with DNA and RNA targets. In several characterized systems, adaptation involves acquisition from DNA templates through the action of a subset of the Cas proteins. One of these proteins (Cas1) plays a catalytic role in spacer acquisition from DNA in all systems analyzed so far.


We sought to determine whether some CRISPR-Cas systems build CRISPR arrays through the acquisition of spacer sequences from RNA. CRISPR systems are phylogenetically grouped into five types (types I to V); in some type III CRISPR systems, Cas1 is naturally fused to a reverse transcriptase (RT). This suggests the possibility of a concerted spacer integration mechanism involving Cas1 integrase activity and the reverse transcription of RNA to DNA. This would enable the acquisition of new spacers from RNA, potentially generating adaptive immunity against RNA-based invaders. To test this hypothesis, we characterized the spacer acquisition machinery of the RT-Cas1–containing type III-B CRISPR system in the bacterium Marinomonas mediterranea (MMB-1), by means of in vivo assays and in vitro reconstitution.


To examine the acquisition capabilities of the MMB-1 type III-B system, we overexpressed RT-Cas1 and associated adaptation genes from MMB-1 in the native host. The resulting strains acquired a variety of new spacer elements in their type III-B CRISPR arrays. These sequences matched segments from the MMB-1 genome and our expression plasmids, with substantially more acquisitions deriving from highly transcribed genes. The transcription-associated acquisition of spacers was dependent on functional Cas1 and RT domains of the RT-Cas1 protein, supporting the idea of an RNA capture mechanism that combines Cas1 integrase activity with reverse transcription of cellular RNA. While Cas1 catalytic mutations abolished spacer acquisition, deletion or mutational inactivation of the RT domain yielded a system capable of integration with no transcriptional bias, revealing an alternative Cas1 activity on DNA substrates. To test whether the MMB-1 system can acquire sequences from RNA, we engineered a self-splicing intron into plasmid copies of two MMB-1 genes that were well sampled by RT-Cas1, simultaneously introducing mutations flanking the splice sites to yield a novel exon-junction sequence that was present as RNA but not DNA. Newly acquired spacers containing the exon-junction sequences confirmed that RT-Cas1 can acquire spacers from RNA. To investigate the relationship between the integrase and RT activities of RT-Cas1, we studied the acquisition machinery in vitro. RT-Cas1 and the associated Cas2 protein promote the precise integration of single-stranded RNA, single-stranded DNA, and double-stranded DNA oligonucleotides directly into a linear CRISPR DNA substrate, indicating that RT-Cas1 acquires spacers directly from RNA. The in vitro studies are consistent with a mechanism in which the Cas1-fused RT domain then reverse-transcribes the integrated RNA, converting it to a cDNA sequence between CRISPR repeats. The concerted integrase-RT mechanism suggested by the in vitro studies has similarities to the genomic integration mechanism used by the bacterial retrotransposons known as mobile group II introns, which encode a related RT.


We showed that a natural RT-Cas1 fusion protein in a type III CRISPR system can enable the acquisition of new spacers directly from RNA. With other type III CRISPR systems known to target RNA for degradation, RT-associated CRISPR-Cas systems would effectively generate adaptive immunity against RNA parasites. RNA spacer acquisition could also contribute to immune responses against highly transcribed regions of DNA-based invaders through targeted interference at both the DNA and RNA levels.

Spacer acquisition from RNA.

(Left) Small segments of invasive DNA are assimilated into CRISPR arrays by Cas1 and Cas2 in a canonical spacer acquisition process that allows adaptive immunity in a wide variety of bacteria and archaea. (Right) In some type III CRISPR systems, a RT fused to Cas1 enables the acquisition of spacer sequences directly from RNA. This process might mediate adaptive immunity against RNA-based parasites.


CRISPR systems mediate adaptive immunity in diverse prokaryotes. CRISPR-associated Cas1 and Cas2 proteins have been shown to enable adaptation to new threats in type I and II CRISPR systems by the acquisition of short segments of DNA (spacers) from invasive elements. In several type III CRISPR systems, Cas1 is naturally fused to a reverse transcriptase (RT). In the marine bacterium Marinomonas mediterranea (MMB-1), we showed that a RT-Cas1 fusion protein enables the acquisition of RNA spacers in vivo in a RT-dependent manner. In vitro, the MMB-1 RT-Cas1 and Cas2 proteins catalyze the ligation of RNA segments into the CRISPR array, which is followed by reverse transcription. These observations outline a host-mediated mechanism for reverse information flow from RNA to DNA.

RNA-guided host defense mechanisms associated with CRISPR arrays exist in most bacteria and archaea (1, 2). Their target specificity derives from a series of spacers—many of which are identical to DNA sequences from phage, transposon, and plasmid mobilomes—interspersed within CRISPR arrays (35). Transcripts from these CRISPR arrays are processed into short structured RNAs, which form a complex with CRISPR-associated (Cas) endonucleases and target invasive nucleic acids, thereby conferring immunity (6, 7). CRISPR-Cas systems have been phylogenetically grouped into five types (8, 9). Homologs of the cas1 and cas2 genes are conserved across diverse CRISPR types (9, 10), with direct evidence for a role in the physical integration of new spacers from invasive DNA into CRISPR arrays in a few type I and II systems (1114). Spacer acquisition allows the host to adapt to new threats.

The ability of type III systems to target RNA in addition to DNA (1521) raises the possibility of natural spacer acquisition from RNA species. Direct acquisition of RNA spacers would add to the handful of known mechanisms for the reverse flow of genetic information from RNA into DNA genomes (2227).

Examination of bacterial genomes has revealed a class of CRISPR-associated coding regions in which cas1 is fused to a putative reverse transcriptase (RT) (10, 2830). These RT-Cas1 fusions raise the possibility of a concerted mechanism of spacer acquisition involving reverse transcription of RNA to DNA, a potentially host-beneficial mechanism for RNA-to-DNA information flow.

Common features of RT-Cas1 fusions

To examine the phylogenetic distribution of fused RT-Cas1–encoding genes, we used the National Center for Biotechnology Information (NCBI) Conserved Domain Architecture Retrieval Tool (CDART) to retrieve protein records containing both a Cas1 domain (Pfam domain PF01867) and a RT domain of any origin (PF00078). Of 93 RT-Cas1–bearing species, all were from bacteria and none were from archaea. RT-Cas1 fusions were most prevalent among cyanobacteria, with 21% of cas1-bearing cyanobacteria carrying such fusions (Fig. 1, A and B). RT-Cas1 fusions with sufficient flanking sequence for type classification were exclusively associated with type III CRISPR systems (table S1); conversely, ~8% of bacterial type III CRISPR systems carried RT-Cas1 fusions.

Fig. 1 Phylogenetic distribution and domain structure of RT-Cas1 fusion proteins.

(A) Taxonomic summary of unique RT-Cas1 protein records obtained from the NCBI CDART engine (current as of May 2015). Shown are numbers of Cas1 protein records and bacterial species with (left) a fused RT domain, (center) RT and an additional N-terminal extension containing a Cas6-like motif, and (right) Cas1 with no additional annotated domain. Only phyla containing RT-Cas1 fusions are listed. (B) 16S rRNA–based tree showing major bacterial phyla, with phyla that contain RT-Cas1 in red [adapted from (48)]. (C) Schematic showing the domain organization of HIV RT (UniProtKB/Swiss-Prot P03366), a group II intron RT (TeI4c from T. elongatus BP-1; GenPept WP_011056164), A. platensis RT-Cas1 (WP_006620498), M. mediterranea RT-Cas1 (WP_013659858), and E. coli Cas1 (NP_417235). Conserved RT motifs as defined in (49) are labeled 1 to 7. Motifs 0 and 2a are conserved in mobile group II intron and non-LTR–retrotransposon RTs (32). The YXDD sequence found in motif 5 contains two aspartic acid residues at the RT active site. Three α-helices found in the thumb/X domain of HIV and group II intron RTs are indicated. Numbers below the bars indicate amino acid positions. D, DNA binding domain; En, endonuclease domain.

The Cas1-fused RT domains were most closely related to RTs encoded by mobile genetic elements (retrotransposons) known as mobile group II introns (29, 30). We identified two related structural families of RT-Cas1 proteins. The more abundant family carries a canonical N-terminal RT domain with a conserved RT-0 motif characteristic of group II intron and non–long terminal repeat (non-LTR) retrotransposon RTs (31, 32). The other lacks the RT-0 motif, starting instead with an additional N-terminal domain containing a putative Cas6-like RNA recognition motif of the RAMP [repeat-associated mysterious protein (10)] superfamily. Alignments of the retrovirus HIV-1 RT and a group II intron RT [Thermosynechococcus elongatus TeI4c RT (33)] with representatives of the two RT-Cas1 fusion families (from Arthrospira platensis and Marinomonas mediterranea) revealed that both Cas1-fused RTs contain the seven conserved sequence motifs characteristic of the finger and palm regions of retroviral RTs. Each also shares the RT-2a motif, which is conserved in group II intron RTs and related proteins but not present in retroviral RTs, such as the HIV-1 RT (31, 32). The thumb/X domain, which is found in retroviral and group II intron RTs just downstream of the RT domain, appears to be missing in the Cas1-associated RTs (Fig. 1C).

The structural subcategories, limited phylogenetic distribution, and exclusive association with a subset of CRISPR types are consistent with a small number of common origins of RT-Cas1 fusions (10, 29).

Spacer acquisition by the M. mediterranea type III-B machinery in an E. coli host

To test whether RT-Cas1 proteins could facilitate the acquisition of new spacers, and to determine whether such spacers might be acquired from RNA, we chose the type III-B CRISPR locus in M. mediterranea (MMB-1) (34), because this is an easily cultured, nonpathogenic member of the well-studied γ-proteobacterium class that contains a RT-Cas1–encoding gene.

We first assessed spacer acquisition after transplantation of the locus into the canonical γ-proteobacterium experimental model, Escherichia coli. We constructed expression vectors carrying the type III-B operon of MMB-1 in two configurations, either as a single cassette consisting of the CRISPR03 array (35), the genes encoding RT-Cas1 and Cas2, and an adjacent gene (encoding Marme_0670) with limited homology to the NERD (nuclease-related domain) family (36), or together with a second cassette additionally encoding the remaining CRISPR-associated factors, Cmr1 to Cmr6 and Marme_0671 (Fig. 2, A and B). The acquisition of new spacers into CRISPR03 was evident from polymerase chain reaction (PCR) amplification of the region between the leader sequence and the first native spacer, followed by high-throughput sequencing. We identified newly acquired spacers in transformants expressing either the full complement of Cas genes, or the subset containing only the potential “adaptation” genes (encoding RT-Cas1, Cas2, and Marme_0670). Bona fide spacer acquisition is evidenced by the precise junctions between the inserted spacer DNA and CRISPR repeats (fig. S1A) and by the diversity of acquired spacers (fig. S1, B and D).

Fig. 2 Spacer acquisition in E. coli by ectopic expression of MMB-1 type III-B CRISPR components.

(A) The MMB-1 type III-B CRISPR operon consists of an 8-spacer CRISPR array (CRISPR03), followed by a canonical six-gene cassette putatively encoding the type III-B Cmr effector complex, two genes of unknown function (Marme_0671 and Marme_0670), the genes encoding RT-Cas1 and Cas2, and lastly a larger 58-spacer CRISPR array (CRISPR02). The locus is flanked by two ~200-bp direct repeats (green arrows). The black arrows indicate promoters. (B) Arrangement of MMB-1 type III-B CRISPR components under inducible promoters (Para, Ptrc, and Plac) on pBAD vectors for ectopic expression in E. coli. (C) Spacer detection frequency after overnight induction of E. coli carrying pBAD expression vectors with arabinose and IPTG. Wild-type RT-Cas1, RT active site mutant (YAAA), and Cas1 domain mutants E790A and E870A were tested with or without the Plac-driven gene cassette encoding the Cmr effector complex. Cas2 Δ32–92 and RT domain Δ299–588 mutants (shown in the two rightmost columns) were tested without the Cmr cassette. Range bars indicate values for two biological replicates (n.d., not determined). (D) Histogram showing normalized counts of E. coli genomic protospacers from the wild-type RT-Cas1 and RTΔ spacer acquisition experiments, distributed by mappable length. Pooled data from several experiments are presented. (E) Nucleotide probabilities at each position along the wild-type RT-Cas1–acquired protospacers in (D), including 15 bp of flanking sequence on each side. Because of varying protospacer lengths, two panels are shown with the spacer 5′ and 3′ ends anchored at positions 15 and 35, respectively. (F) Cumulative normalized distribution of spacers in (D) among E. coli protein-coding open reading frames (ORFs) sorted by expression level [normalized RNAseq read counts from (47); FPKM, fragments per kilobase per million reads], with the most highly expressed genes listed first. Included are 2470 wild-type RT-Cas1– and 5569 RTΔ-acquired spacers mapping to E. coli genes. Dashed black lines show the range of values from a Monte Carlo simulation with random assortment (no transcription-related bias).

Specificity was further tested by evaluating the requirements for RT-Cas1 and Cas2 in spacer acquisition. We constructed two point mutations, E870A and E790A, in the putative Cas1 active site of MMB-1 RT-Cas1, based on a three-dimensional homology model computed using the Archaeoglobus fulgidus Cas1 crystal structure (37). Each point mutation abolished spacer acquisition, as did a 60–amino acid C-terminal deletion in Cas2 (Fig. 2C).

The majority (~85%) of newly acquired spacers mapped to the E. coli genome, with the rest being derived from plasmid DNA (fig. S1D). Over 70% of the spacers were 34 to 36 base pairs (bp) in length (Fig. 2D). Consistent with observations of interference mechanisms in other type III CRISPR systems (7), we found no evidence for a conserved protospacer-adjacent motif (PAM) or other sequence signature associated with protospacer choice (Fig. 2E). We observed no bias for the sense strand among spacers acquired from annotated E. coli genes (fig. S2A) and no enrichment of spacers derived from highly transcribed genes (Fig. 2F). Spacer acquisition was unhindered when the RT domain of RT-Cas1 was mutated or deleted (Fig. 2C), consistent with a DNA-based mechanism operating under these conditions. Deletion of the entire 290–amino acid conserved region of the RT domain resulted in a ~20-fold increase in spacer acquisition frequency (38), with no apparent differences in the characteristics of the pool of acquired spacers (Fig. 2, C to F, and figs. S2A and S3A).

Transcription-associated spacer acquisition in MMB-1 is RT-dependent

Our inability to detect RNA spacer acquisition in the ectopic E. coli assay could reflect the absence of required factors or conditions that are present in the native host, MMB-1. To assay spacer acquisition in MMB-1, we overexpressed RT-Cas1 and Cas2 along with Marme_0670 from a broad-host-range plasmid (pKT230), using the 100-bp sequence upstream of the MMB-1 16S ribosomal RNA (rRNA) gene as a promoter (Fig. 3A). We recovered newly acquired spacers from the genomic copy of the CRISPR03 array and found that the vast majority (~95%) mapped to the MMB-1 genome, with an expected proportion mapping to the expression vector (figs. S1, C and D, and S4). Although the endogenous type III-B CRISPR operon was still present in these strains, we found that plasmid-driven overexpression of adaptation genes was critical for detectable acquisition of new spacers: Parallel analysis of transconjugants in which plasmid-driven RT-Cas1 had the mutation E870A or E790A at the putative Cas1 active site, or of transconjugants carrying an empty vector, failed to identify any new spacers (Fig. 3B). As in E. coli, most (>75%) of the new protospacers were 34 to 36 bp in length (Fig. 3C), and we did not observe a PAM-like sequence at either the 5′ or 3′ ends of the acquired spacers (Fig. 3D).

Fig. 3 RT-Cas1–mediated spacer acquisition in MMB-1.

(A) Arrangement of genes encoding Marme_0670, RT-Cas1, and Cas2 on pKT230 broad-host-range vectors under the control of the putative 16S rRNA promoter (P16S; 100-bp sequence upstream of the MMB-1 16S rRNA gene) for overexpression in MMB-1. New spacers were amplified from the genomic CRISPR03 array. (B) Spacer detection frequency after overnight growth of MMB-1 transconjugants carrying pKT230 overexpression vectors. Two clones each from two independent conjugations carrying either wild-type RT-Cas1, Cas1 domain mutants E790A or E870A, RT domain Δ299–588 mutants, or an empty pKT230 vector were tested. Range bars depict spacer acquisition frequencies for two transconjugants. (C) Histogram showing normalized counts of MMB-1 genomic protospacers from the wild-type RT-Cas1 and RTΔ spacer acquisition experiments, distributed by mappable length. Pooled data from several experiments are presented. (D) Nucleotide probabilities at each position along the wild-type RT-Cas1–acquired protospacers in (C), including 15 bp of flanking sequence on each side. Because of varying protospacer lengths, two panels are shown with the spacer 5′ and 3′ ends anchored at positions 15 and 35, respectively. (E) Cumulative distribution of spacers in (C) among MMB-1 genes sorted by RNAseq FPKM, with the most highly expressed genes listed first. Included are 455 wild-type RT-Cas1– and 341 RTΔ-acquired spacers mapping to MMB-1 genes. Guides are drawn along the x axis at top-10% and top-50% genes by expression level. Monte Carlo bounds were calculated as in Fig. 2F. rRNA genes have been excluded from this analysis because spacers were rarely acquired from rRNA.

In contrast to the E. coli data set, the genomic regions most frequently sampled by the RT-Cas1 spacer acquisition machinery in MMB-1 appeared to be genes that are typically highly expressed in bacteria. We further investigated this association between expression and spacer capture by obtaining RNA sequencing (RNAseq) expression profiles of two independent MMB-1 transconjugants carrying the RT-Cas1 expression vector. The 10% most highly expressed genes accounted for over 50% of newly acquired spacers, with the top 50% of expressed genes accounting for 90% of newly acquired spacers (Fig. 3E). Next, we tested whether this transcriptional association was dependent on the RT domain of RT-Cas1. Deletion of the conserved RT domain of RT-Cas1 abolished the preference for highly transcribed genes (Fig. 3E and fig. S5), while maintaining a comparable length and sequence distribution for the acquired spacer repertoire (Fig. 3, B and C, and figs. S2B, S3B, and S4). Together, these data demonstrate a RT-dependent bias toward the acquisition of spacers from highly transcribed regions.

Spacers acquired from transcribed regions could conceivably be integrated into the CRISPR array in either a negative or a positive orientation. Among spacers that mapped to MMB-1 transcripts, we observed at most a limited preference for the sense strand (fig. S2, B and C). The lack of a strong bias implies a degree of directional flexibility in the integration mechanism, potentially yielding a system in which only a fraction of spacers is able to protect against a single-stranded DNA or RNA target.

RT-Cas1–mediated spacer acquisition from RNA

The observed association between the gene expression level and the frequency of spacer acquisition in MMB-1, combined with the requirement of the RT domain for this association, is consistent with an acquisition process involving reverse transcription of an RNA molecule. Nonetheless, an alternative hypothesis is that acquisition of DNA spacers could result from increased accessibility of DNA in regions of high transcriptional activity.

The acquisition of DNA spacer sequences from an RNA molecule can be tested by placing a functional intron into a transcript, which is spliced to yield a ligated-exon junction sequence that is then captured as DNA (25). To test whether the RT-Cas1 complex could acquire spacers directly from RNA, we used the self-splicing td group I intron, a ribozyme that catalyzes its own excision from its parent transcript, leaving behind a splice junction that was not present as a DNA sequence (39). We produced intron-interrupted versions of two MMB-1 genes—the ssrA gene, encoding a small noncoding RNA [transfer-messenger RNA (tmRNA) (40)] and Marme_0982, encoding ribosomal protein S15—in both cases inserting the intron at sites that were well sampled in our spacer libraries. Each construct was designed with four or five mutations to optimize the flanking exon sequences for td intron splicing. These mutations allow us to unambiguously distinguish between spliced (plasmid-expressed) and native (genomic) ssrA and ribosomal protein S15 transcripts (Fig. 4A). After confirming self-splicing in vitro (fig. S6A), we placed the td intron–containing genes on our RT-Cas1 overexpression plasmids and expressed them in MMB-1 from their native promoters. To assess the transcription level of the engineered coding regions relative to their endogenous counterparts in vivo, we performed high-throughput sequencing of RT-PCR amplicons spanning the splice junctions. We found that ~30% of all ribosomal protein S15 transcripts and ~16% of all ssrA tmRNA transcripts were produced by splicing in the respective transconjugants (fig. S6B).

Fig. 4 Spacer acquisition from RNA in the MMB-1 type III-B system.

(A) Spacers acquired from a host genome could conceivably originate from either RNA or DNA. To test for an RNA origin, we used an engineered self-splicing transcript, which produces an RNA sequence junction that is not encoded by DNA. Bases that were mutated to provide flanking exon sequences favorable for td intron splicing were separated by the 393-bp intron in the DNA template. After transcription and splicing, the two exons were brought together to form a novel junction containing the identifying mutations. Newly acquired spacers that contain this exon-junction indicate spacer acquisition from an RNA target. (B) Alignments of some of the genome-contiguous spacers (gray) and several newly acquired exon-junction–spanning spacers (red) to the genomic and split-gene sequences, respectively. Bases mutated to facilitate td intron splicing are underlined in the genomic sequences. Identifying mutations are depicted as colored bases, and the splice sites are indicated by green triangles. The highlighted ssrA exon-junction–spanning spacer (bottom) is antisense to the spliced tmRNA and differs from a putative DNA template by the five expected mutations. (C) All unique spacers spanning the td intron splice site that did not carry the engineered mutations. The maximum number of mismatches (MM) when these spacers were mapped to the wild-type genomic locus is indicated. None of the identifying mutations were observed among these sporadic mismatches. The spacers in (B) were in addition to four spacers (one for the S15 and three for the ssrA construct) that align to the unspliced exon-intron junction and could have been derived from either DNA or (nascent) RNA.

We assayed for newly integrated spacers in plasmid copies of CRISPR03, recovering 80,136 new spacers that map to the MMB-1 genome. The protospacer length, sequence composition, and bias for highly expressed genes remained consistent with our previous results in MMB-1 (fig. S7). We found two spacers spanning the splice junction of ribosomal protein S15 and six spacers spanning the splice junction of tmRNA from two independent cultures of two independent transconjugants, thereby confirming that the RT-Cas1 spacer acquisition machinery is capable of acquiring spacers from RNA molecules (Fig. 4, B and C). We observed both sense and antisense spacers spanning the synthetic splice junctions from both the ssrA and ribosomal protein S15 constructs (Fig. 4B), further indicating flexibility in the orientation of spacer acquisition relative to the leader. We considered the possibility that these spacers might have been acquired from an extended cDNA copy of the spliced transcripts that was generated through indiscriminate RT activity. Such cDNA sequences would have been detectable by highly sensitive targeted sequencing assays and were not observed (fig. S6C).

Whereas these experiments demonstrate the ability of this system to acquire spacers from RNA, the RT-domain deletion experiments in which spacer acquisition was not biased toward transcribed regions (Fig. 3E) indicate that the system can also acquire spacers from DNA. Nonetheless, the strong transcriptional bias observed with wild-type RT-Cas1 in MMB-1 indicates that most spacer acquisitions driven by the intact RT-Cas1 fusion protein under our conditions are from RNA.

Ligation of RNA and DNA oligonucleotides directly into CRISPR repeats by a RT-Cas1–Cas2 complex

The E. coli Cas1-Cas2 complex has been shown to ligate double-stranded DNA (dsDNA) directly into a supercoiled plasmid containing a CRISPR array by means of a concerted cleavage-ligation (transesterification) mechanism, analogous to that of retroviral integrases (41). To investigate how MMB-1 RT-Cas1 functions in spacer acquisition, we reconstituted this activity in vitro using purified RT-Cas1 and Cas2 proteins. We confirmed that wild-type RT-Cas1 protein has RT activity that is abolished by the deletion of the RT domain (RTΔ) or mutations at the RT active site (YADD to YAAA at amino acid positions 530 to 533) (fig. S8). To assay spacer acquisition, the purified RT-Cas1 and Cas2 proteins were incubated with (i) putative spacer precursors (protospacers) corresponding to DNA or RNA oligonucleotides of different lengths and (ii) a linear 268-bp internally labeled CRISPR DNA substrate containing the leader, the first two repeats, and interspersed spacer sequences from the MMB-1 CRISPR03 array (Fig. 5A). The reactions also included deoxynucleotide triphosphates (dNTPs) to enable reverse transcription of a ligated RNA oligonucleotide.

Fig. 5 Site-specific CRISPR DNA cleavage-ligation by the RT-Cas1–Cas2 complex.

(A) Schematic of CRISPR DNA substrates and products of cleavage-ligation reactions. The substrate was a 268-bp DNA containing the leader (gray), the first two repeats (R1 and R2, orange) and spacers (S1 and S2, green), and part of the third repeat (R3, orange) of the MMB-1 CRISPR03 array. Cleavages (arrowheads) occur at the boundaries of the first repeat with concomitant ligation of a DNA or RNA oligonucleotide (oligo, blue) to the 3′ fragment, yielding products of the sizes shown. (B) Internally labeled CRISPR DNA and a 33-nt dsDNA were incubated with no protein (lane 1), RT-Cas1 (lane 2), Cas2 (lane 3), or a 1:2 mixture of RT-Cas1 and Cas2 (lane 4). The sizes of products determined from sequencing ladders in parallel lanes are indicated on the left. (C) Internally labeled CRISPR DNA was incubated with wild-type (WT) RT-Cas1 and Cas2 without (lane 1) or with a 21-nt RNA (lane 2), 35-nt RNA (lane 3), or 29-nt ssDNA (lane 4). (D) Internally labeled CRISPR DNA was incubated with WT RT-Cas1 plus Cas2 in the absence (lane 1) or presence of a 29-nt ssDNA with either a 3′ OH (lane 2) or a 3′ phosphate (lane 3). (E) Nuclease digestion of 5′-end–labeled RNA and DNA oligonucleotides ligated to CRISPR DNA. Ligation reactions were performed as in (C). After extraction with phenol-CIA and ethanol precipitation, the products were incubated with the indicated nucleases. An asterisk indicates that the sample was boiled to denature the DNA before adding the nuclease. (F) Ligation of 5′-end–labeled RNA and DNA oligo-nucleotides into CRISPR DNA by WT and mutant RT-Cas1 proteins. Lanes 1 and 6 show control reactions of internally labeled CRISPR with WT RT-Cas1 plus Cas2 and an unlabeled 35-nt ssRNA or 29-nt ssDNA oligonucleotide for comparison. Lanes 2 to 5 and 7 to 10 show reactions of unlabeled CRISPR DNA with 5′-end–labeled 35-nt ssRNA and 29-nt ssDNA, respectively, and WT, E870A, and RTΔ RT-Cas1 plus Cas2. All reactions were carried out in the presence of dNTPs. (G) Effect of dNTPs. In the gel on the left, internally labeled CRISPR DNA was incubated with WT RT-Cas1 plus Cas2 in the presence of a 29-nt ssDNA (lanes 1 and 2) or 35-nt ssRNA (lanes 3 and 4) in the absence (lanes 1 and 3) or presence of 1 mM dNTPs (1 mM each of dATP, dCTP, dGTP, and dTTP; lanes 2 and 4). In the gel on the right, internally labeled CRISPR DNA was incubated with WT RT-Cas1 plus Cas2 in the presence of a 35-nt ssRNA oligonucleotide in the absence (lane 10) or presence of different dNTPs (1 mM) as indicated (lanes 5 to 9). Red and black dots indicate products resulting from cleavage and ligation of oligonucleotides at the junction of the leader and repeat 1 on the top strand and the junction of repeat 1 and spacer 1 on the bottom strand, respectively; cyan and purple dots indicate products of the size expected for cleavage and ligation of the oligonucleotide at the junctions of the second CRISPR repeat (see fig. S10).

In initial assays using a dsDNA oligonucleotide, products derived from cleavage of the CRISPR substrate were readily detected in the presence of RT-Cas1 and Cas2 together but not in the presence of either protein alone (Fig. 5B). The sizes of these products were consistent with cleavage at the junctions between the leader and first repeat on the top strand and between the first repeat and spacer on the bottom strand, as expected for staggered cuts that are known to occur in type I CRISPR systems (12). Structural features at the leader-repeat boundary might dictate cleavage at these sites (41). Bands of the sizes expected for free 3′ fragments [148 and 155 nucleotides (nt)] were much weaker than those for the corresponding 5′ fragments (120 and 113 nt), reflecting their replacement with prominent bands of the sizes expected for ligation of the oligonucleotide to their 5′ ends (148 and 155 nt plus oligonucelotide). Similar products were also detected using single-stranded DNA (ssDNA) and RNA oligonucleotides of various sizes (ssDNA, 19 to 59 nt; RNA, 21 to 50 nt) (Fig. 5, B and C, and figs. S9 and S10), presumably reflecting that the more uniform spacer size of 34 to 36 bp in vivo is due to processing of the spacers prior to their integration into the CRISPR array. Additionally, a 3′-phosphate modification of the ssDNA oligonucleotide almost completely abolished the cleavage-ligation reaction, suggesting a crucial role of the 3′OH of the donor oligonucleotide in the integration reaction (Fig. 5D). The ligation of both DNA and RNA oligonucleotides into the CRISPR DNA was confirmed by their expected ribonuclease (RNase) and/or deoxyribonuclease (DNase) sensitivity in reactions with 5′-end–labeled oligonucleotides and unlabeled CRISPR DNA (Fig. 5E). The ligated RNA oligonucleotide was sensitive to RNase H, indicating its presence in an RNA-DNA hybrid, as would be expected if it was used as a template for cDNA synthesis by RT-Cas1 (Fig. 5E).

Although the MMB-1 RT-Cas1–Cas2 complex functions similarly to the E. coli Cas1-Cas2 complex to site-specifically integrate putative spacer precursors into CRISPR arrays, it differs in being able to use a linear CRISPR DNA substrate and to insert not only dsDNA but also ssDNA and RNA oligonucleotides. The ligation of RNA and DNA oligonucleotides into the CRISPR DNA substrate differs in two respects. First, whereas the E870A mutation at the Cas1 active site abolishes ligation of both RNA and DNA oligonucleotides, deletion of the RT domain (RTΔ) abolishes ligation of RNA but not DNA oligonucleotides (Fig. 5F). These findings mirror in vivo results showing that the E870 mutation abolishes the acquisition of both RNA and DNA spacers, whereas the RTΔ mutation abolishes the acquisition of RNA but not DNA spacers (Fig. 3, B and E). Second, dNTPs are required for ligation of RNA but not DNA oligonucleotides, with deoxyguanosine triphosphate (dGTP) or deoxyadenosine triphosphate (dATP) alone sufficient to support RNA ligation (Fig. 5G). Together, these findings suggest that the RT-Cas1 protein is modular, with the Cas1 domain catalyzing ligation of both RNA and DNA spacers into CRISPR repeats, but with ligation of RNA spacers requiring binding by the N-terminal and/or RT domains, possibly coupled to RT domain core closure and/or the initiation of reverse transcription on addition of dNTPs.

Integrated RNA oligonucleotides are reverse-transcribed by the RT-Cas1–Cas2 complex

We tested whether the RT-Cas1–Cas2 complex could reverse-transcribe an integrated RNA oligonucleotide in vitro to generate the cDNA precursor of a fully integrated RNA spacer. The cleavage-ligation reactions on either side of repeat R1 generate products with 5′ overhangs that could potentially be substrates for target DNA-primed reverse transcription (TPRT) reactions, in which the 3′ end of the opposite strand is extended to yield a DNA copy of the repeat plus the ligated RNA oligonucleotide (Fig. 6A). To detect the synthesis of such cDNAs, we incubated the CRISPR DNA with RT-Cas1–Cas2 in the presence of a 21-nt RNA oligonucleotide and supplied radioactive deoxycytidine triphosphate (dCTP) and other unlabeled dNTPs during the incubation (Fig. 6A). cDNA synthesis during the reactions was evident by the labeled products being of the same size as the two ligation products, as expected for a TPRT reaction extending through the R1 repeat and ligated RNA. The synthesis of these cDNAs depends on the presence of the RNA oligonucleotide, the CRISPR DNA, and RT-Cas1–Cas2 (Fig. 6B). The RTΔ mutant abolishes cDNA synthesis, whereas the E870A mutant, which retains RT activity (fig. S8) but cannot integrate the RNA oligonucleotide or create the 3′OH required for priming cDNA synthesis (Fig. 5F), produces only a heterogeneous background of labeled products (Fig. 6B). The TPRT products detected in our assays may represent an intermediate in spacer acquisition, with additional steps potentially including digestion of the ligated RNA spacer strand by a host RNase H, synthesis of a full dsDNA containing the spacer sequence by RT-Cas1 or a host DNA polymerase, and ligation of the unattached ends of the dsDNA into the CRISPR array. Our in vivo and in vitro data suggest that this can occur in either orientation and may involve host enzymes that are present in MMB-1 but not in E. coli.

Fig. 6 cDNA synthesis using RNA ligated to CRISPR DNA.

(A) Schematic showing the CRISPR DNA substrate and the expected products of cleavage and ligation (top), followed by TPRT of the ligated RNA oligonucleotide (blue). cDNAs are shown as black dashes, with arrowheads indicating the direction of cDNA synthesis. (B) WT or mutant RT-Cas1 plus Cas2 proteins were incubated with 268-bp CRISPR DNA in the presence of 21-nt RNA oligonucleotide, labeled dCTP, and unlabeled dATP, dGTP, and dTTP. The WT RT-Cas1–Cas2 complex yields labeled bands of the sizes expected (148 and 155 nt plus oligonucleotide) for TPRT of the RNA oligonucleotide that is ligated site-specifically at opposite boundaries of the first CRISPR DNA repeat (R1, lane 8). The labeled products were not detected with the RT domain (RTΔ, lane 9) or Cas1 active site (E870A, lane 10) mutants, but a background of labeled products is apparent in the E870A lane, due to the RT activity of the protein in the absence of cleavage and ligation (see fig. S8). Labeled products were not detected in the absence of the RNA oligonucleotide (lanes 3 to 6) or CRISPR DNA (lanes 11 and 12). Separate lanes from the same gel (lanes 1 and 2) show the positions of cleavage-ligation products for RT-Cas1 plus Cas2 with an internally labeled CRISPR DNA substrate. “None” indicates no protein added.


We showed that the MMB1 RT-Cas1 fusion protein can mediate the direct acquisition of spacers from donor RNA, using the Cas1 integrase activity to directly ligate an RNA protospacer into CRISPR DNA repeats. The 3′ end generated by cleavage of the opposite DNA strand is then poised for use as a primer for TPRT (26). This mechanism shares features with group II intron retrohoming, in which the intron RNA uses its ribozyme activity to insert itself directly into the host genome and is then converted to an intron cDNA by using the 3′ end generated by cleavage of the opposite DNA strand for TPRT (42). Because type III CRISPR systems are known to target RNA for degradation, and RT-Cas1–encoding genes are exclusively associated with such systems, RNA spacer acquisition makes these CRISPRs distinctively capable of generating immunity against parasitic RNA sequences, potentially including RNA phages and/or other “selfish” RNAs that maintain themselves through the action of host machinery (4346). The acquisition of RNA spacers might also contribute to immune responses to highly transcribed regions of DNA phages and plasmids, by coupling spacers from such regions to an interference system that targets DNA, RNA, or both (1521).

It is possible that fusion between the RT and Cas1 domains may not be necessary to facilitate uptake of RNA spacers; there are several examples of CRISPR loci in which genes encoding similar group II intron–like RTs are adjacent but not fused to cas1 (29). Thus, the mechanisms described herein could potentially extend to species with separately encoded RT and Cas1 components. In addition, RNA spacer acquisition could be involved in gene regulation, providing a straightforward means for bacteria to down-regulate a set of target loci in response to activation of the CRISPR locus.

To fully assess the prevalence and importance of CRISPR adaptation to RNA, a greater understanding of the impact of invasive RNAs in bacteria is necessary. However, our knowledge of the abundance and distribution of RNA phages and other RNA parasites is limited, with the vast majority restricted to the Escherichia and Pseudomonas genera. Future research on the distribution of spacers in RT-associated CRISPR loci among natural populations of bacteria and their environments might help shed light on this topic.

Materials and methods

RT-Cas1 genomic neighborhood analysis

The genomic neighborhoods (up to 20 kb) of RT-Cas1–encoding genes were retrieved from 50 bacterial strains with a custom BioPython script that uses the NCBI tblastn software. The HMMER 3.0 algorithm was then used to identify whether the RT-Cas1–encoding genes were associated with type I, II, or III CRISPR systems, using Cas3 (TIGR 01587, 01596, 02562, 02621, and 03158), Cas9 (TIGR 01865 and 3031), and Cas10 (TIGR 02577 and 02578) hidden Markov models as “signature” genes for each type, respectively (8). Each result was assessed manually by iterative runs of BLAST (Basic Local Alignment Search Tool, NCBI) and the CRISPRfinder online suite.

Monte Carlo simulation of expected spacer acquisition characteristics for random sampling of all genes

We used a Monte Carlo simulation to evaluate a null hypothesis based on random assortment of spacer acquisitions from genomic DNA, with no dependence on gene expression level. For each system, a series of samples of 500 spacers each were randomly chosen in silico from a list of all genes, based on the sizes of the individual genes using the stochastic universal sampling algorithm. Sets of 1000 such trials were used to generate a range of null relationships between gene expression and spacer acquisition. The Monte Carlo bounds (black dotted lines in Figs. 2 and 3 and figs. S2, S5, and S7) depict the envelope of such simulated random assortments. Traces above this envelope indicate preferential spacer acquisition from highly expressed genes; traces below the envelope indicate spacer acquisition from poorly expressed genes more often than expected by random chance. RNAseq data from E. coli K12 were obtained from (47) (data set without computational background subtraction). MMB-1 expression data were generated by RNAseq analysis of the transconjugants used in this study (Fig. 3).

Construction of expression vectors

Plasmids for inducible overexpression of the MMB-1 type III-B CRISPR operon in E. coli were built on the pBAD/Myc–His B backbone (Life Technologies). RT-Cas1–associated genes [Marme_0670, Marme_0669 (RT-Cas1), and Marme_0668 (Cas2)] and green fluorescent protein (GFP) were driven by Para, and the CRISPR03 array was driven by Ptrc. The other seven genes [Marme_0677 to _0672 (Cmr1 to Cmr6) and Marme_0671] and lacZα were driven by Plac. GFP and lacZα ORFs enabled verification of expression of the transcripts containing RT-Cas1–associated adaptation genes and Cmr effector genes, respectively. Point mutants of the Cas1 (E790A or E870A) and RT domains (YADD to YAAA at amino acid positions 530 to 533) of RT-Cas1 were tested with overexpression of the RT-Cas1–associated subset, with and without the remaining seven genes. Deletion mutants of the RT domain of RT-Cas1 (Δ299–588), and Cas2 [Δ32–92] were tested with overexpression of the RT-Cas1–associated subset only.

Plasmids for the overexpression of the RT-Cas1–associated genes in MMB-1 cells were built on the pKT230 backbone (a gift from L. Banta, Williams College). The genes were driven by the 100-bp promoter–containing sequence (positions 306,879 to 306,978) upstream of a MMB-1 16S rRNA gene. Cas1 point mutants (E790A or E870A) and the RTΔ mutant were also tested. For experiments with td intron–containing constructs, a copy of the CRISPR03 array with its leader sequence was also placed on the pKT230 vector to increase the concentration of CRISPR arrays per unit input DNA in the PCR amplification step, and thus increase the efficiency of our spacer detection assay.

Plasmids for protein expression and purification were built on the pMal-c2X backbone [New England Biolabs (NEB)] for RT-Cas1 (wild type and mutants) and on the pET14b backbone (Novagene) for Cas2. Variants of RT-Cas1 were expressed with an N-terminal maltose-binding protein tag attached via a noncleavable rigid linker (50). Cas2 was expressed with an N-terminal 6xHis tag.

All plasmids were verified by sequencing. Plasmid structures are available upon request.

Strains and culture conditions

All bacterial strains used in this study were stored in 20% glycerol at –80°C. Two clones from each conjugation were maintained for each plasmid (referred to as independent transconjugants).

pBAD plasmids (AmpR) encoding MMB-1 type III-B operon components were transformed into chemically competent TOP10F' cells (Life Technologies). TOP10F'-derived strains were grown at 37°C on Luria-Bertani (LB) agar plates (10 g/l tryptone, 5 g/l yeast extract, 10 g/l NaCl, 15 g/l agar) with 100 μg/ml of ampicillin, 0.1% w/v arabinose, and 0.1 mM IPTG (isopropyl-β-D-thiogalactopyranoside) overnight.

pKT230 plasmids (KanR) encoding MMB-1 type III-B operon components were mobilized into a spontaneous rifampicin-resistant mutant of MMB-1 (strain ATCC 700492) from a donor E. coli strain carrying the pRL443 conjugal plasmid (a gift from M. Davison, Carnegie Institution), as described in (51). All transformed MMB-1 strains were grown on 2216 marine agar (Difco) with 50 μg/ml of kanamycin for 16 hours at 25°C.

For experiments with MMB-1 transconjugants carrying td intron constructs, 150-ml cultures were subsequently prepared in 2216 broth (Difco) with 50 μg/ml of kanamycin and shaken at 26° to 27°C in 1-liter flasks for 20 hours before midiprep.

E. coli strain DH5α (Life Technologies) was used for cloning, and Rosetta2 and Rosetta2(DE3) (Novagen) were used for protein expression. Bacteria were grown in LB medium with shaking at 200 rpm. Antibiotics were added when needed (ampicillin, 100 mg/l; chloramphenicol, 25 mg/l).

Nucleic acid extraction

Plasmid DNA from E. coli strains was extracted using the QIAprep Spin Miniprep Kit (QIAGEN). Genomic DNA from MMB-1 strains was extracted using a modified SDS–proteinase K method: Briefly, cells were scraped from plates and resuspended in 1 ml of lysis buffer (10 mM tris, 10 mM EDTA, 400 μg/ml proteinase K, 0.5% SDS) and incubated at 55°C for 1 hour. Batches of the digest (50 to 100 μl) were subsequently purified using the Genomic DNA Clean & Concentrator Kit (Zymo Research).

Total RNA was extracted from MMB-1 strains using a combined trizol–RNeasy method: Briefly, cells were scraped from plates and homogenized directly in 1 ml of trizol (Life Technologies) by vortexing, and total RNA was extracted with 200 μl of chloroform. Ethanol (500 μl) was added to an equal volume of the aqueous phase containing RNA, and the mixture was purified using the RNeasy Kit (QIAGEN) with on-column DNase digestion according to the manufacturer’s instructions. This protocol selects RNA >200 nt and thus depletes transfer RNAs.

Plasmid DNA was purified from large MMB-1 cultures using a custom midiprep method. Cells were harvested from 150- to 200-ml confluent cultures (3000g, 30 min, 4°C) and homogenized in 12 ml of alkaline lysis buffer (40 mM glucose, 10 mM tris, 4 mM EDTA, 0.1 N NaOH, 0.5% SDS) at 37°C by pipetting until clear (10 to 15 min). Chilled neutralization buffer (8 ml) was added (3 M CH3COOK, 2 M CH3COOH), and lysates were immediately transferred to ice to prevent digestion of genomic DNA. Samples were mixed by inverting, and the genomic DNA–containing precipitate was removed by centrifugation (20,000g, 20 min, 4°C). Clarified lysates were extracted twice with a 1:1 mixture of tris-saturated phenol (Life Technologies) and CHCl3 (Fisher Scientific) and once with CHCl3 in heavy phase lock gel tubes (5 Prime). Ethanol (50 ml) was added and DNA was pelleted by centrifugation (16,000g, 20 min, 4°C), washed twice in 80% ethanol, and resuspended in 500 μl of elution buffer (10 mM tris, pH 8.5). Samples were treated with 20 μg/ml RNase A (Life Technologies) at 37°C for 30 min, further digested with 150 μg/ml of protease K in 0.5% SDS at 50°C for 30 min, and purified by organic extraction. Plasmid DNA was resuspended in 0.5 ml of elution buffer, desalted with Illustra NAP-5 G-25 Sephadex columns (GE Healthcare), and eluted with 1 ml of water. Batches of 100 μl were linearized with PvuII-HF (NEB) to aid denaturation during PCR. Last, each digest was purified using a Genomic DNA Clean & Concentrator column (Zymo Research).

DNA and RNA preparations were quantified using a fluorometer (Qubit 2.0, Life Technologies).

Spacer sequencing

Leader proximal spacers were amplified by PCR from 3 to 4 ng of genomic DNA per μl of PCR mix using forward primer AF-SS-119 (CGACGCTCTTCCGATCTNNNNNCTGAAATGATTGGAAAAAATAAGG) anchored in the leader sequence and reverse primer AF-SS-121 (ACTGACGCTAGTGCATCACGTGGCGGAGATCTTTAA) in the first native spacer. For each sample, 96 10-μl reactions were pooled. Sequencing adaptors were then attached in a second round of PCR with 0.5 μl of the previous reaction as a template in a 50-μl reaction, using AF-SS-44:55 (CAAGCAGAAGACGGCATACGAGAT NNNNNNNN GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCACTGACGCTAGTGCATCA) and AF-KLA-67:74 (AATGATACGGCGACCACCGAGATCTACAC NNNNNNNN ACACTCTTTCCCTACACGACGCTCTTCCGATCT), where the (N)8 barcodes correspond to reverse-complemented TruSeq HT indexes D701 to D712 and D501 to D508, respectively (Illumina). Template matching regions in primers are underlined. Phusion High-Fidelity PCR Master Mix with HF Buffer (Fisher Scientific) was used for all reactions. Cycling conditions for round 1 were as follows: one cycle at 98°C for 1 min; two cycles at 98°C for 10 s, 50°C for 20 s, and 72°C for 30 s; 24 cycles at 98°C for 15 s, 65°C for 15 s; and 72°C for 30 s; and one cycle at 72°C for 9 min. Conditions for round 2 were one cycle at 98°C for 1 min; two cycles at 98°C for 10 s, 54°C for 20 s, and 72°C for 30 s; five cycles at 98°C for 15 s, 70°C for 15 s, and 72°C, 30 s; and one cycle at 72°C for 9 min. The dominant amplicons containing the first native spacer from unmodified CRISPR templates after rounds 1 and 2 were 123 bp and 241 bp, respectively. We prepared sequencing libraries by blind excision of gel slices at 300 to 320 bp (70 bp above the 241-bp band, consistent with the expected size of an amplicon from an expanded CRISPR array) after agarose electrophoresis (3%, 4.2 V/cm, 2 hours) of the round 2 amplicons.

When amplifying spacers from plasmids, 1 ng of DNA was used per microliter of PCR mix, synthesis time was shortened to 15 s, and 20 and nine cycles were used in rounds 1 and 2 instead of 24 and five, respectively. Additionally, round 1 amplicons were purified by blind excision of gel slices at 180 to 200 nt after denaturing PAGE (polyacrylamide gel electrophoresis) [pre-run TBE-Urea 10% gels (Novex), 180 V, 80 min in XCell SureLock Mini-Cells (Life Technologies)], and agarose gel–purified libraries were further PAGE-purified by blind excision of gel slices at 300 to 320 nt (pre-run TBE-Urea 6% gels, 180 V, 90 min as above). In this way, spacer detection efficiency was increased ~100-fold. Libraries were quantified by Qubit and sequenced with MiSeq v3 kits (Illumina) (150 cycles, read 1; 8 cycles, index 1; and 8 cycles, index 2).

Spacers were trimmed from reads using a custom Python script and considered identical if they differed only by one nucleotide. Protospacers were mapped using Bowtie 2.0 (“–very-sensitive-local” alignments). These methods preserve strand information.

Directional RNAseq profiling of MMB-1 strains

Total RNA (1 μg) was incubated at 95°C in alkaline fragmentation buffer (2 mM EDTA, 10 mM Na2CO3, 90 mM NaHCO3; pH ~9.3) for 45 min and PAGE-purified [pre-run 15% TBE-Urea precast gels, 200 V, 45 min in Mini-PROTEAN electrophoresis cells (Bio-Rad)] to select 30- to 80-nt fragments. RNA fragments were 3′-dephosphorylated with T4 polynucleotide kinase (NEB) at 37°C for 60 min in the supplied buffer, then desalted by ethanol precipitation. Desphosphorylated RNA was denatured again in adenylated ligation buffer [3.3 mM dithiothreitol (DTT), 10 mM MgCl2, 10 μg/ml acetylated BSA, 8.3% glycerol, 50 mM HEPES-KOH; pH ~8.3) for 1 min at 98°C and ligated to preadenylated adaptor AF-JA-34 (/5rApp/NNNNNNAGATCGGAAGAGCACACGTCT/3ddC/) at 22°C for 4 hours using 10 U T4 RNA Ligase I (NEB). The (N)6 barcode for each RNA fragment allowed us to computationally collapse PCR bias. Excess adaptor was removed by treatment with 5′ deadenylase (NEB) followed by RecJf (NEB) treatment and organic extraction to purify ligation products. RNA was reverse-transcribed using primer AF-JA-126 (/5Phos/AGATCGGAAGAGCGTCGTGT/iSp18/CACTCA/iSp18/GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT) with SuperScript II (Life Technologies) and subsequently hydrolyzed in 0.2 M NaOH at 70°C for 15 min. cDNA was PAGE-purified (pre-run 10% TBE-urea gels, 200 V, 45 min in Mini-PROTEAN electrophoresis cells) to select 90- to 150-nt fragments and circularized with 50U CircLigase I (Epicentre). Libraries were prepared by six to 14 cycles of PCR with universal adaptor AF-JA-158 (AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT) and indexing primers AF-JA-118:125 (CAAGCAGAAGACGGCATACGAGAT NNNNNN GTGACTGGAGTTCAGACGTGTGCTCTTCCG) where the (N)6 barcodes correspond to TruSeq LT indexes AD001 to AD008 (Illumina). Amplicons of 160 to 200 bp were gel-purified by agarose electrophoresis.

Construction and validation of td intron constructs

Constructs with the following features were ordered as gBlocks (Integrated DNA Technologies) and cloned downstream of the T7 promoter in pCR-Blunt II-TOPO (Life Technologies). Bases 208 to 216 (CTTAAGCGT) of the ribosomal protein S15 gene (Marme_0982) and bases 67 to 75 (CGTAAATCC) of the ssrA tmRNA gene (Marme_R0008) were replaced with the wild-type td intron splice junction (CTTGGGT|CT). The 393-bp intron sequence was inserted at the exon junction ‘|’. Included were 128 bp of upstream sequence for Marme_0982 and 183 bp of upstream sequence and 30 bp of downstream sequence for Marme_R0008. Transcripts were generated from linearized plasmids using the MEGAscript T7 Transcription kit (Life Technologies). Mostly unspliced RNA was obtained by arresting the transcription reaction after 5 min at 37°C and subsequently extracted with acidified phenol:CHCl3 (Life Technologies). One-third of the reaction product was incubated in a splicing buffer (40 mM tris, pH 7.5; 6 mM MgCl2; 100 mM KCl; 1 mM ribo-GTP) at 37°C for 30 min and desalted by ethanol precipitation. Spliced and unspliced transcripts were visualized by 1/4x tris-acetate-EDTA native agarose gel electrophoresis, with a 100-bp Quickload dsDNA ladder (NEB) providing approximate sizing. Intron-containing genes were then transferred to pKT230-derived MMB-1 overexpression vectors carrying RT-Cas1–associated genes and a copy of the CRISPR03 array. One clone each from two independent conjugations was isolated for each vector.

In vivo splicing efficiency was measured by high-throughput sequencing as follows. Total RNA was extracted and 1 μg was reverse-transcribed (SuperScript III, high GC content protocol; Life Technologies) with gene-specific primers downstream of the splice junctions that would bind both spliced and unspliced transcripts: AF-SS-238 (CTTAGCGACGTAGACCTAGTTTTT) for Marme_0982 and AF-SS-241 (GGTTATTAAGCTGCTAAAGCGTAG) for Marme_R0008. cDNA was treated with RNase H, and libraries were prepared by a two-round PCR method adapted from the CRISPR spacer sequencing method described above. Round 1 of PCR was performed at annealing temperatures of 48° and 65°C for two and 19 cycles, respectively, with primers AF-SS-242 (CGACGCTCTTCCGATCTNNNNNGATTCGCATGGTAAAC) and AF-SS-243 (ACTGACGCTAGTGCATCAAACTAGTGTAACGTGCTG) for Marme_0982, and for two and 16 cycles, respectively, with primers AF-SS-247 (CGACGCTCTTCCGATCTNNNNNCACGAACCTGAGGTG) and AF-SS-248 (ACTGACGCTAGTGCATCACGTCGTTTGCGACTATATAATTGA) for Marme_R0008. This approach simultaneously generated amplicons of identical length for both spliced and unspliced transcripts, which were then attached to flowcell adaptors (Illumina) with a second round of PCR as before.

The presence of exon-junction sequences corresponding to the td intron constructs in DNA form outside the CRISPR arrays was also tested by high-throughput sequencing. Libraries consisting of the ~100-bp region containing the td intron insertion sites in Marme_R0008 and Marme_0982 were prepared by a two-round PCR method identical to the one described above for measuring splicing efficiency by RT-PCR, using 100 ng of genomic DNA (~2 × 107 copies) as a template instead of reverse-transcribed cDNA. Round 1 of PCR was performed at annealing temperatures of 57°C and 68°C for two and 16 cycles, respectively, with primers AF-SS-318 (CGACGCTCTTCCGATCTNNNNNCACATTCATGACCACCATTCTCG) and AF-SS-309 (ACTGACGCTAGTGCATCACTTCGGTCTTAGCGACGTAGAC) for Marme_0982 and primers AF-SS-310 (CGACGCTCTTCCGATCTNNNNNGGGGTGACATGGTTTCGACG) and AF-SS-311 (ACTGACGCTAGTGCATCAGCAGGTTATTAAGCTGCTAAAGCG) for Marme_R0008. The amplicons were then attached to flowcell adaptors (Illumina) with a second round of PCR as before. Each library was sequenced to a depth of ~5 million reads. To ensure that the PCR was not bottlenecked, we also included a spike-in (1 molecule per 1000 copies of the MMB-1 genome) of synthetic ssDNA templates—AF-SS-312 (TAAAAACATTGAAGGTCTACAAGGTCACTTTAAAGCTCACATTCATGACCACCATTCTCGTCGCNNNNNNNNNNNNATGGTAAACCAACGTCGTAAGTTGTTGGATTACCAGCTGCGTAAAGACGCAGCACGTTACACTAGTTTGANNNNNNNNNNNNGTCTACGTCGCTAAGACCGAAG) for Marme_0982 and AF-SS-313 (GGGGTGACATGGTTTCGACGNNNNNNNNNNNNCCTGAGGTGCATGTCGAGAGTGATACGTGATCTCAGCTGTCCCCTCGTATCAATTATATAGTCGCAAANNNNNNNNNNNNCGCTTTAGCAGCTTAATAACCTGCTAGTGTGCTGCCCTCAGGTTGCTTGTAGCCCGAGATTCCGCAGT) for Marme_R0008—that could be amplified concomitantly by the same primer sets to yield identically sized amplicons.

The spike-in–derived reads are easily identified by sequence, with the diversity of randomized (N)12 segments used to evaluate the degree to which distinct reads in the amplified pool represent independent molecules from the pre-amplification mixture. A large number of spike-in barcodes (ideally a different barcode for every spike-in read) indicate that a high fraction of reads from the amplified pool represent unique molecules in the initial sample, whereas repeated appearances of a small number of (N)12 barcodes in the amplified pool would be indicative of bottleneck formation during PCR (and hence a less than optimal relationship between read counts and molecules in the initial pool). For the purpose of estimating the number of molecules sampled from an initial pool, we calculated a nonredundancy fraction, which is the ratio of spike-in–derived barcodes to total spike-in–derived reads. The nonredundancy fraction provides a multiplier that can be used to correct raw read counts from an amplified pool to obtain an estimate of the contributing number of molecules from the initial pool. This is particularly applicable for estimating a minimal incidence of a rare class (i.e., setting a detection limit for spliced copies of the td intron–containing DNA constructs in this work). Given nonredundancy fractions of >0.45 for all samples in these experiments, the observed totals of control (nonspliced, genomic) sequence reads (fig. S6C) would have been sufficient to detect the presence of extended spliced td intron–containing DNA molecules, even at the low incidence of 10−6.

The same cultures of MMB-1 were used to assess both splicing efficiency and the presence of exon-junction sequences in DNA form.

PCR fidelity

Analyzing sequence distributions through PCR and sequencing entails certain best practices in terms of both experimental protocols and analysis. In particular, several precautions were observed in constructing sequencing libraries for spacer sequencing. PCR titrations were performed to ensure that the amplification kinetics were in the linear range of the reactions before any size selection step (e.g., band excision from native agarose gels); this avoids renaturation artifacts in complex sequence pools. The overall error rate was empirically determined for every experiment by analyzing the distribution of mismatches in the sequences obtained from the first native spacer in the CRISPR03 array; this enabled the estimation of the error rate in the region of the sequencing reads that contained newly acquired spacers. PCR bottlenecking was also measured as the number of repeat occurrences of any given new spacer. All synthetic sequences that could lead to confounding contamination issues were avoided: No sequences from E. coli, MMB-1, or other sources have been synthesized as amplifiable substrates. As a benchmark for recovery of individual sequences, a nonbacterial sequence was synthesized as a spacer flanked by the appropriate CRISPR repeats. This repeat-flanked spacer sequence (CTGGGACATATAATATCGTCCCCGTAGATGCCTAT; a segment of the phage MS2) was recovered effectively in experiments with an E. coli transformant carrying a plasmid with the indicated template. Appearances of MS2 sequences in other trials were limited to this single sequence, indicating a likely source due to a low level of cross-sample “bleeding.”

Protein purification

Expression plasmids were transformed into E. coli strains Rosetta2 (pMal derivatives) or Rosetta2(DE3) (pET14b derivatives), and single transformed colonies were grown in LB medium supplemented with appropriate antibiotics over night at 37°C with shaking. Six flasks each containing 1 liter LB were inoculated with 1% of the overnight culture and grown at 37°C with shaking to log phase. After the culture reached an optical density at 600 nm of ~0.8, IPTG was added to 1 mM final concentration and the cultures were incubated at 19°C for 20 to 24 hours. Cells were harvested by centrifugation, and the pellet was dissolved in A1 buffer (25 mM KPO4, pH 7; 500 mM NaCl; 10% glycerol; 10 mM ß-mercaptoethanol; 10 ml/g cell paste) on ice. Lysozyme was added to 1 mg/ml final concentration and incubated at 4°C for 0.5 hours. Cells were then sonicated (Branson Sonifier 450; three bursts of 15 s each with 15 s between each burst). The lysate was cleared by centrifugation (29,400g, 25 min, 4°C), and polyethyleneimine (PEI) was added to the supernatant in six steps on ice with stirring to a final concentration of 0.4%. After 10 min, precipitated nucleic acids were removed by centrifugation (29,400g, 25 min, 4°C), and proteins were precipitated from the supernatant by adding ammonium sulfate to 60% saturation on ice and incubating for 30 min. Proteins were collected by centrifugation (29,400g, 25 min, 4°C), dissolved in 20 ml A1 buffer, and filtered through a 0.45-μm polyethersulfone membrane (Whatman Puradisc).

Protein purification was achieved by using a BioLogic fast protein liquid chromatography system (BioRad). RT-Cas1 was purified by loading the filtered crude protein onto an amylose column (30 ml; NEB Amylose High Flow resin), washing with 50 ml of A1 buffer, followed by 30 ml A1 plus 1.5 M NaCl and 30 ml of A1 buffer. Bound proteins were eluted with 50 ml of 10 mM maltose in A1 buffer. Fractions containing RT-Cas1 were identified by SDS-PAGE, pooled, and diluted to 250 mM NaCl. The protein was then loaded onto a 5-ml heparin-Sepharose column (HiTrap Heparin HP column; GE Healthcare) and eluted with a 0.1- to 1-M NaCl gradient. Peak fractions (~700 mM NaCl) were identified by SDS-PAGE, pooled, and dialyzed into A1 buffer. The dialyzed protein was concentrated to >10 μM using an Amicon Ultra Centrifugal Filter (Ultracel-50K). The protein was stable in A1 buffer on ice for about 3 months.

The initial steps in the Cas2 purification were similar, except that the cell paste was resuspended in N1 buffer (25 mM tris-HCl, pH 7.5; 500 mM KCl; 10 mM imidazole; 10% glycerol; 10 mM DTT) and the ammonium sulfate precipitation step was omitted. Instead, the Cas2 PEI supernatant was loaded directly onto a 5-ml nickel column (HiTrap Nickel HP column; GE Healthcare) and eluted with an imidazole gradient (60 ml 10 to 500 mM in N1 buffer). Peak fractions containing Cas2 were identified by SDS-PAGE and pooled. After adjusting the KCl concentration to 200 mM, the pooled fractions were loaded onto two 5-ml heparin-Sepharose columns arranged in tandem. The protein was eluted with a linear KCl gradient (50 ml, 100 mM to 1 M), and Cas2 peak fractions (~800 mM KCl) were identified by SDS-PAGE and stored on ice in elution buffer. The protein was stable on ice for several months.

All protein concentrations were measured using the Qubit Protein assay kit (Life Technologies) according to the manufacturer’s protocol. Proteins were >80% pure based on densitometry.

Formation of RTCas1+Cas2 complex

Purified RTCas1 (2500 pMol) was mixed with a twofold excess of purified Cas2 in 250 mM KCl, 250 mM NaCl, and 12.5 mM tris-HCl (pH 7.5); 12.5 mM KPO4 (pH 7); 5 mM DTT; 5 mM BME; and 10% glycerol and incubated on ice for >16 hours prior to reactions.

RT assay

RT assays with poly(rA)/oligo(dT)24 were performed by preincubating poly(rA)/oligo(dT)24 (80 μM and 50 μM, respectively) in 200 mM KCl, 50 mM NaCl, 10 mM MgCl2, and 20 mM tris-HCl (pH 7.5); 1 mM unlabeled deoxythymidine triphosphate (dTTP); and 5 μCi [α-32P]-dTTP (3000 Ci/mmol; PerkinElmer) for 2 min at the desired temperature, then initiating the reaction by adding the RT-Cas1 proteins (1 to 2 μM final concentration). The reactions (20 to 30 μl) were incubated for times up to 30 min. A 3-μl sample was withdrawn at each time point and added to 10 μl of stop solution (0.5% SDS, 25 mM EDTA). Reaction products were spotted onto Whatman DE81 paper (10- × 7.5-cm sheets; GE Healthcare Biosciences), which was then washed three times with 0.3 M NaCl and 0.03 M sodium citrate, dried, and scanned with a PhosphorImager (Typhoon Trio Variable Mode Imager; GE Healthcare Biosciences) to quantify the bound radioactivity.

CRISPR DNA cleavage/ligation assay

MMB-1 CRISPR DNA substrate was a PCR prodduct amplified with primers MMB1crisp5b (CACTCGACCGGAATTATCGACGAA) and MMB1crisp3 (TCTGAAACTCTGAATACTAACGAAAAATAG) using Phusion High-Fidelity DNA polymerase according to the manufacturer’s protocol (NEB or Thermo Scientific). The resulting 268-bp PCR fragment contains 120 bp of the leader, 35 bp of repeat 1, 33 bp of spacer 1, 35 bp of repeat 2, 37 bp of spacer 2, and 8 bp of repeat 3. Internally labeled substrate was prepared by adding 25 μCi [α-32P]-dTTP or dCTP (Perkin Elmer) and 40 μM dTTP or dCTP, respectively, to the PCR reactions. Labeled DNA was purified by electrophoresis in a native 6% polyacrylamide gel, cutting out the labeled band, and electroeluting the DNA using midi D-Tube dialyzer cartridges (Novagen). The eluted DNA was extracted with phenol:chloroform:isoamyl alcohol (phenol-CIA), ethanol-precipitated, and quantitated using a Qubit dsDNA assay kit (Life Technologies).

CRISPR DNA cleavage-ligation assays contained RTCas1–Cas2 complex (500 nM final), MMB-1 CRISPR substrate (1 nM), 20 mM tris (pH 7.5), and 7.5 mM free MgCl2. DNA or RNA oligonucleotides and an equimolar solution of dNTPs and Mg2+ were added at 2.5 μM and 1 mM final concentrations as indicated for individual experiments. Reactions were incubated at 37°C for 1 hour and stopped by adding phenol-CIA. The supernatant was mixed at a 2:1 ratio with loading dye (90% formamide, 20 mM EDTA, and 0.25 mg/ml bromophenol blue and xyan cyanol), and nucleic acids were analyzed in a 6% polyacrylamide 7 M urea gel. Gels were dried and scanned with a phosphorimager.

Labeled DNA or RNA oligonucleotide ligation assays were performed as described above but using 22.5 μM unlabeled CRISPR PCR fragment and ~0.25 μM 5′-end–labeled gel-purified oligonucleotides. Control assays were performed without adding CRISPR PCR fragment. For nuclease treatment of oligonucleotide ligation to CRISPR DNA, reactions were scaled up fourfold, treated with phenol-CIA, and ethanol-precipitated. The precipitated nucleic acids were dissolved in 30 μl of water. Equal amounts were then either untreated or treated with RNase H (2 units, Invitrogen), DNase I (RNase-free, 10 units, Roche), and RNase A/T1 mix [0.5 μg RNase A (Sigma) and 500 units RNase T1 (Ambion)] in 40 mM tris (pH 7.9), 10 mM NaCl, 6 mM MgCl2, and 1 mM CaCl2 for 20 min at 37°C. Samples were extracted with phenol-CIA to terminate the reaction and analyzed by electrophoresis in a denaturing polyacrylamide gel, as described above.

Labeled cDNA extension reactions were carried out as above but using cold CRISPR DNA and oligonucleotides with 0.25 mM unlabeled dATP, dGTP, and dTTP and 5 μCi [α-32P]-dCTP (3000 Ci/mMol, PerkinElmer).


Supplementary Materials

Figs. S1 to S10

Tables S1 and S2

Reference (52)


  1. Of the two RT-Cas1–associated type III-B CRISPR arrays in this system, CRISPR03 was chosen for spacer acquisition assays, because the other array (CRISPR02) has unusual truncated repeats at the leader-proximal end (1).
  2. One potential contributor to increased spacer acquisition frequency (Fig. 2C) after RT deletion could be the higher growth rate that was observed for the cells expressing the RTΔ mutant.
Acknowledgments: We thank J. Shor, S. Cohen, M. Bagdasarian, C. Pourcel, L. Mindich, C.P. Wolk, M. Poranen, and laboratory colleagues for help and advice; the Howard Hughes Medical Institute and Stanford for fellowship support (S.S.); NIH (grants R01-GM37706 to A.Z.F., R01-GM37949 to A.M.L., and R01-GM37951 to A.M.L.); and the Welch Foundation (grant F-1607 to A.M.L.). Sequencing data are archived in the Sequence Read Archive under SRA ID-SRP066108. S.S. and A.Z.F. conceived the project. S.S., G.M., A.M.L., and A.Z.F. designed experiments, analyzed data, and wrote the paper with inputs from other authors. S.S. performed all genetics experiments. G.M., D.J.S., and L.M.M. performed all biochemistry experiments. A.S.-A. and D.B. provided protocols and conceptual guidance.

Stay Connected to Science

Navigate This Article