Report

Programmed DNA destruction by miniature CRISPR-Cas14 enzymes

See allHide authors and affiliations

Science  16 Nov 2018:
Vol. 362, Issue 6416, pp. 839-842
DOI: 10.1126/science.aav4294

A programmable type of CRISPR system

CRISPR-Cas9 systems have been causing a revolution in biology. Harrington et al. describe the discovery and technological implementation of an additional type of CRISPR system based on an extracompact effector protein, Cas14. Metagenomics data, particularly from uncultivated samples, uncovered the CRISPR-Cas14 systems containing all the components necessary for adaptive immunity in prokaryotes. At half the size of class 2 CRISPR effectors, Cas14 appears to target single-stranded DNA without class 2 sequence restrictions. By leveraging this activity, a fast and high-fidelity nucleic acid detection system enabled detection of single-nucleotide polymorphisms.

Science, this issue p. 839

Abstract

CRISPR-Cas systems provide microbes with adaptive immunity to infectious nucleic acids and are widely employed as genome editing tools. These tools use RNA-guided Cas proteins whose large size (950 to 1400 amino acids) has been considered essential to their specific DNA- or RNA-targeting activities. Here we present a set of CRISPR-Cas systems from uncultivated archaea that contain Cas14, a family of exceptionally compact RNA-guided nucleases (400 to 700 amino acids). Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements. Moreover, target recognition by Cas14 triggers nonspecific cutting of ssDNA molecules, an activity that enables high-fidelity single-nucleotide polymorphism genotyping (Cas14-DETECTR). Metagenomic data show that multiple CRISPR-Cas14 systems evolved independently and suggest a potential evolutionary origin of single-effector CRISPR-based adaptive immunity.

Competition between microbes and viruses stimulated the evolution of CRISPR-based adaptive immunity to provide protection against infectious agents (1, 2). In class 2 CRISPR-Cas systems, a single 100- to 200-kDa CRISPR-associated (Cas) protein with multiple functional domains carries out RNA-guided binding and cutting of DNA or RNA substrates (3, 4). To determine whether simpler, smaller RNA-guided proteins occur in nature, we queried terabase-scale metagenomic datasets (59) for uncharacterized genes proximal to both a CRISPR array and cas1, the gene that encodes the universal CRISPR integrase (10, 11). This analysis identified a diverse family of CRISPR-Cas systems that contain cas1, cas2, cas4, and a previously unrecognized gene, cas14, encoding a 40- to 70-kDa polypeptide (Fig. 1A). We initially identified 24 different cas14 gene variants that cluster into three subgroups (Cas14a, Cas14b, and Cas 14c) on the basis of comparative sequence analysis (Fig. 1, A and B, and figs. S1 and S2). Cas14 proteins are ~400 to 700 amino acids (aa), about half the size of previously known class 2 CRISPR RNA-guided enzymes (950 to 1400 aa) (Fig. 1, C and D). Though the identified Cas14 proteins exhibit considerable sequence diversity, all are united by the presence of a predicted RuvC nuclease domain, whose organization is characteristic of type V CRISPR-Cas DNA-targeting enzymes (Fig. 1D) (3, 12, 13).

Fig. 1 Architecture and phylogeny of CRISPR-Cas14 genomic loci.

(A) Phylogenetic tree of type V CRISPR systems. Newly identified miniature CRISPR systems are highlighted in orange. (B) Representative loci architectures for C2c10 and CRISPR-Cas14 systems. (C) Length distribution of Cas14a to Cas14c systems compared with Cas12a to Cas12e and Cas9. (D) Domain organization of Cas14a compared with Cas9 and Cas12a. Nuclease domains (RuvC and HNH) are indicated. Protein lengths are drawn to scale.

The Cas14 proteins we identified occur almost exclusively within DPANN, a superphylum of symbiotic archaea characterized by small cell and genome sizes (14, 15). Phylogenetic comparisons showed that Cas14 proteins are widely diverse with similarities to C2c10 and C2c9, families of bacterial RuvC domain–containing proteins that are sometimes found near a CRISPR array but not together with other cas genes (Fig. 1B and fig. S1) (3). This observation and the small size of the c2c10, c2c9, and cas14 genes made it improbable that these systems could function as stand-alone CRISPR effectors (3).

On the basis of their proximity to conserved genes responsible for creating genetic memory of infection (cas1, cas2, cas4) (fig. S3A), we explored whether CRISPR-Cas14 systems can actively acquire DNA sequences into their CRISPR arrays. Assembled metagenomic contiguous DNA sequences (contigs) for multiple CRISPR-Cas14 loci revealed that otherwise identical CRISPR systems showed diversity in their CRISPR arrays. These results are consistent with active adaptation to new infections, although without longitudinal sampling these data could also be explained by alternative biological mechanisms (Fig. 2A and fig. S3B) (13). The evidence suggesting acquisition of new DNA sequences led us to hypothesize that these CRISPR-Cas14 loci encode functional enzymes with nucleic acid targeting activity despite their small size. To test this possibility, we first investigated whether RNA components are produced from CRISPR-Cas14 loci. Environmental metatranscriptomic sequencing data were analyzed for the presence of RNA from the native archaeal host that contains CRISPR-Cas14a (Fig. 2B and fig. S4A). In addition to CRISPR RNAs (crRNAs), a highly abundant noncoding RNA was mapped to a ~130–base pair sequence located between cas14a and the adjacent CRISPR array. Notably, the 3′ end of this transcript was mostly complementary to the repeat segment of the crRNA (Fig. 2C and fig. S4B), as observed for trans-activating CRISPR RNAs (tracrRNAs) found in association with Cas9, Cas12b, and Cas12e CRISPR systems (12, 13, 16). In these previously studied systems, the double-stranded RNA–cutting enzyme ribonuclease III (RNase III) generates mature tracrRNAs and crRNAs, but no genes encoding RNase III were present in cas14-containing reconstructed genomes (fig. S5A), nor did Cas14a cleave its own pre-crRNA when tested biochemically (fig. S5B). These observations imply that an alternative mechanism for CRISPR-associated RNA processing exists in these hosts.

Fig. 2 CRISPR-Cas14a actively adapts and encodes a tracrRNA.

(A) Spacer diversity for Cas14b4 and Cas14b14 with CRISPR repeats diagramed in tan and distinct spacers shown in different colors. (B) Metatranscriptomic reads mapped to Cas14a1 and Cas14a3 loci. The insets show an expanded view of the most abundant repeat and spacer sequence. nt, nucleotides. (C) In silico predicted structure of Cas14a1 crRNA and tracrRNA. Notably, RNase III orthologs were not identified in host genomes (fig. S5A). (D) Fraction of various CRISPR complex masses made up of RNA and protein.

To test whether the Cas14a proteins and associated RNA components can assemble together in a heterologous organism, we introduced a plasmid into Escherichia coli containing a minimal CRISPR-Cas14a locus that includes the cas14 gene, the CRISPR array, and intergenic regions containing the putative tracrRNA. Affinity purification of the Cas14a protein from cell lysate and sequencing of copurifying RNA revealed a highly abundant mature crRNA as well as the putative tracrRNA, albeit in lower relative abundance than what was shown by environmental metatranscriptomics, suggesting that Cas14 associates with both crRNA and tracrRNA (fig. S5C). The calculated mass of the assembled Cas14a protein–tracrRNA–crRNA particle is 48% RNA by weight compared with just 17% for Streptococcus pyogenes Cas9 (SpCas9) and 8% for Francisella novicida Cas12a (FnCas12a) (Fig. 2D), hinting at a central role of the RNA in the architecture of the Cas14a complex. Known class 2 CRISPR systems require a short sequence called a protospacer adjacent motif (PAM) to target double-stranded DNA (dsDNA) (17). To test whether Cas14a requires a PAM and can conduct dsDNA interference, we transformed E. coli expressing a minimal Cas14a locus with a dsDNA plasmid containing a randomized PAM region next to a sequence matching the target-encoding sequence (spacer) in the Cas14 array. Notably, no depletion of a PAM sequence was detected among E. coli transformants, suggesting that the CRISPR-Cas14a system is unable to target dsDNA, can do so without requiring a PAM, or is inactive in this heterologous host (fig. S6, A and B).

We next tested whether purified Cas14a-tracrRNA-crRNA complexes are capable of RNA-guided nucleic acid cleavage in vitro. All currently reconstituted DNA-targeting class 2 interference complexes are able to recognize both dsDNA and single-stranded DNA (ssDNA) substrates (1820). We incubated purified Cas14a-tracrRNA-crRNA complexes with radiolabeled target oligonucleotides (ssDNA, dsDNA, and ssRNA) bearing a 20-nucleotide sequence complementary to the crRNA guide sequence or a noncomplementary ssDNA, and we analyzed these substrates for Cas14a-mediated cleavage. Only in the presence of a complementary ssDNA substrate was any cleavage product detected (Fig. 3A and fig. S7, A to C), and cleavage was dependent on the presence of both tracrRNA and crRNA, which could also be combined into a single-guide RNA (sgRNA) (Fig. 3B and fig. S8). The lack of detectable dsDNA cleavage suggests that Cas14a targets ssDNA selectively, although it is possible that some other host factor or sequence requirement could enable dsDNA recognition in the native host. Mutation of the conserved active site residues in the Cas14a RuvC domain eliminated cleavage activity (fig. S7, D and E), implicating RuvC as the domain responsible for DNA cutting. Moreover, Cas14a DNA cleavage was sensitive to truncation of the RNA components to lengths shorter than the naturally produced sequences (fig. S9, A to D). These results establish Cas14a as the smallest class 2 CRISPR effector demonstrated to conduct programmable RNA-guided DNA cleavage thus far.

Fig. 3 CRISPR-Cas14a is an RNA-guided DNA endonuclease.

(A) Cleavage kinetics of Cas14a1 targeting ssDNA, dsDNA, ssRNA, and off-target ssDNA. (B) Diagram of Cas14a RNP bound to target ssDNA and Cas14a1 cleavage kinetics of radiolabeled ssDNA in the presence of various RNA components. (C) Tiling of a ssDNA substrate by Cas14a1 guide sequences. (D) Cleavage of the ssDNA viral M13 genome with activated Cas14a1.

Although we were unable to identify a dsDNA PAM in vivo, we tested whether Cas14a requires a PAM for ssDNA cleavage in vitro by tiling Cas14a guides across a ssDNA substrate (Fig. 3C). Despite sequence variation adjacent to the targets of these different guides, we observed cleavage for all four sequences. Notably, the cleavage sites occur beyond the guide-complementary region of the ssDNA and shift in response to guide binding position (Fig. 3C). These data demonstrate Cas14a is a ssDNA-targeting CRISPR endonuclease that does not require a PAM for activation.

On the basis of the observation that Cas14a cuts outside of the crRNA/DNA targeting heteroduplex, we hypothesized that Cas14a might possess target-activated nonspecific ssDNA cleavage activity, similar to the RuvC-containing enzyme Cas12a (20, 21). To test this possibility, we incubated Cas14a-tracrRNA-crRNA with a complementary activator DNA and an aliquot of M13 bacteriophage ssDNA bearing no sequence complementarity to the Cas14a crRNA or activator (Fig. 3D). The M13 ssDNA was rapidly degraded to small fragments, an activity that was eliminated by mutation of the conserved Cas14a RuvC active site, suggesting that activation of Cas14a results in nonspecific ssDNA degradation. However, we were unable to observe Cas14a-mediated interference against the ssDNA bacteriophage ΦX174 when we expressed Cas14a heterologously in E. coli (fig. S10, A to C), possibly due to the dissimilarity between E. coli and Cas14a’s native archaeal host. To investigate the specificity of target-dependent nonspecific DNA cutting activity by Cas14a, we adapted a fluorophore-quencher (FQ) assay in which cleavage of dye-labeled ssDNA generates a fluorescent signal (Fig. 4A) (22). When Cas14a was incubated with various guide RNA–target ssDNA pairs, a fluorescent signal was observed only in the presence of the cognate target and showed strong preference for longer FQ-containing substrates (fig. S10D and Fig. 4A). We next tested Cas14a mismatch tolerance by tiling 2-nucleotide mismatches across the targeted region in various ssDNA substrates. Surprisingly, mismatches near the middle of the ssDNA target strongly inhibited Cas14a activity, revealing an internal seed sequence that is distinct from the PAM-proximal seed region observed for dsDNA-targeting CRISPR-Cas systems (Fig. 4B and fig. S11, A to D). Moreover, DNA substrates containing strong secondary structure resulted in reduced activation of Cas14a (fig. S11E). Truncation of ssDNA substrates also resulted in reduced or undetectable trans cleavage (fig. S11F). Together, these results suggest a mechanism of fidelity distinct from dsDNA-targeting class 2 CRISPR systems, possibly using a mechanism similar to the ssRNA-targeting Cas13a enzymes (2325).

Fig. 4 High-fidelity ssDNA SNP detection by CRISPR-Cas14a.

(A) FQ assay for detection of ssDNA by Cas14a1 and the cleavage kinetics for FQ substrates of various lengths. AU, arbitrary units. (B) Cleavage kinetics for Cas14a1 with mismatches tiled across the substrate (individual points represent replicate measurements). kObs, observed rate constant. (C) Diagram of Cas14-DETECTR strategy and HERC2 eye color SNP. (D) Titration of T7 exonuclease and effect on Cas14a-DETECTR. Bkgd, background. (E) SNP detection using Cas14a-DETECTR with a blue-eye targeting guide for saliva samples from blue-eyed and brown-eyed individuals compared with ssDNA detection using Cas12a.

The target-dependent, nonspecific DNase activity of Cas12a serves as a DNA detection platform (DNA endonuclease-targeted CRISPR trans reporter, or DETECTR) for diagnostic uses (20, 26). Although Cas12a exhibits low fidelity in discriminating against ssDNA substrates (20), Cas14a requires complementarity in the seed region for ssDNA substrate recognition. This improved specificity raised the possibility of using Cas14a for high-fidelity detection of DNA single-nucleotide polymorphisms (SNPs) without the constraint of a PAM sequence. To test this idea, DNA substrates were amplified using a phosphorothioate (PT)–containing primer to protect one strand from degradation by exonucleases. Upon addition of T7 exonuclease, the unmodified strand was degraded, leaving ssDNA substrates that can be detected by Cas14a (Fig. 4, C and D). As a proof of principle, we aimed to detect the human HERC2 gene, which contains a SNP responsible for eye color (27). We amplified the HERC2 gene from DNA in human saliva from both blue-eyed and brown-eyed individuals, using the PT amplification approach described above. When programmed with a guide RNA targeting the blue-eyed SNP, Cas12a failed to discriminate between the two ssDNA targets, exhibiting robust trans activity in both cases, whereas Cas14a exhibited strong activation in recognition of the blue-eyed SNP with near-background signal for the brown-eyed sample (Fig. 4E). The development of Cas14-DETECTR now allows for CRISPR-based detection of medically and ecologically important ssDNA pathogens as well as high-fidelity detection of SNPs without the constraint of a PAM sequence.

Further investigation of compact type V systems in metagenomic data revealed a large diversity of systems that, like Cas14a to Cas14c, include a gene encoding a short RuvC-containing protein adjacent to acquisition-associated cas genes and a CRISPR array. We found 20 additional such systems in various uncultivated microbes that cluster into five main families (Cas14d to Cas14h). Excluding cas14g, which is related to cas12b, the cas14-like genes form separate clades on the type V effector phylogeny (fig. S12, A and B), suggesting that these families evolved from independent domestication events of TnpB, the transposase-associated protein implicated as the evolutionary ancestor of type V CRISPR effectors (3). Phylogenetic reconstruction of their associated cas1 genes indicated that they too have different origins for the cas14 subtypes (fig. S2). Altogether, we identified 38 CRISPR-Cas14 systems belonging to eight families (Cas14a to Cas14h) and eight additional systems that could not be clustered with our analysis (termed Cas14u) (data S3).

The small size of the Cas14 proteins described here and their resemblance to type V effector proteins suggest that RNA-guided ssDNA cleavage may have existed as an ancestral class 2 CRISPR system (28, 29). In this scenario, a small, domesticated TnpB-like ssDNA interference complex may have gained additional domains over time, gradually improving dsDNA recognition and cleavage. Related to this hypothesis, smaller Cas9 orthologs exhibit weaker dsDNA-targeting activity than their larger counterparts but retain the ability to robustly cleave ssDNA (19). Aside from the evolutionary implications, the ability of Cas14 to specifically target ssDNA suggests a role in defense against ssDNA viruses or mobile genetic elements that propagate through ssDNA intermediates (30). A ssDNA-targeting CRISPR system would be particularly advantageous in certain ecosystems where ssDNA viruses constitute the vast majority of viral abundance (31). The unexpected finding that these miniature CRISPR proteins can conduct targeted DNA cleavage highlights the diversity of CRISPR systems hidden in uncultivated organisms. Ongoing exploration of these underrepresented microbial lineages will likely continue to reveal new, unexpected insights into this microscopic arms race and lead to continued development of valuable CRISPR-based technologies.

Supplementary Materials

www.sciencemag.org/content/362/6416/839/suppl/DC1

Materials and Methods

Figs. S1 to S12

References (3242)

Data S1 to S5

References and Notes

Acknowledgments: We thank N. Ma and K. Zhou for technical assistance and P. Harrington for graphic design assistance. Funding: D.B. was supported by a grant from the Paul Allen Frontiers Group. L.B.H., J.C.C., and J.S.C. were supported by U.S. National Science Foundation Graduate Research Fellowships. J.A.D. is an investigator of the Howard Hughes Medical Institute. This work was supported in part by a Frontiers Science award from the Paul Allen Institute to J.A.D. and J.F.B., a grant from the National Science Foundation (MCB-1244557 to J.A.D.), the Lawrence Berkeley National Laboratory’s Sustainable Systems Scientific Focus Area funded by the U.S. Department of Energy (DE-AC02-05CH11231 to J.F.B.), and the Office of Science of the U.S. Department of Energy under contract no. DE-AC02–05CH11231. Author contributions: D.B. and D.P.-E. conducted the computational analysis. L.B.H., J.S.C., I.P.W., E.M., and J.C.C. designed and executed biochemical investigation of Cas14. L.B.H. designed and conducted experiments investigating Cas14 activity and assembly in E. coli. L.B.H. and D.B. conceived of the study. N.C.K., J.F.B., and J.A.D. supervised the research and experimental design. J.A.D., L.B.H., and D.B. wrote and revised the manuscript. All authors read, edited, and approved the manuscript. Competing interests: UC Regents have filed patents related to this work on which D.B., J.F.B., L.B.H., D.P.-E., J.S.C., and J.A.D. are inventors. L.B.H. and J.S.C. are cofounders of Mammoth Biosciences. I.P.W. is a consultant for Mammoth Biosciences. J.F.B. is a founder of Metagenomi. J.A.D. is a cofounder of Caribou Biosciences, Editas Medicine, Intellia Therapeutics, Scribe Therapeutics, and Mammoth Biosciences. J.A.D. is a scientific advisory board member of Caribou Biosciences, Intellia Therapeutics, eFFECTOR Therapeutics, Scribe Therapeutics, Synthego, Metagenomi, Mammoth Biosciences, and Inari. J.A.D. is a member of the board of directors at Driver and Johnson & Johnson and has sponsored research projects by Roche Biopharma and Biogen. Data and materials availability: Data S1 specifies the accession numbers, coordinates, and samples of origin for all CRISPR-Cas14 systems described in this study.

Correction (14 November 2018): In the bottom graphic of Fig. 2A, one instance of "Cas14b" was incorrectly labeled as "Cas14c."

View Abstract

Navigate This Article