Review

Diverse evolutionary roots and mechanistic variations of the CRISPR-Cas systems

See allHide authors and affiliations

Science  05 Aug 2016:
Vol. 353, Issue 6299, aad5147
DOI: 10.1126/science.aad5147

Structured Abstract

BACKGROUND

Prokaryotes have evolved multiple systems to combat invaders such as viruses and plasmids. Examples of such defense systems include receptor masking, restriction-modification (R-M) systems, DNA interference (Argonaute), bacteriophage exclusion (BREX or PGL), and abortive infection, all of which act in an innate, nonspecific manner. In addition, prokaryotes have evolved adaptive, heritable immune systems: clustered regularly interspaced palindromic repeats (CRISPR) and the CRISPR-associated proteins (CRISPR-Cas). Adaptive immunity is conferred by the integration of DNA sequences from an invading element into the CRISPR array (adaptation), which is transcribed into long pre-CRISPR RNAs (pre-crRNAs) and processed into short crRNAs (expression), which guide Cas proteins to specifically degrade the cognate DNA on subsequent exposures (interference).

ADVANCES

A plethora of distinct CRISPR-Cas systems are represented in genomes of most archaea and almost half of the bacteria. The latest CRISPR-Cas classification scheme delineates two classes that are each subdivided into three types. Integration of biochemistry and molecular genetics has contributed substantially to revealing many of the unique features of the variant CRISPR-Cas types. Additionally, structural analysis and single-molecule studies have further advanced our understanding of the molecular basis of CRISPR-Cas functionality. Recent progress includes relevant steps in the adaptation stage, when fragments of foreign DNA are processed and incorporated as new spacers into the CRISPR array. In addition, three novel CRISPR-Cas types (IV, V, and VI) have been identified, and in particular, the type V interference complexes have been experimentally characterized. Moreover, the ability to easily program sequence-specific DNA targeting and cleavage by CRISPR-Cas components, as demonstrated for Cas9 and Cpf1, allows for the application of CRISPR-Cas components as highly effective tools for genetic engineering and gene regulation in a wide range of eukaryotes and prokaryotes. The pressing issue of off-target cleavage by the Cas9 nuclease is being actively addressed using structure-guided engineering.

OUTLOOK

Although our understanding of the CRISPR-Cas system has increased tremendously over the past few years, much remains to be revealed. The continuing discovery of CRISPR-Cas variants will provide direct tests of the recently proposed modular scenario for the evolution of CRISPR-Cas systems. The recent discovery and characterization of new CRISPR-Cas types with previously unknown features implies that our current knowledge has relatively limited power for predicting the functional details of distantly related CRISPR-Cas variants. Hence, newly discovered CRISPR-Cas systems need to be dissected thoroughly to gain insight into their biological roles, to unravel their molecular mechanisms, and to harness their potential for biotechnology. Key outstanding questions regarding CRISPR-Cas biology include the ecological roles of microbial adaptive immunity, the high rates of CRISPR-Cas horizontal transfer, and the coevolution of CRISPR-Cas and phage-encoded anti-CRISPR proteins. Relatively little is known about the regulation of CRISPR-Cas expression, and about the roles of CRISPR-Cas in processes other than defense. With respect to the CRISPR-Cas mechanism, details illuminating the connection between the adaptation stage and the interference stage in primed spacer acquisition remain elusive. A key aspect of CRISPR-Cas that is poorly understood at present is self/nonself discrimination. The discrimination mechanisms seem to differ substantially among CRISPR variants. Recent comparison of class 2 type effector complexes (Cas9/Cpf1) has revealed overall architectural similarities as well as structural and mechanistic differences, as had previously been found for the distinct types of class 1 effector complexes (Cascade/Cmr). These variations may translate into complementary biotechnological applications. As well as innovative tools for basic research, CRISPR-associated effector complexes will be instrumental for developing the next generation of antiviral prophylactics and therapeutics. For applications in human gene therapy, improved methods for efficient and safe delivery of Cas9/Cpf1 and their guide RNAs to cells and tissues are still needed. Further insight into the basic details of CRISPR-Cas structure, functions, and biology—and characterization of new Cas effector proteins in particular—is crucial for optimizing and further expanding the diverse applications of CRISPR-Cas systems.

Evolution of CRISPR-Cas systems resulted in incredible structural and functional diversity.

Class 1 CRISPR-Cas systems are considered to be the evolutionary ancestral systems. The class 2 systems have evolved from class 1 systems via the insertion of transposable elements encoding various nucleases, and are now being used as tools for genome editing.

Abstract

Adaptive immunity had been long thought of as an exclusive feature of animals. However, the discovery of the CRISPR-Cas defense system, present in almost half of prokaryotic genomes, proves otherwise. Because of the everlasting parasite-host arms race, CRISPR-Cas has rapidly evolved through horizontal transfer of complete loci or individual modules, resulting in extreme structural and functional diversity. CRISPR-Cas systems are divided into two distinct classes that each consist of three types and multiple subtypes. We discuss recent advances in CRISPR-Cas research that reveal elaborate molecular mechanisms and provide for a plausible scenario of CRISPR-Cas evolution. We also briefly describe the latest developments of a wide range of CRISPR-based applications.

Bacteria and archaea suffer constant predation by viruses, which are extremely abundant in almost all environments (1). Accordingly, bacteria and archaea have evolved a wide range of antivirus defense mechanisms (2). Because viruses generally have high rates of mutation and recombination, they have the potential to rapidly escape these prokaryotic defense systems. Thus, the hosts’ defenses must also adjust and evolve rapidly, leading to an ongoing virus-host arms race. Protective systems provide innate immunity at all stages of the parasite’s infection cycle via receptor masking, restriction-modification (R-M) systems, DNA interference [prokaryotic Argonaute proteins protect the host against mobile genetic elements (MGEs) through DNA-guided DNA interference], bacteriophage exclusion (BREX systems allow phage adsorption but block phage DNA replication; PGL systems have been hypothesized to modify the phage progeny DNA to inhibit their growth upon reinfection), and abortive infection (28).

The innate immunity strategies are complemented by an adaptive immune function of the systems of prokaryotic clustered regularly interspaced short palindromic repeats (CRISPR) and the associated Cas proteins (9, 10). Diverse variants of the CRISPR-Cas systems are present in the examined genomes of most archaea and almost half of the bacteria (2). Here, we discuss insights into the evolution and functionality of class 1 and class 2 CRISPR-Cas systems. This progress has enabled the development of sophisticated tools for genetic engineering in molecular biology, biotechnology, and molecular medicine.

CRISPR-Cas defense

The CRISPR-Cas systems provide protection against MGEs— in particular, viruses and plasmids— by sequence-specific targeting of foreign DNA or RNA (9, 1115). A CRISPR-cas locus generally consists of an operon of CRISPR-associated (cas) genes and a CRISPR array composed of a series of direct repeats interspaced by variable DNA sequences (known as spacers) (Fig. 1A). The repeat sequences and lengths as well as the number of repeats in CRISPR arrays vary broadly, but all arrays possess the characteristic arrangement of alternating repeat and spacer sequences. The spacers are key elements of adaptive immunity, as they store the “memory” of an organism’s encounters with specific MGEs acquired as a result of a previous unsuccessful infection (1619). This memory enables the recognition and neutralization of the invaders upon subsequent infections (9).

Fig. 1 Overview of the CRISPR-Cas systems.

(A) Architecture of class 1 (multiprotein effector complexes) and class 2 (single-protein effector complexes) CRISPR-Cas systems. (B) CRISPR-Cas adaptive immunity is mediated by CRISPR RNAs (crRNAs) and Cas proteins, which form multicomponent CRISPR ribonucleoprotein (crRNP) complexes. The first stage is adaptation, which occurs upon entry of an invading mobile genetic element (in this case, a viral genome). Cas1 (blue) and Cas2 (yellow) proteins select and process the invading DNA, and thereafter, a protospacer (orange) is integrated as a new spacer at the leader end of the CRISPR array [repeat sequences (gray) that separate similar-sized, invader-derived spacers (multiple colors)]. During the second stage, expression, the CRISPR locus is transcribed and the pre-crRNA is processed into mature crRNA guides by Cas (e.g., Cas6) or non-Cas proteins (e.g., RNase III). During the final interference stage, the Cas-crRNA complex scans invading DNA for a complementary nucleic acid target, after which the target is degraded by a Cas nuclease.

CRISPR-mediated adaptive immunity involves three steps: adaptation, expression, and interference (14, 2023) (Fig. 1B). During the adaptation step, fragments of foreign DNA (known as protospacers) from invading elements are processed and incorporated as new spacers into the CRISPR array. The expression step involves the transcription of the CRISPR array, which is followed by processing of the precursor transcript into mature CRISPR RNAs (crRNAs). The crRNAs are assembled with one or more Cas proteins into CRISPR ribonucleoprotein (crRNP) complexes. The interference step involves crRNA-directed cleavage of invading cognate virus or plasmid nucleic acids by Cas nucleases within the crRNP complex (14, 20, 24). The multifaceted and modular architecture of the CRISPR-Cas systems also allows it to play nondefense roles, such as biofilm formation, cell differentiation, and pathogenicity (2527).

CRISPR-Cas diversity, classification, and evolution

The rapid evolution of highly diverse CRISPR-Cas systems is thought to be driven by the continuous arms race with the invading MGEs (28, 29). The latest classification scheme for CRISPR-Cas systems, which takes into account the repertoire of cas genes and the sequence similarity between Cas proteins and the locus architecture, includes two classes that are currently subdivided into six types and 19 subtypes (30, 31). The key feature of the organization and evolution of the CRISPR-Cas loci is their pronounced modularity. The module responsible for the adaptation step is largely uniform among the diverse CRISPR-Cas systems and consists of the cas1 and cas2 genes, both of which are essential for the acquisition of spacers. In many CRISPR-Cas variants, the adaptation module also includes the cas4 gene. By contrast, the CRISPR-Cas effector module, which is involved in the maturation of the crRNAs as well as in target recognition and cleavage, shows a far greater versatility (Fig. 2A) (30).

Fig. 2 CRISPR diversity and evolution.

(A) Modular organization of the CRISPR-Cas systems. LS, large subunit; SS, small subunit. A putative small subunit that might be fused to the large subunit in several type I subtypes is indicated by an asterisk. Cas3 is shown as fusion of two distinct genes encoding the helicase Cas3′ and the nuclease HD Cas3′′; in some type I systems, these domains are encoded by separate genes. Functionally dispensable components are indicated by dashed outlines. Cas6 is shown with a thin solid outline for type I because it is dispensable in some systems, and by a dashed line for type III because most systems lack this gene and use the Cas6 provided in trans by other CRISPR-Cas loci. The two colors for Cas4 and C2c2 and three colors for Cas9 and Cpf1 reflect the contributions of these proteins to different stages of the CRISPR-Cas response (see text). The question marks indicate currently unknown components. [Modified with permission from (30)] (B) Evolutionary scenario for the CRISPR-Cas systems. TR, terminal repeats; TS, terminal sequences; HD, HD-family endonuclease; HNH, HNH-family endonuclease; RuvC, RuvC-family endonuclease; HEPN, putative endoribonuclease of HEPN superfamily. Genes and portions of genes shown in gray denote sequences that are thought to have been encoded in the respective mobile elements but were eliminated in the course of evolution of CRISPR-Cas systems. [Modified with permission from (31)]

The two classes of CRISPR-Cas systems differ fundamentally with respect to the organization of the effector module (30). Class 1 systems (including types I, III, and IV) are present in bacteria and archaea, and encompass effector complexes composed of four to seven Cas protein subunits in an uneven stoichiometry [e.g., the CRISPR-associated complex for antiviral defense (Cascade) of type I systems, and the Csm/Cmr complexes of type III systems]. Most of the subunits of the class 1 effector complexes— in particular, Cas5, Cas6, and Cas7—contain variants of the RNA-binding RRM (RNA recognition motif) domain (32, 33). Although the sequence similarity between the individual subunits of type I and type III effector complexes is generally low, the complexes share strikingly similar overall architectures that suggest a common origin (31, 32, 34, 35). The ancestral CRISPR-Cas effector complex most likely resembled the extant type III complexes, as indicated by the presence of the archetypal type III protein, the large Cas10 subunit, which appears to be an active enzyme of the DNA polymerase–nucleotide cyclase superfamily, unlike its inactive type I counterpart (Cas8) (3133).

In the less common class 2 CRISPR-Cas systems (types II, V, and VI), which are almost completely restricted to bacteria, the effector complex is represented by a single multidomain protein (30). The best-characterized class 2 effector is Cas9 (type II), the RNA-dependent endonuclease that contains two unrelated nuclease domains, HNH and RuvC, that are responsible for the cleavage of the target and the displaced strand, respectively, in the crRNA–target DNA complex (36). The type II loci also encode a trans-acting CRISPR RNA (tracrRNA) that evolved from the corresponding CRISPR repeat and is essential for pre-crRNA processing and target recognition in type II systems (37, 38). The prototype type V effector Cpf1 (subtype V-A) contains only one nuclease domain (RuvC-like) that is identifiable by sequence analysis (39). However, analysis of the recently solved structure of Cpf1 complexed with the crRNA and target DNA has revealed a second nuclease domain, the fold of which is unrelated to HNH or any other known nucleases. In analogy to the HNH domain in Cas9, the novel nuclease domain in Cpf1 is inserted into the RuvC domain, and it is responsible for cleavage of the target strand (40).

Screening of microbial genomes and metagenomes for undiscovered class 2 systems (31) has resulted in the identification of three novel CRISPR-Cas variants. These include subtypes V-B and V-C, which resemble Cpf1 in that their predicted effector proteins contain a single, RuvC-like nuclease domain. Cleavage of target DNA by the type V-B effector, denoted C2c1, has been experimentally demonstrated (31). Type VI is unique in that its effector protein contains two conserved HEPN domains that possess ribonuclease (RNase) activity (Fig. 2A).

Recent comparative genomic analyses of variant CRISPR-Cas systems (Fig. 2B) (31) have revealed a strong modular evolution with multiple combinations of adaptation modules and effector modules, as well as a pivotal contribution of mobile genetic elements to the origin and diversification of the CRISPR-Cas systems. The ancestral prokaryotic adaptive immune system could have emerged via the insertion of a casposon (a recently discovered distinct class of self-synthesizing transposons that appear to encode a Cas1 homolog) next to an innate immunity locus (probably consisting of genes encoding a Cas10 nuclease and possibly one or more RNA binding proteins). Apart from providing the Cas1 nuclease/integrase that is required for recombination during spacer acquisition (4143), the casposon may also have contributed the prototype CRISPR repeat unit that could have evolved from one of the inverted terminal repeats of the casposon (44). An additional toxin-antitoxin module that inserted either in the ancestral casposon or in the evolving adaptive immunity locus probably provided the cas2 gene, thus completing the adaptation module. The Cas10 nuclease and one or more additional proteins with an RRM fold (the ultimate origin of which could be a polymerase or cyclase that gave rise to Cas10) of the hybrid locus could have subsequently evolved to become the ancestral CRISPR-Cas effector module (3133, 44).

The widespread occurrence of class 1 systems in archaea and bacteria, together with the proliferation of the ancient RRM domain in class 1 effector proteins, strongly suggests that the ancestral CRISPR-Cas belonged to class 1. Most likely, the multiple class 2 variants then evolved via several independent replacements of the class 1 effector locus with nuclease genes that were derived from distinct MGEs (Fig. 2B). In particular, type V effector variants (Cpf1) seem to have evolved from different families of the TnpB transposase genes that are widespread in transposons (31), whereas the type II effector (Cas9) may have evolved from IscB, a protein with two nuclease domains that belongs to a recently identified distinct transposon family (45). Notably, class 2 CRISPR-Cas systems, in their entirety, appear to have been derived from different MGEs: Cas1 from a casposon, Cas2 from a toxin-antitoxin module, and the different effector proteins (such as Cas9 and Cpf1) from respective transposable elements (31).

CRISPR adaptation

The spacers of a CRISPR array represent a chronological archive of previous invader encounters. The captured spacer sequences are integrated into the CRISPR loci after exposure to MGEs, at the leader end of the array that contains the start site of CRISPR transcription (9, 14, 46). Analysis of invader target sequences (also called protospacers) has revealed a short motif directly adjacent to the target sequence, called the protospacer adjacent motif (PAM) (47). This PAM motif allows self/nonself discrimination by the host in two ways: (i) because its presence in alien targets is required for nonself interference, and (ii) because its absence in the host’s CRISPR array avoids self-targeting (48). In class 1–type I and class 2–type II systems, the PAM is not only involved in interference, but also plays a role in spacer selection during the adaptation stage, implying the acquisition of functional spacers only (49, 50). The PAM is a short [2 to 7 nucleotides (nt)], partially redundant sequence that in itself cannot preclude incorporation of spacers from the host DNA because of the low information content of the motif. The short PAM appears to be the result of an evolutionary trade-off between efficient incorporation of spacers from nonself DNA and preventing an autoimmune reaction.

Although host chromosomal fragments can be incorporated as new CRISPR spacers, detection of such events obviously implies that this did not result in a lethal phenotype, either due to a modified PAM and/or to an inactivated CRISPR-Cas effector module (51). Indeed, in the absence of the effector module, elevated frequencies of self-spacer acquisition occur in Escherichia coli (52). Similarly, Streptococcus thermophilus with a catalytically inactive Cas9 results in a major increase of spacers derived from the host genome (53). In addition, there is a strong preference for the integration of plasmid over chromosomal spacer sequences (52, 54, 55), with plasmid sequences incorporated more frequently than host DNA by two to three orders of magnitude (56). Spacer acquisition in E. coli requires active replication of the protospacer-containing DNA (56). Thus, small, fast-replicating plasmid genomes are a much better source of spacers than the large host DNA, and such findings are consistent with acquisition of spacers from an infecting virus genome in the archaeon Sulfolobus islandicus requiring its active replication (57). In E. coli, the CRISPR-Cas system derives the spacers primarily from products of RecBCD-catalyzed DNA degradation that are formed during the repair of double-stranded breaks associated with stalled replication forks (58). Other possible sources of substrates for CRISPR adaptation include DNA fragments generated either by other defense systems, such as restriction-modification systems (59), or by the CRISPR-Cas system itself (49).

Cas1 and Cas2 play crucial roles in spacer acquisition in all CRISPR-Cas systems (50, 52). In addition, these proteins can function in trans, provided that the repeats involved are sufficiently similar in size and structure. Accordingly, cas1 and cas2 genes are missing in many active CRISPR-Cas loci— in particular, of type III as well as types IV and VI (30). Overexpression of Cas1 and Cas2 from the E. coli type I-E system has been shown to be sufficient for the extension of the CRISPR array (52). Mutations in the active site of Cas1 abolish spacer integration in E. coli (52), whereas the nuclease activity of Cas2 is dispensable (55). In E. coli, a central Cas2 dimer and two flanking Cas1 dimers form a complex that binds and processes PAM-containing DNA fragments (Fig. 3A) (55, 60), after which the newly generated spacers can be integrated into a CRISPR array via a recombination mechanism akin to that of retroviral integrases and transposases (61) (Fig. 3B).

Fig. 3 Spacer acquisition.

(A) Crystal structure of the complex of Cas1-Cas2 bound to the dual-forked DNA (PDB accession 5DQZ). The target DNA is shown in dark blue; the Cas1 and Cas2 dimers of the complex are indicated in blue and yellow, respectively. (B) Model explaining the capture of new DNA sequences from invading nucleic acid and the subsequent DNA integration into the host CRISPR array. The numbers on the left correspond to the order of events as described in the text. The dashed lines indicate nucleotides; the nucleotides C and N on the two sides of the protospacer are shown in red and green to clarify the orientation.

In several type III CRISPR-Cas systems, Cas1 is fused to reverse transcriptase (20), and it was recently shown that these systems are capable of acquisition of RNA spacers by direct incorporation of an RNA segment into the CRISPR array followed by reverse transcription and replacement of the RNA strand by DNA (62). Although the biological function of this process remains to be elucidated, these findings demonstrate remarkable versatility of adaptation pathways.

Spacer acquisition (adaptation) in type I systems proceeds along two distinct paths: (i) naïve acquisition, which occurs during an initial infection, and (ii) primed acquisition, when the CRISPR contains a previously integrated spacer that is complementary to the invading DNA (63). According to the proposed model, naïve spacer adaptation involves five steps (Fig. 3B):

1) Fragmentation of (mainly) invasive nucleic acids by non-Cas systems [e.g., by RecBCD after stalling a replication fork, or by restriction enzymes (restriction-modification systems) (56, 59)] or by CRISPR-associated nucleases (49). Although this step may be non-essential, it probably enhances the efficiency of the overall process and its specificity toward invading DNA.

2) Selection of DNA fragments for (proto)spacers by scanning for potential PAMs (after partial target unwinding) by one of the four Cas1 subunits of the Cas1-Cas2 complex (64).

3) Measuring of the selected protospacer generating fragments of the correct size with 3′ hydroxyl groups by Cas1 nuclease.

4) Nicking of both strands of the leader-proximal repeat of the CRISPR array at the 5′ ends through a direct nucleophilic attack by the generated 3′ OH groups, resulting in covalent links of each of the strands of the newly selected spacer to the single-stranded repeat ends.

5) Second-strand synthesis and ligation of the repeat flanks by a non-Cas repair system (46, 61).

Primed spacer adaptation so far has been demonstrated only in type I systems (50, 65, 66). This priming mechanism constitutes a positive feedback loop that facilitates the acquisition of new spacers from formerly encountered genetic elements (67). Priming can occur even with spacers that contain several mismatches, making them incompetent as guides for targeting the cognate foreign DNA (67). Based on PAM selection, functional spacers are preferentially acquired during naïve adaptation. This initial acquisition event triggers a rapid priming response after subsequent infections. Priming appears to be a major pathway of CRISPR adaptation, at least for some type I systems (65). Primed adaptation strongly depends on the spacer sequence (68), and the acquisition efficiency is highest in close proximity to the priming site. In addition, the orientation of newly inserted spacers indicates a strand bias, which is consistent with the involvement of single-stranded adaption intermediates (69). According to one proposed model (70), replication forks in the invader’s DNA are blocked by the Cascade complex bound to the priming crRNA, enabling the RecG helicase and the Cas3 helicase/nuclease proteins to attack the DNA. The ends at the collapsed forks then could be targeted by RecBCD, which provides DNA fragments for new spacer generation (70). Given that the use of crRNA for priming has much less strict sequence requirements than direct targeting of the invading DNA, priming is a powerful strategy that might have evolved in the course of the host-parasite arms race to reduce the escape by viral mutants, to provide robust resistance against invading DNA, and to enhance self/nonself discrimination. Naïve as well as primed adaptation in the subtype I-F system of Pseudomonas aeruginosa CRISPR-Cas require both the adaptation and the effector module (69).

In the type II-A system, the Cas9-tracrRNA complex and Csn2 are involved in spacer acquisition along with the Cas1-Cas2 complex (53, 71); the involvement of Cas9 in adaptation is likely to be a general feature of type II systems. Although the key residues of Cas9 involved in PAM recognition are dispensable for spacer acquisition, they are essential for the incorporation of new spacers with the correct PAM sequence (71). The involvement of Cas9 in PAM recognition and protospacer selection (71) suggests that in type II systems Cas1 may have lost this role. Similarly, Cas4 that is present in subtypes IA-D and II-B has been proposed to be involved in the CRISPR adaptation process, and this prediction has been validated experimentally for type I-B (65). Cas4 is absent in the subtype II-C system of Campylobacter jejuni. Nonetheless, a conserved Cas4-like protein found in Campylobacter bacteriophages can activate spacer acquisition to use host DNA as an effective decoy to bacteriophage DNA. Bacteria that acquire self-spacers and escape phage infection must either overcome CRISPR-mediated autoimmunity by loss of the interference functions, leaving them susceptible to foreign DNA invasions, or tolerate changes in gene regulation (72). Furthermore, in subtypes I-U and V-B, Cas4 is fused to Cas1, which implies cooperation between these proteins during adaptation. In type I-F systems, Cas2 is fused to Cas3 (19), which suggests a dual role for Cas3 (20): involvement in adaptation as well as in interference. These findings support the coupling between the adaptation and interference stages of CRISPR-Cas defense during priming.

Biogenesis of crRNAs

The short mature crRNAs contain spacer sequences, which are the guides that are responsible for the specificity of CRISPR-Cas immunity (12). They associate with one or more Cas proteins to form effector complexes that target invading MGEs through crRNA:target sequence–specific recognition. The CRISPR arrays are transcribed as long precursors, known as pre-crRNA, that may contain secondary structured elements (hairpins) in those cases where the CRISPR contains palindromic repeats. The processing of the pre-crRNA typically yields 30- to 65-nt mature crRNAs that consist of a single spacer flanked by a partial repeat at either one or both ends (12, 73).

The pathways of crRNA biogenesis differ among the different CRISPR-Cas types. In class 1 systems, the Cas6 protein is critical for the primary processing of pre-crRNA. Cas6 is a metal-independent endoribonuclease that recognizes and cleaves a single phosphodiester bond in the repeat sequences of a pre-crRNA transcript (12, 74, 75). Members of the Cas6 family contain two RRM-type RNA-binding domains. The primary cleavage by Cas6 results in crRNAs containing a repeat-derived 5′ “handle” of 8 nt with a 5′ hydroxyl group, followed by the complete spacer sequence and a repeat-derived 3′ handle of variable size that in some subtypes forms a hairpin structure with either a 3′-phosphate or a cyclic 2′3′-phosphate (12, 74, 76). The Cas6 family proteins show considerable structural variation that might reflect the cleavage specificity (73, 77, 78).

In type I-E and I-F systems, the Cas6 ribonuclease is a single-turnover enzyme that remains attached to the crRNA cleavage product. In these cases, Cas6 is a subunit of a multisubunit Cascade complex (12, 79) (Fig. 4A). In the type I-F systems, the crRNP complex consists of the crRNA, Cas6f, and Csy1, Csy2, and Csy3 proteins (8082). In other systems (subtypes I-A, I-B, I-D, and III-A to III-D), Cas6 is not associated with the crRNA-processing complex. The absence of a Cas6 subunit in the complex correlates with the lack of a hairpin structure of the 3′ handle and a variable 3′ end. The absence of a cas6 gene in type I-C is complemented by another double RRM-fold subunit, Cas5d, which has adopted the role of the endoribonuclease that in other subtypes is carried out by Cas6 (83). Some systems coexisting in the same species have been demonstrated to share the same set of guides; examples include type III-A (Csm) and type III-B (Cmr) of Thermus thermophilus (84) and type III-B (Cmr), type I-A (Csa), and type I-G (Cst) of Pyrococcus furiosus (85). Given that the type III loci usually lack cas6 genes, a single stand-alone Cas6 nuclease is likely to be responsible for the supply of crRNAs to the type III complexes in T. thermophilus (84). In P. furiosus, Cas6 nuclease of type I generates the crRNAs from all CRISPR loci for the different coexisting complexes (85). Cas6-based processing of pre-crRNA in type III systems is typically followed by a sequence-unspecific trimming at the 3′ end (by RNases yet to be identified) to yield mature crRNAs with a defined 8-nt 5′ end and a variable 3′ end (34, 86, 87).

Fig. 4 Guide expression and processing.

(A) Generation of CRISPR RNA (crRNA) guides in type I and type III CRISPR-Cas systems. Primary processing of the pre-crRNA is catalyzed by Cas6, which typically results in a crRNA with a 5′ handle of 8 nt, a central spacer sequence, and (in some subtypes) a longer 3′ handle. Shown here is the guide processing (red triangles) for subtype I-E by Cas6e. The occasional secondary processing of the 3′ end of crRNA is catalyzed by one or more unknown RNases. (B) In type II-A CRISPR-Cas systems, the repeat sequences of the pre-crRNA hybridize with complementary sequences of transactivating CRISPR RNA (tracrRNA). The double-stranded RNA is cleaved by RNase III (red triangles); further trimming of the 5′ end of the spacer is carried out by unknown RNase(s) (pink). (C) CRISPR with transcriptional start site (TSS) in repeats, as observed in type II-B CRISPR-Cas systems.

Type II systems use a unique mechanism for crRNA biogenesis whereby processing depends on Cas9, a host RNase III, and a tracrRNA that forms base pairs with the repeats of the pre-crRNA (36, 37, 73) (Fig. 4B). The cleaved crRNA-tracrRNA hybrid is bound and stabilized by Cas9, triggering a conformational change toward a state compatible with target scanning, recognition, and interference (36, 37, 88). Trimming of the 5′ end of the crRNA probably occurs by a non-Cas RNase. The absence of type II systems in archaea is consistent with the absence of RNase III genes in most archaeal genomes (89). In the type II-C system of Neisseria meningitidis, short intermediate crRNA guides are transcribed from multiple promoters embedded within the repeats of the CRISPR array, implying that the system does not require RNase III (90) (Fig. 4C). Expression of tracrRNA has also been demonstrated for the subtype V-B system, suggestive of a crRNA processing pathway analogous to that in type II. By contrast, in subtype V-A and type VI systems, no tracrRNA is coexpressed with the pre-crRNA (31, 39). Class 2 CRISPR-Cas systems lacking tracrRNA can be expected to function using novel mechanisms of crRNA biogenesis, including processing by other host RNases or by the effector proteins themselves.

A third variant of guide maturation has recently been described for the Cpf1 effector complex, a class 2 system that (unlike Cas9) does not associate with a tracrRNA. It has been demonstrated that Cpf1 has an intrinsic RNase activity that allows for the primary processing of the pre-crRNA to crRNA guides with a 5′ hairpin (91). The biosynthesis of crRNAs by Cpf1 system is metal-, sequence-, and structure-dependent (91). Secondary processing of CRISPR guides probably occurs via a non-Cas RNase; maturation of Cas9-associated guides occurs by trimming at the 5′ end (Fig. 4B), whereas in Cpf1 the 3′ flanks of the crRNA are removed.

Target interference

Selection of CRISPR-Cas targets is a stepwise process that relies on recognition of a nonself sequence, a complementary spacer of which is stored in the CRISPR locus. In most cases, with the exception of the RNA-targeting type III systems, cognate protospacer sequences flanked by a PAM sequence are recognized by a CRISPR ribonucleoprotein (crRNP) complex [type I Cascade, type II Cas9, type V Cpf1 (Fig. 5)] and specifically degraded (12, 14, 39). In addition, selection of an appropriate target sequence depends on a so-called seed sequence on the guide (79, 92). The seed is a sequence of seven or eight base pairs in close proximity to the PAM. Matching PAM and seed sequences are crucial for target interference (79, 92, 93) and act as a quality control step that is required for the complete displacement of the noncomplementary strand of the target DNA by the crRNA guide, the so-called R-loop conformation. Downstream of the seed region, mismatches between spacer and protospacer are tolerated to some extent (see below) (92).

Fig. 5 CRISPR RNP complexes.

Crystal structures of the CRISPR ribonucleoprotein (crRNP) complexes responsible for target interference. Shown are the type I-E Cascade complex (PDB accession 4QYZ) and type III-B Cmr complex (PDB accession 3X1L) from class 1, and the type II-A Cas9 complex (PDB accession 4OO8) and type V-A Cpf1 complex (PDB accession 5B43) from class 2. Colors of nucleic acid fragments are the same as in Fig. 6.

In type I systems, the Cascade RNP complex scans DNA for complementary target sites, initially by identifying an appropriate PAM motif, followed by partial melting and base pairing by the guide’s seed sequence, and eventually by formation of a complete R-loop structure (76, 94). Upon reaching a PAM-proximal mismatch, the R-loop propagation stalls and the interference is aborted (95). When base pairing between guide and protospacer is complete, the R-loop structure appears to be locked in a state to license DNA degradation by the Cas3 nuclease/helicase (12, 19, 95).

Single-molecule experiments with E. coli Cascade demonstrate that crRNA-guided Cascade exhibits two distinct binding modes for matching and mismatched targets, which trigger either interference (matching target) or primed spacer acquisition (mismatched target). Unlike the interference of matching targets, mismatched targets are recognized with low fidelity, as indicated by a short-lived binding. The latter association is PAM- and seed-independent and can involve base pairing by any part of the crRNA spacer. In this case, the Cascade complex does not adopt a conformation that allows docking of Cas3 (96), precluding DNA interference. Instead, this Cascade-target complex primes the formation of a spacer acquisition complex that consists of Cas3 and Cas1-Cas2, and generates DNA fragments that are integrated as new spacers in the CRISPR array (94). These dual roles of Cascade allow for efficient degradation of bona fide targets and priming the acquisition of new spacers from mismatched targets (e.g., from viral escape mutants) as an update of the CRISPR memory (96).

Although type III systems are structurally related to the type I system (Fig. 5) (34, 35, 60, 97101), they show some substantial mechanistic variations. Initial analyses indicated that Csm (III-A) complexes target DNA (13), whereas Cmr (III-B) complexes target RNA (11, 102, 103). However, it has recently been demonstrated that both type III complexes are transcription-dependent DNA nucleases (84, 104); that is, they initially recognize their target through specific interaction of the crRNA guide with a complementary nascent mRNA, after which cleavage of the flanking DNA sequences occurs (105110). Robust interference by these systems relies on the concerted cleavage of the transcript RNA and the transcribed DNA. The Cas7-like backbone subunits (Csm3, Cmr4) are responsible for the RNase activity, typically resulting in cleavage of the target RNA at 6-nt intervals (84, 99, 103, 104, 111113). Binding of the Cmr complex to its complementary RNA target induces a conformational change (35, 99) that results in activation of the Cas10 DNA-cleaving subunit (Csm1/Cmr2) (106, 107, 109). Disruption of the RNase active sites (in Csm3/Cmr4), at least in some cases, does not hamper the activation of the DNA nuclease activity of the complexes (104, 106). Exonucleolytic cleavage of single-stranded DNA and RNA by recombinant Staphylococcus epidermidis Csm1 (Cas10) and by Thermotoga maritima and P. furiosus Cmr2 has been demonstrated in vitro (106, 107, 114). In the S. epidermidis system, a Csx1 ortholog (Csm6) provides an auxiliary RNA-targeting activity that operates in conjunction with the RNA- and DNA-targeting endonuclease activities of the Csm effector complex (115117); in the P. furiosus Cmr system, Csx1 appears not to be an essential component (104). The relative contribution of the different nuclease subunits appears to vary in the different type III systems and under different conditions, and awaits further characterization.

Another unique feature of type III systems concerns the mechanism of self/nonself discrimination. Genetic analyses have revealed that type III systems do not use the PAM-based “nonself-activation” mechanism of type I (Cascade), type II (Cas9), and type V (Cpf1). The mechanism used by the S. epidermidis Csm system apparently involves crRNA- or protein-based recognition of the repeats in the CRISPR locus, resulting in “self-inactivation” (118, 119). However, the DNA cleavage activity of the P. furiosus Cmr complex was recently reported to require the presence of a short sequence adjacent to the target sequence within the activating target RNA (i.e., an RNA PAM) (107). Additional analysis is required to reveal whether the reported motifs are typical features that distinguish the two subtypes.

Class 2 systems require only a single protein for interference. In type II, the crRNP complex involved in target recognition and degradation consists of Cas9 bound to the crRNA guide base-paired with the tracrRNA (37). The crystal structures of Cas9 reveal two distinct lobes that are involved in target recognition and nuclease activity (Fig. 5). The positively charged groove at the interface of the two lobes accommodates the crRNA-DNA heteroduplex (120, 121). A major step in Cas9 activation is the reorientation of the structural lobes upon crRNA/tracrRNA loading, which results in the formation of a central channel that accommodates the target DNA (120). Binding and cleavage of the target DNA by the Cas9-crRNA effector complex depend on the recognition of an appropriate PAM located at the 3′ end of the protospacer (93), which serves as a licensing element in subsequent DNA strand displacement and R-loop formation. The PAM motif resides in a base-paired DNA duplex. Sequence-specific PAM readout by Arg1333 and Arg1335 in Cas9 positions the DNA duplex such that the +1 phosphate group of the target strand interacts with the phosphate lock loop (122). This promotes local duplex melting, allowing the Cas9-RNA complex to probe the identity of the nucleotides immediately upstream of the PAM. Base pairing between a 12-nt seed sequence of the guide RNA and the target DNA strand (93) drives further stepwise destabilization of the target DNA duplex and directional formation of the guide RNA–target DNA heteroduplex (122). This R-loop triggers a conformational change of the two nuclease domains (HNH and RuvC) of Cas9, which adopt an active state that allows for the completion of interference by cleavage of both target strands (121, 123). Cas9 generates a blunt-end double-strand break, typically located 3 nt from the 3′ end of the protospacer (14, 124). Recently, however, PAM-independent single-stranded targeting by N. meningitidis Cas9 has been described (125).

Similar to type II, the effector modules of type V systems consist of a large multidomain protein complex (Cpf1 and C2c1 in subtypes V-A and V-B, respectively). Like Cas9, these proteins encompass a RuvC-like nuclease domain and an arginine-rich bridging helix. However, in contrast to Cas9, the RuvC-like domain of Cpf1 and C2c1 is more compact and the HNH domain is missing (Fig. 6). Subtype V-B systems resemble type II with respect to the requirement for a tracrRNA, both for processing and for interference. In contrast, Cpf1-crRNA (type V-A) complexes are single RNA-guided endonucleases that cleave target DNA molecules in the absence of a tracrRNA (39). A model is proposed for a stepwise cleavage of the target DNA by Cpf1 (i.e., initial RuvC-dependent cleavage of the displaced strand, followed by cleavage of the target strand by the novel nuclease domain) (40). The observation that inactivation of the RuvC domain prevents cleavage of both strands of the target DNA (39, 91) suggests that the novel nuclease is allosterically activated by the RuvC cleavage event. Although allosteric control has also been demonstrated in interference by Cas9 (123), details appear to differ (40). Both Cpf1 and C2c1 from different bacteria efficiently cleave target DNA containing a well-defined T-rich PAM at the 5′ end of the protospacer (5′-PAM) (31, 39), in contrast to the more variable, G-rich 3′-PAM sequence of Cas9 (126). Structural analysis has shown that Cpf1 recognizes its PAM through a combination of base and shape readout, in which several PAM-interacting amino acid residues that are conserved in the Cpf1 family are involved (40). Another unique feature of the Cpf1 endonuclease is the generation of staggered double-stranded DNA breaks with 4- or 5-nt 5′ overhangs (39); in the Cpf1 structure, the unique nuclease domain is positioned so as to cleave the target strand outside the heteroduplex, as opposed to the HNH domain of Cas9, in which the active site contacts the target within the heteroduplex (40) (Fig. 6).

Fig. 6 Target interference.

Genomic loci architecture of the components of class 1 and class 2 CRISPR-Cas systems and schematic representation of target interference for the different subtypes. The double-stranded DNA (target) is shown in black, the target RNA in gray, the CRISPR RNA (crRNA) repeat in blue, the spacer region of the crRNA in green, and the transactivating CRISPR RNA (tracrRNA) in red.

The type VI systems contain a unique effector protein (C2c2) with two HEPN domains. The Leptotrichia shahii C2c2 protein provides efficient interference against the RNA phage MS2. C2c2 is guided by a single crRNA and can be programmed to cleave ssRNA targets carrying complementary protospacers (127) (Fig. 6). Spacers with a G immediately flanking the 3′ end of the protospacer are less fit relative to all other nucleotides at this position, which suggests that the 3′ protospacer flanking site (PFS) affects the efficacy of C2c2-mediated targeting (128) (Fig. 6). Remarkably, once primed with the cognate target RNA, the C2c2 protein turns into a sequence-nonspecific RNase that causes a toxic effect in bacteria (127). Thus, the defense strategy of type VI systems appears to couple adaptive immunity with programmed cell death or dormancy induction.

Phages are constantly evolving multiple tactics to avoid, circumvent, or subvert prokaryotic defense mechanisms (8). Phages can evade CRISPR interference through single-nucleotide substitution in the protospacer region or in the conserved protospacer-adjacent motif (47). Additionally, P. aeruginosa phages encode several proteins affecting the activity of type I-E and I-F systems (128). Diverse sequences of these proteins and mechanisms of action, coupled with the strong selection imposed by different antiviral systems, suggest an abundance of anti-CRISPR proteins yet to be discovered. Strikingly, some bacteriophages themselves encode a CRISPR-Cas system that in this case functions as an antidefense device targeting an antiphage island of the bacterial host and thus enabling productive infection (129). Together, these findings emphasize the complexity of the virus-host arms race in which CRISPR-Cas systems are involved and suggest that many important aspects of this race remain to be characterized.

Very recently, an unexpected claim has been published on the existence of a CRISPR-like defense system in a giant mimivirus infecting unicellular eukaryotes (amoeba) (130). This system, named “mimivirus virophage resistance element” (MIMIVIRE), has been proposed to protect certain mimivirus strains from the Zamilon virophage, a small virus that parasitizes on mimiviruses. However, the MIMIVIRE locus lacks CRISPR-like repeats or a Cas1 homolog and encodes only very distant, generic homologs of two Cas proteins (a helicase and a nuclease that belong to the same protein superfamilies as Cas3 and Cas4, respectively, but lack any specific relationship with these Cas proteins). Thus, any analogy between this putative eukaryotic giant virus defense system and CRISPR-Cas should be perceived with caution.

Genome editing applications

The molecular features of CRISPR-Cas systems, particularly class 2 systems with single-protein effectors, have made them attractive starting points for researchers interested in developing programmable genome editing tools. In 2013, the first reports of harnessing Cas9 for multiplex gene editing in human cells appeared (131134). These studies have demonstrated that Cas9 could efficiently create indels at precise locations and that by supplying exogenous repair templates, insertion of a new sequence at target sites could be achieved via homologous recombination. A “dead” Cas9 (dCas9) variant with inactivating mutations in the HNH and RuvC domains binds DNA without cutting, providing a programmable platform for recruiting different functional moieties to target sites. The dCas9 has been used for transcriptional activation and repression (135138), localizing fluorescent protein labels (139), and recruiting histone-modifying enzymes (140, 141). Other applications of Cas9 include building gene circuits (142144), creating new antimicrobials (145) and antivirals (146148), and large-scale gain- and loss-of-function screening (149152).

The genome editing toolbox has been expanding through the discovery of novel class 2 effector proteins, such as Cpf1 (39). The Cpf1 nuclease possesses on-target efficiencies in human cells that are comparable with that of Cas9. Besides, Cpf1 is also highly specific in its targeting, as minimal or no off-target cleavage has been detected (153, 154). Cpf1 does not require a tracrRNA, further simplifying the system for genome editing applications. In addition, it generates sticky ends, which could potentially increase the efficiency of insertion of new DNA sequences relative to the blunt ends created by Cas9 (39).

Central to the success of any Cas-based genome editing tool is the specificity of the enzyme, and many approaches to increase specificity have been reported. For example, “double-nicking,” which uses dimers of two Cas9 variants, each mutated to create a nick in one strand of the DNA, improves specificity by requiring two target matches to create the double-strand break (155, 156). Another tactic is to control the amount of Cas9 in the cell via an inducible system that expresses a low level of Cas9 (157, 158). Shortening the region of complementarity in the guide RNA also reduces off-target cleavage (159). Finally, structure-guided engineering has been used to mutate specific residues in Cas9, to weaken the interaction with the nontarget strand or to decrease nonspecific interactions with the target DNA site, favoring cleavage at sites that are perfectly complementary to the guide RNA and reducing off-target effects to undetectable levels at many sites (160, 161).

A major outstanding challenge for realizing the full potential of Cas-based genome editing, including its use as a therapeutic, is efficient and tissue-specific delivery. Some progress has been made in this area, including the use of a smaller Cas9 ortholog (162), which is more amenable to packaging into viral vectors. Other approaches are also being pursued, including nonviral methods for delivery of DNA or mRNA by nanoparticles (163) and electroporation (164), or direct delivery of Cas9 protein (165). Additionally, the long-term effects of Cas9 expression in heterologous eukaryotic cells remain unexplored. Finally, the potential for editing the human genome as well as the possibility of using Cas-based gene drives for ecosystem engineering (166) raise ethical concerns that must be fully considered.

Outlook

The intensive research over the past few years on structural and functional features of variant CRISPR-Cas systems has revealed that they encompass many homologous components and share common mechanistic principles but also show enormous variability. A key aspect of this variability is module shuffling, which involves frequent recombination of adaptation and effector modules coming from different types of CRISPR-Cas within the same locus. Apart from major differences in the architectures of the effector complexes, functional diversity of CRISPR-Cas includes versatile mechanisms of crRNA guide processing, self/nonself discrimination, and target cleavage. The versatility of class 2 systems in particular, where distinct subtypes apparently evolved via independent recombination of adaptation modules with widely different effectors, is notable, given the potential of these systems as genome editing tools. The in-depth analysis of a few well-characterized CRISPR systems has revealed key structural and mechanistic features. However, the continuing discovery of novel CRISPR-Cas variants and new molecular mechanisms implies that our current insights have limited power for predicting functional details of distantly related variants. Hence, such new CRISPR-Cas systems need to be meticulously analyzed to understand the biology of prokaryotic adaptive immunity and harness its potential for biotechnology. In this Review, we could not cover in any detail several fascinating aspects of CRISPR-Cas biology, such as coevolution of immune systems with viruses, the interplay between CRISPR-Cas activity and horizontal gene transfer, or nonimmune functions of CRISPR-Cas. The complexity and extreme variability of the CRISPR-Cas systems ensure that researchers in this field will have much to do for many years to come.

References and Notes

Acknowledgments: E.V.K. and K.S.M. are supported by the intramural program of the U.S. Department of Health and Human Services (to the National Library of Medicine). J.v.d.O. and P.M. are supported by the NWO/TOP grant 714.015.001. F.Z. is a New York Stem Cell Foundation–Robertson Investigator. F.Z. is supported by the NIH through NIMH (5DP1-MH100706 and 1R01-MH110049) and NIDDK (5R01DK097768-03), the New York Stem Cell, Simons, Paul G. Allen Family, and Vallee Foundations, and D. R. Cheng, T. Harriman, and R. Metcalfe. F.Z. is a founder of Editas Medicine and scientific advisor for Editas Medicine and Horizon Discovery.
View Abstract

Navigate This Article