Review

CRISPR-Cas: Adapting to change

See allHide authors and affiliations

Science  07 Apr 2017:
Vol. 356, Issue 6333, eaal5056
DOI: 10.1126/science.aal5056

Variation in prokaryote adaptive immunity

To repel infection by phage and mobile genetic elements, prokaryotes have a form of adaptive immune response and memory invested in clustered regularly interspaced short palindromic repeats and associated proteins (CRISPR-Cas). This molecular machinery can recognize and remember foreign nucleic acids by capturing and retaining small nucleotide sequences. On subsequent encounters, the cognate CRISPR-Cas marshals enzymatic defenses to destroy infecting elements that contain the same sequences. Jackson et al. review the molecular mechanisms by which diverse CRISPR-Cas systems adapt and anticipate novel threats and evasive countermeasures from mobile genetic elements.

Science, this issue p. eaal5056

Structured Abstract

BACKGROUND

The arms race between prokaryotes and their perpetually evolving predators has fueled the evolution of a defense arsenal. The so-called CRISPR-Cas systems—clustered regularly interspaced short palindromic repeats and associated proteins—are adaptive immune defense systems found in bacteria and archaea. The recent exponential growth of research in the CRISPR field has led to the discovery of a diverse range of CRISPR-Cas systems and insight into their defense functions. These systems are divided into two major classes and six types. Each system consists of two components: a locus for memory storage (the CRISPR array) and cas genes that encode the machinery driving immunity. Information stored within CRISPR arrays is used to direct the sequence-specific destruction of invading genetic elements, including viruses and plasmids. As such, all CRISPR-Cas immune systems are reliant on the formation of CRISPR memories, known as spacers, to facilitate future defense. To form these memories, small fragments of invader nucleic acids are added as spacers to the CRISPR memory banks in a process termed CRISPR adaptation. The genetic basis of immunity means that CRISPR adaptation provides heritable benefits, an attribute that is unparalleled in eukaryotic immune systems. There is widespread evidence of highly active CRISPR adaptation in nature, and it is clear that these systems play important roles in shaping microbial evolution and global ecological networks.

ADVANCES

CRISPR adaptation requires several processes, including selection and processing of spacer precursors and their subsequent localization to, and integration into, the CRISPR loci. Although our understanding of all facets of the CRISPR adaptation pathway is not yet complete, considerable progress has been made in the past few years. At the heart of CRISPR adaptation is a protein complex, the Cas1-Cas2 “workhorse,” which catalyzes the addition of new spacers to CRISPR memory banks. A combination of functional assays and high-resolution structures of Cas1-Cas2 complexes has recently led to major advances. There is now a sound understanding of how foreign DNA is converted to prespacer substrates and captured by the Cas1-Cas2 complex. After this, Cas1-Cas2 locates the genomic CRISPR locus and docks in the appropriate position for insertion of the new spacer into the CRISPR array, while duplicating a CRISPR repeat. The cues directing the docking of substrate-laden Cas1-Cas2 differ between systems, with some relying on intrinsic sequence specificity and others assisted by host proteins.

Before integration, accurate processing of the spacer precursors is required to ensure that the new spacers are compatible with the protein machinery in order to elicit CRISPR-Cas defense. For a given CRISPR-Cas system, spacers must typically be of a certain length and be inserted into the CRISPR in a specific orientation. It is becoming increasingly apparent that Cas1-Cas2 complexes from diverse systems are capable of ensuring that these system-specific factors are met with high fidelity.

New findings also account for the ordering of stored memories: Typically, the insertion of new spacers is directed to one end of CRISPR arrays, and it has been shown that this enhances immunity against recently encountered invaders. The chronological ordering of new spacers has enabled insights into the temporal dynamics of interactions between hosts and invaders that are constantly changing. Some CRISPR-Cas systems use existing spacers to recognize previously encountered elements and promote the formation of new CRISPR memories, a process known as primed CRISPR adaptation. Viruses and plasmids that have escaped previous CRISPR-Cas defenses through genetic mutations trigger primed CRISPR adaptation. Several recent studies have revealed that primed CRISPR adaptation is also strongly promoted by recurrent invaders, even in the absence of escape mutations. This has led to previously separate paradigms of invader destruction and primed CRISPR adaptation beginning to converge into a unified model.

OUTLOOK

CRISPR adaptation is crucial for ensuring both population-level protection through spacer diversity and protection of the host through invader clearance. Although many studies have explored CRISPR adaptation in a broad range of host-specific and metagenomic contexts, much of the mechanistic detail has been gleaned from studying a relatively small subset of systems. Thus, despite the relative wealth of mechanistic information about CRISPR adaptation in a few specific types, work in other systems continues to reveal distinct modes of operation for spacer acquisition. Therefore, studies of CRISPR adaptation in alternative systems are necessary to determine which processes are conserved and which are system-specific. An important remaining question is why the enhanced primed CRISPR adaptation commonly found in type I systems has not yet been observed in other types. Do other systems possess analogous mechanisms that have yet to be discovered, or does the absence of priming in these systems explain the prevalence of type I systems in nature? Future expansion of our understanding of how CRISPR adaptation is carried out in the diverse repertoire of CRISPR-Cas systems is vital for maximizing the potential for repurposing the spacer acquisition machinery in biotechnological applications. Commandeering CRISPR adaptation for on-demand memory formation will usher in a new era of biological information storage, with many applications that await discovery.

Many hues of the CRISPR-Cas adaptation machinery.

The complex of the Cas1 and Cas2 proteins, which is the workhorse of CRISPR adaptation in diverse CRISPR-Cas prokaryotic immune systems, is depicted with a DNA substrate. Despite the near-ubiquitous Cas1-Cas2 molecular machinery, type-specific differences in the insertion of new information into CRISPR memory banks are beginning to come to light.

Abstract

Bacteria and archaea are engaged in a constant arms race to defend against the ever-present threats of viruses and invasion by mobile genetic elements. The most flexible weapons in the prokaryotic defense arsenal are the CRISPR-Cas adaptive immune systems. These systems are capable of selective identification and neutralization of foreign DNA and/or RNA. CRISPR-Cas systems rely on stored genetic memories to facilitate target recognition. Thus, to keep pace with a changing pool of hostile invaders, the CRISPR memory banks must be regularly updated with new information through a process termed CRISPR adaptation. In this Review, we outline the recent advances in our understanding of the molecular mechanisms governing CRISPR adaptation. Specifically, the conserved protein machinery Cas1-Cas2 is the cornerstone of adaptive immunity in a range of diverse CRISPR-Cas systems.

Bacteria and archaea are constantly threatened by phage infection and invasion by mobile genetic elements (MGEs) through conjugation and transformation. In response, a defense arsenal has evolved, including various innate mechanisms and the CRISPR-Cas (clustered regularly interspaced short palindromic repeats and associated proteins) adaptive immune systems (13). CRISPR-Cas systems are widely distributed, occurring in 50 and 87% of complete bacterial and archaeal genomes, respectively. These systems function as RNA-guided nucleases that provide sequence-specific defense against invading MGEs (4, 5). The repurposing of sequence-specific Cas nucleases, particularly Cas9, has stimulated a biotechnological revolution in genome editing (6). In native hosts, the advantage conferred by CRISPR-Cas systems over innate defenses lies in the ability to update the resistance repertoire in response to infection (a process termed CRISPR adaptation). CRISPR adaptation is achieved by incorporating short DNA fragments from MGEs into CRISPR arrays to form memory units called spacers. Early bioinformatic studies showed that many spacers were of foreign origin, hinting that CRISPR loci may act as a form of memory for a prokaryotic immune system (710). Subsequent confirmation of the link between spacers and resistance to phage and MGEs was gained experimentally (4, 5, 11). An overview of CRISPR-Cas–mediated defense and CRISPR adaptation is provided in Box 1.

Box 1

A roadmap of CRISPR-Cas adaptation and defense.

In the example illustrated, a bacterial cell is infected by a bacteriophage. The first stage of CRISPR-Cas defense is CRISPR adaptation. This involves the incorporation of small fragments of DNA from the invader into the host CRISPR array. This forms a genetic “memory” of the infection. The memories are stored as spacers (colored squares) between repeat sequences (R), and new spacers are added at the leader-proximal (L) end of the array. The Cas1 and Cas2 proteins, encoded within the cas gene operon, form a Cas1-Cas2 complex (blue)—the “workhorse” of CRISPR adaptation. In this example, the Cas1-Cas2 complex catalyzes the addition of a spacer from the phage genome (purple) into the CRISPR array. The second stage of CRISPR-Cas defense involves transcription of the CRISPR array and subsequent processing of the precursor transcript to generate CRISPR RNAs (crRNAs). Each crRNA contains a single spacer unit that is typically flanked by parts of the adjoining repeat sequences (gray). Individual crRNAs assemble with Cas effector proteins (light green) to form crRNA-effector complexes. The crRNA-effector complexes catalyze the sequence-specific recognition and destruction of foreign DNA and/or RNA elements. This process is known as interference.


Embedded Image

Red Queen CRISPR adaptation

The ability to keep defenses up to date by acquiring new spacers is central to the success of CRISPR-Cas systems. Typically, new spacers are inserted at a specific end of the CRISPR array, adjacent to a “leader” region that contains conserved sequence motifs (4, 1214). The leader usually also contains the promoter driving CRISPR transcription, and it has been demonstrated that integration of new spacers at the leader end enhances defense against phages and MGEs encountered recently (15). This “polarized” addition of spacers into CRISPR loci produces a chronological account of the encounters between phages and bacteria that can provide insights into phage-host co-occurrences, evolution, and ecology (16, 17). However, phage and MGE variants with genetic mutations can avoid detection by existing CRISPR spacers; these evaders are termed “escape mutants.” Additionally, spacers can be lost from CRISPR arrays by recombination between the repeats (16, 18). Thus, maintenance of CRISPR-Cas defense is reliant on the addition of new spacers into CRISPR arrays (19, 20). The continuous competition between host CRISPR adaptation and MGE escape, akin to Red Queen dynamics, has been exposed in several recent metagenome studies (21, 22). Individual cells within a prokaryotic community acquire different, and often multiple, spacers during CRISPR adaptation (23, 24). The diversity of CRISPR loci within cell populations optimizes defense by limiting the reproductive success of mutants that escape the CRISPR-Cas defenses of individual cells (25). Furthermore, the resulting polymorphisms in CRISPR loci enable fast and accurate differentiation of species subtypes, which may prove to have economic and clinical benefits—for example, enabling tracing of pathogens during outbreaks (7, 26).

Origins of CRISPR adaptation

According to their constituent Cas proteins, CRISPR-Cas systems are classified into two major classes consisting of six types and 19 subtypes (Fig. 1) (27, 28). Comparative genomics indicates that all known CRISPR-Cas systems evolved from a single ancestor (27, 28). The more compact class 2 CRISPR-Cas systems likely evolved from class 1 ancestors through the acquisition of genes encoding new single-subunit effector proteins and the loss of additional cas genes (28). However, despite the divergence of CRISPR-Cas systems into several types, the proteins primarily responsible for catalyzing spacer acquisition—namely, Cas1 and Cas2—remain relatively conserved, and the genes encoding these proteins are associated with nearly all CRISPR-Cas systems (27). Indeed, as long as spacers can be acquired from MGEs, distinct effector machineries capable of using the information stored in CRISPRs are likely to arise. Knowledge of the structure and function of CRISPR-Cas effector complexes has advanced rapidly in recent years (6, 28). In addition, considerable progress has been made lately toward elucidating the molecular basis of how, when, and why CRISPR adaptation occurs. Here we review these recent findings and highlight the insights that they offer on the function of different CRISPR adaptation mechanisms used by diverse CRISPR-Cas systems.

Fig. 1 Target interactions and the PAMs of diverse CRISPR-Cas types.

Recognition of the invading DNA target by the crRNA-Cas effector complexes of types I, II, and V results in the formation of an RNA-DNA hybrid in which the nontarget DNA strand is displaced. The target strand contains the protospacer (red), which is complementary to the spacer sequence in the crRNA (orange). The protospacer-adjacent motif (PAM, blue) is located at either the 3′ end (types I and V) or the 5′ end (type II) of the protospacer. Types III and VI recognize RNA targets, with type III exhibiting additional transcription-dependent DNA targeting. Some type III systems require an RNA-based PAM (rPAM). Type VI systems exhibit specificity for a protospacer-flanking sequence (PFS) motif, which is analogous to a PAM.

The molecular basis of CRISPR adaptation

CRISPR adaptation requires the integration of new spacers into CRISPR loci and duplication of the associated repeat sequences. The Cas1 and Cas2 proteins, which form a Cas14-Cas22 complex (hereafter, Cas1-Cas2) (29, 30), constitute the “workhorse” of spacer integration. Spacers added to CRISPR arrays must be compatible with the diverse range of type-specific effector complex machinery (Fig. 1). Thus, despite being near-ubiquitous among CRISPR-Cas types, Cas1-Cas2 homologs meet the varied requirements for the acquisition of appropriate spacer sequences in different systems. For example, the effector complexes of several CRISPR-Cas types only recognize targets containing a specific sequence adjacent to where the CRISPR RNA (crRNA) base-pairs with the target strand of a MGE (Fig. 1) (31). The crRNA-paired target sequence is termed the protospacer, and the adjacent target-recognition motif is called a protospacer-adjacent motif (PAM) (32). PAM-based target discrimination prevents the unintentional recognition and self-destruction of the CRISPR locus by the crRNA-effector complex, yet canonical PAM sequences vary between and sometimes within systems.

Much of what we know about the Cas1-Cas2 molecular structure and function has been gained from studies in the Escherichia coli type I-E system. Within the Cas14-Cas22 complex, the Cas1 subunits form two dimers that are bridged by a central Cas2 dimer (Fig. 2A) (29, 33, 34). Cas1-Cas2–mediated spacer integration prefers double-stranded DNA (dsDNA) substrates and proceeds through a mechanism resembling retroviral integration (35, 36). In addition to Cas1-Cas2, at least one CRISPR repeat, part of the leader sequence (12, 13, 15, 37), and several host factors for repair of the insertion sites (e.g., DNA polymerase) are required (38). Spacer acquisition involves three main processes: substrate capture, recognition of the CRISPR locus, and integration within the array.

Fig. 2 Cas1-Cas2–mediated spacer acquisition.

(A) The Cas1-Cas2 protein complex loaded with a prespacer substrate (the E. coli type I-E structure is shown; Protein Data Bank ID, 5DQZ). (B) The Cas1 PAM sensing site, showing the canonical type I-E PAM (CTT, yellow) residue-specific interactions (a residue from the noncatalytic Cas1 monomer is annotated with an asterisk) and the site of PAM processing (scissors). H, histidine; K, lysine; Q, glutamine; R, arginine; Y, tyrosine. (C) A schematic representation of the substrate-loaded Cas1-Cas2 protein complex with the active PAM-sensing site highlighted (light purple) and a partially duplexed DNA prespacer substrate (strands are purple and pink). The ruler mechanism determining spacer length for the E. coli type I-E system uses two conserved tyrosine residues (the “Cas1 wedge;” gray hexagons). (D) Spacer integration proceeds as follows: (i) The Cas1-Cas2–prespacer complex binds to the leader (green) and first repeat (black). For type I and type II systems, Cas1-Cas2 docking to the leader-proximal repeat is assisted by integration host factor (IHF) and recognition of the leader-anchoring site (LAS), respectively. (ii) The first nucleophilic attack most likely occurs at the leader-repeat junction and gives rise to a half-site intermediate. (iii) The second nucleophilic attack occurs at the boundary between the repeat and the spacer (orange), resulting in full-site integration. (iv) Host DNA repair enzymes fill the integration site. (E) The type I-E repeat is magnified to indicate the inverted repeats within its sequence and highlight the anchoring sites of the molecular rulers that determine the point of integration. nt, nucleotides.

Cas1-Cas2 substrate capture

During substrate capture, Cas1-Cas2 is loaded with an integration-compatible prespacer, which is thought to be partially duplexed dsDNA (35). For type I systems, the presence of a canonical PAM within the prespacer substrate increases the affinity for Cas1-Cas2 binding but is not requisite (33). Details of how prespacer substrates are produced from foreign DNA are discussed later. For the E. coli type I-E Cas1-Cas2–prespacer complex, the ends of the dsDNA prespacer are splayed by tyrosine wedges in each Cas1 dimer, which lock open the DNA branch points while fixing in place a core 23–base pair dsDNA region. The 3′ single-stranded ends of the prespacer extend into active subunits of each corresponding Cas1 dimer (Fig. 2, A to C) (33, 34). The length of new spacers is governed by the fixed distances between the two Cas1 wedges and from the branch points to the integrase sites. Many CRISPR-Cas systems have highly consistent yet system-specific spacer lengths, and it is likely that analogous wedge-based Cas1-Cas2 “molecular rulers” exist in these systems to control prespacer length (33, 34). However, in some systems, such as type III, the length of spacers found within CRISPR arrays appears more variable, and studies of Cas1-Cas2 structure and function in these systems are lacking.

Recognition of the CRISPR array

Before integration, the substrate-bound Cas1-Cas2 complex must locate the CRISPR leader-repeat sequence. Specific sequences upstream of CRISPR arrays direct leader-polarized spacer integration, both through direct Cas1-Cas2 recognition and assisted by host proteins. The Cas1-Cas2 complexes of several systems show intrinsic affinity for the leader-repeat region in vitro (35, 39), yet this is not always wholly sufficient to provide the specificity observed in vivo. It was recently discovered that for the type I-E system, leader-repeat recognition is assisted by the integration host factor (IHF) heterodimer (40). IHF binds the CRISPR leader in a sequence-specific manner and induces ∼120° DNA bending, providing a cue to accurately localize Cas1-Cas2 to the leader-repeat junction (40, 41). A conserved sequence motif upstream of the IHF pivot is proposed to stabilize the Cas1-Cas2–leader-repeat interaction and increase the efficiency of spacer acquisition, supporting binding of the adaptation complex to DNA sites on either side of the bound IHF (41).

IHF is absent in many prokaryotes, including archaea, indicating that other leader-proximal integration mechanisms exist. Indeed, type II-A Cas1-Cas2 from Streptococcus pyogenes catalyzed leader-proximal integration in vitro at a level of precision comparable to that of the type I-E system with IHF (39, 40). In type II systems, a short leader-anchoring site (LAS) adjacent to the first repeat and ≤6 base pairs of this repeat are essential for CRISPR adaptation (15, 37, 39) and are conserved in systems with similar repeats. Placement of an additional LAS in front of a nonleader repeat resulted in the integration of spacers at both sites (37), whereas LAS deletion caused ectopic integration at a downstream repeat adjacent to a spacer containing a LAS-like sequence (15). Hence, in contrast to type I-E systems, type II-A systems appear to rely solely on intrinsic sequence specificity for the leader-repeat junction.

Integration into the CRISPR array

For CRISPR-Cas types that are reliant on PAM sequences for recognition of targets, the acquisition of interference-proficient spacers requires processing of the prespacer substrate at a specific position relative to the PAM. Each of the four Cas1 monomers in the Cas1-Cas2 complex contains a PAM-sensing domain. The presence of a PAM in the active site of just one of the Cas1 monomers is sufficient to appropriately position the substrate and PAM relative to the cleavage site (Fig. 2B) (33, 34). Furthermore, the presence of a PAM within the prespacer substrate ensures integration into the CRISPR in the correct orientation (23, 4244). This directional fidelity is critical because otherwise the PAM in the MGE target would lie at the wrong end of the crRNA target binding site, thus precluding target recognition (Fig. 1). To avoid premature loss of the PAM directional cue, processing of the prespacer likely occurs after Cas1-Cas2 orients and docks at the leader-proximal repeat (Fig. 2D). Cas1-mediated processing of the prespacer creates two 3′OH ends required for nucleophilic attack on each strand of the leader-proximal repeat (35, 36, 45). The initial nucleophilic attack most likely occurs at the leader-repeat junction and forms a half-site intermediate; then, a second attack at the existing repeat-spacer junction generates the full-site integration product (Fig. 2D). The precise order of the prespacer processing and integration steps remains to be fully determined, but considerable progress toward elucidating the reaction mechanisms has been made.

After the first nucleophilic attack, the intrinsic sequence specificity of the Cas1-Cas2 complex defines the site of the second attack and ensures accurate repeat duplication. CRISPR repeats are often semi-palindromic, containing two short inverted repeat (IR) elements, but the location of these can vary (46). In type I-B and I-E systems, the IRs occur close to the center of the repeat (Fig. 2E) and are important for spacer acquisition (47, 48). In the type I-E system, both IRs act as anchors for the Cas1-Cas2 complex, which contains two molecular rulers to position the Cas1 active site for the second nucleophilic attack at the repeat-spacer boundary (47). However, in the type I-B system from Haloarcula hispanica, only the first IR is essential for integration, and a single molecular ruler, directed by an anchor between the IRs, has been proposed (48). In the type II-A systems of Streptococcus thermophilus and S. pyogenes, the IRs are located distally within the repeats, suggesting that these short sequences may directly position the nucleophilic attacks without a need for molecular rulers (37, 39). Although these recent findings suggest that leader-repeat regions at the beginning of CRISPR arrays contain sequences to ensure appropriate Cas1-Cas2 localization, further work is required to determine how the spacer integration events are specifically orchestrated in the diverse range of CRISPR-Cas types.

Production of prespacers from foreign DNA

Despite the elegance of memory-directed defense, CRISPR adaptation is not without complications. For example, the inadvertent acquisition of spacers from host DNA must be avoided because this will result in cytotoxic self-targeting, akin to autoimmunity in eukaryotic adaptive immune systems (49, 50). Therefore, production of prespacer substrates from MGEs should outweigh production from host DNA. In the following sections, we outline the routes to prespacer generation in different CRISPR-Cas systems.

Naïve CRISPR adaptation

Acquisition of spacers from MGEs that are not already cataloged in host CRISPRs is termed naïve CRISPR adaptation (Fig. 3) (51). For naïve CRISPR adaptation, prespacer substrates are generated from foreign material and loaded onto Cas1-Cas2. The main known source of these precursors is the host RecBCD complex (52). Stalled replication forks that occur during DNA replication can result in double-strand breaks (DSBs), which are repaired through RecBCD-mediated unwinding and degradation of the dsDNA ends back to the nearest Chi sites (53). During this repair process, RecBCD produces single-stranded DNA (ssDNA) fragments, which have been proposed to subsequently anneal to form partially duplexed prespacer substrates for Cas1-Cas2 (52). The greater number of active origins of replication and the paucity of Chi sites on MGEs, compared with the host chromosome, bias naïve adaptation toward foreign DNA. Furthermore, RecBCD recognizes the unprotected dsDNA ends that are commonly present in phage genomes upon injection or before packaging, which theoretically provides an additional phage-specific source of naïve prespacer substrates (52).

Fig. 3 Cas1-Cas2 substrate production pathways.

(A) Naïve generation of substrates by RecBCD activity on DNA ends due to double-strand breaks that occur as a result of stalled replication forks, innate defenses such as restriction endonuclease activity, or the ends of phage genomes (not shown). (B) Primed prespacer production in type I systems, which requires Cas3 helicase and nuclease activity. (C) Cas9-dependent spacer selection in type II systems, which for some subtypes is dependent on the activity of accessory proteins, such as Csn2 or Cas4. The PAM specificity of the Cas9 protein determines the selection of PAMs in prespacer substrates.

Despite the role of RecBCD in substrate generation, naïve CRISPR adaptation can occur in its absence, albeit with reduced bias toward foreign DNA (52). Thus, events other than DSBs might also stimulate naïve CRISPR adaptation, such as R-loops that occur during plasmid replication, lagging ends of incoming conjugative elements (54), and even CRISPR-Cas–mediated spacer integration events themselves (23, 52). Furthermore, we do not know whether all CRISPR-Cas systems have an intrinsic bias toward production of prespacers from foreign DNA. In high-throughput studies of native systems, the frequency of acquisition of spacers from host genomes is likely to be underestimated, because the autoimmunity resulting from self-targeting spacers means that these genotypes are typically lethal (23, 49, 50, 55). For example, in the S. thermophilus type II-A system, spacer acquisition appears biased toward MGEs, yet nuclease-deficient Cas9 fails to discriminate between host and foreign DNA (55). It is unknown whether CRISPR adaptation in type II systems is reliant on DNA break repair. Further studies in a range of host systems are required to clarify how diverse CRISPR-Cas systems balance the requirement for naïve production of prespacers from MGEs against the risk of acquiring spacers from host DNA.

crRNA-directed CRISPR adaptation (priming)

Mutations in the target PAM or protospacer sequences can abrogate immunity, allowing MGEs to escape CRISPR-Cas defenses (5658). Furthermore, the protection conferred by individual spacers varies: Often, several MGE-specific spacers are required to mount an effective defense (24, 59) and to prevent proliferation of escape mutants (17, 25). Thus, to maintain effective immunity, CRISPR-Cas systems need to undergo CRISPR adaptation faster than MGEs can evade targeting. Indeed, type I systems have evolved a mechanism known as primed CRISPR adaptation (or priming) to facilitate rapid spacer acquisition (44, 60), even against highly divergent invaders (58) (Fig. 3). Priming uses MGE target recognition that is facilitated by preexisting spacers to trigger the acquisition of additional spacers from previously encountered elements. Thus, priming is advantageous when MGE replication within the host cell exceeds defense capabilities. This can occur when cells are infected by MGE escape mutants or when the levels of CRISPR-Cas activity are insufficient to provide complete immunity using only the existing spacers, even in the absence of MGE escape mutations (44, 58, 6063).

Priming begins with target recognition by crRNA-effector complexes. Therefore, factors that influence target recognition (i.e., the formation and stability of the crRNA-DNA hybrid; Fig. 1), including PAM sensing and crRNA-target complementarity, affect the efficiency of primed CRISPR adaptation (58, 59, 6469). Furthermore, these same factors can induce conformational rearrangements in the target-bound crRNA-effector complex that result in favoring of either the interference or priming pathways (6567, 70). In type I-E systems, the Cas8e (Cse1) subunit of Cascade can adopt one of two conformational modes (67, 70), which may promote either direct or Cas1-Cas2–stimulated recruitment of the effector Cas3 nuclease (65, 66, 70).

Cas3, which is found in all type I systems, exhibits 3′ to 5′ helicase and endonuclease activity that nicks, unwinds, and degrades target DNA (7173). In vitro activity of the type I-E Cas3 produces ssDNA fragments of ~30 to 100 nucleotides that are enriched for PAMs in their 3′ ends and that anneal to provide partially duplexed prespacer substrates (64). The spatial positioning of Cas1-Cas2 during primed substrate generation has not been clearly established, although Cas1-Cas2–facilitated recruitment of Cas3 would imply that the CRISPR adaptation machinery is localized close to the site of prespacer production (65, 70). In type I-F systems, Cas3 is fused to the C terminus of Cas2 (Cas2-3), so these systems form Cas1–Cas2-3 complexes (30) that couple the CRISPR adaptation machinery directly to the source of prespacer generation during priming (23, 74).

Despite different target recognition modes favoring distinct Cas3 recruitment routes, primed CRISPR adaptation can be provoked by MGE escape mutants and non-escape (interference-proficient) targets (23, 44, 60, 75). However, when the intracellular copy number influences of the MGE are excluded, interference-proficient targets promote greater spacer acquisition than escape mutants (23, 75). This forms a positive feedback loop, reinforcing immunity against recurrent threats even in the absence of escapees (23, 44). If the copy number of the MGE within the host cell is factored in, then escape mutants actually trigger more spacer acquisition. This is because interference rapidly clears targeted MGEs from the cell, whereas escape mutants that evade immediate clearance by existing CRISPR-Cas immunity persist for longer. Over time, the prolonged presence of the escape MGE, combined with the priming-centric CRISPR-Cas target recognition mode, results in higher net production of prespacer substrates and spacer integration (23, 63, 64, 75).

Because priming is initiated by site-specific target recognition (i.e., targeting a priming protospacer), Cas1-Cas2–compatible prespacers are subsequently produced from MGEs with locational biases (Fig. 4). Mapping the MGE sequence positions and strands targeted by newly acquired spacers (i.e., their corresponding protospacers) has revealed subtype-specific patterns and has provided much of our insight into the mechanisms of primed CRISPR adaptation (23, 43, 44, 60, 74, 76, 77). In type I-E systems, new protospacers typically map to the same strand as the priming protospacer (43, 44) (Fig. 4). For type I-B priming, Cas3 is predicted to load onto either strand at the priming protospacer, resulting in a bidirectional distribution of new protospacers (76). For type I-F priming, the first new protospacer typically maps to the strand opposite the priming protospacer, in a direction consistent with Cas3 loading and 3′ to 5′ helicase activity on the nontarget strand. Once the first spacer is acquired, two protospacer targets in the MGE are recognized, and prespacer production can be driven from both locations (23, 74) (Fig. 4). However, priming is stimulated more strongly from the interference-proficient protospacer than from the original priming protospacer. Thus, subsequent spacers (i.e., the second and those following) result from targeting by the first new spacer (23) (Fig. 4). The dominance of the first new spacer also holds true for type I-E (44, 75) and likely all other systems that display priming. However, these are generalized models, and many aspects remain unresolved, such as the mechanisms resulting in strand selection and why some spacer sequences are more highly acquired from MGEs than others. Further analyses of priming in different systems, particularly with respect to the order of new spacers acquired, will greatly inform our understanding of primed Cas1-Cas2 substrate production.

Fig. 4 Primed CRISPR adaptation from a multicopy MGE by type I-E and I-F CRISPR-Cas systems.

(A) An existing spacer (brown) with homology to a MGE sequence that has escaped interference (the priming protospacer, denoted with an asterisk) directs target recognition. The PAM is shown in black. The crRNA-effector complex recruits Cas3 (or Cas1–Cas2-3 for type I-F), and the 3′ to 5′ helicase activity (illustrated by the red arrow) results in the acquisition of a new spacer from a site distal to the initial priming location. The new spacer maps to an interference-proficient protospacer (orange). Spacer acquisition in the type I-E system requires the Cas1-Cas2 complex, and spacer acquisition in the type I-F system uses a Cas1–Cas2-3 complex. (B) The new spacer (orange) perfectly matches the MGE sequence at the orange protospacer location and facilitates targeting of the MGE and recruitment of Cas3. Hence, subsequent spacers (mapping to blue protospacers) typically originate from Cas3 activity (red arrows) beginning at this location.

Cas protein–assisted production of spacers

Given the apparent advantages conferred by priming in type I systems, analogous mechanisms to stimulate primed-like CRISPR adaptation are likely to exist in other CRISPR-Cas types. For example, DNA breaks induced by interference activity of class 2 CRISPR-Cas effector complexes could trigger host DNA repair mechanisms (e.g., RecBCD), thereby providing substrates for Cas1-Cas2. In agreement with a model for DNA break–stimulated enhancement of CRISPR adaptation, restriction enzyme activity can stimulate RecBCD-facilitated production of prespacer substrates (52). RecBCD activity may also partially account for the enhanced CRISPR adaptation observed during phage infection of a host possessing an innate restriction-modification defense system (78). Whether the enhanced CRISPR adaptation was RecBCD-dependent in this example is unknown. In a CRISPR-Cas–induced DNA break model, the production of prespacer substrates is preceded by sequence-specific target recognition, which could be considered to be related to priming (79). Although direct evidence to support this concept is lacking, CRISPR adaptation in type II-A systems requires Cas1-Cas2, Cas9, a transactivating crRNA (tracrRNA; a cofactor for crRNA processing and interference in type II systems), and Csn2 (55, 79). The PAM-sensing domain of Cas9 enhances the acquisition of spacers with interference-proficient PAMs (79). However, Cas9 nuclease activity is dispensable (55), and existing spacers are not strictly necessary (79), suggesting that the PAM interactions of Cas9 could be sufficient to select appropriate new spacers. Some Cas9 variants can also function with non-CRISPR RNAs and tracrRNA (80). This raises the possibility that host or MGE-derived RNAs might direct promiscuous Cas9 activity, resulting in DNA breaks or replication fork stalling that could potentially result in prespacer generation.

Roles of accessory Cas proteins in CRISPR adaptation

Although Cas1 and Cas2 play a central role in CRISPR adaptation, type-specific variations in cas gene clusters occur. In many systems, Cas1-Cas2 is assisted by accessory Cas proteins, which are often mutually exclusive and type-specific (27). For example, in the S. thermophilus type II-A system, deletion of csn2 impaired the acquisition of spacers from invading phages (4). Direct interaction between Cas1 and Csn2 also suggests a role for Csn2 in conjunction with the spacer acquisition machinery (81). Csn2 multimers cooperatively bind to the free ends of linear dsDNA and can translocate by rotation-coupled movement (82, 83). Given that substrate-loaded type II-A Cas1-Cas2 is capable of full-site spacer integration in vitro (39), Csn2 may be required for prespacer substrate production, selection, or processing. Potentially, Csn2 binding to the free ends of dsDNA provides a cue for nucleases to assist in prespacer generation (82).

Cas4, another ring-forming accessory protein, is found in type I, II-B, and V systems (27). Confirming its role in CRISPR adaptation, Cas4 is necessary for type I-B priming in H. hispanica (76) and interacts with a Cas1-Cas2 fusion protein in the Thermoproteus tenax type I-A system (84). Fusions between Cas4 and Cas1 are found in several systems, which indicates a functional association with the spacer acquisition machinery. Cas4 contains a RecB-like domain and four conserved cysteine residues, which are presumably involved in the coordination of an iron-sulfur cluster (85). However, Cas4 proteins appear to be functionally diverse, with some possessing uni- or bidirectional exonuclease activity, whereas others exhibit ssDNA endonuclease activity and unwinding activity on dsDNA (85, 86). Because of its nuclease activity, Cas4 is hypothesized to be involved in prespacer generation.

In type III systems, spacers complementary to RNA transcribed from MGEs are required for immunity (Fig. 1) (87, 88). Some bacterial type III systems contain fusions of Cas1 with reverse transcriptase domains (RTs) that provide a mechanism to integrate spacers from RNA substrates (89). The RT-Cas1 fusion from M. mediterranea can integrate RNA precursors into an array, which are subsequently reverse-transcribed to generate DNA spacers (89). However, integration of DNA-derived spacers also occurs, indicating that the RNA derived–spacer route is not exclusive (89). Hence, the combined integrase and reverse transcriptase activity of RT-Cas1–Cas2 enhances CRISPR adaptation against highly transcribed DNA MGEs and potentially against RNA-based invaders.

Other host proteins may also be necessary for prespacer substrate production. For example, RecG is required for efficient primed CRISPR adaptation in type I-E and I-F systems, but its precise role remains speculative (38, 90). Additionally, it is still enigmatic why some CRISPR-Cas systems require accessory proteins, whereas closely related types do not. For example, type II-C systems lack cas4 and csn2, which assist CRISPR adaptation in type II-A and II-B systems, respectively. These type-specific differences exemplify the diversity that has arisen during the evolution of CRISPR-Cas systems.

The genesis of adaptive immunity in prokaryotes

Expanding knowledge of the molecular mechanisms underlying CRISPR adaptation indicates the evolutionary origin of CRISPR-Cas systems (91). Casposons are transposon-like elements typified by the presence of Cas1 homologs, or casposases, which catalyze site-specific DNA integration and result in the duplication of repeat sites, analogously to spacer acquisition (92, 93). It is possible that ancestral innate defenses gained DNA integration functionality from casposases, thus seeding the genesis of prokaryotic adaptive immunity (94). The innate ancestor remains unidentified but is likely to be a nuclease-based system. Co-occurrence of casposon-derived terminal IRs and casposases in the absence of full casposons might represent an intermediate of the signature CRISPR repeat-spacer-repeat structures (95). However, the evolutionary journey from the innate immunity–casposase hybrid to full adaptive immunity is unclear. Evolution of diverse CRISPR-Cas types would have required stringent coevolution of the Cas1-Cas2 spacer acquisition machinery, PAM and leader-repeat sequences, crRNA processing mechanisms, and effector complexes.

In some systems, mechanisms to enhance the production of Cas1-Cas2–compatible prespacers from MGEs, such as priming, might have arisen because naïve CRISPR adaptation is an inefficient process with a high probability of acquiring spacers from host DNA. However, it was recently shown that promiscuous binding of crRNA-effector complexes to the host genome results in a basal level of lethal “self-priming” in a type I-F system (23). Host CRISPR and cas gene regulation mechanisms might have arisen to balance the likelihood of self-acquisition events against the requirement to adapt to new threats—for example, when the risk of phage infection or horizontal gene transfer is high (96, 97). Alternatively, it has been proposed that selective acquisition of self-targeting spacers could provide benefits, such as invoking altruistic cell death (98), facilitating rapid genome evolution (50), regulating host processes (99, 100), or even preventing the uptake of other CRISPR-Cas systems (101).

Outlook

Recent years have seen rapid progress in understanding the mechanisms of CRISPR adaptation. Despite this progress, many facets of CRISPR adaptation need more work. Synergy between innate defense systems and CRISPR adaptation is relatively unexplored, but two aspects may be interrelated. First, DNA breaks (52) could stimulate the generation of substrates for spacer acquisition (Fig. 3), and second, the stalling of infection could buy time for CRISPR adaptation (78, 102, 103). Analogously, it remains to be determined whether interference by CRISPR-Cas systems other than type I can also stimulate primed CRISPR adaptation. If not, the benefits of priming might provide an explanation for why type I systems are the most prevalent and diverse CRISPR-Cas type.

It is also unclear why many CRISPR-Cas systems have more than one CRISPR array that is used by a single set of Cas proteins. Given that Cas1-Cas2 is directed to leader-repeat junctions during integration, multiple arrays might provide additional integration sites, increasing CRISPR adaptation efficiency. In addition, parallel CRISPR arrays should increase crRNA production from spacers that were acquired recently (because of the polarized insertion of new spacers next to the promoter-containing CRISPR leaders) (15). Whereas some strains have multiple CRISPR arrays belonging to the same type, other hosts have several different types of CRISPR-Cas systems simultaneously. The benefits of harboring multiple CRISPR-Cas systems are not entirely clear, but it can result in CRISPRs being shared by different systems to extend targeting to both RNA and DNA (104). From a CRISPR adaptation perspective, multiple systems might also enable a wider PAM repertoire to be sampled during spacer selection. Additional systems in a single host could also be a result of phage- and MGE-encoded anti-CRISPR proteins, which can inhibit both interference and primed CRISPR adaptation (105107). Alternatively, additional systems may allow some systems to function in defense, whereas others perform noncanonical roles (100).

Although Cas effector nucleases, such as Cas9, have been harnessed for many biotechnological applications, the use of repurposed CRISPR-Cas adaptation machinery has yet to be widely exploited. The sequence-specific integrase activity of Cas1-Cas2 holds promise in synthetic biology, such as for the insertion of specific sequences (or barcodes) to mark and track cells in a population. In E. coli, the feasibility of such an approach is evident (42), but translation to eukaryotic systems, in which lineage and cell fate could be tracked, will provide the greatest utility. A similar approach has been demonstrated by exploiting Cas9 nuclease activity (108). The elements required for leader-specific integration must be carefully considered for the introduction of CRISPR-Cas spacer acquisition machinery into eukaryotic cells, because unintended ectopic integrations could be problematic, given the larger eukaryotic sequence space. Ultimately, our understanding of CRISPR adaptation in prokaryotes may lead to applications in which entire CRISPR systems are transplanted into eukaryotic cells to prevent viral invaders. As we begin to comprehend CRISPR adaptation in more detail, the opportunities to repurpose other parts of these remarkable prokaryotic immune systems are increasingly becoming reality.

References and Notes

  1. Acknowledgments: Work in the Fineran laboratory on CRISPR-Cas is supported by a Rutherford Discovery Fellowship (to P.C.F.) from the Royal Society of New Zealand (RSNZ), the Marsden Fund of the RSNZ, the University of Otago, and the Bio-Protection Research Centre (Tertiary Education Commission). The Brouns laboratory is funded by a European Research Council starting grant (639707), a Netherlands Organisation for Scientific Research (NWO) Vidi grant (864.11.005), and a Foundation for Fundamental Research on Matter (FOM) projectruimte grant (15PR3188). We thank members of the Fineran and Brouns groups for useful feedback on the manuscript.
View Abstract

Navigate This Article