Research Article

Systematic discovery of antiphage defense systems in the microbial pangenome

See allHide authors and affiliations

Science  02 Mar 2018:
Vol. 359, Issue 6379, eaar4120
DOI: 10.1126/science.aar4120

Maps of defense arsenals in microbial genomes

To survive the attack of foreign invaders such as viruses and plasmids, bacteria and archaea fight back with immune systems that are usually clustered in “defense islands” in their genomes. Doron et al. took advantage of this property to map microbial defense systems systematically (see the Perspective by Kim). Candidate immune systems were then experimentally validated for their activities. Like well-known defense arsenals such as restriction-modification and CRISPR systems, these additional immune systems now require mechanistic investigation and could potentially be engineered into useful molecular tools in the future.

Science, this issue p. eaar4120; see also p. 993

Structured Abstract


Bacteria and archaea are frequently attacked by viruses (phages) and as a result have developed multiple, sophisticated lines of active defense that can collectively be referred to as the prokaryotic “immune system.” Although bacterial defense against phages has been studied for decades, it was suggested that many currently unknown defense systems reside in the genomes of nonmodel bacteria and archaea and await discovery.


Antiphage defense systems are known to be frequently physically clustered in microbial genomes such that, for example, genes encoding restriction enzymes commonly reside in the vicinity of genes encoding other phage resistance systems. The observation that defense systems are clustered in genomic “defense islands” has led to the hypothesis that genes of unknown function residing within such defense islands may also participate in antiphage defense. In this study, we aimed to comprehensively identify and experimentally verify new defense systems based on their enrichment within defense islands in an attempt to systematically map the arsenal of defense tools that are at the disposal of microbes in their fight against phages.


We searched for gene cassettes of unknown function that are enriched near known defense systems in more than 45,000 available bacterial and archaeal genome sequences. Such gene cassettes were defined as candidate defense systems and were systematically engineered into model bacteria, which were then infected by an array of phages to test for antiphage activities. This yielded the discovery of nine new families of antiphage defense systems and one additional family of antiplasmid systems that are widespread in microbes and shown to strongly protect against foreign DNA invasion. The systems discovered include ones that seem to have adopted components of the bacterial flagella and chromosome maintenance complexes and use these components for defensive capacities. Our data also show that genes with Toll-interleukin receptor (TIR) domains are involved in bacterial defense against phages, providing evidence for a common, ancient ancestry of innate immunity components shared between animals, plants, and bacteria.


Our study expands the known arsenal of defense systems used by prokaryotes for protection against phages, exposing tens of thousands of instances of defense systems that were so far unknown. Some of these systems appear to employ completely new mechanisms of defense against phages. In the past, the discovery and mechanistic understanding of antiphage defense systems led to the development of important biotechnological tools, as exemplified by the use of restriction enzymes and CRISPR-Cas for biotechnological and biomedical applications. One may envision that some of the systems discovered in the current study, once their mechanism is deciphered, will also be adapted into useful molecular tools in the future.

A pipeline for systematic discovery of defense systems.

Microbial genomes (more than 45,000 in the current study) are mined for genetic systems that are physically enriched next to known defense systems such as restriction-modification and CRISPR-Cas. Candidate predicted systems are cloned into model bacteria, and these bacteria are then infected by an array of phages from various families to determine whether they provide defense.


The arms race between bacteria and phages led to the development of sophisticated antiphage defense systems, including CRISPR-Cas and restriction-modification systems. Evidence suggests that known and unknown defense systems are located in “defense islands” in microbial genomes. Here, we comprehensively characterized the bacterial defensive arsenal by examining gene families that are clustered next to known defense genes in prokaryotic genomes. Candidate defense systems were systematically engineered and validated in model bacteria for their antiphage activities. We report nine previously unknown antiphage systems and one antiplasmid system that are widespread in microbes and strongly protect against foreign invaders. These include systems that adopted components of the bacterial flagella and condensin complexes. Our data also suggest a common, ancient ancestry of innate immunity components shared between animals, plants, and bacteria.

Bacteria and archaea are frequently attacked by viruses (phages) and as a result have developed multiple, sophisticated lines of active defense (13) that can collectively be referred to as the prokaryotic “immune system.” Antiphage defense strategies include restriction-modification (R-M) systems that target specific sequences on the invading phage (4); CRISPR-Cas, which provides acquired immunity through memorization of past phage attacks (5); abortive infection systems (Abi) that lead to cell death or metabolic arrest upon infection (6); and additional systems whose mechanism of action is not yet clear, such as BREX (7), prokaryotic Argonautes (pAgos) (8), and DISARM (9). Different bacteria encode different sets of defense systems: CRISPR-Cas systems are found in about 40% of all sequenced bacteria (10, 11), R-M systems are found in about 75% of prokaryote genomes (12), and pAgos and BREX appear in about 10% (7, 13). It has been suggested that many currently unknown defense systems reside in genomes and plasmids of nonmodel bacteria and archaea and await discovery (2, 14).

Antiphage defense systems were found to be frequently physically clustered in bacterial and archaeal genomes such that, for example, genes encoding restriction enzymes commonly reside in the vicinity of genes encoding abortive infection systems and other phage-resistance systems (14, 15). The observation that defense systems are clustered in genomic “defense islands” has led to the suggestion that genes of unknown function residing within such defense islands may also participate in antiphage defense (15, 16). Indeed, recent studies that focused on individual genes enriched next to known defense genes resulted in the discovery of new systems that protect bacteria against phages (7, 9, 17).

Identification of putative defense gene families

We have set out to comprehensively identify new defense systems enriched within defense islands, in an attempt to systematically map the arsenal of defense systems that are at the disposal of bacteria and archaea in their fight against phages. As a first step in this discovery effort, we sought to identify gene families that are enriched near known defense systems in the microbial pangenome. For this, we analyzed 14,083 protein families (pfams) in >45,000 available bacterial and archaeal genomes (overall encoding >120 million genes). Each pfam represents a set of genes sharing a common protein domain (18). We calculated, for each pfam, the tendency of its member genes to reside in the vicinity of one or more known defense genes (Fig. 1, A and B) (see Methods). We further selected pfams that at least 65% of their member genes were found next to defense genes and that their member genes appeared in diverse defense contexts within different genomes (at least 10% variability) (Fig. 1C). These thresholds were selected because they capture the majority of pfams that comprise known defense systems—e.g., restriction enzymes and Abi genes (Fig. 1, B and C, and table S1) (see Methods). The resulting set of 277 candidate pfams was supplemented with 35 non-pfam gene families that were previously predicted to be associated with known defense systems (15), as well as 23 pfams that were predicted in the same study as putatively defensive but did not pass our thresholds, altogether yielding a list of 335 candidate gene families (table S2).

Fig. 1 Discovery of new antiphage defense systems in defense islands.

(A) Illustration of the computational analysis employed for each pfam found to be enriched in defense islands. Pfams that are enriched in the vicinity of known defense genes are identified, and their neighboring genes are clustered based on sequence homology to identify conserved cassettes that represent putative defense systems. (B) Tendency of protein families to occur next to defense genes. The genomic neighborhood for each member gene in each pfam is examined, and the fraction of member genes occurring in the vicinity (10 genes on each side) of one or more known defense genes is recorded. Pink, a set of 123 pfams known to participate in antiphage defense (“positive set”); blue, the remaining 13,960 pfams analyzed in this study. (C) Neighborhood variability score for the analyzed pfams. Score represents the fraction of pfam members occurring in different defense neighborhoods out of total occurrences of pfam members (see Methods). Pink, the 123 positive pfams; blue, a set of 576 pfams that passed the 65% threshold for fraction of members occurring with defense genes in proximity.

From defense genes to defense systems

Antiphage defense systems are usually composed of multiple genes that work in concert to achieve defense—for example, cas1, cas2, cas3, and the cascade genes in type I CRISPR-Cas systems (19), and the R, M and S genes in type I restriction-modification systems (3). Genes functioning within the same defense system are frequently encoded on the same operon, and the gene order within the operon is highly conserved among distantly related organisms sharing the same system (3, 7, 9, 16, 19, 20). To investigate whether the defense-associated pfams belong to multigene systems, we used each such pfam as an anchor around which we searched for commonly associated genes (Fig. 1A). For this, we collected all the neighboring genes (10 genes from each side) from all the genomes in which members of the anchor pfam occurred and clustered these genes based on sequence homology (see Methods). We then searched for cassettes of gene clusters that, together with the anchor gene, show conserved order across multiple different genomes, marking such cassettes as candidate multigene systems (see Methods) (Fig. 1A).

The gene annotations in the resulting candidate systems were manually inspected to filter out likely false predictions. We found that 39% of the cases (129 of 335) represented nondefense, mobile genetic elements, such as transposons and integrases, that are known to colocalize with defense islands (15) (table S2). An additional 30% (102 of 335) represented known defense systems whose pfams were not included in our original set of known defense pfams, and 17% belonged to operons probably performing metabolic or other functions not associated with defense (fig. S1A). The remaining systems possibly represent putative new defense systems. To expand our predictions with new pfams that may be specifically enriched next to the putative new defense systems, a second prediction cycle was performed, this time adding the members of the predicted new systems to the positive defense pfam set (Fig. 1A and fig. S1B) (see Methods). Altogether, 41 candidate single-gene or multigene systems were retrieved from the two prediction cycles of this analysis (table S3). We further filtered from this set systems that were largely confined to a specific taxonomic clade (e.g., systems appearing only in cyanobacteria), resulting in a set of 28 candidate systems that showed broad phylogenetic distribution.

Experimental verification strategy

We selected two bacteria, Escherichia coli str. MG1655 and Bacillus subtilis str. BEST7003, as model organisms to experimentally examine whether the predicted systems confer defense against phages (Fig. 2A). None of the candidate new systems are naturally present in the genomes of these two bacterial strains. For each candidate system we selected source organisms from which the system was taken and heterologously cloned into one of the model organisms. To increase the probability that the cloned system would be compatible and functionally expressed within the receiving bacterium, we selected systems from mesophilic organisms as close phylogenetically as possible to E. coli or to B. subtilis and included the upstream and downstream intergenic regions so that promoters, terminators, or other regulatory sequences would be preserved. Where possible, we took at least two instances of each system (from two different source genomes), to account for the possibility that some systems may not be active in their source organism (21, 22). The DNA of each system, spanning the predicted genes and the intergenic spaces, was synthesized or amplified from the source genome and cloned into the phylogenetically closest model organism—either to E. coli (on a plasmid) or to B. subtilis (genomically integrated). As a control, we repeated the procedure with five known defense systems (instances of types I, II, and III R-M systems, a type III toxin/antitoxin system, and an abortive infection gene of the AbiH family) for which source organisms were similarly selected and cloning was performed into B. subtilis, as well as a sixth control composed of the recently discovered DISARM defense system (9) (table S4).

Fig. 2 Experimentally verified defense systems.

(A) Flowchart of the experimental verification strategy. (B) Active defense systems cloned into B. subtilis. (C) Active defense systems cloned into E. coli. For (B) and (C), fold protection was measured using serial dilution plaque assays, comparing the system-containing strain to a control strain that lacks the system and has an empty vector instead. Data represent average of three replicates; see figs. S2 and S3. Numbers below phage names represent phage genome size. On the right, gene organization of the defense systems, with identified domains indicated (DUF, domain of unknown function). Gene sizes are drawn to scale; scale bar, 400 amino acids.

Altogether, we attempted to heterologously clone 61 representative instances of the 28 candidate new systems, and successful cloning was verified by whole-genome sequencing (table S4). For 27 of these 28 systems, there was at least one candidate locus for which cloning was successful, and RNA sequencing (RNA-seq) of the transformants showed that, for 26 of the systems, at least one of the candidate loci was expressed in the receiving E. coli or B. subtilis strain.

The engineered bacteria were then challenged by an array of phages consisting of 10 B. subtilis and six E. coli phages, spanning the three major families of tailed double-stranded DNA (dsDNA) phages (myo-, sipho-, and podophages), as well as one single-stranded DNA (ssDNA) phage infecting E. coli (Fig. 2, B and C). Measuring phage efficiency of plating (EOP) on system-containing bacteria versus control cells, we found that 9 of the 26 tested systems (35%) showed protection from infection by at least one phage (Fig. 2, B and C and figs. S2 and S3). In comparison, three of the six positive control systems showed defense, with the remaining three showing no protection against the 10 B. subtilis phages tested (see Discussion).

We named the nine verified new systems after protective deities from various world mythologies. These defense systems comprise between 1 and 5 genes and span between 2 and 12 kb of genomic DNA (Table 1 and table S5). Where possible, we verified system consistency by testing for phage resistance in systems where individual genes were deleted (Figs. 3 to 5 and figs. S4 and S5). We found between several hundred and several thousand representations of each of the defense systems in sequenced microbial genomes, usually with broad phylogenetic distribution (fig. S6 and tables S6 to S15). Most systems were detected in >10 taxonomic phyla, and 7 of them appear in archaea (fig. S6). Some of the systems seem to target a specific family of phages (e.g., the Thoeris system appears to specifically protect from myophages), whereas others, such as the Hachiman system, provide broader defense (Fig. 2B). The genes comprising the new systems encode many protein domains that are commonly present in antiviral systems such as CRISPR-Cas and RNA interference (RNAi), including helicases, nucleases, and nucleic acid binding domains, in addition to many domains of unknown function and also atypical domains as described below. Three of the systems contain membrane-associated proteins, as predicted by the presence of multiple transmembrane helices. Below, we focus on further functional analyses for a selected set of systems.

Table 1. Composition of defense systems reported in this study.

View this table:
Fig. 3 The Zorya system.

(A) Representative instances of the type I Zorya system and their defense island context. Genes known to be involved in defense are orange. Mobilome genes are in dark gray. RM, restriction-modification; TA, toxin-antitoxin; Abi, abortive infection; Wadjet and Druantia are systems identified as defensive in this study (see below). (B) Representative instances of the type II Zorya system. (C) Domain organization of the two types of Zorya. (D) Model of the flagellum base. The position of the MotAB complex is indicated. (E) EOP of phage SECphi27 infecting wild-type (WT) type I Zorya, deletion strains, and strains with point mutations. Data represent plaque-forming units per ml; average of three replicates. Error bars, mean ± SD. ZorA:T147A/S184A and ZorB:D26N are predicted to abolish proton flux; ZorC:E400A/H443A are mutations in two conserved residues in pfam15611 (EH domain) whose function is unknown (23); ZorD:D730A/E731A are mutations in the Walker B motif, predicted to abolish ATP hydrolysis.

Fig. 4 The Thoeris system.

(A) Representative instances of the Thoeris system and their defense island context. Thoeris genes thsA (containing NAD+ binding domain) and thsB (TIR domain) are marked dark and light green, respectively. Genes known to be involved in defense are orange. Mobilome genes are in dark gray. RM, restriction-modification; TA, toxin-antitoxin; Abi, abortive infection. (B) The two Thoeris systems shown in this study to protect against myophages. Locus tag accessions are indicated for the individual genes. (C) EOP of phage SBSphiJ infection with WT and mutated versions of the B. amyloliquefaciens Y2 Thoeris (top set) or B. cereus MSX-D12 Thoeris (bottom set) cloned into B. subtilis BEST7003. Average of three replicates; error bars, mean ± SD.

Fig. 5 The Wadjet system provides protection against plasmid transformation in B. subtilis.

(A) Representative instances of the Wadjet system and their defense island context. Genes known to be involved in defense are orange. RM, restriction-modification; TA, toxin-antitoxin; Abi, abortive infection. (B) Domain organization of the three types of Wadjet. Pfam and COG domains were assigned according to the information in the IMG database (48). (C) Wadjet reduces plasmid transformation efficiency in B. subtilis. Instances of Wadjet systems were taken from B. cereus Q1 (type I), B. vireti LMG 21834 (type II), and B. thuringiensis serovar finitimus YBT-020 (type III) (table S4) and cloned into B. subtilis BEST7003. Gene deletions and point mutations are of the B. cereus Q1 type I Wadjet. Transformation efficiency of plasmid pHCMC05 into Wadjet-containing strains is presented as a percentage of the transformation efficiency to B. subtilis BEST7003 carrying an empty vector instead of the Wadjet system. Average of three replicates; error bars, mean ± SD.

The Zorya defense system

The Zorya system (named after a deity from Slavic mythology) was identified based on the enrichment of the anchor pfam15611, representing a domain of unknown function, within defense islands. Pfam15611-containing gene clusters were previously reported as genomically associated with tellurium- and stress-resistance genes (23). The reconstructed system is composed of the four genes zorABCD, overall encompassing ~9 kb of DNA, with pfam15611 being the third gene in the system (zorC) (Fig. 3C and Table 1). A representative Zorya operon from E. coli E24377A was cloned into E. coli MG1655 and provided 10- to 10,000-fold protection against infection by T7, SECphi27, and lambda-vir phages (Figs. 2C and 3, A and C, and fig. S3). Further searches based on homologies to the first two genes of the system, zorA and zorB, revealed a second type of Zorya, comprised of the three genes zorABE. A type II Zorya was cloned from E. coli ATCC8739 into E. coli MG1655 and provided defense against T7 and the ssDNA phage SECphi17 (Figs. 2C and 3, B and C, and fig. S3).

The first two genes of the Zorya system, zorA and zorB, contain protein domains sharing distant, but clear, homology with domains in motA and motB, respectively (Fig. 3C). MotA and MotB are inner membrane proteins that are part of the flagellar motor of bacteria. They assemble into a MotAB complex, which forms the stator of the flagellar motor (the static part within which the flagellar rotor swivels) (24). The MotAB complex also forms the proton channel that provides the energy for flagellar rotation, coupling transport of protons into the cell with the rotation (Fig. 3D) (25, 26). Whereas zorB shares the same size and domain organization with motB (including the pfam13677 and pfam00691 domains), zorA contains, in addition to the MotA domain (pfam01618), a long C-terminal helical domain that is sometimes identified as a methyl-accepting chemotaxis domain (COG0840). In addition to these two genes, type I Zorya contains zorC, a gene of unknown function, and zorD, which encodes a large protein (1200 amino acids) with a helicase domain that in some cases also encodes a C-terminal Mrr-like nuclease domain. Type II Zorya lacks zorC and zorD and instead contains zorE, a smaller gene encoding an HNH-endonuclease domain.

The gene composition of the Zorya system may point to several hypotheses as to its mechanism of action. It is possible that the system has adopted the MotAB proton channel to achieve depolarization of membrane potential upon phage infection. ZorC, ZorD, and ZorE may possibly be involved in the sensing and inactivation of phage DNA, and if phage inactivation fails, the ZorAB proton channel opens up, leading to membrane depolarization and cell death. Under this hypothesis, Zorya may be a conditional abortive infection system. Indeed, although Zorya-containing cells that were infected by phage T7 did not yield phage progeny in >80% of infection events, infection of Zorya-containing cells in liquid cultures has led to an eventual culture collapse, suggesting that Zorya-mediated defense involves death or metabolic arrest of the infected cells (fig. S7).

We further experimented with mutated forms of type I Zorya. All four genes in the system appear to be essential for its functionality, because deletion of each of the genes resulted in loss of protection from phage infection (Fig. 3E). Moreover, the activity of the ZorAB putative proton channel is necessary for the system’s functionality, because point mutations in residues predicted to be critical for proton translocation through the channel (either ZorA:T147A/S184A or ZorB:D26N) yielded a nonfunctional system (Fig. 3E). Similarly, point mutations inactivating the Walker B motif of the ZorD helicase domain, predicted to prevent adenosine triphosphate (ATP) hydrolysis, resulted in loss of protection from phage infection.

We identified 1829 instances of the Zorya system within 1663 sequenced bacterial genomes, belonging to 12 phyla, marking this system as prevalent in at least 3% of sequenced bacteria (fig. S6 and table S12). We did not find the system in archaea. The system is enriched in Proteobacteria and is markedly underrepresented in Gram-positive bacteria (Firmicutes and Actinobacteria) (fig. S6), suggesting that its functionality may depend on the double-membrane organization of Gram-negative bacteria or on differences between flagellar organization of Gram-negative and Gram-positive bacteria. Because the Zorya system protects against phages that do not use flagella as their receptor [e.g., T7 (27)], Zorya protection is unlikely to stem from a receptor-masking effect.

The Thoeris defense system

Thoeris (Egyptian protective deity of childbirth and fertility) is a system that was detected based on the enrichment of pfam08937 [Toll-interleukin receptor (TIR) domain] next to known defense genes (Fig. 4A). This domain was previously reported as associated with prokaryotic argonaute genes (28). The first gene in the Thoeris system, denoted thsA, has an nicotinamide adenine dinucleotide (NAD)–binding domain that is sometimes annotated as sirtuin (SIR2)–like domain or Macro domain. The second gene, thsB, contains the TIR domain and can appear in one or more copies (Fig. 4A). In some Thoeris versions, ThsA has a multitransmembrane N-terminal domain. Two instances of this system, one from Bacillus amyloliquefaciens Y2 (where ThsA is predicted to be membrane-associated) and the other from Bacillus cereus MSX-D12 (ThsA predicted as cytoplasmic), were engineered into B. subtilis BEST7003 and were found to confer defense against myophages (Fig. 2B, Fig. 4, B and C, and fig. S2). Because the three myophages that we tested are very different from each other and share few homologous genes, it is possible that the Thoeris system senses or targets a general feature in the biology of myophages rather than a specific sequence or genome modification. Both Thoeris genes, thsA and thsB, are essential in the system because deletion of either of them rendered the system inactive (Fig. 4C).

Interestingly, the TIR domain is an important component of the innate immune systems of mammals, plants, and invertebrates, where it mainly serves as a connector domain that transfers the immune signal once a molecular pattern of an offensive pathogen is sensed (29). In animals, this domain frequently forms the intracellular portion of membrane-bound Toll-like receptors, whereas in plants it is often connected to intracellular R genes (30) and can also be involved in direct recognition of pathogens (31, 32). Our finding marks a common involvement of TIR domains in innate immunity across the three domains of life and implies that the ancestry of this important component of eukaryotic innate immune systems may have stemmed from prokaryotic defense against phages.

The Thoeris system is broadly distributed in bacteria and archaea and can be detected in at least 4% of the sequenced genomes that we analyzed (2070 genomes) (table S6 and fig. S6). The TIR domain gene, thsB, has a strong tendency (52% of cases) to appear in multiple, diverse copies clustered around the thsA gene (Fig. 4A and table S6). Presence in multiple copies is typical to specificity-conferring genes in defense systems (such as the S subunit in type I R-M systems), where duplication followed by diversification serves for multiple specificities of the system (3335). It is therefore possible that the TIR domain gene is responsible for identification of specific phage patterns, with multiple TIR domain genes serving for recognition of different phage components. Under this hypothesis, it is tempting to suggest that Thoeris is the prokaryotic ancestral form of pathogen-associated molecular pattern (PAMP) receptors.

A recent study showed that TIR domains can have enzymatic NAD+ (oxidized form of NAD) hydrolase activities (36), which is in line with predictions that these domains process nucleotide derivatives (37). In Caenorhabditis elegans, this activity was shown to be involved in antifungal and antibacterial defense (38), whereas in animal neurons, NAD+ hydrolysis by the SARM1 TIR domain–containing gene leads to NAD+ depletion and generation of linear and cyclic adenosine diphosphate ribose (ADP–ribose) signaling molecules that regulate axonal degeneration (39). An E99A point mutation in the B. amyloliquefaciens Y2 ThsB protein, which aligns with the catalytic residue in the SARM1 NAD-cleaving TIR domain (fig. S8), abolished the protective activity of Thoeris (Fig. 4C). Moreover, point mutations in the ThsA NAD+ binding site, predicted to abolish NAD+ binding, also resulted in system inactivation (ThsA:N112A and ThsA:D100A/N115A for the B. cereus and B. amyloliquefaciens systems, respectively). These results suggest that NAD+ binding and hydrolysis are essential for the antiphage activity of the Thoeris system.

The Druantia system

Another system worth discussing briefly is the Druantia system (named after a deity from the Gallic mythology). This system is characterized by a gene encoding a very large protein (~1800 to 2100 amino acids) containing a domain of unknown function (DUF1998) as well as a helicase signature and a Walker A/B motif suggestive of ATP utilization. This large gene is typically preceded by a set of highly variable genes with no recognizable domains or function prediction—either three genes sized 350 to 600 amino acids (type I), or two genes sized 700 to 900 amino acids (type II), or a single large gene of 1000 to 1200 amino acids (type III) (fig. S5, A and B). In some cases, type I systems are preceded by a gene annotated as DUF4338, encoding yet another domain of unknown function; and type II systems are also associated with a cytosine methylase (fig. S5, A and B). A type I system cloned from E. coli UMEA 4076-1 into E. coli MG1655 rendered the engineered strain resistant against four of the six phages tested, and by serially deleting four of the genes in this system, we verified that all four are essential for its activity (fig. S5C). Notably, DUF1998-containing genes are among the components of the recently reported DISARM (9) and Dpd (40) defense systems, where their function is also unknown. The sheer size of the Druantia system (12 kb of genomic DNA) suggests a complex function, and the near-complete absence of recognizable domains in its genes suggests a new mode of defense not shared by prokaryotic defense systems whose mechanism is currently understood.

Defense against plasmid transformation

Some of the putative defense systems that we experimentally tested did not show any antiphage activity despite being strongly associated with known defense genes. We reasoned that some of these systems may defend against other forms of foreign DNA. To test this hypothesis, we selected one such system, which we denote Wadjet (god protector of ancient Egypt), for further experimentation. Wadjet is a four-gene system, composed of the genes jetABCD, which is common in microbial genomes and is very frequently found next to defense genes (Fig. 5A). Three instances representing three different types of Wadjet (see below) were cloned from three separate Bacillus species into B. subtilis BEST7003. Although none of these systems provided protection against any of the 10 Bacillus phages in our array, all three consistently and significantly reduced transformation efficiency of the episomal plasmid pHCMC05 (Fig. 5C). These results suggest that Wadjet may be a defense system specifically targeting foreign plasmids.

We identified three different domain compositions, each encoding a different set of pfams, but all with common sequence signatures marking them as three types of Wadjet (Fig. 5B). Whereas the pfam domains of Wadjet genes are mostly defined as “domains of unknown function,” structural modeling using Phyre2 (41) showed structural homology between JetA, JetB, and JetC and genes belonging to the housekeeping condensin system MukF, MukE, and MukB, respectively. Bacterial condensins are chromosome-organizing complexes that are responsible for DNA condensation and accurate segregation during replication (42), and mutations in the housekeeping condensins lead to severe defects in chromosome segregation and viability (43). Several versions of housekeeping condensins appear in bacterial genomes: SMC, MukBEF, and MksBEF (44); the Wadjet system was previously noted as a distant homolog of the MksBEF system described in P. aeruginosa (45).

Although the domain organization of the jetABC genes may lead to the hypothesis that Wadjet is an alternative condensin system involved in bacterial chromosome maintenance, our data imply that its role is defensive. This system is highly enriched within defense islands, undergoes extensive horizontal gene transfer, and is only sporadically found within strains of the same species, all of which is inconsistent with a core, essential role in chromosome maintenance. We hypothesize that the Wadjet system has been adapted from a MukBEF condensin ancestor to become a defense system. Possibly, the system identifies foreign plasmids and uses its condensin properties to interfere with proper plasmid segregation into daughter cells. Notably, plasmid transformation in B. subtilis takes place via the natural competence of this organism, during which the plasmid DNA is transformed to the cell through dedicated transporters as ssDNA (46). It is possible that the Wadjet system protects against rampant natural transformation or, alternatively, may specifically target ssDNA phages. However, because no ssDNA phage was reported for B. subtilis, we were not able to determine whether ssDNA phages are specifically blocked by the Wadjet systems cloned in B. subtilis BEST7003.

The Wadjet system is broadly spread in bacterial and archaeal genomes (found in ~6% of the genomes we studied), where it presents high sequence diversity (table S15 and fig. S6). Deletion of each of the four genes in type I Wadjet from B. cereus Q1 abolished its activity and restored plasmid transformation, indicating that each of the genes is essential for antiplasmid defense (Fig. 5C). Moreover, point mutations E59K/K60E in JetB, predicted to disrupt the MukE-MukF–like protein-protein interactions, resulted in loss of protective activity against plasmids and so has the E1025Q mutation in the Walker B motif of JetC that is predicted to abolish adenosine triphosphatase (ATPase) activity. The JetD gene, which has no homology to genes in the Muk system, has a putative topoisomerase VI domain based on structural predictions; a point mutation JetD:E226A, predicted to diminish binding of the topoisomerase VI domain to DNA, also abolished the protective activity of the system.


Our study considerably expands the known arsenal of defense systems used by prokaryotes for protection against phages. However, our results do not yet expose the complete set of prokaryotic defense systems. Out of the 26 candidate systems we tested, nine were verified as antiphage defense systems, and an additional one showed protection against plasmids. The remaining 16, although not verified by our experiments, do not necessarily represent false predictions, as exemplified by the fact that only 50% of our positive control systems showed defense in our assays. Lack of activity of positive control systems or candidate systems could possibly stem from incompatibility of some tested systems with the recipient organism (E. coli or B. subtilis) or could be due to pseudogenization of some systems in their genome of origin. Some systems may be highly specific against a certain type of phages or foreign genetic element not represented in our phage set, whereas others may work in a specific condition not tested in our study. Clade-specific potential systems, such as those found only in archaea or cyanobacteria (table S3), were not tested in this study and can represent a more specialized defense arsenal specific only to a subset of organisms. Finally, we may have missed some true systems by falsely tagging them as belonging to the “mobilome” (table S2), as mobile genetic elements have an intimate evolutionary relationship with defense systems (47).

In the past, the discovery and mechanistic understanding of antiviral defense systems led to the development of important biotechnological tools. For example, the discovery of restriction enzymes resulted in a revolution in genetic engineering, and CRISPR-Cas now revolutionizes the genome editing field. Eukaryotic immune systems, such as RNAi and antibodies, have also become widely used tools. The tendency of defense systems to turn into revolutionary molecular tools stems from their intrinsic high degree of flexible molecular specificity (to differentiate between self and nonself), as well as their inherent capability to target the identified molecule. One may envision that some of the new systems we discovered, once their mechanism is deciphered, may also be adapted into useful molecular tools in the future.

Materials and methods

Computational prediction of defense systems

A set of gene families known to participate in defense

A set of pfams and COGs that are known to participate in antiphage defense was compiled based on the gene families present in table S10 from Makarova et al. 2011 (15) with the addition of pfams/COGs present in the BREX (7) and DISARM (9) antiphage systems. This set is found in table S1.

Identification of pfams enriched near defense genes

The genome sequences, gene annotations, and taxonomy annotations of all publicly available sequenced bacterial and archaeal genomes were downloaded from the NCBI FTP site ( and, respectively) on April 2016. Pfam annotations for bacterial and archaeal genes were obtained from the Integrated Microbial Genomes (IMG) database (48) on December 2015, and cross-referenced to the genes in the genomes downloaded from NCBI using the locus_tag GenBank field. All pfams annotated in at least 20 genes (“members”) across the analyzed genomes (14,083 pfams) were scanned. For each pfam, the number of member genes for which a gene having an annotation of a known defense gene family (table S1) was present in proximity (up to 10 genes upstream and 10 genes downstream) was recorded. The fraction of defense-associated members out of total members (“defense score”) was calculated per pfam. A second score (“defense context variability score”) was calculated for each pfam as follows: for each member gene occurring with at least one defense gene in proximity, a list of the proximal defense genes was recorded, and the fraction of unique lists out of total number of lists for that pfam represents the score (for example: if pfamX is found within 20 genes in our set, with 15 of them having Cas9 nearby and 5 having type I R-M nearby, the number of unique lists is two, and the “defense context variability score” is 2/20 = 0.1). Pfams with defense score ≥ 65% and defense context variability score ≥0.1 were taken for further analysis. This list was supplemented with 35 non-pfam gene families that were predicted to be associated with defense by Makarova et al. 2011 (15), as well as 23 pfams that were predicted in the same study but did not pass the thresholds above (table S2).

From genes to systems

Each of the putative defense-related gene families was used as an anchor to search for multigene systems, as follows. The protein coding sequences for neighboring genes (±10 genes) for all family members were clustered based on sequence homology (for example, if pfamY is found within 50 genomes in our set, the 20 neighboring genes in each genome, plus the pfamY gene in each genome, were taken—altogether 50*21 = 1050 genes to be clustered). Clustering was done with OrthoMCL software v2.0.9 (49) with blastp parameters [-F 'm S' -v 100,000 -b 100,000 -e 1e-5 -m 8] and with mcl v12.068 downloaded from (50, 51) with inflation value of 1.1. When the number of blastp hits for a given anchor pfam was too large and prohibitive for OrthoMCL to generate clusters (>75 million blastp hits), a subset of genomes, containing only bacterial and archaeal genomes annotated as “complete” (rather than “draft”) was used for clustering.

To detect the most prevalent genes around the anchor pfam, only the 10% largest clusters (“frequent clusters”) were considered. For the sake of cluster size calculation, genes originating from the same species (derived from the strain name in the NCBI annotation) were counted as one gene, to prevent organisms for which many strains have been sequenced from inflating the cluster size. An edge between cluster(i) and cluster(j) was defined if a gene from cluster(j) followed a gene from cluster(i) in a given genome with no other genes belonging to frequent clusters found in between, with edge weight (“thickness”) defined as the number of such adjacency cases. Again, edge weights were adjusted such that multiple appearances of a cluster pair originating from the same species were recorded as a single appearance. Only the 10% thickest edges were retained for further analysis. In each genome, the maximal “path” that included the anchor pfam gene and was composed of the retained (largest) clusters and the retained (thickest) edges was recorded. Such a “path,” representing a set of genes appearing in a conserved order in multiple genomes, was considered a candidate multigene system. Infrequent variations on the gene order and composition of common systems were merged into the common system if they shared at least 50% of their clusters and had less than 25% appearances than the common system. Only systems with five or more appearances from different species were further analyzed.

The domains within the gene members of each system were analyzed bioinformatically using the tools HHpred (52, 53), Phyre2 (41), PSI-BLAST (54) and NCBI’s Conserved Domain Database (CDD) (55). The systems were then manually filtered, based on this analysis, to remove (i) known defense systems whose domains did not appear in our initial set of gene families known to participate in defense; (ii) systems likely representing mobile genetic elements (“mobilome”) and; (iii) systems likely participating in nondefensive functions or house-keeping systems (table S2).

A second cycle of prediction was then performed, expanding the set of “positive” gene families from table S1 to include the gene families participating in the candidate new defense systems, as well as the gene families participating in known defense systems that were previously missing from our set and detected in the first round. All pfams were again scanned and the same thresholds were applied (defense score 65%, context variability score 0.1). New pfams retrieved from the second cycle were analyzed as above to generate and annotate multigene systems.

Candidate new systems were further prioritized to select instances for experimental validations. Systems tagged as “questionable,” due to uncertainty whether they represent defense genes or mobile genetic elements, were filtered out (table S3). Systems existing in only a narrow range of organisms, as well as systems that were not found in organisms phylogenetically close either to E. coli or B. subtilis, were not tested experimentally (table S3).

For system selection for experimental testing, we first attempted to select candidate systems from organisms close to B. subtilis as the receiving model organism, as in this organism genomic integration of large fragments of DNA is straightforward and results in a single-copy addition of the system. In case no source organisms sufficiently close to B. subtilis were found, we switched to E. coli as the model organism for experimentation.

Phylogenetic distribution analysis of new systems

For each validated defense system, several loci, including the locus that was experimentally verified, were taken as seeds for psi-Blast. psi-Blast version 2.5.0 of BLAST+ (54, 56), with parameters [-num_iterations 10 -max_hsps 1 -max_target_seqs 100,000 -evalue 1e-10], was performed for each protein of each system, against all microbial genomes downloaded from NCBI on April 2016. When the hits of all proteins of a system were found closely localized on a genome, spanning no more than 150% of the length of the original system, this genome was recorded as containing the system. For the Druantia and Wadjet systems, -evalue 1e-5 was used to enable detection of distant homologs. For systems with four or five genes (Zorya type I, Druantia types I and II, and Wadjet), systems were reported if at least three of their genes were identified. For the Druantia system, systems with hits to the DruE protein were retained if the DruE size was >1300 amino acids. For the Thoeris system, multiple thsB genes near the thsA gene were recorded if they were within 10 kb of genomic DNA around the identified thsA. Phylum for each genome was obtained using the JGI taxonomy server (

Experimental validation of defense systems

Cloning of candidate systems into E. coli MG1655 and B. subtilis BEST7003

A cloning shuttle vector for large fragments was constructed as previously described (9). The vector contains a p15a origin of replication and ampicillin resistance for plasmid propagation in E. coli, and amyE integration cassette with spectinomycin resistance for genomic integration into B. subtilis. The backbone of this vector was amplified using primers OGO309+OGO310, adding to it a BamHI restriction site and a terminator site upstream to the insert cloning site. The multiple cloning site of plasmid pBS1C (57), received from the Bacillus Genetic Stock Center (BGSC) (accession ECE257), was amplified using primers OGO311+OGO312. Both fragments were digested using AscI and BamHI, ligated using T4 ligase and transformed into E. coli, resulting in plasmid pSG1-rfp.

The loci of most systems were commercially synthesized and cloned, by Genscript Corp., directly into pSG1-rfp between the AscI and NotI sites of the multiple cloning site (table S4, “Cloning method” column). In one case (the type I Wadjet system) the DNA was synthesized by Gen9 (Boston, MA) with synonymous modifications to optimize GC content. In case the donor strains were readily available, the system was not synthesized but instead was directly amplified from the genomic DNA of the donor strain using KAPA HiFi HotStart ReadyMix (Kapa Biosystems KK2601) with primers as detailed in table S16. For long systems (>10,000 bases) when the donor strain was not available, the system was commercially synthesized in overlapping fragments (table S4, “Cloning method” column). Systems amplified from genomic DNA or ordered as overlapping fragments were cloned into pSG1-rfp between the AscI and NotI sites using NEBuilder HiFI DNA Assembly cloning kit (NEB E5520S). The full list of sources used for cloning the systems into our model organisms is found in table S4, including the accessions of all strains ordered.

Transformation to B. subtilis was performed using MC medium as previously described (58). MC medium was composed of 80 mM K2HPO4, 30 mM KH2PO4, 2% glucose, 30 mM trisodium citrate, 22 μg/ml ferric ammonium citrate, 0.1% casein hydrolysate (CAA), 0.2% potassium glutamate. From an overnight starter of bacteria, 10 μl were diluted in 1 ml of MC medium supplemented with 10 μl 1M MgSO4. After 3 hours of incubation (37°C, 200 rpm), 300 μl was transferred to a new 15 ml tube and ~200 ng of plasmid DNA was added. The tube was incubated for another 3 hours (37°C, 200 rpm), and the entire reaction was plated on LB agar plates supplemented with 100 μg/ml spectinomycin and incubated overnight at 30°C.

For systems tested in E. coli, the cloned vector was transformed into E. coli MG1655 cells (ATCC 47076), and the resulting transformants were verified by PCR. For systems to be tested in B. subtilis, the cloned vector was transformed into B. subtilis BEST7003 cells, kindly provided previously by M. Itaya. The system was integrated into the amyE locus, and resulting transformants were screened on starch plates for amylase-deficient phenotype. Whole-genome sequencing was then applied to all transformed B. subtilis and E. coli clones as described in (9) to verify system’s integrity and lack of mutations.

As a negative control for transformation into B. subtilis, a transformant with an empty plasmid, containing only the spectinomycin-resistance gene in the amyE locus, was used. As a negative control for transformation into E. coli, the wild-type E. coli MG1655 carrying an empty plasmid was used.

For strains with gene deletions and point mutations, plasmids containing systems with these deletions/mutations were commercially synthesized by Genscript. The mutated systems were transformed into B. subtilis and E. coli as described above, and clones used were fully sequenced to verify proper integration and sequence of the mutated systems.

Phage strains, cultivation, and plaque assay

The following B. subtilis phages were obtained from the BGSC: SPO1 (BGSCID 1P4), phi3T (BGSCID 1L1), SPβ (BGSCID 1L5), SPR (BGSCID 1L56), phi105 (BGSCID 1L11), rho14 (BGSCID 1L15), and SPP1 (BGSCID 1P7). Phage phi29 was obtained from the DSMZ (DSM 5546). Phages SBSphiJ and SBSphiC were isolated by us from mixed soil and leaves samples on B. subtilis BEST7003. For this, soil and leaves samples were added to a log phase B. subtilis BEST7003 culture and incubated overnight to enrich for B. subtilis phages. The enriched sample was centrifuged and filtered through 0.2 μm filters, and the filtered supernatant was used to perform double layer plaque assays as described in Kropinski et al. (59). Single plaques that appeared after overnight incubation were picked, re-isolated three times, and amplified as described below.

E. coli phages (T4, T7, and lambda-vir) were kindly provided by U. Qimron. Phages SECphi17, SECphi18, and SECphi27 were isolated as described in Wommack et al. (60) from sewage samples on E. coli MG1655. 0.2 μm filtered concentrated sewage samples were used to perform double layer plaque assays, individual plaques were picked, re-isolated three times, and amplified as described below.

All phages isolated by us were Illumina sequenced following a library prep using the Nextera protocol (61) and assembled using SPAdes v. 3.10.1 using the –careful and –cov-cutoff auto modifiers (62). Assembled genomes and raw reads were deposited in the European Nucleotide Archive (ENA) under study accession PRJEB23070. Phage classification was done according to sequence homology to the closest known similar phage. Phage SECphi17 (ENA ERS1981053) has a 5,538 bp genome and its closest relative is Coliphage WA3 (GenBank DQ079897.1, 66% coverage, 81% identity), indicating that it is an ssDNA phage of the Microviridae family. Phage SECphi18 (ENA ERS1981054) has a 44,798 bp genome and its closest relative is Escherichia phage Gluttony (GenBank KX534336.1, 92% coverage, 93% identity), indicating that it is a member of the Siphoviridae family. Phage SECphi27 (ENA ERS1981055) has a 51,811 bp genome, and its closest relative is Escherichia phage vB_Eco_swan01 (GenBank LT841304.1, 91% coverage, 98% identity), indicating that it is a member of the Siphoviridae family. Phage SBSphiJ (ENA ERS1981056) has a 156,875 bp genome, and its closest relative is Bacillus phage Grass (GenBank KF669652.1, 91% coverage, 95% identity), indicating that it is a member of the family Myoviridae. Phage SBSphiC (ENA ERS1981057) has a 144,651 bp genome, and its closest relative is Bacillus phage SP10 (GenBank AB605730.1, 94% coverage, 90% identity), indicating that it is a member of the Myoviridae family. Siphoviridae and Myoviridae phage morphologies were verified by electron microscopy (EM). For the EM experiments, phage lysates were blotted onto copper grids, stained using uranyl acetate 2%, and visualized in FEI Tecnai T12 transmitting electron microscope.

Phages were propagated on either E. coli MG1655 or B. subtilis BEST7003 using the plate lysate method as previously described (63). Lysate titer was determined using the small drop plaque assay method as previously described (64). Bacteria were mixed with MMB agar (LB + 0.1 mM MnCl2 + 5 mM MgCl2 + 5 mM CaCl2 + 0.5% agar), and serial dilutions of phage lysate in MMB were dropped on top of them. After the drops dried up, plates were incubated at room temperature overnight. EOP was measured by performing small drop plaque assay with the same phage lysate on control bacteria and bacteria containing the candidate defense system, and comparing the ratio of plaque formation.

To determine the number of infective centers during infection with T7 phage of control bacteria and bacteria containing type I or type II Zorya, we used a modified version of the technique described in (65). Zorya-lacking E. coli MG1655 or Zorya-containing cells were infected with T7 phage at MOI 0.05 and incubated for 10 min at 37°C to allow adsorption. Cells with adsorbed phages were then centrifuged (1 min, 14,000 rpm) at 4°C, washed once with ice-cold MMB medium, and resuspended in 200 μl ice-cold MMB medium. Then, 100 μl aliquots of 10-fold dilutions of resuspended phage-infected cells were mixed with 100 μl of a Zorya-lacking E. coli MG1655 culture grown to O.D. 0.3. The mixture was plated using the double agar overlay method and infection centers (plaques) were counted after overnight incubation in room temperature.

For the liquid culture infection with T7 phage, overnight cultures of Zorya-lacking E. coli MG1655 or Zorya-containing cells were diluted 1:100 in MMB medium. 180 μl volumes of the diluted culture were dispersed into wells in a 96-well plate and grown at 37°C with vigorous shaking until early log phase (O.D.600 0.3). 20 μl of T7 phage lysate were added at multiplicities of infection 0.05, 0.5 and 5 in three replicates. Optical density measurements at a wavelength of 600 nm were taken every 15 min using a TECAN Infinite 200 plate reader in a 96-well plate as previously described (9).

Transformation efficiency assay

Transformation was performed using the MC medium as described above. To test plasmid transformation efficiency, the episomal Bacillus plasmid pHCMC05 was used (66). Transformation efficiency was calculated by dividing the number of transformants that grew on LB plates containing 5 μg/ml chloramphenicol by the live count on LB plates.

DNA-seq and RNA-seq

DNA was extracted from bacteria using Qiagen DNeasy blood and tissue kit (Qiagen 69504). DNA libraries were constructed using the Nextera library preparation protocol as previously published (61). RNA-seq was performed with the NEBNext Ultra Directional RNA Library Prep Kit (NEB, E7420) according to the manufacturer’s instructions with modifications as previously described (67). Prior to library preparation, equal amounts of extracted RNA from 3 to 7 strain samples were pooled together and processed as a single library. All libraries were sequenced using the Illumina NextSeq500. The sequencing reads were aligned to the reference genomes of B. subtilis BEST7003 (GenBank: AP012496) and E. coli MG1655 (GenBank: NC_000913), and to the plasmid sequence of each system, using Novoalign 3.02.02 (Novocraft Technologies Sdn Bhd, with the default parameters and [-r Random]. The coverage along the reference genomes was calculated, to determine whether each system exists in the genome (DNA-seq) or expressed (RNA-seq). The pooled RNA library was sequenced to a depth of 5 million reads per sample and later aligned to the reference genomes as described.

Supplementary Materials

References and Notes

Acknowledgments: We thank H. Sberro, O. Zuqert, O. Cohen, S. Mintzer, R. Shenhav, Z. Erez, D. Dar, M. Voichek, and N. Tal for useful discussion during the course of this study. We also thank M. Voichek for assistance with RNA-seq and S. Silverman for help with phage isolation. R.S. was supported, in part, by the Israel Science Foundation (personal grant 1360/16 and I-CORE grant 1796/12), the European Research Council (ERC) (grant ERC-CoG 681203), the Abisch-Frenkel Foundation, the David and Fela Shapell Family Foundation, the Benoziyo Advancement of Science grant, the Minerva Foundation, and the Pasteur-Weizmann Council. Assembled phage genomes and raw reads were deposited in the European Nucleotide Archive (ENA) under study accession PRJEB23070. Competing interests: R.S. is a scientific cofounder and advisor of BiomX Ltd. S.D., S.M., G.A., A. Leavitt, and R.S. are inventors on U.S. provisional patent application 62/586,911. S.D., S.M., G.O., and R.S. are inventors on U.S. provisional patent applications 62/512,216 and 62/512,219.
View Abstract

Navigate This Article