Research Article

Virus Population Dynamics and Acquired Virus Resistance in Natural Microbial Communities

See allHide authors and affiliations

Science  23 May 2008:
Vol. 320, Issue 5879, pp. 1047-1050
DOI: 10.1126/science.1157358


Viruses shape microbial community structure and function by altering the fitness of their hosts and by promoting genetic exchange. The complexity of most natural ecosystems has precluded detailed studies of virus-host interactions. We reconstructed virus and host bacterial and archaeal genome sequences from community genomic data from two natural acidophilic biofilms. Viruses were matched to their hosts by analyzing spacer sequences that occur among clustered regularly interspaced short palindromic repeats (CRISPRs) that are a hallmark of virus resistance. Virus population genomic analyses provided evidence that extensive recombination shuffles sequence motifs sufficiently to evade CRISPR spacers. Only the most recently acquired spacers match coexisting viruses, which suggests that community stability is achieved by rapid but compensatory shifts in host resistance levels and virus population structure.

Viruses are arguably the most abundant and diverse components of natural environments. They can potentially alter the makeup, and thus the functioning, of microbial communities (13) and excise and transfer host DNA, facilitating genetic exchange and driving host evolution [e.g., (4, 5)]. An important recent advance has been the recognition of a virus resistance system in bacteria and archaea that is based on genetically encoded spacers within the clustered regularly interspaced short palindromic repeats (CRISPR) loci (68). Experiments using Streptococcus thermophilus strains and their viruses established that CRISPR spacers with sequence identity to viral genomes confer specific resistance to viruses. Resistance is lost if there is even a single nucleotide mismatch between spacer and virus genome (9).

Although a suite of CRISPR-associated (Cas) proteins is implicated in microbial resistance (10, 11), the details of the CRISPR-based mechanism and the dynamics of interactions involving viruses and the host CRISPR loci are not yet known. Community genomic analyses can capture roles for population heterogeneity in virus-microbe interactions in natural systems that are not apparent in pure culture experiments. Previously, we analyzed two closely related bacterial Leptospirillum populations and found patterns of variation consistent with very rapid evolution of the CRISPR locus (12). Given their likely viral origin (69), we hypothesized that spacers of CRISPR loci recovered from microbial communities could be used to fish out viral sequences from among the numerous, small, and otherwise unassigned community genomic fragments and to link viruses to their coexisting host bacteria and archaea (fig. S1A). Sufficient viral sequence was recovered to enable extensive reconstruction of multiple virus genomes from natural microbial communities.

DNA was extracted from two biofilms collected within the Richmond Mine, Redding, CA (fig. S1B). The subaerial UBA biofilm, dominated by bacterial Leptospirillum groups II (13) and III, was growing in 39°C pH 1.1 acid mine drainage (AMD) and was collected in June 2005. The floating UBA BS biofilm, growing in 39°C, pH 1.5 AMD, and collected in November 2005, contained abundant E-, G-, and A-plasma (14) and the more distantly related I-plasma, all uncultivated members of the Thermoplasmatales lineage of Euryarchaea. The sample also contained Leptospirillum groups II and III and other novel archaea (14). Approximately 100 Mb of genomic sequence was obtained from each of the small insert libraries constructed from the UBA and UBA BS samples (15). Essentially complete genomes of Leptospirillum groups II (13) and III were reconstructed from the UBA sample. In addition, near-complete genomes of I-plasma (∼20-fold coverage) and E-plasma (∼10-fold coverage) and lower coverage but near-complete genomes of G-plasma and A-plasma were reconstructed from sequencing reads derived from both samples (15).

In the combined community genomics data set, we identified 476 reads encoding 37 different CRISPR repeat sequences (fig S1A and table S1). A subset of the sequences were mapped to CRISPR loci of Leptospirillum group II (16), Leptospirillum group III, I-plasma, E-plasma, A-plasma, and G-plasma genomes and bacterial plasmids (15). From the CRISPR reads, 6044 spacer sequences, representing 2348 unique sequences (28 to 54 nucleotides in length) were extracted (table S2). As initial confirmation that reads exactly matching CRISPR spacers (and lacking repeats) derive from coexisting viruses, all spacer-containing non-CRISPR (SNC) reads were compared with sequences in public databases. The 911 SNC reads were significantly enriched in genes with matches to viral proteins, compared with random reads (Fisher's exact test, P <10–16) (fig. S2), and were also enriched in genes encoding proteins with typical viral functions (fig. S3). Thus, the majority of SNC reads derived from viruses, although some corresponded to other mobile genetic elements, such as plasmids and transposons.

Exact matches to spacer sequences were also found on assembled DNA fragments (contigs) up to 15 kb in length. SNC contigs were typically linked by clone mate pairs to other SNC contigs, but only rarely to archaeal or bacterial genome fragments (supporting online text). On the basis of mate-pair linkage and blocks of similar sequence, SNC contigs were condensed by manual assembly (15) to form larger fragments that we inferred represented partial and possibly complete viral genomes.

On the basis of tetranucleotide frequencies (17), SNC contigs were clustered into three major groups (Fig. 1). The first cluster was formed from contigs that match CRISPR spacers of the bacterial Leptospirillum groups II and III. The second and third clusters were targeted by CRISPR spacers of archaea and of plasmids (supporting online text).

Fig. 1.

Virus-host associations in AMD biofilms. Putative viral (SNC) contigs were clustered based on tetranucleotide frequencies (left panel), and CRISPRs were clustered based on patterns of SNC contig matching (right panel) (15). Columns in the left panel represent tetranucleotides (reverse complementary pairs are combined); colors indicate frequencies (gray indicates absence). Columns in the right panel represent CRISPRs; colors indicate number of distinct spacer sequences matching SNC contigs. The majority of Cluster 1 (C1) contigs belong to the AMDV1 population, Cluster 2 (C2) to AMDV2, and Cluster 3 (C3) to AMDV3, AMDV4, and AMDV5 (see table S3 for details).

Many of the SNC contigs of the first tetranucleotide cluster (Fig. 1) were linked by mate pairs into subclusters (table S3). One such subcluster (AMDV1) was targeted by Leptospirillum group II and III spacers. AMDV1 is similar to prophages integrated into the genomes of Gluconobacter oxydans 621H (18) and Acidiphilum cryptum JF-5 (fig. S4 and supporting online text).

The second tetranucleotide cluster (Fig. 1) includes a deeply sampled SNC contig (AMDV2) (fig. S6 and supporting online text) and smaller contigs linked to it by mate pairs and overlapping sequence (fig. S5 and table S3). AMDV2 is targeted by 33 spacers derived only from the two E-plasma CRISPR loci. The composite sequence of the discrete, linear 10-kb viral genome has a GC content of 23.7% and inverted 160 base pair (bp) repeats on each end, and encodes 17 (putative) genes, including a type B DNA polymerase, all on the same strand (Fig. 2A).

Fig. 2.

Diagram illustrating genome organization and recombination within virus population AMDV2. (A) Putative genes on the linear archaeal virus AMDV2. (B) Pattern of nucleotide polymorphisms (SNPs, colored bars) in a subset of sequencing reads within a region of the DNA polymerase gene (largest gene). The region was divided into equally spaced blocks (A to L), and the alleles were numbered based on SNP patterns to the left of the label. In the summary table below, colors are assigned to alleles based on the read in which the allele first appears. (C) Linkage disequilibrium, Mvol (36), plotted on the vertical axis as a function of interlocus distance. Allele and haplotype frequencies were calculated from aligned sequence reads for all pairs of single-nucleotide loci, 1 to 100 bp apart, on contig 14338 (AMDV2) (15). Mvol was significantly negatively correlated with interlocus distance (Spearman's rank correlation ρ = –0.32, P < 10–16). The box plots illustrate Mvol distributions within 5 bp intervals of interlocus distances (50% of data points are within boxes and 80% within whiskers).

The third tetranucleotide cluster (Fig. 1) comprises several subclusters of mate-pair-linked and partially overlapping contigs (fig. S5 and table S3). One subcluster, AMDV3, includes ∼18 kb of composite genome sequence plus many small strain variant contigs (fig. S7). Given targeting of AMDV3 by the CRISPR loci of E-, G-, and A-plasma [88 to 92% 16S ribosomal RNA (rRNA) gene sequence identity], we infer that this virus population has a relatively broad host range.

AMDV4 (∼56 kb composite genome) shares some sequence similarity with AMDV3 (fig. S5) and groups with it in tetranucleotide cluster 3. E-plasma is the likely host for this virus (table S3 and supporting online text). Although the E-plasma type #1 CRISPR locus targets AMDV4, more of its spacers target AMDV2. The interspersing of spacers matching to AMDV2 and AMDV4 on single CRISPR reads suggests that E-plasma is exposed to both of these narrow host range viruses simultaneously.

Only one virus, AMDV5, targets I-plasma, and it is only detected in the UBA BS sample that contains this archaeon. I-plasma shares only 80% 16S rRNA gene sequence identity with the closest of the other Thermoplasmatales archaea and has no spacers that would confer immunity to other Thermoplasmatales viruses (table S3). All 10 I-plasma spacers that match SNC reads colocalize to the two AMDV5 fragments (∼4 kb). Interestingly, three AMDV5 proteins have closest matches in eukaryal viruses (supporting online text).

CRISPR loci 11, 13, 17, and 19 are associated with plasmid-like populations (table S3 and supporting online text). These complex populations may be responsible for CRISPR lateral transfer (19).

Up to 40% of spacers in a single CRISPR locus matched virus sequences, the highest degree of correspondence being for the AMDV2 population. Given that each CRISPR read represents an individual cell, it is evident that most microbial cells target several different virus populations (e.g., some E-plasma cells target specific AMDV2, 3, and 4 variants). By mapping spacers onto reads coassembled into AMDV2 and AMDV3, we found that some spacers match the dominant sequence types, yet many match sequences characteristic of only one or a small number of genotypes within each population (e.g., fig. S8).

Comparative analyses of isolated bacteriophages reveal genome mosaicism (2024). Less is known about natural population heterogeneity or recombination in archaeal viruses [but see e.g., (2529)]. The AMDV2 population displays a high level of nucleotide variation (∼94% average similarity of aligned reads to each other). Combinatorial mixtures of small sequence motifs (Fig. 2B and fig. S9) suggest that the population has been shaped by extensive homologous recombination. We measured linkage disequilibrium as a function of interlocus distance and found a significant decline in linkage with distance (Spearman's correlation, P <10–16) (Fig. 2C). Linkage becomes independent of distance at loci separations of >25 nucleotides (values remain close to 0.1). Thus, recombination has scrambled virus sequences so much that blocks shared by different individuals are often no more than 25 nucleotides in length, in agreement with the size of sequence motifs apparent in Fig. 2B. The persistence of some linkage over longer distances (Fig. 2C) is due to a low abundance of highly similar sequences. Recombination creates new virus sequences that are able to confound the function of 28 to 54 nucleotide CRISPR spacers, allowing the virus to evade the CRISPR-based host defense system (fig. S8). Compared with mutation, recombination generates new DNA signatures with less risk of altering protein function and limits purging to sequence motifs recognized by spacers rather than entire viral genotypes, preserving viral population diversity.

Within the AMDV3 and AMDV4 populations, genotypes share larger sequence motifs than AMDV2 (Fig. 3). In AMDV3, substitution of highly divergent sequence blocks within some variants of a large membrane or capsid-like protein gene (Fig. 3 and fig. S7) may alter the host range, allowing infection of the A-, E-, and G-plasma lineages, analogous to specificity conferred by recombination in bacteriophage tail fiber proteins (30).

Fig. 3.

Diagram illustrating heterogeneity within a genomic region of the archaeal virus population, AMDV3. Genes (row of arrows) and a subset of the assembled sequencing reads on contig 15322 (bars are reads, thin white lines link mate pairs) are shown. The red box identifies the region of the gene magnified below, where colored bars indicate single-nucleotide polymorphisms (SNPs). Linkage patterns among defined SNP patterns are consistent with extensive recombination.

The set of CRISPR spacers in each cell within the host populations differs, and only a few CRISPR spacers are shared widely (table S4). The exception is I-plasma, where the first ∼800 bp of the CRISPR locus is clonal (i.e., each cell has the same spacer in the same position as every other cell for the first 12 of, on average, ∼15 spacers). The CRISPR locus diversifies quickly toward the cas genes; the terminal region provided 36 single-copy spacers. The colocation of the switch to nonclonal spacers and to spacers that match AMDV5 (supporting online text) suggests that the recent appearance of AMDV5 caused a selective sweep. The fact that mostly rare and recently incorporated (close to the cas genes) spacers (7, 9, 12, 31) match the I-plasma virus (11 of 13 are in single copy) also applies to other archaea (table S4).

The complexity of most previously studied systems has limited resolution of patterns of viral and host distribution over space and time. In the current study, most virus populations (AMDV2–4) and their hosts were present in the two samples collected about 5 months apart. However, the relative abundances of AMDV2 and AMDV4 populations, inferred to primarily target the same host (E-plasma), differed between samples (supporting online text). Interestingly, in the June sample, all of the matching spacers were found on the type 1 CRISPR locus, whereas the majority in the November sample were carried on the type 2 locus, which had expanded considerably (197 versus 37 distinct spacer sequences). Moreover, there was only one matching spacer common to the E-plasma populations sampled at the two time points. This suggests rapid evolution of the CRISPR loci and potential modulation of resistance levels on the time scale of months.

CRISPR loci cannot grow unchecked (12). Consequently, a host may be exposed to a virus for which it has no spacer-based immunity as the result of spacer loss. Alternatively, this may occur as a result of migration of a new virus type into the community (32) or evolution of a virus to predate a new host. For infections that lower the fitness of the host, the first cells either to acquire an effective CRISPR locus by lateral transfer (e.g., from the plasmid pool) or a spacer matching the new virus, would thrive relative to other individuals in its population. Proliferation of these cells will spread effective spacers in the population, generating a pattern comparable to that observed in the I-plasma CRISPR locus. Increase in population-level immunity by spacer acquisition (33) will be countered by virus mutation (33) and recombination, and possibly other mechanisms of genome evolution (34). Resistance may increase over time to the point that the virus population declines, or a virus may occasionally become so virulent that it causes a crash in the host population, as predicted by the “kill the winner” model (35). Alternatively, if CRISPR and viral diversification remain in balance, a relatively stable virus and host community may result.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S9

Tables S1 to S4

References and Notes

View Abstract

Stay Connected to Science

Navigate This Article