In Situ Evolutionary Rate Measurements Show Ecological Success of Recently Emerged Bacterial Hybrids

See allHide authors and affiliations

Science  27 Apr 2012:
Vol. 336, Issue 6080, pp. 462-466
DOI: 10.1126/science.1218389


Few data are available on how quickly free-living microorganisms evolve. We analyzed biofilms collected from a well-defined acid mine drainage system over 9 years to investigate the processes and determine rates of bacterial evolution directly in the environment. Population metagenomic analyses of the dominant primary producer yielded the nucleotide substitution rate, which we used to show that proliferation of a series of recombinant bacterial strains occurred over the past few decades. The ecological success of hybrid bacterial types highlights the role of evolutionary processes in rapid adaptation within natural microbial communities.

Microbial communities, which drive Earth’s geochemical cycles (1), can rapidly respond to change, but the proportion of this response that can be attributed to evolutionary processes, rather than species composition or gene expression shifts, remains an unresolved question (2). Most evolutionary rate estimates are available for nucleotide substitution rates and derive from laboratory measurements (3, 4). It is difficult to know how relevant these rates are for geochemical environments, because studies on natural populations have been restricted to pathogens (57) and endosymbionts (8).

Our approach to measuring evolutionary rates for free-living bacterial populations involved tracking genome change in a time series of natural microbial community samples. Few systems are suitable for such a study; requirements include the availability of discrete, well-defined, and relatively homogeneous microbial community samples from a location where a reproducible community forms over time, with very restricted microbial input from other regions. The microbial communities catalyzing the formation of acid mine drainage (metal-rich, low-pH solutions) in the Richmond Mine (Iron Mountain, CA) provide such a system. Biofilms that develop at the air-solution interface on standing pools and slow-flowing underground streams have been used for more than a decade to study evolutionary processes and ecological complexity in nature (9). Key to the success of this environment as a model system has been its low species richness, which enables detailed culture-independent community genomics analyses of the genetic structure and dynamics of natural populations (1013). Most biofilms within this system are dominated by the chemolithoautotrophic Leptospirillum group II. Previous metagenomic studies on two biofilms led to the reconstruction of genomes for Leptospirillum group II, types I and VI, the genes of which share ~94% average nucleotide identity (10, 13, 14). Inferences based on proteomics data indicate that large-scale homologous recombination events occurred between two parental Leptospirillum group II types, resulting in six recombinant (hybrid) genotypes (14, 15). One genotypic group, comprising types II through VI, predominates during initial colonization, whereas type I becomes abundant in later successional stages (16).

We have collected biofilms within the Richmond Mine over approximately 9 years. Sampling required field expeditions to locations where ambient conditions are often close to the limit of human endurance (e.g., 100% humidity, ~48°C, with depressed O2 levels) at sites only accessible at times of the year when flow rates are low and danger from underground collapse is minimized. Working within these constraints, we collected microbial biofilms (both time series and spatially resolved) from six underground locations between March 2002 and December 2010 at intervals of 3 weeks to more than 1 year (Fig. 1, table S1, and figs. S1 and S2) (17). We selected 13 samples from the C75 location, a site indicated by proteomics-based inferences to be typically dominated by the Leptospirillum group II, type III genotype (15). This allowed us to analyze single-nucleotide substitution accumulation over 5 years, and thereby to estimate the substitution rate. On the basis of the same proteomics-based inferences, we also selected nine biofilms that provided multiple samplings of most of the other Leptospirillum group II genotypes (types I, III, IV, V, and VI). Together with the two sequence data sets from which the type I and type VI reference genomes had been reconstructed, this allowed us to reconstruct a lineage history (fig. S1).

Fig. 1

Leptospirillum group II genotype distribution shows dispersal across the Richmond Mine and type III strain turnover at the C75 location. (A) Richmond Mine schematic map, with pie charts indicating genotype proportions in 24 samples, estimated on the basis of read recruitment (figs. S5 and S9). (B) Acid mine drainage flow rate (measured at the mine entrance) and community composition at the C75 location, measured by fluorescence in situ hybridization (Arc, Archaea; L3, Leptospirillum group III; L2, Leptospirillum group II). Leptospirillum group II, type III strain transitions, revealed by SNP analysis, are indicated by an asterisk inside the Leptospirillum group II FISH data. (C) Flow data, 2001 to 2011. Shading indicates period displayed in (B).

We generated community genomic data sets comprising ~63 billion base pairs of sequence from 24 samples, each of which contained ~108 Leptospirillum group II cells (10). Using sequences obtained from the first five C75 location biofilms, for which longer read lengths were available, we reconstructed a type III genome de novo (17). The genome is indeed a recombinant hybrid of the previously reconstructed type I and type VI genomes, as deduced from proteomic analysis (15). All identified recombination points were located within genes (table S2 and fig. S3). All Leptospirillum group II sequences were accounted for (fig. S4) (17), excluding the possibility of other genotypes in these five samples.

The sequencing reads from each of the 13 C75 samples collected over 5 years were recruited to the previously reconstructed Leptospirillum group II, type I and type VI genomes (13, 18) to identify the recombinant block structure of the genome at each time point at this location. Only alignment of reads to the sequence with the highest similarity was permitted. Twelve biofilms contained only the type III genotype (>99% of cells); the remaining one contained ~84% type III and ~16% type I genotype (Fig. 1 and figs. S5 and S6A). Because 1.7 million to 8 million Illumina sequencing reads were obtained per sample (table S1), and because each read is an independent measurement of population-level sequence variation, we could unambiguously identify high-frequency single-nucleotide polymorphisms (SNPs) (>90% of all reads).

Nucleotide by nucleotide, we compared the type I–like segments of the type III genome to the type I genome sampled in 2002. Similarly, type VI–like regions of the type III genome were compared to the type VI genome sampled in 2005. In total, relative to the reference sequences, 64 mutations were fixed in all populations sampled at the C75 location (fig. S7A).

We then tracked genome change at the C75 location between August 2006 and December 2010. In the 11 C75 samples taken after August 2006, we identified up to six additional high-frequency substitutions relative to the 2006 type III genome (fig. S7A). Notably, the number of high-frequency mutations increased over time (fig. S8, R2 = 0.78). Using this information, we calculated a rate of 1.4 × 10−9 (SE, ±0.2 × 10−9) substitutions per nucleotide per generation (table S3). This number is in the higher range of previously reported bacterial genome-wide short-term substitution rates [7.2 × 10−11 to 4.0 × 10−9 substitutions per nucleotide per generation (4, 7, 8)]. It is similar to the rate of 1.3 × 10−9 substitutions per nucleotide per generation predicted for the type III genome size using the universal mutations-per-genome rate suggested by Drake (3). Although the highest rates have been reported for endosymbionts and pathogens, which have small effective population sizes, correspondence with Drake’s prediction may indicate that genetic drift accelerated by population bottlenecks is not the main factor affecting the substitution rate in Leptospirillum group II.

To assess potential biases in the measured substitution rate introduced by spatial variation within the contiguous biofilms covering the C75 acid mine drainage pool, we sampled three locations ~1 m apart in June 2008. Samples 1 and 3 were dominated by the same genotype (with four SNPs relative to the type III genome), whereas sample 2 was dominated by a variant of this genotype lacking one high-frequency polymorphism (fig. S7A). The three samples also had only 93 (±8) replicated SNPs at frequency greater than 0.05 (average frequency 0.08). The low levels of variation, both within populations and across space, provide confidence that nucleotide changes in populations at the C75 location over time can be used to calculate the substitution rate.

Read recruitment of community genomic data from sites located across the underground system (five-way, UBA, C75, C10, AB muck, and AB20 locations) to types I and VI yielded six genotypes (Fig. 1 and fig. S9): type I, type VI, and four other types (III, IV, IVa, and V) that each consist of a mosaic of type I and type VI genome blocks tens to hundreds of kilobases in length (Fig. 2, A and B, and figs. S6B and S9). This again confirmed previous strain-resolved proteomic analysis–based inferences (14, 15). The presence of exactly the same transition points between the recombinant blocks in each of the genotypes in every sample indicates that each major recombination event occurred in a single cell, whose descendants rose to fixation (Fig. 2). Each genotype’s fixed mutations were identified by read recruitment to type III (Fig. 3 and fig. S7B) (17) and were used to construct a phylogenetic tree (Fig. 2C). Because read recruitment was performed relative to the type III genome, positions within type I–like regions not shared by type IV, V, or VI genotypes were designated as gaps for tree construction, and these gaps were treated as missing data in the calculation of tree branch length. Notably, the tree topology indicates that the larger the fraction of type I–like sequence recombined into the type VI background, the more recently the specific genotype emerged (Fig. 2C).

Fig. 2

Reconstruction of the recent evolutionary history of Leptospirillum group II. (A) The outer circle plots the counts of peptides unique to the type I (red) and type VI (blue) reference genomes at each protein locus (dots) for the C75 type III population [proteomics-inferred genotyping (PIGT); data from (15)]. Inner circles show Illumina read recruitment to the reference genomes, revealing the prevalence of type III, IV, IVa, V, and VI recombinants. (B) Correspondence of PIGT and recruitment data, highlighting the identical recombination point in genotypes III to V. (C) Evolutionary history of the sampled genotypes, based on the variant loci (table S7), inferred using the maximum parsimony method. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test is shown next to the branches. Dotted arrows indicate recombination events; circle schematics represent the regions affected. The timeline indicates the calculated time ranges of recombination events (table S3) as well as historical events (fig. S2) (18, 21). Branch D* is presented as two strains, each assigned half the total number of UBA 06/05 SNPs, because low incidence of SNPs precluded their linkage. BP, years before the present.

Fig. 3

Overview of the Leptospirillum group II high-frequency variants, based on read recruitment to the C75 June 2006 type III genotype. Dots indicate variant nucleotides relative to the last common ancestor of types III to VI. Functional categories (table S6) are indicated only for nonsynonymous substitutions and only for the first occurrence, starting from the most recent genotype (outside circle). Shading in inner circles indicates regions that were excluded for SNP analysis in a particular genotype because of recombination or indel events.

Using the substitution rate calculated from the C75 time series and assuming equal generation times for the different genotypes, we estimated that the times to coalescence for the suite of recombinant genotypes ranged between 2 and 44 years (Fig. 2 and table S4). The causes of the success of these hybrid genotypes are hard to establish. In Bacteria and Archaea, little is known about the importance of hybridization in adaptation to change. However, one recent study hypothesized that agricultural practices led to proliferation of hybrids of Campylobacter jejuni and Campylobacter coli formed by homologous recombination (19). In Eukarya, hybrids are often reported to have little ecological success (20). However, increased fitness relative to parental types has been observed when the adaptive landscape changes as a result of natural- or human-induced environmental alterations (21, 22). The Richmond Mine site is subject to both natural (seasonal and year-to-year variation; Fig. 1C) and human perturbations (fig. S2) (17), a subset of which may have provided opportunity for hybrid proliferation.

Although we cannot connect specific events to emergence of the hybrid genotypes, we can evaluate the role of neutral versus selection-based evolutionary processes in lineage divergence. For example, extreme population bottlenecks such as flushing events that remove biofilms during the rainy season may purge diversity independent of the fitness of particular strains. However, there are several lines of evidence indicating that hybrid genotypes arose as the result of selective processes. We detected evidence for positive selection (McDonald-Kreitman test, P < 0.05) for mutations specific to one type VI population (branch C; see Fig. 2), type V (branch F), and the ancestor of types III to V (branch E) (table S5). Selection was also supported by the disproportionately high number of signal transduction genes (Fig. 3, functional category T) and transcriptional regulation genes (Fig. 3, category K), global regulators in particular, that were affected by fixed mutations (Fig. 2, branches E, F, D*, and H; see also tables S5 and S6). Previous observations in laboratory settings (2325) and a natural system (16) similarly identified evolution of gene expression as an important factor underpinning early ecological divergence. Finally, an earlier study analyzing the distribution of the different Leptospirillum group II genotypes in the context of geochemical conditions indicated selection among type I through type VI recombinants (15). Evidence for selection also emerged from the current study, where two sites located ~140 m apart along a single flow path were compared (C10 and C75). The type III population, which dominated all upstream C75 biofilms, was found in only one of three biofilms sampled at the same time at the downstream C10 location. Even when the type III population was present at both sites, the genomes were different (Fig. 1 and fig. S10). This result suggests selection between genotypes that differ by only a few nucleotides—a finding consistent with prior laboratory studies showing increased fitness due to adaptive SNPs (4, 25). It also indicates that lack of diversity is not the explanation for dominance of many biofilms by a single genotype.

Our population genomic analyses provided data on the accumulation of genome change in free-living populations that underpin geochemical cycles. Application of these rates revealed that a series of important divergence events occurred over a time scale of years to decades within the natural model system of the Richmond Mine. We attribute these to major selection episodes, although we cannot conclude whether they were mediated by natural or human factors. Rapid adaptive evolution may have assisted in the maintenance of Leptospirillum group II as both the dominant primary producer and most active iron oxidizer responsible for acid mine drainage formation. Our results contribute to the development of a predictive understanding of how microbial systems respond to both natural and anthropogenic change (2).

Supplementary Materials

Materials and Methods

Figs. S1 to S11

Tables S1 to S7

References (2648)

References and Notes

  1. See supplementary materials on Science Online.
  2. Acknowledgments: We thank T. W. Arman (president, Iron Mountain Mines Inc.) and R. Sugarek (U.S. Environmental Protection Agency) for site access; R. Carver and M. Jones for on-site assistance; R. Carver for Richmond Mine flow data; J. Sickles (U.S. EPA) for providing site history documentation; Banfield laboratory members for their contributions to sample collection; C. Miller, C. Sun, and B. Thomas (University of California, Berkeley) for assistance with next-generation sequencing data analysis; and C. Miller and K. Vogel (University of Arizona) and three anonymous reviewers for constructive criticism. Supported by the Office of Biological and Environmental Research, U.S. Department of Energy, through the Genomic Sciences (DE-FG02-05ER64134) and Carbon-Cycling (DE-FG02-10ER64996) programs. The authors declare no conflict of interest. All 454 and Illumina sequencing reads generated for this study have been deposited in the NCBI sequence reads archive (accession number SRA047370). The Leptospirillum group II C75 strain genome has been deposited at DDBJ/EMBL/GenBank under the Whole Genome Shotgun project accession number AIJM00000000. The version described in this paper, containing the assembled and annotated contigs, is the first version, AIJM01000000.

Stay Connected to Science

Navigate This Article