Temporal Fragmentation of Speciation in Bacteria

See allHide authors and affiliations

Science  24 Aug 2007:
Vol. 317, Issue 5841, pp. 1093-1096
DOI: 10.1126/science.1144876


Because bacterial recombination involves the occasional transfer of small DNA fragments between strains, different sets of niche-specific genes may be maintained in populations that freely recombine at other loci. Therefore, genetic isolation may be established at different times for different chromosomal regions during speciation as recombination at niche-specific genes is curtailed. To test this model, we separated sequence divergence into rate and time components, revealing that different regions of the Escherichia coli and Salmonella enterica chromosomes diverged over a ∼70-million-year period. Genetic isolation first occurred at regions carrying species-specific genes, indicating that physiological distinctiveness between the nascent Escherichia and Salmonella lineages was maintained for tens of millions of years before the complete genetic isolation of their chromosomes.

The proper identification and delineation of bacterial species play critical roles in medical diagnosis, food safety, epidemiology, and bioterrorism mitigation. Human responses are guided by perceptions of the biological properties and capabilities of a named species, as well as by an understanding of its natural variability and potential to change. The biological species concept (BSC) considers a species to be a group of organisms that readily exchange genetic information only with each other (1). In eukaryotes, recombination—here defined as allelic exchange—is often tied to reproduction, whereby meiosis is followed by the karyogamy of two entire haploid genomes. Consequently, as new species arise, genetic isolation would occur simultaneously for all loci, meaning that all pairs of orthologous genes would be diverging for about the same amount of time. Whereas bacterial speciation is a complex process (24), the BSC has also been applied to bacteria such as E. coli (5). Bacterial recombination involves the occasional, unidirectional transfer of small DNA fragments from one strain into the homologous locus of another strain. Because only a small portion of the genome is transferred, orthologs would have diverged for differing amounts of time (fig. S1, A and B). Interspecies transfer is limited by mechanisms that increasingly reject recombination as donor and recipient sequences become more divergent (6, 7). Yet this process does not speak to how recombination ceases within a group of recombining strains (wherein allelic differences are few), thereby allowing two genetically distinct groups to form.

Given the vast range of recombination rates seen for bacterial populations (8, 9), we propose two models for lineage separation after the emergence of and selection for a differentially adapted genotype. First, nucleotide substitutions and lineage-specific loci could be acquired quickly, relative to the rate of recombination (10). In this model, genetic isolation would be established almost simultaneously for all orthologs (fig. S1A). Alternatively, niche-specific changes may be acquired more slowly, relative to the rate of recombination (fig. S1B), and gene conversion events would continue at loci unlinked to niche-defining genes (11). In this case, between-population selective sweeps (12) would occur freely at loci that are unlinked to genes imparting ecological distinctiveness; in contrast, recombinants losing niche-specific functions would be poorly adapted to either environment and would be counterselected (4). Thus, alleles may undergo selective sweeps across “species” boundaries when not proximate to niche-specific loci; over time, all loci would become genetically isolated as mismatches accumulate and the number of niche-specific loci increases. This fragmented speciation model further predicts that early-diverging genes will be linked to loci that interfered with effective interlineage recombination, such as those encoding niche-specific traits or those subject to diversifying or frequency-dependent selection (13, 14). As a caveat, regions of strongly dissimilar DNA can interfere with recombination at highly similar flanking regions, independent of selection against recombinants (7); the consequences of this issue are beyond the scope of this work.

To detect temporal fragmentation of speciation, we must first distinguish between early- and late-diverging orthologs. Because divergence is a function of both time and evolutionary rate, time may be estimated from divergence once the evolutionary rate is determined (fig. S1C). At synonymous sites, evolutionary rate can be estimated from the codon adaptation index (CAI), an intragenomic, time-independent measure of selection (15). Divergence is measured as the number of synonymous substitutions per synonymous site (Ks); because Ks decreases as CAI increases (fig. S1C), CAI can generate the expected value of Ks if divergence times are uniform among genes (16). Early-diverging orthologs will have larger than expected Ks values because more time has elapsed since their divergence, and late-diverging orthologs will have smaller than expected values (fig. S1C).

We applied this method to the genomes of E. coli and S. enterica; recombination is common within either taxon (8, 10, 17), whereas interspecies recombination is inhibited (18). We analyzed genes with orthologs present in each of three different E. coli and S. enterica genomes representing the most diverse available sequences (16). These six strains share a chromosomal backbone of 2677 sets of orthologs (table S1). CAI and between-species Ks were computed for protein-coding genes, and the relationship was fit by polynomial regression (Fig. 1). As expected, increasing selection for preferred codons (high CAI) is generally reflected by lower divergence (low Ks). We ignored 527 pairs of genes with <50 synonymous sites, either because their Ks values were in saturation or because the relationship between CAI and Ks was unclear (Fig. 1). The effect of map position on Ks (19) was estimated by treating CAI-corrected Ks as a linear function of the gene's distance from the E. coli K12 replication origin (Fig. 1, inset). Ultimately, relative divergences of 2150 genes along the chromosomal backbone (Fig. 2) were calculated as the ratio of observed Ks to that expected from CAI and map position (16).

Fig. 1.

Influences on synonymous substitution rate. Synonymous substitutions as a function of mean codon bias of the open reading frames are shown, with polynomial least-squares regression lines. The dashed vertical line indicates the value of CAI above which the relationship between CAI on Ks was unclear. (Inset) Scatter plot of third-order regression residuals as a function of distance from the E. coli K12-MG1655 origin, with a linear least-squares regression line. Mb, megabase.

Fig. 2.

Time of divergence of chromosomal regions. Relative divergence for orthologs is plotted against E. coli K12-MG1655 chromosomal position, averaged across a seven-gene window. Dark gray bars indicate divergence times of regions longer than six genes. Dashed lines delineate 95% of the range of divergence values. Shared loci are indicated in italics; Escherichia- and Salmonella-specific loci are indicated at their corresponding location on the backbone in inverse and bold-faced type. (Inset) Intraclass correlations of relative divergence for gene pairs as a function of distance. The solid line indicates all gene pairs, whereas the dotted line indicates gene pairs not within runs of consecutive genes transcribed in the same direction.

Although stochastic variation in the accumulation of substitutions will account for much of the variability in relative divergence, genes that have recombined more recently should have lower Ks values. To detect this footprint of recombination, we rely on a mechanistic constraint of bacterial gene exchange: Physically proximate genes will be transferred in the same recombination events. As a result, early- and late-diverging genes will not be randomly distributed throughout the genome but will cluster in regions defined by the most recent interlineage exchange. Therefore, physical association among genes with Ks values higher or lower than expected can be taken as evidence for recombination. The scale of recombination regions was estimated from the correlation of relative divergence values for pairs of orthologs (solid line in Fig. 2, inset). Adjacent genes showed a strong intraclass correlation [intraclass correlation coefficient (ICC) = 24%, P <10–24,Ftest]that decreased as pairs of orthologs became more distantly situated on the chromosomal backbone, becoming undetectable when separated by more than 20 genes (∼32 kb). These results are consistent with the boundary of recombination interference observed for the rfb locus (20, 21). To remove any correlation in Ks resulting from transcription-associated repair or selection for mRNA stability, we recalculated ICCs, having excluded comparisons between consecutive genes transcribed in the same direction. Despite the decreased sample size, a significant correlation (ICC = 11%, P <10–2, F test) extended to the same distance (dashed line in Fig. 2, inset). Thus, genes within the Escherichia and Salmonella chromosomes diverged at significantly different times at different locations.

Potential recombination events were delineated with an agglomerative clustering algorithm that minimized variability of relative divergence within clusters (16); the algorithm terminated when the distribution of cluster sizes most closely reflected the magnitude of ICCs across segments of differing length. The most robust segments (SE < 0.013; each longer than six genes, covering 49% of genes) appear as dark gray bars in Fig. 2. If Escherichia and Salmonella genes have been diverging for ∼140 million years on average (22), the distribution of divergence times shows that genetic isolation developed over a period of ∼70 My (region between the dashed lines in Fig. 2). As expected, among the first regions to diverge were those containing genes producing surface structures, such as the rfa, rfb, rff, flg, mipA, and phoE loci, which are often subject to frequency-dependent or diversifying selection. Other early-diverging regions are associated with differences in gene content, such as those adjacent to (i) the Salmonella cbi, pdu, std, and tct operons and (ii) the Escherichia lac and xdh operons (Fig. 2), most of which encode physiological functions that distinguish the two species (23). In contrast, the regions flanking Salmonella Pathogenicity Islands 1 and 2 (SPI1 and SPI2) diverged more recently, suggesting that they did not promote the separation of Escherichia and Salmonella. Even though relative divergence was corrected for evolutionary rate, the major peaks in Fig. 2 consistently represent clusters of genes with high CAI values. These slowly evolving regions may offer longer stretches of DNA with high similarity, thereby postponing the establishment of recombination barriers.

As a control, we compared the genomes of Buchnera aphidicola strains APS and Sg, whose 489 conserved protein-coding genes show divergence similar to those of Escherichia-Salmonella comparison. Buchnera are recA-deficient intracellular endosymbionts believed to recombine rarely (24). We would expect lineage diversification to affect all loci simultaneously, and analysis of these genomes showed no significant correlation in relative divergence for adjacent genes (fig. S2). To control for the lower sample size, we examined all regions of the Escherichia-Salmonella comparative backbone with equal gene numbers; these regions showed significant ICC values that were invariably stronger than the Buchnera ICC value, suggesting that the lack of correlation for Buchnera reflects a lack of recombination.

The fragmented speciation model (fig. S1B) predicts that genetic and ecological differentiation developed even as recombination continued at loci not conferring ecological distinctiveness. Niche-specific traits often arise by gene gain or loss, where altered physiology allows cells to thrive in conditions that are hostile to parental strains (25). If recombination between incipient Escherichia and Salmonella lineages continued at some loci even after lineage-specific loci had arisen, then shared regions around the lineage-specific genes should have been among the first to become genetically isolated, because recombination in those regions would have eliminated the gene-content differences at those loci. Conversely, if the differences in gene content developed only after interlineage recombination had effectively ceased, then these genes should be distributed without regard to the divergence time of the surrounding region.

We defined a locus as a pair of genes in the Escherichia-Salmonella comparative backbone. There were 514 dynamic loci (685 genes; some genes contributed to 2 loci; table S2), at which a pair of conserved genes was separated by at least one gene that had been gained or lost in any genome (16); the remaining 2106 static loci showed no insertion or deletion events (white bars in Fig. 3). Genes at static loci have an average divergence time that is 4.4% younger than the average for the entire genome (P <10–5 by randomization), which is likely because longer stretches of uninterruptible, slowly evolving genes allow for continued recombination. A fraction of dynamic loci (178 loci, table S2) show species-specific differences, whereat the conserved gene pair was interrupted in the three strains of one species by genes absent from the three strains of the other species. These loci would include sites whereat differences arose while the Escherichia and Salmonella lineages were diverging. Other dynamic loci (e.g., those where only a single strain shows a difference) would have arisen only after recombination had effectively ceased between the two lineages. Genes adjacent to species-specific loci are 6.2% older than genes adjacent to other dynamic loci (P <10–2 by randomization; gray bars in Fig. 3); thus, species-specific genes are not randomly distributed but are found preferentially in the older regions, indicating that the incipient Escherichia and Salmonella lineages continued to participate in recombination at loci unlinked to lineage-specific genes.

Fig. 3.

Relative divergence based on region character. Bars show the mean relative divergence of sets of orthologs classified according to adjacency to loci that distinguish genomes; above the bars are the number of orthologs (top row) and the number of loci (bottom row in parentheses). Dashed lines show the average value for the entire set. Error bars show 1 SE for the distribution of randomized samples. *, P < 0.01; **, P < 0.000001.

In contrast to the rapid formation of eukaryotic species boundaries, the ∼70-My time frame over which genetic isolation evolved between Escherichia and Salmonella represents a temporal fragmentation of speciation. Because separate lineages arise within populations that continue to recombine at some loci for tens of millions of years, relationships among species inferred from few loci may underestimate their underlying complexity. Taxa may show different relationships depending on the genes compared. Long periods of partial genetic isolation allow extant, named species (such as E. coli) to contain multiple nascent species. Although one can observe recombination at some genes within E. coli as a whole, strains also have niche-specific loci that may act as genetic progenitors for the creation of new species. That is, it may not possible to make a clear distinction between intraspecific and interspecific variability (26), and clearly defined species cannot represent newly formed lineages. Therefore, the species concept proposed by Dykhuizen and Green [in which gene phylogenies are congruent among representatives of different species but are incongruent among members of the same species (5)] works to delineate long-established species but fails to recognize incipient species.

Supporting Online Material

Materials and Methods

Figs. S1 to S3

Tables S1 to S3


References and Notes

View Abstract

Navigate This Article