The Dynamics and Time Scale of Ongoing Genomic Erosion in Symbiotic Bacteria

See allHide authors and affiliations

Science  16 Jan 2009:
Vol. 323, Issue 5912, pp. 379-382
DOI: 10.1126/science.1167140


Among cellular organisms, symbiotic bacteria provide the extreme examples of genome degradation and reduction. However, only isolated snapshots of eroding symbiont genomes have previously been available. We documented the dynamics of symbiont genome evolution by sequencing seven strains of Buchnera aphidicola from pea aphid hosts. We estimated a spontaneous mutation rate of at least 4 × 10–9 substitutions per site per replication, which is more than 10 times as high as the rates previously estimated for any bacteria. We observed a high rate of small insertions and deletions associated with abundant DNA homopolymers, and occasional larger deletions. Although purifying selection eliminates many mutations, some persist, resulting in ongoing loss of genes and DNA from this already tiny genome. Our results provide a general model for the stepwise process leading to genome reduction.

Obligate symbionts and pathogens, which have evolved repeatedly from free-living bacterial ancestors, show striking convergence in fundamental genomic features. In several symbionts of insects, most ancestral genes are eliminated by deletion, resulting in some of the smallest known cellular genomes (14). Symbionts also display rapid evolution at both the DNA and peptide sequence levels and have highly biased nucleotide base compositions, with elevated frequencies of adenine and thymine (A+T). Because these genomes are asexual and do not acquire foreign DNA, each gene loss is irreversible (2, 57). These genomic features have been ascribed to increases in genetic drift associated with a host-restricted life-style (8, 9) and, potentially, to an increased mutation rate and biased mutational profile stemming from the loss of DNA-repair genes, which are among the gene categories most depleted in symbiont genomes (1, 10).

Although numerous sequenced examples of reduced genomes in obligate symbionts or pathogens are available, these are too distantly related to permit stepwise reconstruction of genomic changes. As a result, the dynamics of ongoing genomic erosion, the extent to which mutation rate is elevated, the effectiveness of natural selection in purging mutations, and the nature of the mutational events that lead to further loss of DNA and metabolic functions are unclear. To illuminate these evolutionary processes, we sequenced several closely related genomes of the obligate symbiont Buchnera aphidicola from a single host species, the pea aphid Acyrthosiphon pisum (Buchnera-Ap). A previously sequenced genome of Buchnera-Ap showed a gene set typical for an obligate symbiont (1) lacking most ancestral genes, including genes underlying transcriptional regulation, biosynthesis of cofactors present in hosts, DNA repair, and other processes. The 607 retained genes encode machinery for replication, transcription, translation, and other essential processes, as well as biosynthetic pathways for essential amino acids required by hosts (1).

A. pisum is native to Eurasia, but has been introduced worldwide. It was first detected in North America in the 1870s (11). We sequenced the genomes of seven Buchnera-Ap strains descended from two colonizers of North America (and hence diverging up to 135 years ago), including two strains diverging in the laboratory for 7.5 years. Solexa sequencing was combined with verification by Sanger sequencing (12), to determine genomic sequences of these seven strains (Table 1). A total of 2392 positions (0.3% of sites on the 641-kb chromosome) showed a nucleotide substitution. These single-nucleotide polymorphisms (SNPs) were distributed approximately evenly around the chromosome (fig. S1). We also detected a total of 149 insertion or deletion events (indels): 134 single-base indels, 12 indels of 2 to 16 bases, and 3 large deletions (220 to 1131 bases), also dispersed around the genome (fig. S1).

Table 1.

Description of sequence data.

View this table:

Parsimony analysis of SNPs yielded a single phylogenetic tree with no homoplasy, as expected for clonal lineages if each base substitution is a singular event (Fig. 1A). Indels showed almost no homoplasy; all but two mapped as single events (Fig. 1B). The newly sequenced genomes comprised two tight clusters that were divergent from each other and even more divergent from the reference strain, Tokyo1998 (Fig. 1A). Rooting the phylogeny on the branch leading to Tokyo1998 enabled us to assign direction of change for both base changes and indels on the lineages leading to the two clusters.

Fig. 1.

A maximum-parsimony tree showing the evolutionary reconstruction of phylogenetic relationships and changes in the genomes of Buchnera-Ap strains. (A) Single-nucleotide substitutions. (B) Small and large insertion and deletions.

We inferred that the two clusters consist of descendants of two separate female colonizers, each arriving in North America sometime after 1870, by constructing a phylogeny on the basis of a 1.1-kb DNA fragment from 38 Buchnera-Ap samples collected in America, Asia, and Europe (12). The clades corresponding to these two clusters contain the large majority of North American samples, but are absent (Cluster 2) or rare (Cluster 1) among samples from Eurasia, where diverse lineages are present (fig. S2). Thus, we dated the common ancestor of each cluster representing an introduced matriline to a maximum of 135 years ago. Averaging the pairwise divergences through the ancestral node of each cluster, we calculated rates of nucleotide substitution of 19 SNPs per genome per 270 years for each U.S. haplotype cluster (divergence times are doubled to derive the rate along a single evolving lineage). After pooling the observed changes, we estimated the rate as 0.70 substitutions per genome per decade [95% confidence interval (CI): 0.51 to 0.97 substitutions per genome per decade], or 1.1 × 10–7 substitutions per site per year (95% CI: 0.8 to 1.5 × 10–7 substitutions per site per year). The rate is doubled (2.2 × 10–7; 95% CI: 1.4 to 3.3 × 10–7 substitutions per site per year) if calculated on the basis of changes at intergenic spacers and synonymous sites, that is, genomic sites that generally can tolerate mutations with little effect on fitness and that are thus expected to approximate the mutation rate (Table 2) (12). Adjusting for Buchnera replications per year [by estimating Buchnera divisions per aphid generation and aphid generations per year (13)] gives an estimate of 4 × 10–9 substitutions per site per replication.

Table 2.

Mutational patterns in genomes of Buchnera symbionts of pea aphids, from all base changes and insertion/deletion events, of Fig. 1.

View this table:

Our estimated mutation rate for base changes was unexpectedly high: more than 10 times the previous estimates of mutation rate calculated on the basis of silent site divergences in both Buchnera and free-living bacteria (5, 13). Although several artifacts could affect these calculations, the main source of error, a more recent coalescence of introduced clusters than estimated, would actually make this an underestimate of the mutation rate. Also, even spacers and silent sites may be subject to some purifying selection. Thus, the rate of spontaneous mutation (or substitution at neutral sites) is almost certainly higher than our estimates. A high mutation rate was also supported by the finding of two base substitutions fixed in a laboratory line (5AR) during 7.5 years (Fig. 1A). Although previous estimates of mutation rate in Buchnera, from genome pairs diverging 60 million years ago, were lower, those calculations were unreliable because intergenic spacers were too divergent to allow alignment and because silent sites underwent too many substitutions for accurate estimation of divergence (5).

This rate calibration can be used to estimate divergence times of older lineages of Buchnera-Ap used in this study. If the root of the tree is on the branch leading to the Tokyo1998 strain, we calculated that the lineage leading to the two clusters we sequenced (showing an average divergence of 1617 substitutions per genome) diverged from the Tokyo1998 strain 11,489 (95% CI: 8340 to 15,790) years ago. Calculations made only on the basis of intergenic spacers and synonymous changes gave similar estimates [12,555 (95% CI: 8292 to 20,030) years].

We next considered trends in nucleotide-composition bias. A distinctive feature of most small bacterial genomes is an elevated A+T content, reflecting biased mutational patterns. Our data show no evidence of continued evolution toward increased A+T content in Buchnera-Ap. The 50 substitutions in the terminal branches of the tree had little effect on base composition, with 21 increasing, 31 decreasing, and 6 not affecting A+T content (Fig. 1A). For the 1423 substitutions on the branches leading to the two clusters, base composition was in near-equilibrium, with 47% decreasing and 48% increasing overall G+C content. This implies that the overall genomic base composition near 25% G+C is an approximate equilibrium, consistent with a mutation rate from G/C to A/T that is three times as high as that for A/T to G/C (Fig. 2C) (14). However, this equilibrium could be disturbed if additional DNA repair functions are lost.

Fig. 2.

Observed mutations in Buchnera-Ap. Frequencies of single-base indels in homopolymers of different lengths (A) per homopolymer and (B) genome-wide. (C) Relative frequencies of base substitutions genome-wide and per nucleotide base.

To estimate the effect of purifying selection on the ongoing evolution of Buchnera-Ap genomes, we analyzed base substitutions in coding regions. On the basis of the genome-wide frequencies of mutation types acting on each base (Fig. 2C) and codon frequencies calculated for the entire Buchnera-Ap genome, mutations causing amino acid replacements are expected to arise 4.5 times as often as those affecting only codon choice (12). But only 36% of observed substitutions were replacement substitutions, giving a per-site ratio of replacement to silent changes (dN/dS) of 0.125 and implying that most mutations affecting polypeptide sequence are purged by selection. Purifying selection was also evident from the concentration of indels in intergenic spacers, which are largely selectively neutral regions sometimes recognizable as eroding pseudogenes (5). Of 146 small indels, 134 (92%) occur in intergenic spacers, which constitute only 13.5% of the genome. The SNP-to-indel ratio is 3.1 in spacers and 166.4 in coding regions (Table 2). This paucity of indels within coding genes reflects the fact that most indels cause frameshifts, leading to dysfunctional protein products, and are eliminated by selection. Indeed, 11 of 12 indels observed in coding regions imposed frameshifts. Thus, during the evolution of these Buchnera-Ap lineages, an estimated 82% of new indels have been purged by selection.

To determine whether genome erosion is ongoing in the closely related genomes that we sequenced, we addressed whether the 146 detected indels contributed to genome reduction. Our findings indicate that small indels do not directly cause DNA loss in Buchnera, because small deletions were balanced by small insertions on all branches of the tree, regardless of root position (Fig. 1B). But three large deletions did effect a net DNA loss: (i) a 220–base pair (bp) deletion in the znuC-pykA spacer in the lineage leading to Cluster 1; (ii) a 277-bp deletion in the gapA-fidA spacer in the lineage leading to Tokyo1998; and (iii) a 1131-bp deletion corresponding to part of the gene yaeT and the entire sequence of the gene fabZ, also in the lineage leading to Tokyo1998. All large deletions corresponded to positions of extra genes (znuA-yebA, queF, and fabZ) in the Buchnera–Schizaphis graminum genome (5), suggesting that the detected deletions eliminated relics of these genes from the Buchnera-Ap genomes. No large insertions were identified, consistent with previous evidence that Buchnera does not acquire foreign genes (2, 5, 6). Together these three deletions account for a loss of 1625 nucleotides (Fig. 1B). This corresponds to DNA loss at a rate of roughly 1 kb per 10,000 years, although this estimate is subject to a high error rate due to the small number of large deletions observed.

We next studied whether ongoing loss of functional genes has occurred during the divergence of the Buchnera-Ap genomes. We observed 16 genes that appear to be inactivated, either through a 1- to 2-base indel causing a frameshift (11 genes), a base substitution generating a stop codon (three genes), or a large deletion (one event, two genes) (table S1). Functions of these genes include DNA repair (ung, sbcB), biosynthesis potentially affecting host amino acid or vitamin nutrition (argC, trpB, glyA, ribD2), fatty acid biosynthesis (fabZ), or cell envelope production (murF), and genes involved in transport (ynfM) or secretion (fliK, flgB) (table S1). Genes in these functional categories have been noted to undergo degradation or loss in distantly related strains of Buchnera and other obligate symbionts (2, 5, 6). Such losses could influence further genomic evolution (by affecting mutation), as well as the ability to provision hosts. Genes with frameshift mutations may retain partial functionality, through production of some in-frame transcripts due to slippage of RNA polymerase (15), but the notable concentration of indels in intergenic spacers implies that most frameshifts adversely affect gene function.

As indicated above, SNPs are about three times as common among new mutations as are small indels (Table 2), but indels mediated most inferred gene inactivations. Small indels were heavily concentrated in mononucleotide runs (“homopolymers”), with 93% of single-base indels linked to runs of at least five and 66% in runs of at least seven consecutive A's or T's (Fig. 2, A and B). [Solexa sequencing resolves homopolymer length with low error and without bias (12).] The incidence of indels per run was highest in the longest runs (10 and above), but because the longest runs were rare, most indels were found in runs of 7 to 9 bases (Fig. 2A, B).

Together, the evolutionary trends observed in Buchnera-Ap converged to a model describing a stepwise process of symbiont genome erosion (Fig. 3). The shift toward high A+T content that is common in host-restricted bacteria leads to increased occurrence of A/T homopolymers. These, in turn, are hot-spots for small indels, which are elevated in homopolymers due to replication slippage, and which are further increased when certain DNA repair pathways are compromised (16). Our data imply that many new indels disrupt reading frames and that most are removed by selection. However, a minority persists, leading to inactivated genes. The resulting pseudogenes undergo rapid sequence evolution due to the lack of purifying selection and are eventually removed by large deletions. Because large deletions do not precisely excise inactivated genes, intergenic spacers often persist in the positions of former genes.

Fig. 3.

Model of symbiont genome erosion, from mutational patterns revealed by sequencing the complete genomes of seven Buchnera-Ap strains.

This process fits well with previous observations comparing more distantly related symbiont genomes [e.g., (57)]; however, those studies lacked the precision needed to detect the critical role of homopolymers and frameshifts in gene inactivation. Our model predicts that the initial step leading to genome reduction is a shift in nucleotide composition toward higher A+T content. Loss of DNA-repair functions has been proposed as the cause for this shift (17). A consequence of high A+T content is an excess of homopolymers and a resulting high incidence of small indels (Fig. 2, A and B) leading to gene inactivations. Indeed, A+T-biased genomes, including Buchnera genomes, show higher frequencies of A/T homopolymers than expected by chance alone (18), reflecting mutational patterns that yield longer A/T runs through replication slippage or other processes (16).

Most sequenced insect symbiont genomes are between 0.6 and 1 megabases in size and contain more than 500 genes, similar to the smallest known pathogen genomes and consistent with previous suggestions that cellular genomes have a minimal size threshold (1, 5, 6, 19, 20). This study, as well as the recent discovery of symbiont genomes containing only 182 to 450 genes (24), suggests instead that the process of gene loss has no clearly defined limit. We identified a surprisingly high rate of new mutations, including both base changes and indels, in the genomes of Buchnera-Ap. Although most mutations impairing gene function are removed by selection, others persist, leading to the permanent inactivation of genes and the subsequent loss of the corresponding DNA through larger deletions.

Supporting Online Material

Materials and Methods

Figs. S1 and S2

Table S1


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article