50 Million Years of Genomic Stasis in Endosymbiotic Bacteria

See allHide authors and affiliations

Science  28 Jun 2002:
Vol. 296, Issue 5577, pp. 2376-2379
DOI: 10.1126/science.1071278


Comparison of two fully sequenced genomes ofBuchnera aphidicola, the obligate endosymbionts of aphids, reveals the most extreme genome stability to date: no chromosome rearrangements or gene acquisitions have occurred in the past 50 to 70 million years, despite substantial sequence evolution and the inactivation and loss of individual genes. In contrast, the genomes of their closest free-living relatives, Escherichia coli and Salmonella spp., are more than 2000-fold more labile in content and gene order. The genomic stasis of B. aphidicola, likely attributable to the loss of phages, repeated sequences, and recA, indicates thatB. aphidicola is no longer a source of ecological innovation for its hosts.

The availability of genome sequences for related bacteria is providing exciting insights into evolution, but one limitation has been the lack of identifiable bacterial fossils to provide a time frame for these studies. We have quantified total rates of genomic evolution for Buchnera aphidicola, an obligate mutualistic symbiont of aphids, by sequencing the genome of the B. aphidicola symbiont ofSchizaphis graminum (Sg) (1) and analyzing its divergence from the published sequence of the B. aphidicolasymbiont of Acyrthosiphon pisum (Ap) (2).

This case allows genome evolution to be calibrated reliably with respect to time. Because the symbiont phylogeny mirrors that of its aphid hosts, indicating synchronous diversification, divergence dates reconstructed for ancestral aphids can be extended to the correspondingB. aphidicola ancestors. This approach has been used to infer that this endosymbiosis was established at least 150 million years ago (Ma) and that the lineages represented by B. aphidicola(Sg) and B. aphidicola (Ap) diverged 50 to 70 Ma (3, 4) (Fig. 1A). These are the only fully sequenced organisms that have eliminated recA, which is expected to lower the incidence of recombination events (5).

Figure 1

Comparison of gene order structures and location of pseudogenes in B. aphidicola (Ap) andB. aphidicola (Sg). (A) During the past 50 MyA. pisum and S. graminum have diverged 2-fold in size and 10-fold in body weight. (B) The positions of pseudogenes in the B. aphidicola (Ap) and B. aphidicola (Sg) genomes are indicated with arrows above and below the line, respectively. The axes are the ranked position of each ortholog along the chromosome, with the zero position corresponding to the putative origin of replication. Symbols and colors differentiate pseudogenes (ψ; yellow) and genes uniquely present in one species (Δ; blue). (C) and (D) Representation of pseudogenes in the cysteine and murein biosynthetic operons, respectively, in B. aphidicola (Sg). The nine extra genes in E. coli are not related to cysteine biosynthesis. Filled and open boxes represent gene and pseudogene sequences, respectively. Numbers show the position of mutations (black numbers, −1 frameshifts; numbers in boxes, +1 frameshifts; underlined numbers, stop codons).

The genomes of B. aphidicola (Sg) andB. aphidicola (Ap) are similar in size [0.64 megabases (Mb)] and are among the smallest of bacterial genomes. Their gene content is also very similar, with 526 genes shared of the 564 and 545 intact genes present in B. aphidicola (Ap) and B. aphidicola (Sg), respectively (Table 1). A comparison of the aligned genome sequences (1) confirms a high degree of divergence at the nucleotide sequence level. On the basis of a divergence date of 50 million years (My), average rates of sequence evolution were estimated at 9.0 × 109 synonymous substitutions per site per year and 1.65 × 109nonsynonymous substitutions per site per year. The observed divergence at synonymous sites shows low variance among genes (1), suggesting that the synonymous divergence level corresponds to the mutation rate of B. aphidicola, which is similar to or slightly higher than the rate estimated in E. coli andSalmonella typhimurium (4, 6,7).

Table 1

Comparison of genome features for B. aphidicola(Sg) and B. aphidicola (Ap).

View this table:

Despite high levels of sequence divergence, the two B. aphidicola genomes show complete conservation of genomic architecture (Fig. 1B). No inversions, translocations, duplications, or gene acquisitions have occurred in either lineage since their divergence. Of the 564 protein-coding genes originally annotated inB. aphidicola (Ap), only four (yba1 toyba4) were reported not to have orthologs in E. coli (2), a closely related free-living species (Fig. 2A). Our analyses suggest that even these genes were present before the establishment of the symbiosis (1), providing even stronger evidence that the symbiotic life-style did not involve the uptake of novel genes. Many of thefli and flg homologs (loci involved in flagellar biosynthesis and protein export) show unusually high levels of amino acid divergence within B. aphidicola relative to the divergence between E. coli and Salmonella typhi(Fig. 2B). This suggests modification of function of these loci, which is also supported by lack of a flagellum in micrographs of B. aphidicola and lack offliC, encoding the filament protein.

Figure 2

Comparisons of genome stability among fully sequenced genome pairs. (A) Phylogenetic trees (1) derived from an alignment of 10 concatenated ribosomal protein sequences. The scale bar shows amino acid substitutions per site; the boxes contain bootstrap indices. (B) Frequencies of nonsynonymous substitutions per site (Ka) for 449 genes of B. aphidicola compared with those for orthologous loci in E. coli and S. typhi (1). Among the outliers are six genes (triangles) that function in formation of the flagellum in E. coli and have unknown functions in B. aphidicola. (C and D) Genome structure divergence (18) normalized to nonsynonymous substitutions averaged over 56 genes (1) in comparison to repeat content (1) and genome size in B. aphidicola (black circles) and other obligate host-associated bacteria (open circles), facultative intracellular parasites (open triangles), and free-living bacteria (gray squares). Abbreviations and numbers are as follows: Bu, Buchnera(Ap)-Buchnera (Sg); K12-0157, E. coliK-12-E. coli 0157; St-Sty, S. typhi-S. typhimurium; 1, M. genitalium-M. pneumoniae; 2,C. trachomatis-C. muridarum; 3, C. muridarum-C. pneumoniae AR39; 4, C. trachomatis-C. pneumoniae AR39; 5, R. prowazekii-R. conorii; 6, H. pylori26695-H. pylori J99; 7, E. coli K12-S. typhimurium; 8, E. coli K12-S. typhi; 9,B. subtilis-B. halodurans.

Our comparison highlights the first clear pattern of genome-scale evolution: obligately host-associated bacteria show enhanced stability of genome architecture relative to sequence evolution (Fig. 2), which is also indicated by comparisons of pathogen genomes (5).B. aphidicola is the most extreme organism analyzed so far, with no rearrangements or gene acquisitions and only a few gene losses during the past 50 My (Fig. 2, C and D). This stasis is remarkable because E. coli, S. typhi, and S. typhimurium, the closest relatives of B. aphidicola, have highly labile genomes (Fig. 2, C and D). The ratio of insertions and deletions (indels) and rearrangements per nonsynonymous substitutions is more than 2000-fold higher in modern E. coli and Salmonella spp.; this represents a massive difference even when normalized for the eightfold difference in genome size.

The B. aphidicola (Sg) genome sequence also provides insight into the ecological role of the endosymbionts in the supplementation of the host's phloem sap diet, which is deficient in the 10 essential amino acids required in animal diets (8). Both genomes possess 54 genes that produce these amino acids (1, 2). However, five genes, cysN, -D, -G, -H, and -I, involved in sulphur reduction and biosynthesis of cysteine contain frameshifts and stop codons in B. aphidicola (Sg) but are intact in B. aphidicola (Ap) (Fig. 1C). S. graminum ingests more cysteine than does A. pisum due to differences in phloem composition of their respective food plants (grasses versus legumes) (9) as well as the phytotoxic effects of feeding by S. graminum (8). This enrichment of the host diet renders superfluous the capacity for sulphur assimilation by B. aphidicola (Sg), and inactivation of the underlying genes exemplifies evolution of the symbiont in response to host environmental conditions.

In total, we identified 13 and 38 pseudogenes (1) inB. aphidicola (Ap) and B. aphidicola (Sg), respectively; half are located in biosynthetic operons with defects in several contiguous genes (Fig. 1). Another 14 genes, each showing homology to some E. coli gene, are present in one B. aphidicola genome but are missing entirely from the other. In numerous instances, the intergenic region in one species is as long as the corresponding gene in the other species (1). These regions probably consist of genes degraded beyond recognition due to the juxtaposition of several gene remnants and/or due to early inactivation followed by extensive nucleotide substitution.

Among new pseudogenes in B. aphidicola (Sg), five are associated with DNA repair processes, including base excision repair. The mutational spectrum in the weakly degraded genes in B. aphidicola (Sg) is dominated by deletions of 1 to 2 nucleotides per event, consistent with a reduced capacity for repairing errors caused by bulky residues and/or photoproducts. The remarkably high copy number of the B. aphidicola chromosome (10) may result from the loss of other genes involved in the metabolism of DNA, such as seqA and datA that coordinate replication in the cell cycle of related bacteria (11).

Overall, we estimate that coding capacity in B. aphidicolahas been lost at a rate of one complete gene elimination per 5 to 10 My during the divergence of these two B. aphidicolaspecies. The evidence for reduced effectiveness of selection and lability at the nucleotide sequence level (12) makes the stability of gene inventory and gene arrangements in B. aphidicola all the more striking.

These seemingly contradictory patterns can be explained by two kinds of losses during genome reduction of obligate host-associated bacteria (13). First, eliminated sequences include elements that normally mediate genome dynamics and gene mobility, such as phages, plasmids, repeated sequences, and transposons. Indeed, a survey of the repeat contents in microbial genomes has revealed a decreased density of repeated sequences in obligate intracellular bacteria with genomes in the 1 Mb range (14). Here, B. aphidicola is the extreme, containing no prophages, a single rRNA operon, and no repeated sequences longer than 30 base pairs (bp). Second, gene losses also include loci that facilitate recombination and incorporation of foreign DNA. Here, the two B. aphidicolagenomes are also distinct in their lack of recA andrecF; the absence of the corresponding gene functions is expected to lower the incidence of genome rearrangements (5, 15).

If the mutational input of rearrangements is extremely low due to these losses, the frequency of such events that are beneficial or selectively neutral will approach zero, resulting in genomic stasis during lineage evolution. Also, selection on gene content and gene order may be unusually restrictive in small symbiont genomes, further reducing the fixation rate of rearrangements. It is unlikely that sequestering resulting from the symbiotic lifestyle prevents gene uptake, because other bacteria regularly coinfect aphids (16).

This leads to the testable hypothesis that the reduction of genome size caused by transitions to obligate host-associated lifestyles is ultimately halted by a corresponding increase in genome stability because of the loss of genetic elements that mediate recombination events. Scaling genome divergence by nucleotide substitutions of orthologous genes reveals a dramatic positive relation between the frequencies of rearrangements and indels and the genomic content of repeats (Fig. 2, C and D). This relation is expected because the number of recombination sites is n(n – 1)/2, where n represents the number of identical repeats per genome. Thus, the number of possible genome variants that can be generated will decrease rapidly as the repeat content and genome size are reduced. The result is a correlation between genome rearrangements and lifestyle, because obligate host-associated bacteria tend to have smaller genomes with lower content of repeats and less efficient recombination systems than free-living bacteria.

Reconstruction of the ancestor shared with E. colishows that the B. aphidicola lineage eliminated at least 2000 genes and underwent multiple chromosomal inversions before the divergence of B. aphidicola (Sg) and B. aphidicola(Ap) (17). This degree of reduction would have required over 1010 years if gene disappearance in the earlyB. aphidicola lineage occurred at the rate (14 genes per 50 My) estimated for the period in which these two genomes diverged. Thus, more rapid genomic changes must have been characteristic of the early stages of B. aphidicola evolution. This may be attributable to both more repeats and a greater proportion of expendable genes in the ancestor of B. aphidicola, allowing deletions of multigene fragments (17).

Although the original acquisition of a bacterial symbiont enabled aphids and other sap-feeding insects to exploit food resources that would be otherwise nutritionally unsuitable, the dependence on B. aphidicola has not conferred continued evolutionary plasticity in nutritional capabilities and diet breadth. Rather, our study has shown that B. aphidicola remains stable in genome content and architecture and has even lost pathways that may affect the ecological range of the aphids. This stability, particularly the complete absence of gene acquisition, implies effectively invariant or diminishing biosynthetic capabilities of the symbionts over periods that span many evolutionary shifts in the diet and life cycles of hosts. Within the clade of aphids, including S. graminum and A. pisum, there are about 3000 different species living on a wide range of monocots, dicots, and even ferns; yet the corresponding lineages of B. aphidicola have not obtained new genes or novel capabilities. Thus, the ecological diversification of aphids cannot be attributed to the genetic diversity of B. aphidicola.

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed. E-mail: Siv.Andersson{at}


Stay Connected to Science

Navigate This Article