Technical Comments

Comment on "The Origins of Genome Complexity"

See allHide authors and affiliations

Science  05 Nov 2004:
Vol. 306, Issue 5698, pp. 978
DOI: 10.1126/science.1098469

Lynch and Conery (1) argued that genome complexity reflects a history of genetic drift caused by small genetic population size (Ne), which in turn enables the spread of mildly deleterious selfish elements and duplications. Under this argument, large organisms, which tend to have small Ne, also exhibit larger (complex) genomes, and the small size of bacterial genomes reflects their position at the extreme large end of the range for Ne. Genome size can vary among bacteria by at least an order of magnitude, but the fraction of noncoding DNA is consistently low (∼15%). Consequently, larger bacterial genomes are expected to result from more selection, because they maintain more functional DNA. Indeed, large genome size in bacteria commonly has been considered an adaptation to changing environments (26). Furthermore, small Ne in symbiotic bacteria appears to result in reduced genomes through gene loss (7, 8). Thus, the relation between Ne and genome size in bacteria is, if anything, expected to be the opposite of that proposed by Lynch and Conery (1).

The claimed relationship between Ne and genome size is based on inferring Neu (where u is the mutation rate per nucleotide) from estimates of polymorphism using sequence divergences among isolates (1). This approach, however, has several drawbacks, which are especially severe in bacteria. One concern is the assumption of similar mutation rates across bacteria, which is required if estimates of Neu are interpreted as directly reflecting variation in Ne. Lynch and Conery (1) justify this assumption by citing a survey (9) that included only a single bacterial strain; other evidence indicates that bacterial lineages vary substantially in mutation rates (10)—as would be expected, because their genomes differ markedly in content of genes known to impact mutation. Thus, polymorphism levels reflect mutational input as well as Ne.

Even more problematic is the assumption that taxonomic names correspond to cohesive species, which is implicit in the use by Lynch and Conery (1) of polymorphism data to estimate Ne. Although this assumption can be considered approximately true for most sexual eukaryotic species, it is often far from valid for bacterial strains grouped under a particular species name. These usually constitute phylogenetically and ecologically distinct clusters separated by niche and genetic boundaries; thus, genetic drift does not act within the named “species” as a whole but within smaller, divergent subunits. The position of bacteria at the extreme high end of the range of Neu values cited by Lynch and Conery (1) is based in large part on lumping independently evolving strains for calculations of polymorphism. For example, the highest Neu value listed by Lynch and Conery, for the free-living marine cyanobacterial genus Prochlorococcus, is derived from divergent species with different ecological niches (11), dramatically different gene inventories reflecting habitat differences, and extreme differences in genomic base composition (12). This use of divergence to calculate Ne is comparable to using the divergence of cat, dog, human, and mouse to calculate Ne for the species “mammal,” an exercise that would give an enormous population size.

Similarly, Lynch and Conery (1) treat Salmonella enterica as a cohesive species, yet it consists of distinct, highly clonal lineages living in different vertebrate hosts (13, 14). Lateral gene transfer, which is relatively common in bacteria [and reported for the S. enterica data set (13)] will also radically affect the calculated degree of polymorphism if undetected (or ignored). The complexities of relating polymorphism to Ne in bacteria are widely recognized, having been noted in early enzyme electrophoresis studies, which revealed that polymorphism levels were unexpectedly low in view of the census population sizes of bacterial species (15).

Even if issues of population subdivision, recombination, and variation in mutational input could be resolved, estimates of Ne based on polymorphism would reflect the very recent evolutionary past rather than the longer period over which genome features evolved. Major reductions in polymorphism are observed for recently derived pathogens relative to ecologically generalized parental species [see, for example, (16)], but genomic traits, including genome size, are little altered.

A more reliable index of genetic drift over evolutionary time is the ratio of Ka (nonsynonymous substitutions per site) to Ks (synonymous substitutions per site) for a large set of genes, based on comparisons of related species. The Ka/Ks ratio, which is almost always less than one, is widely used as an indicator of the extent of purifying selection acting to conserve coding sequences [see, for example, (17)]. Although Ka/Ks can be elevated in particular genes as a result of positive selection for amino acid changes or selection conserving codon choice, a pattern of genome-wide differences in Ka /Ks indicates persistent levels of differences in Ne. Therefore, the proposal of Lynch and Conery (1) predicts a positive correlation between Ka /Ks and genome size and, thus, higher Ka /Ks ratios in eukaryotes. Another advantage of this approach is that it is not invalidated by variation in mutation rate among lineages.

No clear relation is observed between average Ka /Ks and genome size across bacteria, whether all genes or a common set of genes are used (Fig. 1). Because Ks values can be depressed by selection favoring certain codons, we used different approaches to define a genome-wide Ka /Ks but found no relation with genome size (Fig. 1 and Table S1). Furthermore, genomes known to have strong codon biases, such as E. coli-S. enterica or Bacillus species, show low Ka/Ks, and cases known to have low codon bias, such as Buchnera (18), have high Ka /Ks, which indicates that purifying selection on codon use has less effect on Ka/Ks than does selection on amino acid residue.

Fig. 1.

Average Ka/Ks (26) versus genome size for 23 pairs of related bacteria (filled circles) and several pairs of eukaryotes (open circles). Pairs span the range of bacterial genome size and include all available genome pairs for which Ks is not saturated (values < 2). Because the value of Ka depends on gene function, we selected the single-copy orthologs (defined as reciprocal unique BLAST hits) present in at least 15 of the 23 genome pairs, to ensure that comparisons among genomes encompass similar gene sets. Similar lack of relation was observed using the complete set of orthologs for each genome pair and using different methods of calculating Ka/Ks (2628). For comparison, Ka/Ks and genome size values for four pairs of eukaryotic genomes (in order of increasing genome size: yeasts, nematodes, flies, and mammals) are presented. Data and taxon names for bacteria are presented in the Supporting Online Material.

As is evident in Fig. 1 [and as noted previously based on more limited data (19)], average Ka/Ks can be very high in certain bacteria, including not only symbiotic lineages but also others, such as several groups of soil-dwelling bacteria. At least for symbionts, high Ka /Ks corresponds to very low levels of nucleotide polymorphism (20, 21), implicating small Ne as the likely cause. The conventional view—that large Ne is typical of bacteria—was initially established on the basis of the limited early sequence data available for E. coli and S. enterica, which, as it turns out, show exceptionally low Ka /Ks (Supporting Online Material). Genomic data and polymorphism studies now suggest that this early view was simplistic and that Ne varies widely across bacterial groups.

For comparison, we have included Ka /Ks values calculated for eukaryotes [derived from genome-wide comparisons of orthologs for fungi, mammals, nematodes, and flies (Fig. 1)]. These fall within the range of values found in bacteria, which suggests that there is no consistent difference in the efficacy of purifying selection between prokaryotes and eukaryotes, despite the gap between their genome sizes.

As Lynch and Conery (1) argue, Ne probably does affect the ability to eliminate useless or deleterious DNA. However, their model supposes that lack of effective selection will lead to genome growth because of the dynamics of selfish elements. That supposition may be true for eukaryotes but does not hold for bacteria, which display a mutational bias in nature favoring deletion and preventing major accumulation of so-called junk DNA (2224). This deletional bias may itself reflect vulnerability to selfish DNA (23), resulting from distinctive biological features of bacteria, including lack of a nuclear envelope or meiotic sex. Although bacteria undergoing severe genetic drift may experience outbreaks of selfish elements, the principal outcome of reduced population size is the random inactivation and deletion of genes participating in basic cellular processes (7), ultimately resulting in an overall genome reduction [see, for example, (25)]. Hence, the model of Lynch and Conery does not explain the differences in genome sizes among bacteria. Their conclusions that Ne is consistently enormous in bacteria and that the resulting low levels of genetic drift underlie the lack of genomic and phenotypic complexity in these organisms are not warranted.

Supporting Online Material

SOM Text

Table S1

References and Notes

Stay Connected to Science

Navigate This Article