Technical Comments

Response to Comment on "The Origins of Genome Complexity"

See allHide authors and affiliations

Science  05 Nov 2004:
Vol. 306, Issue 5698, pp. 978
DOI: 10.1126/science.1100559

Our study (1) argued that the long-term genetic effective size of a population (Ne) plays a central role in dictating the types of genomic evolution that can occur and that many aspects of eukaryotic genome complexity may have arisen owing to a reduction in Ne that began near the time of origin of eukaryotes and became much more pronounced in lineages of multicellular species (1). If these general principles naturally extend to prokaryotes, then the diminutive genomes of prokaryotes may stem largely from the enhanced efficiency of selection against very mildly deleterious insertions, and one need not resort to arguments based on perceived constraints of cell morphology.

Daubin and Moran (2), by contrast, imply that the power of genetic drift is as great in prokaryotes as in eukaryotes and that unique selective forces must account for streamlined prokaryotic genomes. Their arguments, however, are inconsistent with both empirical observation and population-genetic theory.

First, contrary to the suggestion in (2), the strong correlation between genome size and gene number in prokaryotes is consistent with theoretical expectations. The key issue here is why, despite the range in variation among taxa, the average number of genes harbored within prokaryotic genomes is so much smaller than that in eukaryotes. Numerous results (1, 35) show that microbes have gene duplication rates comparable to or greater than those in multicellular eukaryotes, so the small size of prokaryotic genomes can be explained only by a reduced rate of duplicate-gene retention. Theoretical work demonstrates that a common mechanism of duplicate-gene preservation in animals and plants, subfunctionalization of preexisting functions by degenerative mutations (6, 7), is nearly inoperable in populations with large effective sizes (8, 9). In contrast, the preservation of duplicate genes by rare beneficial mutations to new functions is much more likely in populations of enormous absolute size (prokaryotes).

Thus, if the small number of duplicate genes retained in prokaryotes owe their maintenance to neofunctionalization (or, perhaps, dosage requirements), as Daubin and Moran imply, the tight correlation between gene number and genome size (10, 11) does not contradict our model. Rather, the observed pattern is entirely consistent with our argument that selection efficiency in prokaryotes is typically too great for substantial mobile-element and intron proliferation—which, in turn, means that most such genomes consist largely of functional genes. This said, although the mobile-element contribution to the genome content of prokaryotes is almost always <10%, a positive scaling with genome size is entirely continuous with that for eukaryotic genomes (12), consistent with our theory. The reduction in gene number in endosymbiotic prokaryotes is also consistent with our hypothesis that random genetic drift is a major force in genomic evolution. If a substantial fraction of genes are only weakly favorable to organismal fitness, then the adoption of an endosymbiotic life style would magnify the likelihood of loss of such genes, because functions supplied by the host would reduce the selective advantage of gene retention and because the reduction in Ne would reduce the efficiency of selection.

Second, Daubin and Moran object to the use of standing variation at silent sites, η, as an indicator of Ne for prokaryotes (13). They argue that η reflects variation in u, the mutation rate per base pair per generation, as well as variation in Ne. This point, however, only strengthens the conclusion that Ne in prokaryotes substantially exceeds that in most eukaryotes (1). Many factors can influence the mutation rate within and among genomes, but all evidence suggests that u is much lower in prokaryotes than in eukaryotes, increasing with generation time and number of germ-line cell divisions in the latter (14, 15). A composite direct estimate of u for the eubacterium Escherichia coli, 0.36 × 10–9 (16), is similar to that for the archaebacterium Sulfolobus acidocaldarius, 0.26 × 10–9 (17), and not greatly different from that for the single-celled yeast Saccharomyces cerevisiae, 0.19 × 10–9 (14). In contrast, a variety of methods suggest that u ≅ 22.0 × 10–9in humans (18, 19) and on the order of 2.0 to 9.0 × 10–9in invertebrates (Drosophila and Caenorhabditis) and plants (14, 20, 21). This ∼100-fold increase in u from the smallest prokaryotes to the largest eukaryotes implies that the decline in Ne with increasing organismal size is much more pronounced than the already strong decline of η we depicted [figure 1 in (1)]. This is a conservative conclusion because it does not factor in the enhanced efficiency of selection on silent sites in microbes relative to multicellular species.

Third, the contention by Daubin and Moran that population subdivision invalidates application of the Ne concept is incorrect. Genome evolution is a long-term process, defined by the effective number of individuals in the entire species, not by the size of local demes. Not only is the concept of Ne entirely valid (provided some cohesion exists between various population segments), but it was specifically developed to deal with population structures that deviate from an ideal random-mating situation (22).

Daubin and Moran do raise an important point—the inherent difficulty in defining species boundaries in prokaryotes (23). However, although molecular surveys of prokaryotes reveal clades within lineages classically defined as species, all genealogical data exhibit such structure. The key issue is whether the observed patterns in prokaryotes represent inappropriate mixes of two or more permanently separated lineages or simple stochastic variation in lineage structure, including variation among demes exhibiting local adaptation. Even in an ideal population, the expected degree of separation of the two most deeply branching lineages in a neutral gene genealogy is half the depth of the entire tree (24), and population subdivision will induce deeper furrows in a tree.

To avoid arbitrary decisions in estimating η, we took species designations as imposed by investigators to be the natural boundaries of analysis. That could have resulted in some upwardly biased estimates of prokaryotic η, but that is also true for eukaryotic species (particularly unicellular species). To examine this problem further, we have supplemented our previous survey of η in prokaryotes with recently published studies (Supporting Online Material) and with gene sequence surveys that had not been previously converted to η, conservatively restricting analyses to individual clades when noted by the authors. The average estimate of η from 39 taxa, 0.0412 (0.0123), remains quite high relative to estimates in eukaryotes. In many of these studies, the authors indicate evidence for within-species recombination. In the extreme case of Prochlorococcus, even when different light-adapted clades are treated separately, average η is still 0.4748. Because the gradient in η is already greatly biased downwardly relative to Ne, it remains extremely unlikely that the average Ne of prokaryotes is anywhere near as low as that in eukaryotes.

The implication by Daubin and Moran that microbial lineages with silent-site divergence similar to that of phenotypically diverse mammals must also have equivalent levels of ecological or functional genetic divergence fails to consider the basic principles that lead to the maintenance of near-neutral variation. Provided Ne is large enough, there is nothing to prevent the development of very high η while maintaining stability of protein functions. To begin to address these issues, we have sequenced a number of nuclear genes in global isolates within Paramecium species (25), whose species designations have been confirmed by laboratory crosses. The average η for silent-site variation is 0.04, higher than that for all but a single eukaryote in figure 1 of (1) and slightly less than the divergence between cat and dog (24). Unless lineages within currently described microbial species can be shown to be completely impervious to gene flow, there is no reason to abandon the use of η as a surrogate measure of recent Ne.

Although a meaningful measure of Ne that extends beyond the applicable time scale for η (the past 2Ne generations for haploids) is highly desirable, we do not share the enthusiasm of Daubin and Moran for the ratio of replacement-site (Ka) to silent-site (Ks) substitutions among species as an indicator of long-term Ne. Although it is common to use Ka/Ks as a relative measure of the width of the selective sieve (that is, as 1.0 minus the fraction of mutations that are eliminated by selection) within species, there are substantial problems in employing this logic among species with dramatically different Ne. Under this ideal interpretation, Ks is an unbiased estimate of the mutation rate, but selection on silent sites associated with codon usage can cause Ks to decline with increasing Ne, inducing a tendency for Ka/Ks to increase.

Although an increasing fraction of mildly deleterious mutations at replacement sites is also expected to be purged with larger Ne, whether the reduction in Ka will exceed the reduction in Ks, yielding the qualitative behavior that Daubin and Moran expect, depends on (i) the distribution of mutational effects, and (ii) the effect of selection on linked loci. As to point (i), a plausible case can be made that most beneficial mutations have very small effects (26), in which case Ka could actually increase with Ne in some range of Ne (where the increased fixation of beneficial mutations exceeds the reduction in purging of deleterious mutations). Recent work supports this view in showing that a substantial fraction of Ka separating metazoan species consists of adaptive mutations (27, 28). Because selection should be more efficient in species with higher Ne, this effect is likely to be even greater in prokaryotes. As to (ii), the effect of selection on linked loci is especially pronounced in species with major clonal phases of reproduction and is probably the main reason that Ne scales substantially less than linearly with absolute population size (29). The case has been made that with background selection on beneficial mutations, Ks could first decrease and then increase with increasing Ne as very mildly deleterious mutations at silent sites are fixed by hitchhiking with beneficial mutations at replacement sites (30). If that is true, then our conclusions based on silent-site diversity are even more conservative than suggested above. At any rate, although the overall level of adaptation is expected to generally increase with Ne, this need not be reflected in a monotonic negative scaling between Ne and between-species Ka/Ks.

The arguments of Daubin and Moran about the behavior of Ka/Ks in species with high versus low levels of codon bias provide no insight into these issues, because they neglect the fact that codon bias is a function of both mutation and selection. An absence of codon bias cannot be taken at face value as evidence for the absence of selection, nor can the presence of codon bias be taken as evidence of selection. In addition, these interpretive issues aside, the data reported in (2) have fundamental quantitative uncertainties. Silent sites are saturated with changes in many of their comparisons (Ks > 1.0 in 12 cases), and many of their Ka/Ks estimates are ratios of average Ka to average Ks. Because the expected value of a ratio is unequal to the ratio of expectations, such ratios are biased with respect to what Daubin and Moran claim to be measuring. Moreover, this treatment explicitly contradicts the contention by Daubin and Moran that the use of Ka/Ks controls for potential variation in mutation rates among genes. This diverse combination of conceptual and analytical problems raises enough questions about statistical validity and biological meaning to render the indices of Daubin and Moran uninterpretable.

Finally, Daubin and Moran imply that prokaryotes harbor an endogenous mutational deletion bias that keeps genomes streamlined without the need for direct selection. This assumption is crucial to their line of thinking; Mira et al. (10) have argued that there is no observable selective cost of genome size in prokaryotes. However, the references cited by Daubin and Moran do not actually consider the distribution of mutational effects, but rather simply report on the substitution process. Thus, as in the case of codon bias, they have failed to separate mutational from selective effects. Although some of the cited studies involve divergence of pseudogenes, it is risky to assume that mutational changes in such sequences are entirely neutral. In species with large enough Ne to promote codon bias, one cannot rule out the ability of natural selection to promote deletions over insertions in otherwise selectively neutral sequence. It is, moreover, becoming increasingly clear that numerous pseudogene sequences in a variety of species have major functional consequences (3133). The one study that has provided an unbiased estimate of the mutational size spectrum suggests a substantial excess of insertion relative to deletion mutations (34).

Although many questions regarding microbial genomic evolution remain unanswered, the idea that eukaryotes have experienced reductions in Ne no longer seems to be in doubt. Individual estimates of Ne may be inaccurate, but the general patterns are clear and, because of the intrinsic biases discussed above, are likely to be even more pronounced than what we have suggested. Contrary to the assertion of Daubin and Moran, we do not suggest that random genetic drift is absent in prokaryotes; indeed, given the low estimates of prokaryotic Ne relative to their enormous absolute population sizes, we are suggesting quite the opposite. Although the data in our study and a substantial body of theoretical work support the idea that a reduction in the relative power of random genetic drift from microbes to multicellular species induces radical differences in the types of genomic evolution that can proceed, we see no compelling reason to abandon the idea that the same principles of population genetics guide genomic evolution in all prokaryotes and eukaryotes.

Supporting Online Material



Navigate This Article