Special Perspectives

Selection on Major Components of Angiosperm Genomes

See allHide authors and affiliations

Science  25 Apr 2008:
Vol. 320, Issue 5875, pp. 484-486
DOI: 10.1126/science.1153586


Angiosperms are a relatively recent evolutionary innovation, but their genome sizes have diversified remarkably since their origin, at a rate beyond that of most other taxa. Genome size is often correlated with plant growth and ecology, and extremely large genomes may be limited both ecologically and evolutionarily. Yet the relationship between genome size and natural selection remains poorly understood. The manifold cellular and physiological effects of large genomes may be a function of selection on the major components that contribute to genome size, such as transposable elements and gene duplication. To understand the nature of selection on these genomic components, both population-genetic and comparative approaches are needed.

Flowering plants are relative newcomers to the evolutionary stage, appearing for the first time 150 to 200 million years ago. Angiosperms have since radiated across the globe, quickly becoming a dominant life form on the planet. Mirroring their rapid diversification, the size of angiosperm genomes has changed rapidly as well: Higher plants vary ∼2000-fold in genome size, from the 64-Mb genomes of Genlisea (corkscrew plants) (1) to the 124-Gb genomes of Fritillaria (the fritillary lilies) (Fig. 1) (2). Still, the nature of the relationship between genome size and natural selection is not well understood.

Fig. 1.

(A) Despite being among the most recent of the groups depicted, monocots and eudicots encompass the widest range of genome sizes. Data are from (1, 2). pgDNA, picograms of DNA. (B) Photos depict the flower and metaphase squashes of Genlisea (1) on the left and a triploid Fritillaria (30) on the right. The arrow indicates the dividing Genlisea nucleus. Scale bars represent 10 μm. [Photo credit: Fernando Rivadavia (Genlisea) and Christine Skelmersdale (Fritillaria)]

Genome size correlates with broad-scale patterns of plant biology. Plant species with large genomes tend to have large cells and large seeds, factors that are associated with a number of life-history traits. But plants with large genomes also have lower photosynthetic rates, grow more slowly, and are underrepresented in extreme environments (3). The ecological costs incurred by large genome size have a parallel evolutionary cost: Plant genera with the largest genomes tend to have the lowest species diversity, suggesting that genome size affects speciation rates (3).

Genome size also varies among individuals within a species, and such variation has been linked to selection. Individuals with the same chromosome number can vary as much as 40% in genome size (4). This intraspecific diversity correlates with environmental clines and growth characteristics (5) and may also respond to indirect selection on other traits (6). However, the mechanisms that connect genome size to phenotype remain unclear. One possibility is that DNA content affects cell volume and replication, leading to generally lower growth rates. The accrual of DNA may also have functional effects via gene regulation or copy-number variation (5). In any case, the manifold cellular and physiological effects of a larger genome may result in direct selection either on genome size itself or on the major components that contribute to genome size.

The largest contributor to genome size is repetitive DNA, particularly transposable elements (TEs). In fact, it is common for the majority of a plant's genome to consist of transposon-derived DNA (Fig. 2). Much has been learned about TEs from genomic sequence data, including their distribution among species, their genomic locations of accumulation, the mechanisms by which they are purged from genomes, and their rates of proliferation. The last may be particularly impressive; the rice genome has increased >2% in size over the past few hundred thousand years because of TE activity alone (7). Individual animal genomes may also be element-rich (8), but plant genomes appear to vary more rapidly with respect to their transposon-derived component.

Fig. 2.

TE content and gene-family size for representative sets of eukaryote genomes. (A) Genomic TE content [from (8)]. Numbers above each bar represent the percentage of each genome made up of TEs. (B) Percent of the genome made up by gene families of varying sizes (single copy at left to greater than five copies at right) [from (31)].

Given the apparently detrimental consequences of a large genome, it follows that the accumulation of transposons is probably deleterious to plant fitness. Many individual transposon insertions—such as those into coding regions—may be strongly deleterious, leading to their rapid loss from the gene pool (9). However, understanding the role of natural selection in shaping transposon diversity ultimately requires a population-genetic approach. In Drosophila, humans, Arabidopsis, and pufferfish, quantitative estimation of the strength of selection from population-genetic data suggests that TE insertions are on average slightly deleterious (10, 11) and thus expected to be purged from populations by natural selection. But the effectiveness of selection against TEs depends on the composite parameter Nes, which includes not only the strength of selection s but also the effective population size Ne. Even if selection is relatively strong, species with low Ne may not be able to prevent transposons from accumulating within their genome. The proliferation of elements within plant genomes may thus reflect low Ne as much as low s (12). Though there have been surprisingly few studies of plant TE population genetics, this approach could go a long way toward illuminating the selective forces acting on plant genomes.

Population-genetic approaches have also been useful for identifying adaptive transposon insertions (13). This and other evidence suggests that TEs are not just deleterious but also contribute to genome function. In plants, transposons have been domesticated to become functional genes (14), have inserted complete exons into expressed genes (15), and have facilitated the formation of previously unrecognized genes via reverse transcriptase (16). Transposons are also potential sources of cis-regulatory elements and small RNAs. Moreover, transposable element activity can also accelerate the response to selection, presumably by producing genetic variation on which selection can act. An example is selection on bristle number in Drosophila melanogaster, where p element lines responded rapidly to selection but lines without active p elements did not (17). This study suggests that natural and artificial selection on a phenotypic trait could drive correlated increases in transposon activity in a manner antagonistic to selection against large genome size.

Gene duplication is another major contributor to plant genome size. The angiosperms sequenced to date contain more gene duplicates than animals (Fig. 2). Much of this duplication is due to polyploid events, which create complete genetic redundancy by copying every gene in the genome. Although many duplicated genes are lost as they accumulate mutations and deletions, this process is nonrandom (18). Genes related to transcription, signal transduction, and development are more likely to be retained as duplicates than other functional gene categories. This biased retention may result from variable sensitivity of genes to dosage effects, with selection acting to maintain proper stochiometric ratios (18). Retention biases can also be taxon specific, perhaps explaining the high abundance of aromatic proteins in grapes (19).

Tandem duplication is another potent source of gene duplication. Tandemly duplicated genes represent ∼15% of genes in angiosperm genomes (20); curiously, this proportion closely mirrors the 10 to 17% range of tandem duplicates found across animal genomes (21). Tandemly duplicated genes are probably subjected to different selection pressures than genes duplicated by polyploidy, on the basis of four lines of evidence: (i) First, tandem events tend to duplicate only one component of a genetic network, as opposed to entire networks. (ii) Second, tandemly duplicated genes are biased toward a different set of genes; tandem duplicates are overrepresented for membrane proteins and abiotic response genes (20). These genes tend to be at the end of biosynthetic pathways, suggesting that tandem duplicates are retained more readily if they do not perturb key branch points of networks. (iii) Third, differences between tandem and polyploid duplicates extend to patterns of gene expression, because tandem duplicates diverge more rapidly in expression (22). (iv) Lastly, tandem duplication events are ongoing and common. It has been estimated conservatively that 1 out of ∼700 Arabidopsis thaliana seeds contain a copy-number variant caused by unequal crossing over between tandem duplications (23).

Tandem duplication is common enough to occur on an ecological time scale and may thus be a particularly potent source of genetic innovation for local adaptation. Tandem duplications have been shown to mediate boron tolerance in barley (24), submersion tolerance in rice (25), and diversification of secondary metabolites in A. thaliana (26). However, there is not yet a great deal of information as to the extent of copy-number variation in plants, the role of copy-number variants in local adaptation, and the contribution of copy-number variants to genome-size variation among individuals. Careful characterization in humans indicates that up to 12% of the genome may vary in copy number (27), but even this impressive number is unlikely to account for the 40% difference in genome size among some plant populations.

Our discussion underscores the need for genome-wide assessments of all types of genetic variation (including nucleotide polymorphism, copy-number variation, and TEs) at the population level. Such information is a necessary precursor for characterizing recent selection on plant genomes and also for understanding the mechanisms that contribute to genome-size variation. Population genomic data can address the relative strengths of purifying, balancing, and directional selection; the genomic components that contribute to adaptation; and the identification of genes that have been targets of selection. These diversity assessments eventually need to include explicit multipopulation sampling so that diversity patterns can be evaluated to detect signatures of local adaptation. Thus far, the only genome-level polymorphism surveys in plants have targeted A. thaliana (28, 29), yielding insights about selection on coding regions and revealing unexpected trends (such as high levels of diversity in genes that mediate interactions with the biotic environment). Unfortunately, technological limitations inherent to these studies have prevented a comprehensive assessment of the relative frequency and amount of copy-number versus transposon polymorphism, and thus these important components of genome size and function remain poorly characterized.

Population genomic data provide information about recent selection, but inter-species comparisons may uncover selection manifested over longer time periods. Yet, there has been shockingly little comparative analysis of plant genomes, owing to the substantial evolutionary distances among the four angiosperm genome sequences published to date. This lack of analysis has highlighted the need for dense sequencing within recently diverging clades. For example, the A. lyrata and sorghum genome sequences will provide fitting contrasts to those of A. thaliana and maize, respectively. These contrasts will yield information about the type and strength of selection on coding regions, molecular-evolutionary patterns that characterize species divergence, and basic dynamics of plant genome evolution. We do not yet know, for example, whether plant genomes contain large, conserved intergenic regions like those of animals and whether such intergenic regions constrain the lower limits of genome size. Similarly, we do not have a clear picture of the evolution of gene complement, particularly over modest (intrafamilial) evolutionary distances. Comparative data will also facilitate a broader understanding of the dynamics of gene duplication and TE accumulation.

Nevertheless, additional comparative and population-genetic data alone will not yield a complete understanding of selection on plant genomes or on the processes that govern genome-size variation. There is first a pressing need for additional theoretical advances to provide a conceptual framework to interpret polymorphism data, especially in the context of demographic change in structured populations. Similarly, the theory of the population genetics of gene duplication is in its infancy, as is our understanding of whether standing genetic variation commonly contributes to adaptation. In addition, we need to better understand biological factors that affect the process of selection but are usually not included in molecular-evolutionary or population-genetic models; such factors include paramutation, methylation, epistasis, and gene conversion. Finally, there is always a need to complement inferences about selection with functional assays, particularly if the goal is to correctly identify the genetic variants that have been targeted by selection. With the need for additional data and theoretical models, we clearly are only beginning to understand the complex interplay among phenotypic diversity, genome size, and natural selection.

References and Notes

View Abstract

Navigate This Article