PerspectiveGenomics

Gene Duplication and Evolution

See allHide authors and affiliations

Science  09 Aug 2002:
Vol. 297, Issue 5583, pp. 945-947
DOI: 10.1126/science.1075472

In the transmittance of genetic material from parent to offspring, accidents occasionally happen. Such accidents may result in the duplication of a chromosomal segment that then becomes separated from the original segment, ending up in a different chromosomal location. A number of human genetic disorders are known to be associated with the increased expression of genes contained within such duplications. However, evolutionary biologists have long been enthralled with the idea that duplicate genes could provide the ultimate substrate on which evolution could work. There are two ways in which gene duplication could generate a substrate suitable for adaptive evolution. Either one member of the duplicate gene pair could take on a new function, or two duplicate genes could divide the multiple functions of the ancestral gene between them, with natural selection then refining each copy to a more restricted set of tasks (1). On page 1003 of this issue, Bailey and colleagues (2) contribute to our understanding of gene duplication by calculating the number of segmental duplications in the human genome.

The most common fate of duplicate genes appears to be the simple silencing of one member of the pair. The average time before silencing of one duplicate gene pair member is ∼4 million years in animals (3, 4). By restoring the content of a genome to its original state, the silencing of duplicate genes has little direct effect on adaptive evolution. But, because either the ancestral or the descendant copy can be silenced, recurrent duplication of genes at unlinked chromosomal locations can passively give rise to small-scale chromosomal rearrangements (5, 6). When combined with geographical isolation, these small-scale gene rearrangements may contribute to the emergence of new reproductively isolated species.

Consider a pair of unlinked copies of an essential gene in an ancestral species. If, because of functional redundancy, one random member of the duplicate gene pair is destined to become silenced in each population, there is a 50% chance that different copies will be silenced in two geographically isolated populations, thereby resulting in different chromosomal locations for the gene (see the figure). The contribution of this process to the evolution of genetic incompatibility between the two populations depends on the rate at which gene duplication takes place. Thus, it is noteworthy that all recently characterized eukaryotic genomes harbor substantial numbers of very young gene duplicates, many showing less divergence than gene copies (alleles) at the same chromosomal position (locus).

Divergent silencing of duplicate genes in sister species.

The ancestral species may acquire a duplication of an essential gene (represented by the black bars on two pairs of parental chromosomes). Either copy of the duplicate gene pair (white bars) may be randomly lost in the two descendant populations. In this case, the first-generation (F1) hybrid progeny of these two populations will contain two “absentee” alleles, one at each chromosomal locus. As a consequence of independent assortment, 25% of the gametes produced by these individuals will be entirely lacking an active copy of the original ancestral gene.

By applying demographic principles and genome sequence analysis to species-wide surveys of duplicate genes, scientists have calculated an average duplication rate of ∼1% per gene per million years (3). This estimate holds up under a reanalysis of additional and better curated genome sequences (4). Using a new statistical approach to infer the presence of duplicated regions in pools of random human genome sequences, Bailey et al. estimate that at least 5% of the human genome consists of segmental duplications. They calculate that the span of the segmental duplications in the human genome ranges from tens to hundreds of kilobases, with >90% identity at the nucleotide level between ancestral and duplicated segments. Considering that roughly 50% of human gene duplicates have been silenced by the time the duplicated regions have diverged by 5% (3, 7), one can estimate that the high incidence of young gene duplicates in the human genome requires a duplication rate of at least 0.4% per gene per million years.

Given that DNA breakpoints occur randomly during the duplication process, some duplicated genes may lack key regulatory elements and/or regions of coding DNA. Thus, the rate of origin of functional gene duplicates may be somewhat less than 1% per gene per million years. Nevertheless, Bailey and colleagues show that a substantial fraction of human segmental duplications contain expressed genes, and another study suggests that at least 25% of the identifiable duplicate genes in the worm Caenorhabditis elegans are apparently still functional (8). So, we can now be fairly confident that many, if not all, eukaryotic genomes are subject to ongoing stochastic turnover of functional genes resulting from the origin and loss of small duplicated regions. The dynamics of this birth-death process are sufficiently pronounced that a random subset of 1 to 10% of the genes in a typical eukaryote are predicted to be in transient coexistence with their duplicates at any one time.

How does this level of genomic traffic affect the passive origin of incompatibilities between the genomes of different species? Let's be conservative and assume that only 0.1% of the genes in an ancestral species are in a duplicated state and subject to divergent silencing (that is, one member of the pair becomes silenced in one lineage, and its opposite becomes silenced in the other lineage). In this case, a genome containing ∼20,000 genes (which approximates the number in many multicellular eukaryotes) would harbor the potential for ∼10 changes in gene location (map changes) per genome over a period of a few million years. The continuous influx of new duplications keeps the process going, and with a duplication rate of only 0.1% per gene per million years, ∼10 additional map changes are expected to accrue per population per million years of divergence. The consequences of these microchromosomal rearrangements for reproductive fitness (viability and fertility) of hybrid progeny depend on a number of factors, including the extent to which “absentee” alleles influence hybrid fitness, the degree of redundancy elsewhere in the genome, the involvement of autosomes versus sex chromosomes, and the level of gene expression in gametes (6). The salient point is that divergent silencing of just one pair of duplicates distributed on different autosomes results in the complete absence of a functional allele in 25% of the gametes produced by a first-generation (F1) individual (see the figure). And with just 10 divergently resolved loci, 94% of F1 gametes will carry null genes at one or more loci. These conservative calculations, extrapolated from single-species analyses, suggest that chromosomal repatterning driven by gene duplication may be involved in the passive origin of isolating barriers on a time scale relevant to speciation (9).

More direct insight into the rate of microchromosomal repatterning will require fine-scale comparisons of closely related species. An impressive start in this direction has been made by Coghlan and Wolfe (10). By comparing the genomic sequences of C. elegans and its relative C. briggsae, they estimate that 4030 rearrangements have occurred over ∼80 million years, implying ∼25 rearrangements per genome per million years. Contributing to the total pool of rearrangements (in a 1:1:2 ratio) are reciprocal exchanges between two genomic locations, local inversions, and transpositions from one location to another. The vast majority of genomic rearrangements are quite small, involving segments that include five or fewer genes. Because the gradual replacement of one member of a duplicate gene pair by another does not impose a bottleneck in fitness on a population, many of these structural genomic changes may have arisen through an initial phase of gene duplication followed by loss of the ancestral locus. The potential for this process is quite high in C. elegans. With an estimated gene duplication rate of ∼1.6% per gene per million years (3), this species experiences ∼250 duplication events per genome per million years. Moreover, consistent with the types of rearrangements observed, a large fraction of the intrachromosomal duplications in C. elegans are inverted, and most duplication segments contain only one or two genes (11). Perhaps the high level of microchromosomal repatterning is peculiar to the C. elegans genome. However, extensive comparative genome mapping among the grasses paints a picture similar to that in worms (12), and an intermediate amount of small-scale rearrangement is apparent in the Drosophila genome (13).

In the early days of speciation genetics, models of chromosomal rearrangements were popular, particularly among plant biologists. However, recent studies (mostly in Drosophila) implicate changes within interacting genes that could be potential sources of genomic incompatibility between species (14). This shift in emphasis was motivated in part by the lack of obvious structural genomic rearrangements between many pairs of reproductively isolated species, as well as by perceived barriers to the full establishment of changes that cause meiotic problems in individuals heterozygous for large-scale chromosomal rearrangements. However, recent findings including those of Bailey et al. demonstrate how misleading large-scale chromosomal analyses can be. In terms of abundance, microchromosomal rearrangements are far more common than previously imagined, vastly outnumbering the macrochromosomal alterations favored by earlier studies. Containing only one or a few genes, such small-scale rearrangements are generally far too tiny to be revealed by visual inspection of chromosomes, and they may cause few, if any, problems during gamete production (meiosis).

Chromosomal rearrangements and nucleotide substitutions are by no means mutually exclusive contributors to the origin of reproductive isolation and the formation of new species. Indeed, the case has been made that chromosomal rearrangements can facilitate gene divergence (15), and the preservation of one member of a pair of duplicates by positive selection for a new function can promote the origin of changes in chromosomal location (16). Undoubtedly, some phylogenetic lineages will harbor genomic architectures that make them much more prone to speciation by genomic rearrangement than others (6), but at least one fundamental issue is now a bit clearer. With a rate of origin of duplicate genes on the order of 1% per gene per million years, the segmental duplication-rearrangement process is on a par with adaptive nucleotide changes within genes (genic evolution) as a mechanism for the origin of species incompatibilities. Consider that the average gene contains ∼1000 nucleotides, that the mutation rate is ∼0.1% per nucleotide per million years, and that the vast majority of mutations are either neutral or deleterious. In this case, the rate of nucleotide sequence changes per gene relevant to reproductive isolation and evolutionary adaptation could easily be less than the rate of gene duplication. Moreover, duplicated segments of DNA, even those not containing functional genes, contribute to chromosomal rearrangements in an indirect way—by serving as sites for nonhomologous recombination, thereby promoting secondary rearrangements (2, 7).

The recent observations of genome researchers raise two compelling issues for evolutionary biologists to ponder. First, discriminating between models of speciation that invoke negative epistatic interactions among genomes versus those that invoke microchromosomal rearrangements induced by gene duplication will require attention to the finest of details at scales of less than a few kilobases (the size of a typical rearrangement). Second, although the two big engines of evolution—adaptation and speciation—may be studied in isolation, they both are frequently interconnected through just one mechanism, that of gene duplication.

References and Notes

View Abstract

Navigate This Article