Chromosomal Speciation and Molecular Divergence--Accelerated Evolution in Rearranged Chromosomes

See allHide authors and affiliations

Science  11 Apr 2003:
Vol. 300, Issue 5617, pp. 321-324
DOI: 10.1126/science.1080600


Humans and their closest evolutionary relatives, the chimpanzees, differ in ∼1.24% of their genomic DNA sequences. The fraction of these changes accumulated during the speciation processes that have separated the two lineages may be of special relevance in understanding the basis of their differences. We analyzed human and chimpanzee sequence data to search for the patterns of divergence and polymorphism predicted by a theoretical model of speciation. According to the model, positively selected changes should accumulate in chromosomes that present fixed structural differences, such as inversions, between the two species. Protein evolution was more than 2.2 times faster in chromosomes that had undergone structural rearrangements compared with colinear chromosomes. Also, nucleotide variability is slightly lower in rearranged chromosomes. These patterns of divergence and polymorphism may be, at least in part, the molecular footprint of speciation events in the human and chimpanzee lineages.

If speciation processes have left any molecular footprints, detecting them could not only shed light on speciation processes along the human lineage, but also would help to identify the specific genomic regions responsible for the separation of humans and other primates and bring us closer to identifying the genetic differences that may underlie the morphological, behavioral, and cognitive differences between us. The role of chromosomal rearrangements in speciation is particularly well supported by several lines of evidence (1–6). Classical models of chromosomal speciation state that, because heterozygous individuals are partly sterile (i.e., underdominant), chromosomal changes act as genetic barriers to gene flow between populations fixed for different arrangements (1, 3) and, thus, facilitate reproductive isolation. However, these models are burdened by a paradox that renders them weak and unconvincing (2): If underdominance were strong, it would be very unlikely that new rearrangements could get established. On the other hand, if underdominance were weak enough for fixation to be likely, chromosomal rearrangements would be very poor barriers to gene flow and, thus, unlikely to contribute to speciation. Recently, a new class of models has been proposed that suggest that chromosomal changes are strong genetic barriers because they reduce recombination in heterokaryotypes and not because of underdominance (3, 4,6, 7). Such strong barriers would facilitate divergence in the rearranged region during the time when the diverging populations are in parapatry, i.e., have limited gene flow. Their effects would be especially pronounced if divergence is through the accumulation of incompatible alleles, as proposed by Bateson, Dobzhansky, and Muller (8). The spread of new favorable alleles through a species will be delayed if they are linked to chromosomal differences that are already established within that species. This gives different alleles time to spread through the rest of the species range; if these different alleles are incompatible with each other, then they will be trapped at the chromosomal barrier and will strengthen reproductive isolation. In contrast, favorable alleles that are not impeded by the barrier will spread quickly through the whole species, making parapatric divergence less likely. The accumulation of incompatibilities facilitated by chromosomal differences generates genetic barriers of growing strength that, eventually, produce complete reproductive isolation and, therefore, speciation (fig. S1). This accumulation of positively selected alleles in chromosomes presenting rearrangements underlies the key prediction of this and similar models of chromosomal speciation (3): that the molecular signature of positive selection should be stronger across chromosomes carrying rearrangements than in colinear chromosomes (7).

Within-population polymorphism for rearrangements has rarely been described in mammals. However, parapatric coexistence of different arrangements is common (1) and, in particular, has been described in Pongo (9) and other primates (10). Human and chimpanzee karyotypes are highly homologous, but some major chromosomal changes differentiating the two species can be detected in banded metaphase chromosomes. Human chromosomes 1, 4, 5, 9, 12, 15, 16, 17, and 18 are separated from their chimpanzee's homologs by pericentric inversions, and human chromosome 2 is the result of the tandem fusion of two acrocentric chromosomes common to the rest of great apes (11, 12). If one or more of these rearrangements were polymorphic while human and chimpanzee lineages were diverging, or during the successive speciation events that led to modern humans and chimpanzees, and if speciation took place along the lines proposed in the model, we might be able to detect the molecular signature of positive selection on rearranged chromosomes. Previous work on the X-4 fusion in Drosophila americanaprovides evidence that selectively favorable variants can accumulate on rearranged chromosomal segments in a natural population (13).

The rate of protein evolution of a gene can be used to uncover the footprints of past positive selection (14). It is measured as the rate of nonsynonymous nucleotide substitution per nonsynonymous site (K A) relative to the underlying neutral mutation rate, which is given by the rate of synonymous substitution per synonymous site (K S). Amino acid changes are usually deleterious because of the action of purifying selection (15) and, thus,K AK S. However, if positive selection is strong enough, K A may increase and even exceed K S. Therefore,K A/K S ratios larger than 1 suggest positive selection.

We gathered human-chimpanzee divergence measures for the coding regions of 115 annotated, autosomal genes (16). We calculated K S and K Avalues for these genes by the method of Li (17) implemented in the program DAMBE (18). The cytological position in the human genome of every gene was ascertained by means of OMIM and MapViewer. Chromosomes were classified as rearranged or colinear, in correspondence to the presence or absence of chromosomal rearrangements (11, 12).

The K S and K Avalues of the 115 genes are listed in table S1. Both the averageK S (1.53%) and K A(0.76%) were in the range of previous measures of coding sequence divergence (19). They were somewhat high because this data set was compiled to examineK A/K S ratios and, thus, genes with no divergence had to be excluded. The averageK A/K S ratio was 0.61. This figure is based on 108 genes only, because seven genes with aK S value of 0% were excluded because they had aK A/K S ratio of infinity, which also precluded their use in most of our analysis. The full data set, however, was used in a simple test. We built a contingency table by classifying genes according to two categories: their cytological position (with genes mapping in rearranged and in colinear chromosomes falling in different classes) and the value of theirK A/K S ratio (with different classes for genes withK A/K S >1 and genes withK A/K S ≤ 1). Genes with K A/K S > 1 tended to cluster in rearranged chromosomes [Table 1, all genes (top) G test,P = 0.0024]. This pattern was also visible when only the 108 genes with K S > 0% were considered [Table 1 (bottom), G test, P = 0.0043].

Table 1

Contingency tests. For all genes, G = 11.223, P = 0.0024. (Genes withK S = 0% are considered to haveK A/K S >1.) For genes with K S > 0, G = 8.415,P = 0.0043.

View this table:

The distribution ofK A/K S ratios in rearranged and colinear chromosomes is shown in Fig. 1. The average ratio for rearranged chromosomes was 0.84, more than twice the average ratio of colinear chromosomes (K A/K S = 0.37). These two ratios are significantly different, as detected by both Mann-Whitney's U test (one-tailed P = 0.008) and a permutation test (P = 0.0005, for which the difference in averageK A/K S between rearranged and colinear chromosomes was used as the measured statistic). Genes in rearranged chromosomes can be further classified according to their position. Genes mapping within a chromosomal rearrangement or in the same chromosomal region (20) were classified as “close” to chromosomal changes. Genes mapping in different chromosomal regions were classified as “far” from chromosomal changes. The averageK A/K S ratio was 0.68 for genes far from chromosomal changes and 0.91 for genes close to them. This trend was not significant (P = 0.23 in a permutation test), but it certainly is in the direction expected under the model of chromosomal speciation discussed here.

Figure 1

Distribution ofK A/K S ratios in colinear and rearranged chromosomes.

If lower gene flow in rearranged chromosomes triggered the process that produced higher K A/K Sratios in rearranged chromosomes, then this lower gene flow might also have resulted in higher neutral divergence in these chromosomes (21). In our sample, the average K Svalue of genes in rearranged chromosomes was 1.60% (1.67% in colinear) if only the genes for whichK A/K S ratios were available were included in the analysis. When all genes were included, average K S became 1.46% in rearranged and 1.61% in colinear chromosomes. Neither of these differences is significant, and their standard errors are large. Moreover, as explained above, we gathered this data set to measureK A/K S ratios and not the absolute level of divergence. To be able to perform tests on the level of neutral divergence, we used two other divergence data sets (22–24). In both data sets, the values of K, the number of nucleotide substitutions per 100 sites, are higher for rearranged than for colinear chromosomes (1.40% versus 1.15% in the data from (22) and 1.25% versus 1.23% in the data from (23). Although differences are only significant for the first data set (P < 0.001 in a permutation test), the measures are indeed in the expected direction if gene flow had been lower in rearranged chromosomes.

Neutral divergence is not randomly distributed in the genome. It has been shown that genomic regions with more CpG sites experienced higher rates of divergence between humans and chimpanzees (23); that linked genes have similar K A andK S values in human-rodent comparisons (25); that genes are frequently associated with CpG-rich islands (26); that proteins of linked genes have similar K A/K S ratios (27, 28); and that different chromosomes have different protein evolution rates (25). Evidence for this clustering of the rates of protein evolution is also found in our data. We built a list of all possible pairs of genes within a chromosome. The cytological (in number of subbands) and physical (in Mb) distances between each pair of genes were measured and, in turn, correlated with their K A/K S ratios. The ratios tended to be more similar for closer pairs of genes (Spearman Rank-Correlation: 0.252, P < 0.01 for cytological distances and 0.204, P < 0.01 for physical distances). Because comparing all possible pairs of genes artificially inflates sample size (25), a permutation test was performed on the value of Spearman Rank-correlation, randomly shuffling the positions of genes. The test is significant (P < 0.001) for both cytological and physical distances. The same tests were performed separately for rearranged and colinear chromosomes. For both physical and cytological distance, clustering turns out to be significant for colinear chromosomes (P = 0.002 andP = 0.005), but it is only marginally significant for rearranged chromosomes (P = 0.096 and P = 0.074). To assess the effect of clustering in our primary result of higher K A/K S ratios in rearranged chromosomes, we performed two further permutation tests. The tests were as described above, except that instead of randomly shuffling individual genes between rearranged and colinear chromosomes, we shuffled blocks of genes. In the first test, we used blocks defined by genomic regions and by chromosomal bands in the second. In both cases, the probability of obtaining a difference inK A/K S ratios between rearranged and colinear chromosomes as big as in our data isP < 0.001. Therefore, we can conclude that differences between rearranged and colinear chromosomes are not explained by clustering of K A/K Svalues.

Even though differentialK A/K S values are not explained by clustering of evolutionary rates of proteins, they may be affected by genomic variables such as levels or mutation or recombination. GC content has been shown to correlate positively with the levels of synonymous substitution in the mammalian genome (22, 28, 29), and the amount of GC at fourfold redundant sites (GC4) correlates negatively withK A/K S values in mouse-rat comparisons (27). Thus, the higher Kvalues found in the two divergence data sets analyzed here (22, 23) could be due to more GC. Analysis of these data sets shows that, far from being higher, the amount of GC is actually lower in rearranged than in colinear chromosomes [47.80% versus 47.96% for (22) and 36.50% versus 40.17% for (23)]. Therefore, the tendency of rearranged chromosomes to have higher neutral divergence values cannot be explained by their having more GC. Furthermore, the analysis of the amount of GC4 in the set of 115 genes that we used to ascertainK A/K S ratios (30) shows that there are no significant differences between the two chromosomal classes. However, the amount of GC4 is slightly lower in genes mapping in rearranged chromosomes (56% versus 62%). Even though these differences are not significant, it is plausible that the K A/K S ratio in rearranged chromosomes may be inflated by the lower amount of GC4. To make sure that the significance of our primary result is not affected by a covariation in GC4 content, we performed a permutation test in which the genomic distribution of GC4 was kept constant between permutations. Following the method in (25), we divided our data set in classes of similar GC4 content, with each class containing 10% of the full data set. Then, we swapped only genes within classes. The test shows that, even when we controlled for GC4 content, genes in rearranged chromosomes have significantly higherK A/K S ratios (P = 0.0076). Finally, the amount of GC4 is thought to correlate with recombination rates in mice (31), and rates of protein evolution might reflect, in part, the local strength of purifying selection, which depends on recombination rates (27). To check whether rearranged and colinear chromosomes have different recombination rates, we used the high-resolution recombination map of the human genome in (32). The average value of centimorgans per megabase is 1.10 in rearranged and 1.17 in colinear chromosomes. Differences are not significant. Therefore, we can conclude that differential levels of mutation and recombination do not explain the differentialK A/K S ratios between rearranged and colinear chromosomes.

If speciation involved the differential accumulation of incompatible favorable alleles in chromosomes with rearrangements, then the selective sweeps produced by the fixation of such alleles should have decreased neutral polymorphism in these chromosomes (14). This effect should be detectable if the fixation of favorable alleles were recent. In the human-chimpanzee case, we expect the effect to be very weak, if present at all, because the original separation of the two lineages occurred between 6 or 7 million years ago and because, although the dates of successive speciation events along the two lineages are unclear, they are certainly not recent (33, 34), and variability is likely to have recovered. Also, recent demographic events might have erased the signature of any ancient selective sweeps (35). Nevertheless, an analysis of the data set in (36) shows that variability is lower in rearranged chromosomes (37). As can be seen in Table 2, haplotype heterozygosity is weakly, but significantly, lower in rearranged chromosomes (P = 0.014, Mann-Whitney's U test) and all the other measures, including Tajima's D, are in the expected direction if old sweeps did determine higher K A/K Sratios in rearranged chromosomes. These trends are compelling but not conclusive, as evidence from Drosophila (38) shows that the fixation of chromosomal rearrangements could produce a selective sweep that could have similar effects on linked variability.

Table 2

Variability measures from (36).

View this table:

Our observation that the relative rate of amino acid substitution is substantially higher in chromosomes that have undergone rearrangements during the divergence between chimpanzees and humans may be explained, at least in part, as a consequence of the barrier to gene flow generated by those rearrangements. Such an evolutionary explanation implies, first, that chromosomal changes are a major factor determining local rates of protein evolution; second, that the biological processes that led to the separation of our two lineages and/or to successive speciations, were facilitated, or even triggered, by chromosomal changes; third, that these processes involved periods of hybridization before speciation was gradually completed; and, finally, that the key genomic regions involved in these crucial changes are the ones spanned by rearrangements. Nevertheless, a twofold difference inK A/K S ratio between rearranged and colinear chromosomes is so substantial that it raises the question of whether it can be exclusively due to the speciation mechanism proposed here (7). Even taking into account that in our particular sample the lower amount of GC4 in genes mapping in rearranged chromosomes is slightly inflating theirK A/K S values, a twofold difference seems unlikely under the hypothesis of a time-constant rate of adaptive substitutions, because it could only be explained if rearrangements had been barriers in parapatry for at least half the time of divergence of the two lineages. Crucially, the evolutionary rate of a protein is not always constant through time (39,40) and episodes of rapid change have been shown to be associated with speciation and the dispersal of new species (41, 42). As mentioned above, other speciation-related processes, which have not yet been explored, may have contributed to the difference. For example, by preventing gene flow, rearrangements might facilitate the accumulation of alleles that are under weak divergent selection. Moreover, favorable alleles that do not take place in incompatibilities might accumulate in the low gene flow regions defined by rearrangements because they spread through the population much faster than neutral variants. Even so, explanations unrelated to speciation that would inflate the difference in protein evolution rates cannot be completely ruled out. It is possible, for example, that adaptive changes in certain genes or regions facilitate the accumulation of further adaptive changes in the same genes or regions (43); that the accumulation of positively selected alleles in a new arrangement facilitates its fixation; that the reduction in population size associated with the fixation of a new rearrangement allows for a relaxation of purifying selection; or that changes in recombinational context associated with rearrangements might move linked regions to regions with a different equilibrium base composition and thus lead to changes in mutation rates. Anyhow, it is clear that several fascinating questions are raised by our results. Did all the chromosomal changes currently separating the two species became fixed because of some speciation event? In which lineage and time did a given speciation and the associated adaptation took place? What is the functional content of the genomic areas involved in that adaptation? When more data become available, it will be possible to address them and to take a further step into linking the structure and function of the genome.

Supporting Online Material

Fig. S1

Tables S1 and S2

  • * To whom all correspondence should be addressed. E-mail: arcadi.navarro{at}


View Abstract

Stay Connected to Science

Navigate This Article