Chromosomal Effects of Rapid Gene Evolution in Drosophila melanogaster

See allHide authors and affiliations

Science  05 Jan 2001:
Vol. 291, Issue 5501, pp. 128-130
DOI: 10.1126/science.291.5501.128


Rapid adaptive fixation of a new favorable mutation is expected to affect neighboring genes along the chromosome. Evolutionary theory predicts that the chromosomal region would show a reduced level of genetic variation and an excess of rare alleles. We have confirmed these predictions in a region of the X chromosome ofDrosophila melanogaster that contains a newly evolved gene for a component of the sperm axoneme. In D. simulans, where the novel gene does not exist, the pattern of genetic variation is consistent with selection against recurrent deleterious mutations. These findings imply that the pattern of genetic variation along a chromosome may be useful for inferring its evolutionary history and for revealing regions in which recent adaptive fixations have taken place.

We have previously described the de novo evolution of a gene in the lineage of D. melanogaster (1). This gene, denoted Sdic, encodes a novel intermediate chain in a sperm-specific axonemal dynein. Changes that led to the creation of Sdic during the short evolutionary history of D. melanogaster [about 3 million years (2)] exhibit evidence for adaptive evolution. The gene was created from duplicated—and hence dispensable—copies of the genes for annexin X (AnnX) and the cytoplasmic dynein intermediate chain (Cdic). Three large deletions led to the fusion of the duplicated genes, whereupon a series of smaller deletions and nucleotide substitutions fashioned a new amino end of the Sdic polypeptide and created motifs characteristic of known axonemal dynein intermediate chains. The regulatory region of Sdic, including a spermatocyte-specific promoter element, also evolved fromAnnX and Cdic sequences (1).

In principle, the evolutionary changes in Sdic could have taken place relatively rapidly during and immediately following speciation (3). In this case, current selection pressure onSdic should be mainly to eliminate deleterious mutations. However, Sdic still appears to be evolving rapidly, as evidenced by the fact that the ratio of replacement to synonymous polymorphisms is in excess of 2:1 [(1) and additional data shown below].

The evidence for ongoing positive selection of Sdic prompted us to examine genetic variation in the surrounding genomic region to determine whether the theoretically predicted consequences of a rapid adaptive fixation (selective sweep) could be detected. The key issue is whether selection has been sufficiently recent and strong enough to yield a statistically significant deviation from the pattern of genetic variation that would be expected from nearly neutral polymorphisms affected only by random genetic drift, as well as selection against linked deleterious mutations [“background selection” (4,5)]. Strong positive selection increases the frequency of a new favorable mutation and displaces linked nucleotide polymorphisms in the process (6). Theory predicts that a recent selective sweep should create a characteristic “trough” in the level of polymorphism in a region that includes the selected gene (7), as well as an excess of “singleton” polymorphisms (those present in only one sequence in the sample). On the other hand, theory also indicates that levels of polymorphism should be restored relatively rapidly after a selective sweep. The time required for effective recovery of Tajima'sD (8) is approximately 2N generations, where N is the effective population number; in D. melanogaster 2N generations are about 80,000 years. Tajima's D (9) is a conventional measure that compares the nucleotide diversity (pairwise differences) in a sample with the proportion of polymorphic sites, and it is negative when there is an excess of low-frequency polymorphisms, such as singletons.

To look for evidence of a selective sweep, we examined the spatial distribution of polymorphisms in the region at the base of the X chromosome that includes Sdic in D.melanogaster. The same analysis was carried out in the homologous region of the sibling species D. simulans, which lacks the Sdic gene. The pattern of polymorphism in D. simulans serves as a control, since there is no a priori reason to expect a recent selective sweep.

We sampled genes from polytene chromosome bands 18E1 to 20D. Messenger RNAs from 11 genes in D. melanogaster and 10 genes in D. simulans were reverse-transcribed, and the products were amplified by the polymerase chain reaction (PCR) and sequenced. Our analysis is based on an average of 903 base pairs per gene in each of 15 strains of D. melanogaster and 834 base pairs per gene in each of 7 strains of D. simulans (10). The analysis was confined to synonymous polymorphisms to eliminate possible artifacts due to different selective constraints or rates of amino acid replacement among the proteins.

To analyze the distribution of polymorphism along the chromosome, we used logistic regression. For each gene, let W(x) be the number of segregating synonymous sites andL(x) be the total number of synonymous sites in the sample. In these functions, x corresponds to the relative position of the gene in the chromosome. Under a simple model of background selection, the fraction of segregating sites,S(x) =W(x)/L(x), should decrease monotonically as x moves from the euchromatin of the X chromosome toward and into the pericentromeric heterochromatin, owing to the progressive decrease in the rate of recombination and effective population size (11). The logistic regression model is used rather than an ordinary linear regression of S onx, because S is necessarily bounded on (0, 1). This feature favors analysis of S rather than derivative quantities, such as θ (12), even though, in the present case, the logistic regressions would be equivalent because the sample size is the same for all genes.

Maximum likelihood was used for parameter estimation and hypotheses testing. For each gene, defineŜ(x) as the predicted probability of polymorphism, given a logistic model as an nth order polynomial function in x:Embedded ImageUnder a model of background selection, with no effects of selective sweeps, the only parameters that should differ significantly from 0 are β0 and β1, the intercept and coefficient of x, respectively. In contrast, if in the recent past there has been a selective sweep, then we would expect that a significantly better fit would be provided by a model with quadratic and cubic terms. These terms would reflect the expected trough in the level of polymorphism across the region, but higher order terms would not necessarily be expected to be significant.

The data from D. simulans are shown in Table 1 and those from D. melanogaster, in Table 2. The genes are listed in order along the X chromosome from distal to proximal with respect to the centromeric heterochromatin. The sequenced region of each gene is indicated, along with the number of synonymous nucleotide sites in the region and the observed number of polymorphic synonymous sites. For synonymous nucleotide sites, θsyn is the nucleotide polymorphism, estimated from S and the sample size (9); the parameter πsyn is estimated as the average proportion of pairwise differences per synonymous site. For neutral alleles in mutation-drift equilibrium, θsyn= πsyn= 4Nμ, where N is the effective population size and μ the nucleotide mutation rate (9).

Table 1

Segregating synonymous sites in D. simulans.

View this table:
Table 2

Segregating synonymous sites in D. melanogaster.

View this table:

The results of the logistic regression analyses are shown in the curves in Fig. 1, along with their 50% confidence bands (13). In D. simulans(Fig. 1A), we find a monotonic decrease in nucleotide polymorphism as the genetic markers approach the centromeric heterochromatin. This pattern has previously been described in regions of low recombination in Drosophila (14), which includes the centromeric heterochromatin, and has been attributed to increased efficiency of both selective sweeps and background selection in such regions (15). In D. melanogaster (Fig. 1B), the level of polymorphism shows the depression in a region nearSdic that we predict based on the evidence for positive selection of this gene. This trough in the level of polymorphism is consistent with a recent selective sweep in the region. A recent selective sweep is also implied by the frequency spectrum of the polymorphisms. For the 10 D. melanogaster genes with one or more polymorphic nucleotides (Table 2), 7 show an excess of singleton polymorphisms, indicated by the negative value of Tajima'sD. Although there are so few polymorphisms that none of the individual values of D is significantly different from 0, across the region as a whole, a one-tailed nonparametric Wilcoxon signed-rank test for Tajima's D is significant (P = 0.04 for silent sites, P = 0.01 for all sites). In contrast, neither test yields a significant value ofD for the data from D. simulans(P = 0.44 and P = 0.28, respectively).

Figure 1

(A) Results from D. similansshowing a monotonic decrease in the proportion of polymorphic sites (S) as a function of gene location at the base of the X chromosome. (B) Results from D. melanogastershowing a significant trough in the proportion of polymorphic sites (S) in the region around Sdic. The delimiter on each point is the approximate 50% confidence interval.

Significance tests for the coefficients in the logistic regressions are given in Table 3. The test statistic is the difference in the log-likelihood of the data based on polynomial regressions of different order, which is approximately chi-square distributed with degrees of freedom equal to the difference in the order of the polynomials (16). For the D. simulans data, the linear regression is significant, and no higher order terms improve the goodness of fit. On the other hand, in the D. melanogaster regression, both linear and cubic terms are significant, and no higher order terms are significant. The cubic term is needed to fit the trough of polymorphism in the Sdic region.

Table 3

Results of log-likelihood ratio tests.

View this table:

In summary, our prediction that Sdic has undergone one or more recent selective sweeps is supported by two independent features of the data. The first is the significant depression in the level of polymorphism near polytene chromosome region 19A(Fig. 1B), and the second is the frequency spectrum of polymorphisms skewed toward rare alleles including singletons (Table 2). Neither of these patterns is observed in the homologous chromosomal region in the sibling species D. simulans (Fig. 1A and Table 1). These analyses were based on silent sites alone. Yet another indication of ongoing selection for Sdic is evident in the fact that Sdic accounts for 46% of the nonsynonymous polymorphisms but only 15% of the nonsynonymous sites (P ≈ 0.001), whereas the level of synonymous polymorphism in Sdic is one of the lowest of the genes examined. Furthermore, nonsynonymous changes account for 70% of theSdic polymorphisms, which is much higher than the average of 26% for other genes in D. melanogaster(17).

These findings confirm our prediction that the newly evolved Sdic gene has undergone one or more recent selective sweeps. The more general significance of the findings is the demonstration that natural selection for improved gene function may often be of sufficient magnitude to cause the level of polymorphism to be markedly reduced in or near the target of selection and to generate a distinctive frequency spectrum skewed toward rare alleles including singletons. Analysis of genetic variation across contiguous regions of the genome may therefore be a promising approach for identifying the locations of recently selected genes in Drosophila and other organisms.

  • * To whom correspondence should be addressed. E-mail: dhartl{at}


Stay Connected to Science

Navigate This Article