Nucleotide Variation Along the Drosophila melanogaster Fourth Chromosome

See allHide authors and affiliations

Science  04 Jan 2002:
Vol. 295, Issue 5552, pp. 134-137
DOI: 10.1126/science.1064521


The Drosophila melanogaster fourth chromosome, believed to be nonrecombining and invariable, is a classic example of the effect of natural selection in eliminating genetic variation in linked loci. However, in a chromosome-wide assay of nucleotide variation in natural populations, we have observed a high level of polymorphism in a ∼200-kilobase region and marked levels of polymorphism in several other fragments interspersed with regions of little variation, suggesting different evolutionary histories in different chromosomal domains. Statistical tests of neutral evolution showed that a few haplotypes predominate in the 200-kilobase polymorphic region. Finally, contrary to the expectation of no recombination, we identified six recombination events within the chromosome. Thus, positive Darwinian selection and recombination have affected the evolution of this chromosome.

Detecting evolutionary forces that shape the structure of genetic variation at the genomic level often relies on understanding the effects of natural selection on nearby linked loci (1–4), for which the fourth chromosome of Drosophila melanogaster is a classical model system (5–7). It has been thought to undergo no meiotic recombination except under certain experimental conditions (e.g., the interchromosomal effect introduced for the purpose of mapping) (8, 9). Two genetic models—“selective sweep” by the hitchhiking effect (1, 10), in which an advantageous allele is selected for and fixed in species rapidly, and the background selection model, in which deleterious mutations are selected against (2, 10)—predicted a lack of variation throughout the chromosome, which has been supported by limited data (6, 11). We have reexamined the level of variation and recombination in the fourth chromosome using a chromosome-wide assay of nucleotide variation.

We first investigated within-species nucleotide variation in two adjacent regions from the 102F cytological position of the fourth chromosome of D. melanogaster: (i) 4257 base pairs (bp) of the CG11091 locus, and (ii) 847 bp of an intron of the toy gene. CG11091 and toy are separated by ∼10 kb. We directly sequenced these genes from a worldwide sample of 23 chromosomes by polymerase chain reaction (PCR). We also sequenced an additional nine lines from one location in Israel (IS), resulting in 11 chromosomes sequenced from a single local population (Fig. 1).

Figure 1

Nucleotide variations in theCG11091-toy region of the D. melanogaster fourth chromosome. The sites defining the two haplotype groups are shown in bold type in the low-frequency haplotype group. The numbers for the positions, e.g., 113 and 90, indicate the positions of polymorphic sites in toy (site 113 to 758) and CG11091 (site 90 to 4004). i1/d1 and i2/d2 are two insertion-deletion (indel) polymorphisms: i1 indicates presence of the sequence ATTTTACAAAA and d1, absence of this sequence; i2 indicates presence of the sequence GGTGTATCATTTGCTTTC and d2, absence of this sequence. Other indel polymorphisms are shown directly; dashes indicate absence of nucleotides.

Both the worldwide and population samples show high levels of nucleotide variation in the CG11091-toy region. The entire region of 5104 nucleotides (nt) contains 47 segregating sites for the worldwide sample (nucleotide diversity π = 0.0028) and 32 segregating sites for the IS population (π = 0.024). These segregating sites include six insertion/deletion (indel) sites. The π value for the toy gene is as high as 0.0049 for the worldwide sample and 0.0043 for the IS population. This level of variation is an order of magnitude greater than those previously observed in ci genes in the fourth chromosomes of D. melanogaster and Drosophila simulans(6) and in Drosophila sechellia andDrosophila mauritiana (11), for which the values range from 0 to 0.0003. The probability distribution of segregating sites (12) in these IS and worldwide samples revealed that the observed levels of variation are significantly higher than the nucleotide diversities of ci (0.0002) (6,11) and other loci in regions of low recombination (0.0005) (13) (P < 0.001). Indeed, the observed levels of variation are in the range typically seen for autosomal genes in regions of normal recombination (13).

Figure 1 shows that nucleotide variation along the fourth chromosome is partitioned into two distinct sets of haplotypes with unequal frequencies. Hereafter, we refer to the high- and low-frequency haplotype groups as the major and minor haplotypes, respectively. The frequency of the major haplotype is similar in both the IS and worldwide data sets (8/11 and 14/23, respectively). Furthermore, these two haplotypes appear in all local populations tested (Fig. 1) (14), suggesting that the haplotype distribution in the IS population is typical worldwide. These results raise new questions about the evolutionary forces shaping variation and recombination on the fourth chromosome, prompting further analysis.

Using the 11 randomly sampled alleles from the IS population (Fig. 1), we carried out three different statistical tests of the null hypothesis of neutrality (15), assuming randomly generated haplotypes: haplotype partition test (HP test) (16), haplotype number test (K test) (17), and haplotype diversity test (H test) (17). We also calculated Tajima's D value (18), a measure of skewness in the frequency spectrum of polymorphic sites. For all three tests, probability values were estimated by Monte Carlo simulation (19). Because the sequenced region of toy is too short for powerful statistical testing and its pattern and levels of polymorphism are similar to those of CG11091, we pooled data from both gene regions, comprising 32 segregating sites and five haplotypes with a haplotype diversity of 0.618.

In the K test and H test, we found a significant reduction with respect to the neutral expectation in both the number of haplotypes and haplotype diversity of the sample (Table 1). The HP test further revealed that polymorphic sites are not uniformly distributed among the sequences in the sample (Table 1). It is possible that the two classes of haplotypes reflect a history of positive Darwinian selection, most likely balancing selection, which was also consistent with a slight excess of intermediate-frequency polymorphisms, as indicated by a positive Tajima's D value (0.6428; P = 0.2167).

Table 1

Neutrality tests of haplotype structures in theD. melanogaster fourth chromosome genes.

View this table:

An alternative interpretation is that the departure from the null hypothesis of neutrality is a consequence of ancient admixture of two differentiated populations, consistent with a demographic cause (20). However, the worldwide survey shows that all local populations contain both haplotypes, providing no evidence for population differentiation in terms of variation along the fourth chromosome. Furthermore, such a model predicts similar dimorphisms elsewhere in the genome; this has not been found in the many population genetic studies of D. melanogaster variation. Thus, it is more parsimonious to interpret the observed dimorphism as a consequence of balancing selection.

We also examined whether the haplotype structure was associated with a chromosomal inversion that may be under balancing selection, resulting in the observed pattern. We used fluorescence in situ hybridization (FISH) of polytene chromosomes (21) to ascertain the gene order within the haplotype region; this did not reveal any inversion. An alternative scenario of positive selection would involve the major haplotype undergoing a selective sweep, being driven by selection toward fixation. However, that the two haplotypes are polymorphic in all locations argues against this possibility.

Because our results contrast with those from previous studies ofci in Drosophila, we sequenced 10 cigenes from our worldwide sample and again found no polymorphism (Fig. 2). The lack of variation at ci implies different evolutionary histories in different chromosomal regions and the existence of recombination between ci andCG11091-toy. We investigated these conjectures by examining patterns of linkage disequilibrium throughout the euchromatic region of the chromosome by sequencing an additional 15 gene regions (22) (Fig. 2). We also sequenced seven related genes inD. simulans, the sibling of D. melanogaster, to further investigate the role of selection and to determine the ancestral haplotype.

Figure 2

Distribution of variation on theD. melanogaster fourth chromosome. (A) The surveyed 18 gene regions along the D. melanogasterchromosome in the map (25) with the gene order betweenCG11153 and pho corrected on the basis of our FISH experiments. (B) Segregating sites in the nonrandomly sampled 10 chromosomes with dimorphism shown in bold type. Arrows indicate the minimum number of recombination (Rm) sites identified by the four-gamete method (24).Drosophila simulans is the outgroup. i/d denotes indel polymorphism of various lengths indicating presence (i) or absence of nucleotide sequences (d). Seven indel polymorphic sites are shown: i1/d1 = 16 bp, i2/d2 = 2 bp, i3/d3 = ∼1 kb, i4/d4 = 13 bp, i5/d5 = 2 bp, i6/d6 = 11 bp, i7/d7 = 18 bp. (C) Ks: synonymous divergence per nucleotide site betweenD. melanogaster and D. simulans.

First, we found that the chromosome could be divided into three discrete domains: (i) the 200-kb dimorphic domain containing theCG11091-toy region with its characteristic two-haplotype organization of variation; (ii) a polymorphic proximal domain in which no such haplotype organization is seen; and (iii) a domain, distal to the centromere, where levels of variation are low. The first domain has the highest level of variation in comparison with the two other domains, although its average silent nucleotide diversity, 0.0021, is lower than the average nucleotide diversity in the genome (13). The latter two domains, both proximal and distal to the dimorphic domain, show no such dimorphism, and varying levels of polymorphism. Although many genes in these two domains show levels of polymorphism characteristic of regions of reduced recombination (13), 3 out of 11 gene regions in these areas show relatively high levels of nucleotide diversity (0.0012 to 0.0019). The boundaries between these domains can be narrowed to two short regions of 15 kb (boundary 1 between the regions CG11153 and B) and 7 kb (boundary 2 between toy and plexA).

A statistical test for heterogeneity of variation among gene regions, based on the X 2 − Kreitman-Hudson test statistic (23), shows, for all 18 gene regions, significant heterogeneity (P ≪ 0.0001). This further indicates that the fourth chromosome is not evolving as a single unit; different regions appear to have different evolutionary histories.

All but one of the five gene regions in the dimorphic domain in the additional survey show the same haplotype structure as the CG11091-toy region; locus A, however, shows no variation. This domain thus displays linkage disequilibrium over some 200 kb. The major haplotype is less diverse than the minor haplotype: The major haplotype cluster contains 9 segregating sites (one indel site included), whereas the minor haplotype cluster contains 30 such sites (2 indel sites included) (Fig. 2). In the CG11091-toyregion, the worldwide sample reveals that for the within-minor haplotype group π = 0.0015, and for the within-major haplotype group π = 0.0004. Consistent with these observations, theD. simulans outgroup sequences in gene regionsCG11152 and CG11091 reveal that the minor haplotype is ancestral.

By pooling all 16 loci that contain polymorphisms, we have estimated a minimum number of six recombination events (Rm) by the four-gamete method (Fig. 2) (24). However, because we have not obtained a continuous sequence along the entire chromosome, the true Rm for the fourth chromosome is in all probability larger. Thus, to calculate an upper bound on Rm, we identified one event in CG11093 from 20,010 nt that we sequenced, yielding a Rm density of 1/20,010 = 0.05/kb in 10 chromosomes. To calculate a lower bound, we assume a Rm density of six events/1156-kb nucleotides (the chromosomal euchromatin length) = 0.0052/kb in these chromosomes. Although these results cannot be directly compared with the experimental recombination estimates, it is informative to compare them with the estimates from other population genetic data. The Adh gene in a moderate-recombination region has a Rm density of 1.84 events per kilobase in 11 alleles (24). Thus, qualitatively, rates of recombination on the fourth chromosome are 37- to 354-fold lower than those on normal autosomes. The amount of recombination observed here is low, consistent with genetic analysis of the chromosome (8). In contrast to previous predictions (1–3, 6), however, such a low rate has a considerable effect on the structure of genetic variation on the chromosome.

Recombination caused by crossovers at each end of the dimorphic domain may account for the different evolutionary histories of the three chromosome domains as described above. The genome sequence (25) reveals that both putative boundary regions contain many repetitive sequences that may facilitate genetic recombination.

Given that recombination does occur on the fourth chromosome, the maintenance of the huge dimorphic domain is anomalous—we would expect it to be eroded by recombination. However, it seems plausible to suppose that the dimorphism is the joint product of balancing selection on a locus within the region, and a low rate of recombination such that variation linked to one balanced allele is seldom, if ever, recombined into association with the other allele.

The significantly reduced variation outside the dimorphic domain could be due to either a reduced mutation rate, hitchhiking with positive Darwinian selection, or background selection. The first hypothesis, which predicts that low divergence between species will correspond to low variation within species, was not supported by the observed typical level of silent site substitutions,Ks (Ks = 0.0785 ∼ 0.1463) (Fig. 2) (3, 6). For the second and third hypotheses, a Tajima's D test on pooled data from all seven gene regions in the centromere-proximal nondimorphic domain shows no significant bias in the polymorphism spectrum (D = −0.9745,P = 0.1739) and thus does not support a recent selective sweep over this long region (26). This leaves the possibility that other forms of selection—e.g., background selection or directional selection in local regions delineated by recombination—may play a role. Even if selective sweep does occur in some local regions, the low recombination rate would render it a slow process and make it unlikely to be global.

Previous studies, both theoretical and empirical, had concluded that the fourth chromosome lacks variation. However, we have found that it not only harbors high levels of nucleotide variation throughout the chromosome, but also has a unique dimorphism that extends across a long chromosome domain, suggesting the importance of positive Darwinian selection (balancing selection) in the evolution of this chromosome. These results may be viewed as empirical support for Dobzhansky's “coadapted gene complex” idea (27), with each haplotype representing a distinct complex. The evolution of such a complex—if it is to occur at all—is most likely to occur in regions of low recombination like the one in question. These results provide a starting point for reassessing the genetic and evolutionary forces that affect both this chromosome in particular, and low recombination regions in general.

  • * To whom correspondence should be addressed. E-mail: mlong{at}


Stay Connected to Science

Navigate This Article