Report

A BAC-Based Physical Map of the Major Autosomes of Drosophila melanogaster

See allHide authors and affiliations

Science  24 Mar 2000:
Vol. 287, Issue 5461, pp. 2271-2274
DOI: 10.1126/science.287.5461.2271

This article has a correction. Please see:

Abstract

We constructed a bacterial artificial chromosome (BAC)–based physical map of chromosomes 2 and 3of Drosophila melanogaster, which constitute 81% of the genome. Sequence tagged site (STS) content, restriction fingerprinting, and polytene chromosome in situ hybridization approaches were integrated to produce a map spanning the euchromatin. Three of five remaining gaps are in repeat-rich regions near the centromeres. A tiling path of clones spanning this map and STS maps of chromosomesX and 4 was sequenced to low coverage; the maps and tiling path sequence were used to support and verify the whole-genome sequence assembly, and tiling path BACs were used as templates in sequence finishing.

The fruit fly Drosophila melanogaster is a principal model organism in metazoan genetics and molecular biology. Here, we describe a BAC-based physical map of chromosomes 2 and 3 constructed as part of the effort to determine the D. melanogaster genome sequence (1). There are five chromosomes (X, 2,3, 4, and Y), and the second and third together account for ∼97 Mb of the ∼120-Mb euchromatic portion of the genome. Several clone-based physical maps have been described previously. Low-resolution yeast artificial chromosome maps of the genome have been produced by polytene chromosome in situ hybridization (2), and cosmid maps of regions of theX chromosome have been made by STS content and fingerprint mapping (3). The most complete previous map is the P1-based map by Kimmerly et al. (4) [also see (5)], constructed by polymerase chain reaction–based STS content mapping and polytene chromosome in situ hybridization. On chromosomes 2 and 3, it comprises 348 sets of contiguously overlapping clones (contigs), each with at least two STS markers.

The contiguity of the P1 map was limited by the shallow genome coverage of the library (about sixfold) and the relatively small insert size of the clones (80 kb). BAC vectors can accommodate larger inserts, so we created a BAC map using the P1 map as a starting point. We constructed a BAC library (RPCI-98) from an isogenicy1 ; cn1 bw1sp1 strain (6). High-molecular-weight (HMW) DNA was prepared from adults (7), partially digested with Eco RI and Eco RI methylase, size fractionated, and cloned into the pBACe3.6 vector (8). The library consists of 17,540 recombinant clones with an average insert size of 163 kb and represents ∼24-fold coverage of the euchromatic portion of the genome (9).

We hybridized radioactively labeled oligonucleotide probes made from STS markers selected from the P1 map to colony arrays representing the RPCI-98 library (10); 1226 markers from the P1 map are included in the BAC map, at an average spacing of 80 kb. Because these markers had been previously localized, the data for each of the four chromosome arms (2L, 2R, 3L, and 3R) could be assembled separately, and this reduced the complexity of the assembly process.

To join the initial contigs together, new markers were added to the map in multiple iterations of STS design, hybridization, and data assembly. The new markers included 690 designed from BAC end sequences (1), 5 designed from genomic sequences, and 2 designed from coding sequences of known genes. Potential markers with substantial sequence similarity to more than one location in the genome were rejected. These were identified by scanning databases of known repeats and scanning for instances of the sequence in multiple, nonoverlapping BAC and P1 clones. In the latter stages of the project, restriction fingerprints (see description below) were used in STS design to identify BACs that extended farthest into the map gaps. The map presented here includes 1923 markers at an average spacing of 50 kb.

STS content data were assembled by chromosome arm in the program SEGMAP v3.49 (11) and manually edited. Cytological data associated with markers from the P1 map were used to identify false joins in the BAC map. These were due to markers that hybridized to multiple sites in the genome and were resolved by removing the markers from the map. Markers that had been mapped to the wrong chromosome arm in the P1 map were identified by their failure to incorporate into assemblies and were moved. The quality of the hybridization data resulted in a map with a high degree of internal consistency (12). The accuracy of the map has been confirmed by selecting a complete tiling path of clones and sequencing them to low coverage (1).

The STS content map (5, 13) has five gaps outside of the centromeric heterochromatin, which is not represented in large-insert clone libraries. Three gaps are near the centromeres, and we have been unable to identify unique probes to close them. In an attempt to close the gaps at 57B4 and 64C5 (Figs. 1 and2B), we screened an alternative BAC library (14), but no spanning clones were identified. The apparent absence of BACs covering these two gaps may reflect random fluctuations in the distribution of clones, an absence of appropriate restriction sites, or sequences that cannot be cloned in the BAC vector. None of the five gaps was spanned by clones in the whole-genome shotgun sequence assembly (1).

Figure 1

BAC-based physical map of D. melanogaster chromosomes 2 and 3. A representation of the euchromatic portion of the four chromosome arms is shown, indicating regions covered by overlapping BAC clones (gray bars). The extent of coverage has been determined by polytene chromosome in situ hybridization of BACs (Fig. 2). The scale indicates cytological map position along the chromosomes (2L 21A1–40F7, 2R 41A1–60F5, 3L 61A1–80F9, and 3R 81F1–100F5), and the lengths of the numbered divisions represent their estimated relative physical lengths (23). Regions not represented by mapped BACs are indicated, as are the positions of the telomeres (TEL) and centromeres (CEN).

Figure 2

Polytene chromosome in situ hybridization of BACs (31). DNAs for use as probes were prepared with an alkaline lysis procedure (9). The chromosomes are Giemsa-stained (blue), and hybridized BACs are stained with a diaminobenzidine reaction (brown). (A) BACs at contig ends demonstrating coverage of the euchromatin. (B) BACs flanking gaps in map coverage. (C) Overlapping BACs near the 2L telomere demonstrating resolution of the method.

We constructed a fingerprint map from Eco RI digests of BACs to corroborate the STS content assemblies, define the extent of clone overlaps, and provide a resource for ensuring that sequence assemblies accurately reflect the structure of the genome. Agarose gel–based restriction fingerprinting was carried out essentially as described by Marra et al. (15) on 10,253 random BACs representing ∼14-fold coverage of the genome. Fingerprint data were collected in IMAGE v3.9d (16). Fingerprinting with Eco RI, the enzyme used to make the library, simplified map assembly because no vector-insert junction fragments were generated.

Fingerprint data were assembled by means of the program FPC (fingerprinted contigs) v4.2 (17, 18); assemblies were edited manually to remove false joins, which were readily identified by means of the STS content map. We optimized stringency settings for the FPC assembly algorithm by comparing fingerprint assemblies to known BAC locations, STS order, and Eco RI sites in the finished sequence of the 2.9-Mb Adh region (19). Settings were optimized to yield large contigs, which reduced the number of manually directed merges required to achieve contiguity. We found lower stringency settings that reduced the number of contigs by 60% and resulted in <10% additional false joins relative to high-stringency settings (20).

The STS content map was used to divide the genome into segments, and restriction fingerprints of BACs within the segments were assembled and edited independently of one another. This strategy permitted multiple operators to edit segments in parallel and reduced the complexity of each assembly. First, BACs on chromosome arm 3L were assembled as a single ∼24-Mb project; automated assembly in FPC generated 153 contigs, and merges that were suggested by STS content data and confirmed by fingerprint data resulted in eight contigs. Next, chromosome arms 2L, 2R, and 3R were divided into 14 segments averaging 5 Mb in size. Automated assembly of these segments resulted in 225 contigs; fingerprint and STS content data were used to direct merges between contigs. We then merged the 5-Mb assemblies to yield a fingerprint map with 16 gaps relative to the STS content map. We collected directed fingerprints for 56 additional BACs selected from the STS content map, and these data closed four fingerprint gaps. The remaining 12 gaps may be due to sparse BAC coverage, the distribution of Eco RI sites, or low STS marker density in these regions. The fingerprint assemblies (21) corroborate the STS content assemblies, providing confidence in the integrated map.

The polytene chromosomes constitute the unambiguous physical map ofD. melanogaster (22, 23). To align the BAC map with the cytological map, we mapped BACs by in situ hybridization to polytene chromosomes. First, random BACs were hybridized to provide anchor points throughout the genome; 173 mapped to specific locations on chromosomes 2 and 3. Next, an additional 547 BACs from the tiling path selected for sequencing were hybridized to provide finer alignment of the BAC map, the cytological map, and the genome sequence. These hybridized BACs represent ∼1.2-fold coverage of the euchromatic portion of the two chromosomes (5).

The in situ data indicate that BAC coverage extends nearly to the telomeres (Fig. 2A). It is more difficult to determine how far the map extends toward the centromeres (Fig. 2A); in pericentric regions, the morphology of hybridized chromosomes is poorly preserved and difficult to interpret. These regions include a substantial amount of repetitive sequence, so BACs representing them often hybridize to multiple locations in the genome. However, each of the three small contigs near the centromeres (Fig. 1) contains at least one BAC that hybridizes to a single cytological location. The in situ data also permit estimation of the sizes of the euchromatic regions not represented in mapped BACs (Fig. 2B and Table 1). The resolution of in situ hybridization varies across the genome because of differences in the DNA content of each polytene chromosome band (Fig. 2C), and the relative DNA content of each band has been measured by Sorsa (23). We estimate that the map covers >97.9% of the euchromatic portion of the two chromosomes (Table 1).

Table 1

Estimated sizes of the euchromatic regions of chromosomes 2 and 3 that are not represented in mapped BACs. For each region, the relative sizes (23) of the unrepresented polytene bands and the flanking represented bands were summed. The relative size totals were adjusted to the known size of the euchromatic portion of the genome (1). These are overestimates, as some BACs at contig ends hybridize to dispersed repeats and the chromocenter and cannot be used in the calculation. The total unrepresented portion is 2.1% of the euchromatic portion of the two chromosomes.

View this table:

The construction of this BAC map and the recently reported BAC maps ofArabidopsis thaliana, which has a genome size similar to that of D. melanogaster, illustrate how hybridization-based STS content mapping and agarose gel–based restriction fingerprint mapping can be productively integrated to produce contiguous clone-based physical maps of large genomic regions. The STS content map of the ∼130-Mb A. thaliana genome had 130 contigs (24), and the restriction fingerprint map had 169 contigs (25); integration of these data resulted in a BAC map with 14 gaps, excluding the centromeres (24). The D. melanogaster BAC map presented here has five gaps, excluding the centromeric heterochromatin. We found it efficient to use the STS content map to direct fingerprint assembly and did not attempt to construct an independent fingerprint map. In our experience, STS content mapping with oligonucleotide probes is more effective for achieving contiguous clone coverage, and agarose gel–based restriction fingerprint mapping is more useful for measuring the extent of clone overlaps and confirming that sequence assemblies reflect the structure of the genome. The differing utilities of the two techniques arise because STS content data have higher specificity for detecting clone overlaps, and restriction fingerprint data have higher resolution for measuring them. Our success in combining STS content and restriction fingerprint data to produce an integrated, accurate, and essentially complete map argues for a similar approach for the human and mouse genomes.

The physical map described here played three key roles in the generation of the D. melanogaster genome sequence described by Adams et al. (1). First, the map provided an independent benchmark for evaluating the accuracy of whole-genome shotgun sequence assemblies (26). Second, a tiling path of overlapping BAC and P1 clones spanning the map of chromosomes2 and 3 was shotgun sequenced to at least onefold coverage, and these data were assembled with the whole-genome shotgun data to increase total sequence coverage from 12- to 13.5-fold. These data also directly confirm the accuracy of clone overlaps in the BAC map. Third, the BACs composing the tiling path were used as templates for gap closure in sequence finishing. In addition to these roles in sequence assembly and validation, the mapped BACs facilitate the subcloning of any region of the genome.

BAC-based STS content maps of the X chromosome (27) and chromosome 4 (28) have been constructed by others. These maps will be integrated with the restriction fingerprint data to complete a BAC-based physical map of the whole genome. The contiguity and depth of coverage of these maps have ensured that the complete sequence of the euchromatic portion of the D. melanogaster genome could be correctly assembled and finished to high accuracy.

  • * To whom correspondence should be addressed. E-mail: hoskins{at}bdgp.lbl.gov

  • Present address: Children's Hospital Oakland Research Institute, Oakland, CA 94609, USA.

  • Present address: Parke Davis Laboratory for Molecular Genetics, Alameda, CA 94502, USA.

REFERENCES AND NOTES

View Abstract

Navigate This Article