Report

A Physical Map of the 1-Gigabase Bread Wheat Chromosome 3B

See allHide authors and affiliations

Science  03 Oct 2008:
Vol. 322, Issue 5898, pp. 101-104
DOI: 10.1126/science.1161847

Abstract

As the staple food for 35% of the world's population, wheat is one of the most important crop species. To date, sequence-based tools to accelerate wheat improvement are lacking. As part of the international effort to sequence the 17–billion–base-pair hexaploid bread wheat genome (2n = 6x = 42 chromosomes), we constructed a bacterial artificial chromosome (BAC)–based integrated physical map of the largest chromosome, 3B, that alone is 995 megabases. A chromosome-specific BAC library was used to assemble 82% of the chromosome into 1036 contigs that were anchored with 1443 molecular markers, providing a major resource for genetic and genomic studies. This physical map establishes a template for the remaining wheat chromosomes and demonstrates the feasibility of constructing physical maps in large, complex, polyploid genomes with a chromosome-based approach.

Among plants providing food for humans and animals, one of the oldest and most widespread is wheat (Triticum aestivum L.). Despite its socioeconomic importance and the challenges that agriculture is facing today (1), wheat genomics and its application to crop improvement are lagging behind those of most other important crops. The wheat genome has always been viewed as impossible to sequence because of its large amount of repetitive sequences (>80%) and its size of 17 Gb, which is five times larger than the human genome. The largest wheat chromosome (3B) alone is more than twice the size of the entire 370-Mb rice genome (2), whereas the entire maize genome (2.6 Gb) is about the size of three wheat chromosomes (table S1). Further complicating the challenge, bread wheat is a relatively recent hexaploid (2n = 6x = 42) containing three homoeologous A, B, and D genomes of related progenitor species, meiotic recombination is not distributed homogeneously along the chromosomes, and intervarietal polymorphism is very low.

Genome sequencing is the foundation for understanding the molecular basis of phenotypic variation, accelerating breeding, and improving the exploitation of genetic diversity to develop new crop varieties with increased yield and improved resistance to biotic and abiotic stresses. These new varieties will be critical for meeting the challenges of the 21st century, such as climatic changes, modifications of diets, human population growth, and the increased demand for biofuels. Physical maps are essential for high-quality sequence assembly regardless of the sequencing strategy used [such as bacterial artificial chromosome (BAC)–by–BAC or whole-genome shotgun strategies], and they will remain pivotal for de novo sequencing even with the advent of short-read technologies (3). As the foundation for genome sequencing, physical maps have been established for a dozen plants species so far, including cereals such as maize, rice, and sorghum (46). Recently, the development of new genomic resources for analyzing wheat paved the way for physically mapping and ultimately sequencing a species for which this was unthinkable a few years ago.

A physical map with 10-fold coverage of the 17-Gb bread wheat genome would require more than 1.4 million BAC clones to be fingerprinted, assembled into contigs, and anchored to genetic maps. Although whole-genome BAC libraries are available and fingerprinting millions of BAC clones is technically feasible with high-information-content fingerprinting (HICF) (7), assembly to accurately depict individual chromosomes and the anchoring of homoeologous BAC contigs onto genetic maps remains daunting. To address these issues, we used a chromosome-based approach (8) to construct a physical map of the largest hexaploid wheat chromosome (3B) and, to compensate for the inherent limits of the wheat genome (the lack of recombination and polymorphism), we deployed a combination of genetic mapping strategies for anchoring the physical map.

Fingerprinting and contig assembly were performed with BAC clones originating from sorted 3B wheat chromosomes of Chinese Spring (9), the reference cultivar chosen for genome sequencing by the International Wheat Genome Sequencing Consortium because of its previous use for cytogenetic studies and the availability of a set of aneuploid lines (10). 67,968 3B BAC clones were fingerprinted with a modified (11) HICF SNaPshot protocol (7), and a total of 56,952 high-quality fingerprints (84%) was obtained. A first automated assembly (11) resulted in a final build of 1991 contigs with an average size of 482 kb for a total length of 960 Mb (table S2). One hundred ninety-seven contigs were larger than 1 Mb; the largest was 3852 kb in size. A minimal tiling path (MTP) consisting of 7440 overlapping BAC clones was defined for further analyses. After the preliminary automated assembly, contigs were merged manually (11), resulting in a final assembly of 1036 contigs with an average size of 783 kb (table S2) covering 811 Mb (∼82%) of the estimated 995 Mb (12) constituting chromosome 3B.

Contig assembly was validated through BAC library screening with markers derived from BAC-end sequences (BESs) (13) and by genetic mapping (11). Out of 421 markers derived from BESs, 369 (88%) correctly identified the BAC clones belonging to computationally identified contigs. Conversely, 35 markers originating from the same contigs mapped to the same genetic locus. The wheat genome has a high content of long terminal repeat retrotransposons (> 67%) (13) and a large number of tandemly repeated sequences, which may result in the misassembly of BAC clones. A statistical analysis (11) indicated that less than 10% of the fragments were randomly shared between any two non-overlapping BAC clones. We calculated that approximately 58% overlap between fingerprints was required for automated assembly (fig. S1), and therefore with the high stringency (Sulston score = le = 75) used for the assembly (11), the repeated sequences did not affect the quality of the contig build.

The physical map was anchored with 685 microsatellites [simple sequence repeats (SSRs)], some of which were designed from the BESs, as well as expressed sequence tag (EST) markers (table S3) previously mapped to chromosome 3B (14) and identified on the basis of their synteny with rice chromosome 1. In total, 291 SSR markers were anchored to 203 contigs representing 219 Mb, and 394 ESTs were anchored to 250 individual contigs representing 283 Mb (Fig. 1 and table S3). Four hundred seventy-two additional contigs representing 452 Mb were then anchored with 711 insertion site–based polymorphism (ISBP) markers derived from 19,400 BESs (13) (Fig. 1 and table S3). We also tested the multiplexed and genomewide Diversity Arrays Technology (DArT) (15) by hybridizing a wheat array composed of 5000 DArT markers with three-dimensional pools of the MTP (11). Thirty-five DArTs were unambiguously assigned and anchored to 25 individual contigs (19 Mb) (table S3).

Fig. 1.

Relative contribution of ISBP, EST, and SSR markers for anchoring the 3B physical map. Markers found in BAC contigs are indicated in parentheses after the marker type. The numbers of contigs anchored, as well as their physical size (in brackets), are provided for each marker type. The Venn diagram illustrates the relative contribution of each marker type, with the number and total size of contigs anchored by one or more types of markers.

In total, 1443 molecular markers were linked to 680 BAC contigs representing 611 Mb and 75% of the 3B physical map. The longer contigs were anchored more easily (fig. S2), and more than 80% of contigs longer than 900 kb were anchored. Very few contigs (50, <80 Mb) were anchored with all marker types, and most (463 out of 680) were anchored by a single marker type (Fig. 1). This indicates that the different classes of markers cover different regions of the genome and should be used in combination to ensure optimal representation of the chromosome.

The anchored physical map can expedite map-based cloning, but its full value is achieved by determining the relative contig order along the chromosome and producing an integrated physical map. To integrate the 3B physical map, we used deletion mapping [in which the absence of two genetic sites in the same deletion line (that is, one of many lines containing small deletions in specific sites along the chromosome) is a measure of the maximal distance between them] and meiotic mapping (in which the relative order of markers is determined on the basis of recombination). This combined approach was necessary because in wheat, the resolution of meiotic mapping is limited by both the nonhomogeneous distribution of recombination events along the chromosome arms (for example, on chromosome 3B, 42% of the physical map length is represented by only 2.2% of the genetic map length in the centromeric regions) and the low level of polymorphism in the cultivated pool.

Deletion mapping resulted in the integration of 599 contigs (556 Mb, ∼56% of the chromosome) in 16 physical intervals along chromosome 3B (table S3). To assess whether our physical map accurately depicts 3B, we systematically compared the total size of the contigs present in the 16 intervals defined by genetic deletions (so-called deletion bins) with the size of the genetic deletion as estimated cytologically from the chromosomes in the corresponding deletion line stocks (16) (table S3). Coverage within the bins ranged from 33 to 99%, with an average of 56% (fig. S3). The most terminal bin on the short arm (bin 3BS3-0.87-1.00) was only 10% covered and had a smaller average contig size (630 kb) than other bins (957 kb), suggesting that the telomeric and heterochromatic region of the short arm may be underrepresented in our BAC library.

We tested high-resolution radiation hybrid (RH) mapping, which relies on lines carrying specific radiation-induced chromosomal fragments in nonhomoeologous backgrounds (17) and measures the distance between genetic sites as the frequency with which they remain together after fragmentation. A panel of 184 RH lines developed for chromosome 3B (11) was tested with 65 ISBP markers (table S5) and indicated a resolution of about 263 kb per break. In addition, with a limited set of critical RH lines, we were able to order 35 loci (32 contigs) previously assigned to 3BL7-0.63-1.00 (table S5). We are in the process of increasing the number of RH lines for mapping all the contigs along chromosome 3B.

Further integration of the 3B physical map was achieved through meiotic mapping with a reference genetic map developed from an F2 population (CsRe) derived from a cross between Chinese Spring (Cs) and the French cultivar Renan (Re). Because the physical contigs originate from Chinese Spring, anchoring them to the CsRe genetic map guarantees high accuracy in ordering. Because BAC libraries are available for both cultivars, this physical map also provides an efficient platform for single-nucleotide polymorphism discovery. To date, 102 SSR and ISBP markers have been mapped with 376 F2 individuals of the CsRe population (11). Eighty-nine of them were anchored to contigs, permitting the integration of 75 individual contigs (77 Mb) to the genetic map, of which 80% (60/75) were ordered. Using the same criteria as the IBM Neighbors map of maize (18), we also established a 3B neighbor map (11) containing 636 SSR, restriction fragment length polymorphism, sequence tagged site (STS), DArT, and ISBP markers (table S6). In total, 213 contigs were anchored on this map, providing 225 Mb of sequence information for map-based cloning. A Gbrowse interface displaying the integrated chromosome 3B physical map is available at http://urgi.versailles.inra.fr/projects/Triticum/eng/index.php.

We also aligned the wheat 3B physical map against the rice genome using the genic sequences (ESTs/STSs) present on the 75 contigs integrated to the CsRe genetic map (11). Twenty-seven contigs carried 49 ESTs/STSs with homology to 56 orthologous rice genes, including 14 contigs located in the terminal part of the short arm of chromosome 3B, which is collinear with rice chromosome 1S (table S7). In this region, we identified four inversions as well as noncollinear genes (Fig. 2). This confirms, with a higher degree of resolution, rearrangements observed by deletion mapping between the two most conserved wheat and rice chromosomes (19) and suggests that many local rearrangements have occurred in globally collinear regions since the divergence of wheat and rice more than 50 million years ago. This result also indicates that predicting gene order from sequenced genomes that are not closely related so as to order the physical maps of other genomes requires great caution.

Fig. 2.

Integrated physical map at the telomeric end of chromosome 3BS and colinearity with rice. (A) Genetic map of wheat chromosome 3B. The blue and yellow sectors represent the two distal deletion bins 3BS3-0.87-1.00 and 3BS8-0.78-0.87, respectively. (B) BAC contigs integrated to the genetic map. The names and sizes of the 27 contigs are displayed in gray boxes, the sizes of which reflect the relative contig sizes. (C) Rice chromosome 1. Each line represents the relationship between an EST located in a wheat BAC contig and the orthologous rice gene. The red sectors indicate rearrangements between the two chromosomes. The four red asterisks designate wheat ESTs for which rice orthologs were found on noncollinear regions or chromosomes other than chromosome 1. The vertical dashed bracket represents a region containing more than 30 kinase-related genes in rice.

To date, less than a dozen wheat genes have been isolated through map-based cloning. About 40 genes and quantitative trait loci (QTLs) have been identified on chromosome 3B (http://wheat.pw.usda.gov/GG2/maps.shtml) and none have been cloned. Seventeen contigs representing 16.8 Mb are anchored with markers flanking or cosegregating with 16 of these genes and QTLs on our physical map (table S8). With an average contig size of 783 kb, the chromosome 3B physical map allows one to land on any target locus, in a single step, provided that recombination is compatible with fine mapping of the target gene. This has been carried out for the stem rust resistance gene Sr2 (20) and the QTL Fhb1 conferring resistance to Fusarium head blight (21) (table S8). The 3B physical map also provides the foundation for sequencing and in-depth studies of the wheat genome composition and organization. Sequencing and annotation are under way for 13 BAC contigs of 800 kb to 3.2 Mb, originating from different regions of chromosome 3B. These large sequenced regions will also aid in SSR and ISBP marker development. Finally, a 15-fold coverage physical map is under way for future chromosome 3B sequencing, which is envisaged for the near future.

By establishing a physical map of the largest wheat chromosome, we demonstrate that the chromosome-based approach is feasible and suitable for the construction of the hexaploid wheat genome physical map. With this physical map, the structure of agronomically important target loci can be defined in a single step and marker development can be accelerated (22), opening up new possibilities for accessing regions important for yield, disease resistance, and trait improvement in wheat. An international collaborative effort is under way now to exploit this new resource and, using the same strategy, projects have begun for the 20 remaining chromosomes (table S1; www.ueb.cas.cz/Olomouc1/LMCC/Resources/resources.html#chrs and www.wheatgenome.org/projects.php), opening the way to future sequencing of the wheat genome.

Finally, although chromosome sorting has so far been applied only to a few other cereals [barley, durum wheat, and rye (8)], important legumes (pea and bean), and trees (Norway spruce) (23), this work exemplifies its broader potential for other complex nonmodel genomes heretofore considered impossible to sequence despite their socioeconomic importance. Until now, the selection of genomes for sequencing has been determined on the basis of genome simplicity and not agronomic relevance, with serious consequences for crop improvement and food security [for example, by neglecting wheat or choosing the diploid of cotton, Gossypium raimondii, to sequence first rather than focusing on the economically important tetraploid G. hirsutum (24)]. Our work may pave the way for a major change in how the next genomes for de novo sequencing are selected, thereby accelerating improvements in economically important crop species.

Supporting Online Material

www.sciencemag.org/cgi/content/full/322/5898/101/DC1

Materials and Methods

Figs. S1 to S3

Tables S1 to S8

References

References and Notes

View Abstract

Stay Connected to Science

Navigate This Article