Wheat genome deciphered, assembled, and ordered. Seeds, or grains, are what counts with respect to wheat yields (left panel), but all parts of the plant contribute to crop performance. With complete access to the ordered sequence of all 21 wheat chromosomes, the context of regulatory sequences, and the interaction network of expressed genes—all shown here as a circular plot (right panel) with concentric tracks for diverse aspects of wheat genome composition—breeders and researchers now have the ability to rewrite the story of wheat crop improvement. Details on value ranges underlying the concentric heatmaps of the right panel are provided in the full article online.
Fig. 1 Structural, functional, and conserved synteny landscape of the 21 wheat chromosomes. (A) Circular diagram showing genomic features of wheat. The tracks toward the center of the circle display (a) chromosome name and size (100-Mb tick size; light gray bar indicates the short arm and dark gray indicates the long arm of the chromosome); (b) dimension of chromosomal segments R1, R2a, C, R2b, and R3 [(18) and table S29]; (c) K-mer 20-frequencies distribution; (d) LTR-retrotransposons density; (e) pseudogenes density (0 to 130 genes per Mb); (f) density of HC gene models (0 to 32 genes per Mb); (g) density of recombination rate; and (h) SNP density. Connecting lines in the center of the diagram highlight homeologous relationships of chromosomes (blue lines) and translocated regions (green lines). (B) Distribution of Pfam domain PF08284 “retroviral aspartyl protease” signatures across the different wheat chromosomes. (C) Positioning of the centromere in the 2D pseudomolecule. Top panel shows density of CENH3 ChIP-seq data along the wheat chromosome. Bottom panel shows distribution and proportion of the total pseudomolecule sequence composed of TEs of the Cereba and Quinta families. The bar below the bottom panel indicates pseudomolecule scaffolds assigned to the short (black) or long (blue) arm on the basis of CSS data (6) mapping. (D) Dot-plot visualization of collinearity between homeologous chromosomes 3A and 3B in relation to distribution of gene density and recombination frequency (left and bottom panel boxes: blue and purple lines, respectively). Chromosomal zones R1, R2a, C, R2b, and R3 are colored as in (A). cM, centimorgan.
Fig. 2 Evaluation of automated gene annotation. (A) Selected gene prediction statistics of IWGSC RefSeq Annotation v1.1, including number and subgenome distribution of HC and LC genes as well as pseudogenes. (B) BUSCO v3 gene model evaluation comparing IWGSC RefSeq Annotation v1.1 to earlier published bread wheat whole-genome annotations, as well as to annotations of related grass reference-genome sequences. BUSCO provides a measure for the recall of highly conserved gene models.
Fig. 3 Wheat atlas of transcription. (A) Schematic illustration of a mature wheat plant and high-level tissue definitions for “roots,” “leaves,” “spike,” and “grain” used in the further analysis. (B) Principal component (PC) analysis plots for similarity of overall transcription, with samples colored according to their high-level tissue of origin [as introduced in (A)]. The color key for tissue is shown at the bottom of the figure under (C). (C) Chromosomal distribution of the average expression breadth [number of tissues in which genes are expressed (total number of tissues, n = 32)]. The average (dark orange line) is calculated on the basis of a scaled position of each gene within the corresponding genomic compartment (blue, aqua, and light yellow background) across the 21 chromosomes (orange lines). (D) Heatmap illustrating the expression of a representative gene (eigengene) for the 38 coexpression modules defined by WGCNA. Modules are represented as columns, with the dendrogram illustrating eigengene relatedness. Each row represents one sample. Colored bars to the left indicate the high-level tissue of origin; the color key is shown at the bottom of the figure under (C). DESeq2-normalized expression levels are shown. Modules 1 and 5 (light green boxes) were most correlated with high-level leaf tissue, whereas modules 8 and 11 (dark green boxes) were most correlated with spike. (E) Bar plot of module assignment (same, near, or distant) of homeologous triads and duplets in the WGCNA network. (F) Simplified flowering pathway in polyploid wheat. Genes are colored according to their assignment to leaf (light green)– or spike (dark green)–correlated modules. (G) Excerpt from phylogenetic tree for MADS transcription factors, including known Arabidopsis flowering regulators SEP1, SEP2, and SEP4 (black) (for the full phylogenetic tree, see fig. S38). Green branches represent wheat orthologs of modules 8 and 11, whereas purple branches are wheat orthologs assigned to other modules (0 and 2). Gray branches indicate non-wheat genes.
Fig. 4 Gene families of wheat. (A) Heatmap of expanded and contracted gene families. Columns correspond to the individual gene families. Rows in the top panel illustrate the sets of gene-family expansions (++, red) and contractions (––, blue) found for the wheat A lineage (Triticum urartu and A subgenome); the D lineage (Aegilops tauschii and D subgenome); the A, B, or D subgenomes; or bread wheat (expanded and contracted in all subgenomes). In the latter four categories, expansions and contractions do not imply bread wheat–specific gene copy number variations. Similar dynamics might have remained unobserved in T. urartu or A. tauschii owing to the inherent limitations of the used draft genome assemblies (53, 54). Rows in the bottom panel heatmap (color scheme on z-score scale) indicate the fold expansion and contraction of gene families for the taxa and species included in the analysis [Oryza sativa (Osat), Sorghum bicolor (Sbic), Zea mays (Zmay), Brachypodium distachyon (Bdis), Hordeum vulgare (Hvul1/2), Secale cereale (Scer), A. tauschii (Aetau), T. urartu (Tura), and wheat A (TraesA), B (TraesB), and D (TraesD) subgenomes]. (B) All enriched TO terms for the gene families depicted in (A). Overrepresented TO terms were found for expanded families in bread wheat (all subgenomes, red), the B subgenome (green), and the A lineage (T. urartu and A subgenome, blue) only, respectively. The x axis represents the percentage of genes annotated with the respective TO term that were contained in the gene set in question. The size of the bubbles corresponds to the P (−log10) significance of expansion. (C) Genomic distribution of gene families associated with adaptation to biotic (light and dark blue) or abiotic stress (light and dark pink), RNA metabolism in organelles and male fertility (orange), or end-use quality (light, medium, and dark green). Known positions of agronomically important genes and loci are indicated by red arrows and arrowheads to the left of the chromosome bars. Recombination rates are displayed as heatmaps in the chromosome bars [7.2 cM/Mb (light green) to 0 cM/Mb (black)].
Fig. 5 IWGSC RefSeq v1.0–guided dissection of SSt1 and TaAGL33. (A) The Lillian-Vesper population genetic map was anchored to IWGSC RefSeq v1.0 (left), and differentially expressed genes were identified between solid- and hollow-stemmed lines of hexaploid (bread) and tetraploid (durum) wheat (right). (B) Cross-sectioned stems of Lillian (solid) and Vesper (hollow) are shown as a phenotypic reference (top). Increased copy number of TraesCS3B01G608800 [annotated as a DOF (DNA-binding one-zinc finger) transcription factor] is associated with stem phenotypic variation (bottom). (C) A high-throughput SNP marker tightly linked to TraesCS3B01G608800 reliably discriminates solid- from hollow-stemmed wheat lines. Relative intensity of the fluorophores (FAM and HEX) used in KASPar analysis are shown. Vertical axis shows FAM signal; horizontal axis shows HEX signal. (D) Schematic of the three TaAGL33 proteins, showing the typical MADS, I, K, and C domains. Triangles indicate the position of the five introns that occur in all three homeologs. Bars indicate the position of single-guide RNAs designed for exons 2 and 3. Three T-DNA vectors—each containing the bar selectable marker gene, CRISPR nuclease, and one of three single-guide RNA sequences—were used for Agrobacterium-mediated wheat transformation, essentially as described earlier (55). Transgenic plants were obtained with edits at the targeted positions in all TaAGL33 homeologs. The putatively resulting protein sequence is displayed starting close to the edits, with wild-type amino acids (aa) in black font and amino acids resulting from the induced frame shifts in red font. * indicates premature termination codons. (E) Mean days to flowering (after 8 weeks of vernalization) for progeny of four homozygous edited plants (light gray bars) and the respective homozygous wild-type segregants (dark gray bars). Numbers in parentheses refer to the number of edited and wild-type plants examined, respectively. Error bars display SEM. Growth conditions were as described in (50).
- Table 1 Assembly statistics of IWGSC RefSeq v1.0.
Assembly characteristics Values Assembly size 14.5 Gb Number of scaffolds 138,665 Size of assembly in scaffolds ≥ 100 kb 14.2 Gb Number of scaffolds ≥ 100 kb 4,443 N50 contig length 51.8 kb Contig L50 number 81,427 N90 contig length 11.7 kb Contig L90 number 294,934 Largest contig 580.5 kb Ns in contigs 0 N50 scaffold length 7.0 Mb Scaffold L50 number 571 N90 scaffold length 1.2 Mb Scaffold L90 number 2,390 Largest scaffold 45.8 Mb Ns in scaffolds 261.9 Mb Gaps filled with BAC sequences 183 (1.7 Mb) Average size of inserted BAC sequence 9.5 kb N50 superscaffold length 22.8 Mb Superscaffold L50 number 166 N90 superscaffold length 4.1 Mb Superscaffold L90 number 718 Largest superscaffold 165.9 Mb Sequence assigned to chromosomes 14.1 Gb (96.8%) Sequence ≥ 100 kb assigned to chromosomes 14.1 Gb (99.1%) Number of superscaffolds on chromosomes 1,601 Number of oriented superscaffolds 1,243 Length of oriented sequence 13.8 Gb (95%) Length of oriented sequence ≥ 100 kb 13.8 Gb (97.3%) Smallest number of superscaffolds per subgenome chromosome 35 (7A), 68 (2B), 36 (1D) Largest number of superscaffolds per subgenome chromosome 111 (4A), 176 (3B), 90 (3D) Average number of superscaffolds per chromosome 76 - Table 2 Relative proportions of the major elements of the wheat genome.
Proportions of TEs are given as the percentage of sequences assigned to each superfamily relative to genome size. Abbreviations in parentheses under the headings “Class 1” and “Class 2” indicate transposon types.
Major elements Wheat subgenome AA BB DD Total Assembled sequence assigned to chromosomes (Gb) 4.935 5.180 3.951 14.066 Size of TE-related sequences (Gb) 4.240 4.388 3.285 11.913 TEs (%) 85.9 84.7 83.1 84.7 Class 1 LTR-retrotransposons Gypsy (RLG) 50.8 46.8 41.4 46.7 Copia (RLC) 17.4 16.2 16.3 16.7 Unclassified LTR-retrotransposons (RLX) 2.6 3.5 3.7 3.2 Non-LTR-retrotransposons Long interspersed nuclear elements (RIX) 0.81 0.96 0.93 0.90 Short interspersed nuclear elements (SIX) 0.01 0.01 0.01 0.01 Class 2 DNA transposons CACTA (DTC) 12.8 15.5 19.0 15.5 Mutator (DTM) 0.30 0.38 0.48 0.38 Unclassified with terminal inverted repeats 0.21 0.20 0.22 0.21 Harbinger (DTH) 0.15 0.16 0.18 0.16 Mariner (DTT) 0.14 0.16 0.17 0.16 Unclassified class 2 0.05 0.08 0.05 0.06 hAT (DTA) 0.01 0.01 0.01 0.01 Helitrons (DHH) 0.0046 0.0044 0.0036 0.0042 Unclassified repeats 0.55 0.85 0.63 0.68 Coding DNA 0.89 0.89 1.11 0.95 Unannotated DNA 13.2 14.4 15.7 14.4 (Pre)-microRNAs 0.039 0.057 0.046 0.047 tRNAs 0.0056 0.0050 0.0068 0.0057 - Table 3 Groups of homeologous genes in wheat.
Homeologous genes are “subgenome orthologs” and were inferred by species tree reconciliation in the respective gene family. Numbers include both HC and LC genes filtered for TEs (filtered gene set). Conserved subgenome-specific (orphan) genes are found only in one subgenome but have homologs in other plant genomes used in this study. This includes orphan outparalogs resulting from ancestral duplication events and conserved only in one of the subgenomes. Nonconserved orphans are either singletons or duplicated in the respective subgenome, but neither have obvious homologs in the other subgenomes or the other plant genomes studied. Microsynteny is defined as the conservation and collinearity of local gene ordering between orthologous chromosomal regions. Macrosynteny is defined as the conservation of chromosomal location and identity of genetic markers like homeologs but may include the occurrence of local inversions, insertions, or deletions. Additional data are presented in table S24.
Homeologous group (A:B:D) Number in wheat
genomeComposition of
groups (%)Number of
genes in ANumber of
genes in BNumber of
genes in DTotal number of
genes1:1:1 21,603 55.1 21,603 21,603 21,603 64,809 1:1:N 644 1.6 644 644 1,482 2,770 1:N:1 998 2.5 998 2,396 998 4,392 N:1:1 761 1.9 1,752 761 761 3,274 1:1:0 3,708 9.5 3,708 3,708 0 7,416 1:0:1 4,057 10.3 4,057 0 4,057 8,114 0:1:1 4,197 10.7 0 4,197 4,197 8,394 Other ratios 3,270 8.3 4,999 5,371 4,114 14,484 1:1:1 in microsynteny 18,595 47.4 18,595 18,595 18,595 55,785 Total in microsynteny 30,339 77.3 27,240 27,063 28,005 82,308 1:1:1 in macrosynteny 19,701 50.2 19,701 19,701 19,701 59,103 Total in macrosynteny 32,591 83.1 29,064 30,615 30,553 90,232 Total in homeologous groups 39,238 100.0 37,761 38,680 37,212 113,653 Conserved subgenome orphans 12,412 12,987 10,844 36,243 Nonconserved subgenome singletons 10,084 12,185 8,679 30,948 Nonconserved subgenome duplicated orphans 71 83 38 192 Total (filtered) 60,328 63,935 56,773 181,036
Supplementary Materials
www.sciencemag.org/content/361/6403/eaar7191/suppl/DC1
Materials and Methods
Figs. S1 to S59
Tables S1 to S43
References (56–186)
Databases S1 to S5
Additional Files
- Shifting the limits in wheat research and breeding using a fully annotated reference genome
International Wheat Genome Sequencing Consortium (IWGSC)
Materials/Methods, Supplementary Text, Tables, Figures, and/or References
- Materials and Methods
- Figs. S1 to S59
- Tables S1 to S43
- Captions for Databases S1 to S5
- References
- Metadata of 850 RNAseq samples used in the study
- SlimGO ubiquitous and Tissue-exclusive genes.
- GO terms of the WGCNA850 analysis.
- WGCNA Module Assignment.
- Module 8 and 11 TF Arabidopsis and rice orthologs.