Wild emmer genome architecture and diversity elucidate wheat evolution and domestication

See allHide authors and affiliations

Science  07 Jul 2017:
Vol. 357, Issue 6346, pp. 93-97
DOI: 10.1126/science.aan0032

Genomics and domestication of wheat

Modern wheat, which underlies the diet of many across the globe, has a long history of selection and crosses among different species. Avni et al. used the Hi-C method of genome confirmation capture to assemble and annotate the wild allotetraploid wheat (Triticum turgidum). They then identified the putative causal mutations in genes controlling shattering (a key domestication trait among cereal crops). They also performed an exome capture–based analysis of domestication among wild and domesticated genotypes of emmer wheat. The findings present a compelling overview of the emmer wheat genome and its usefulness in an agricultural context for understanding traits in modern bread wheat.

Science, this issue p. 93


Wheat (Triticum spp.) is one of the founder crops that likely drove the Neolithic transition to sedentary agrarian societies in the Fertile Crescent more than 10,000 years ago. Identifying genetic modifications underlying wheat’s domestication requires knowledge about the genome of its allo-tetraploid progenitor, wild emmer (T. turgidum ssp. dicoccoides). We report a 10.1-gigabase assembly of the 14 chromosomes of wild tetraploid wheat, as well as analyses of gene content, genome architecture, and genetic diversity. With this fully assembled polyploid wheat genome, we identified the causal mutations in Brittle Rachis 1 (TtBtr1) genes controlling shattering, a key domestication trait. A study of genomic diversity among wild and domesticated accessions revealed genomic regions bearing the signature of selection under domestication. This reference assembly will serve as a resource for accelerating the genome-assisted improvement of modern wheat varieties.

Wheat (Triticum spp.) is one of the Neolithic founder crop plants, domesticated ~10,000 years ago in the Near-Eastern Fertile Crescent (1). The initial domestication of the allo-tetraploid wild emmer wheat [WEW; T. turgidum ssp. dicoccoides (Körn.) Thell.; genome BBAA] and the subsequent evolution of hulled domesticated emmer wheat (DEW; T. turgidum ssp. dicoccum Schrank) led to the selection of free-threshing durum wheat [T. turgidum ssp. durum (Desf.) MacKey] (2). Subsequently, hexaploid bread wheat (T. aestivum L., genome BBAADD) arose from the hybridization of domesticated emmer with the diploid Aegilops tauschii Coss. (genome DD) (3), indicating that WEW was the direct progenitor of all economically important domesticated wheats.

Crop domestication is generally marked by rapid modification of key traits (i.e., the “domestication syndrome”) followed by subsequent evolution (4) introducing further, incremental changes in morpho-physiological traits (57). The domestication of WEW involved traits related primarily to seed dormancy, spike morphology, and grain development. For example, whereas the spikes of WEW shatter at maturity [trait: brittle rachis (BR)], all domesticated wheat spikes remain intact (i.e., “non-BR spikes”), enabling easier harvest. Characterizing the genetic mechanisms underlying wheat domestication and improvement may provide insights into basic wheat biology and human cultural evolution (8) and offer opportunities for further improvement [e.g., (9)]. To date, few genes involved in wild emmer domestication have been discovered, mainly due to lack of a comprehensive reference genome. The wheat reference genome has been difficult to obtain owing to its polyploid nature, large size (estimated ~12 Gb for tetraploid wheat and ~17 Gb for hexaploid wheat), and high repeat content (~80%) (10). Although the gene space of wheat has been described, most intergenic regions remain fragmented and unassigned to chromosomal locations [e.g., (10)].

Wild emmer accession “Zavitan” was chosen for this genome assembly to leverage the genetic data already collected for this line (7, 11). The WEW reference genome, constructed by whole-genome shotgun (WGS) sequencing of various insert-size libraries (12), produced contigs with an N50 of 57,378 base pairs (bp) and scaffolds with an N50 of 6,955,166 bp (table S1) (13). The scaffolds were validated with genetic data and combined with three-dimensional (3D) chromosome conformation capture sequencing (HiC) data (14), enabling construction of chromosome-scale assemblies (pseudomolecules) (Fig. 1, A and B). The resulting 10.5-Gb genome assembly is composed of 14 pseudomolecule sequences representing the 14 chromosomes of WEW (10.1 Gb) and one group of unassigned scaffolds (0.4 Gb). The gaps between scaffolds, estimated to represent ~1.5 Gb of the genome (13), are likely the result of technically difficult-to-sequence or difficult-to-assemble regions (Fig. 1, C and D). A strength of this assembly strategy is its ability, despite these gaps, to construct pseudomolecules with correct scaffold orientations, thereby enabling whole-genome analyses in polyploid wheat with high physical resolution.

Fig. 1 Schematic of the assembly process of the 10.1-gigabase wild tetraploid wheat (WEW) genome, consisting of 14 chromosomes constructed from 2781 scaffolds assembled from 11,536,004,714 short sequencing reads (13).

(A) Example of three scaffolds anchored by molecular markers to the genetic map of chromosome 5B. The average length of these scaffolds is 7.2 Mb, reflecting the total assembly N50 of 6,955,166 bp. (B) The linear order of scaffolds in the WEW pseudomolecules was determined by interscaffold HiC links (red arches), while the intrascaffold links (black arches) validated scaffold accuracy. (C) Comparative analysis between the WEW assembly and two durum BAC sequences (JX454959 and KY996403) revealed >99% similarity, except in the six numbered loci: (1) A 325-bp insertion in durum. (2) A gap (208 Ns) in WEW associated with 49 bp, including 13 consecutive guanines in durum. (3) A 426-bp insertion in durum. (4) A 16,280-bp insertion in WEW. (5) A gap (184 Ns) in WEW associated with 32 bp, including 23 consecutive cytosines in durum. (6) A gap (100 Ns representing junction between scaffolds) in WEW associated with 12,010 bp in durum. (D) Alignment of WEW paired-end (yellow) and mate-pair (red) sequencing reads on durum bacterial artificial chromosomes (BACs) JX454959 and KY996403 shows a lack of representation in durum-specific sequence demonstrated with the 325-bp insertion (1), whereas overrepresentation is associated with highly repetitive sequences as in the scaffold junction (6), which was unresolved in the WEW assembly.

In addition to gene models from other grasses, RNA sequencing reads generated from 20 different combinations of WEW tissues and developmental stages were used to annotate protein-coding genes in the WEW assembly (13). We identified 65,012 high-confidence (HC) gene models (fig. S1 and tables S2 and S3), and validation with the BUSCO (15) gene set (table S4) indicated that the assembly captures 98.4% of the WEW gene complement.

The density of HC genes was up to 14-fold higher in the distal compared to the pericentromeric regions of chromosome arms, confirming the reported gene density gradient along the centromere-telomere axis (10) (Fig. 2, A and B). Of the 62,813 genes assigned to chromosomes, 30.4% were expressed under all 20 conditions (i.e., tissue types + time points), 48% were expressed in at least one condition but not under all conditions, and 21.6% genes were expressed at a low level or not at all. Both the mean expression level and expression breadth (i.e., the number of conditions under which gene expression was detected) per gene were higher for genes in the proximal regions of chromosome arms (Fig. 2, C and D, and fig. S2). The mean lower expression levels in the distal regions may be due to the higher proportion of condition-specific genes expressed at a low level in these chromosomal regions, whereas in the proximal regions, a greater proportion of highly expressed housekeeping genes can be found. The total numbers of genes expressed per subgenome were roughly equivalent (fig. S3A), and only a slight but consistent higher expression ratio of genes from the A subgenome was observed (a 5% increase, on average; fig. S3B). This supports previous findings of a lack of genome-wide transcriptional dominance of individual subgenomes in bread wheat (10).

Fig. 2 Structural, functional, and conserved synteny landscape of the 14 chromosomes of wild emmer wheat (WEW).

Tracks from outside to inside, with respective scales, are as follows: (A) Chromosome name and size (each tick mark is 100 Mb). (B) Density of high-confidence genes (HC; 0 to 14 genes per Mb). (C) Expression of HC genes [log(FPKM + 1)]; mRNA expression ranges from 0 to 2.5; mean value of all 20 conditions (i.e., tissue types + time points). (D) Expression density of HC genes (number of samples in which genes were expressed; from 0 to 20). (E) K-mer frequencies. (F) Percent identity between homeologous genes (90 to 100%). Central links connect homeologous genes between subgenomes. Color of links is blue between homeologous chromosomes and green in cases of large translocations. (G) Density of Ty1/copia-like insertions older than 1.2 Ma (insertions per Mb).

Homeology analysis between the two WEW subgenomes revealed distinct homeologous pairs for most genes (45,386, 72.3%), another 9123 (14.5%) consisting of potentially duplicated genes within one of the subgenomes, and an unclassified remainder (e.g., singleton genes) (Fig. 2E, table S5, and database S1). Expression of homeologous gene pairs revealed a strong positive correlation across all samples (fig. S4). Both homeologs were expressed at least once in 18,306 pairs (36,612 genes), but expression of 2188 pairs was not detected in any sample. A portion of the homeologous gene pairs was exclusively expressed in only one of the two subgenomes, with expression of only the A subgenome gene for 1084 pairs and of only the B subgenome gene for 1115 pairs. These genes were functionally enriched for protein phosphorylation and general metabolic processes (table S6), suggesting that subgenome-regulated gene expression may contribute to adaptation of wheat varieties.

Most (82.2%) of the WEW assembly was annotated as transposable elements (TEs), and overall TE content appears to be similar for the two subgenomes (Fig. 2F, fig. S5, and table S7). The majority (7.1 Gb) of TEs derive from long-terminal repeat–retrotransposons (LTR-RTs) that can be divided into three groups: Ty1/copia-like, Ty3/gypsy-like, and unclassified (table S7). Mutations within full-length (FL) LTR-RTs can serve as a clock to track their activity in the genome over evolutionary time (16). In WEW, the age distribution of all FL-LTR-RT insertions suggests the occurrence of a general amplification wave around 1.5 million years ago (Ma), with the overall age distribution patterns of Ty3/gypsy-like and unclassified LTR-RT insertions appearing similar in both the A and B subgenomes (fig. S6 and database S2). The amplification wave of Ty1/copia-like elements, however, occurred more recently, around 0.5 Ma, which corresponds to the estimated time of hybridization of the A and B subgenomes (10). The B subgenome also exhibits an older Ty1/copia-like maximum dating roughly 1.2 Ma, indicating a subgenome-specific amplification wave prior to hybridization (Fig. 2G).

To illustrate the potential of this assembly in the genetic dissection of agronomically important traits, we targeted the domestication trait of nonshattering spike using a population derived from a cross between Zavitan and domesticated durum wheat (cultivar “Svevo”) (11). This analysis revealed genomic regions regulating the BR phenotype, including two major loci on WEW chromosomes 3A and 3B (15.5 and 32.5 Mb, respectively) containing homologies to the Btr1 and Btr2 genes controlling BR in barley (17). Among a complex of gene duplication clusters (three or four copies for each gene in the A and B subgenomes), we identified the orthologous wheat genes (chromosome-3A: TtBtr1-A and TtBtr2-A; chromosome-3B: TtBtr1-B and TtBtr2-B) (Fig. 3, fig. S7, and table S8) (13).

Fig. 3 Genome-enabled insight into spike brittleness in wheat.

(A) The left panel shows a schematic representation of the wild-type Zavitan allele of TtBtr1-A above the loss-of-function domesticated allele of Svevo, carrying a 2-bp deletion (290 bases from the start codon). This deletion introduces a premature stop codon, thereby truncating the functional 196–amino acid wild-type protein to only 97 amino acids. The right panel shows an analogous illustration of the wild-type Zavitan allele of TtBtr1-B above the loss-of-function domesticated allele of Svevo, carrying a 4-kb insertion (539 bases from start codon), which results in a longer C-terminus protein sequence compared to the wild-type 196–amino acid protein. (B) The BR phenotypes of Zavitan (full BR), NIL-3A (intermediate BR), NIL-3B (intermediate BR), and Svevo (non-BR). Below each spike image is a paired SEM image of an abscission scar of an upper spikelet, confirming that the NILs have smooth scars similar to those of Zavitan. The white arrow points to the smooth abscission scar responsible for the BR phenotype in Zavitan, while in Svevo the spikelets must be torn from the nonshattering rachis, producing rough edges.

The domesticated (i.e., Svevo) TtBtr1-A and TtBtr1-B alleles carry mutations predicted to disrupt the structures of the encoded proteins and are likely loss-of-function alleles (Fig. 3A). Notably, no polymorphisms were detected between the coding regions of the Zavitan and Svevo TtBtr2-A or TtBtr2-B alleles, leading us to postulate that the combination of the mutations in the two TtBtr1 genes could be complementary to achieve the non-BR domesticated phenotype. We thus developed a pair of near-isogenic lines (NILs), each carrying one functional allele (TtBtr1-A or TtBtr1-B) in the background of Svevo. Both NILs exhibited an intermediate BR phenotype, in which the upper part of the spike was brittle and the lower part was nonbrittle. Scanning electron microscopy (SEM) confirmed a smooth abscission site typical of WEW spikelets in the scar tissues of spikelets from the upper rachises of both NILs (Fig. 3B), whereas the lower nonbrittle spikelets had rough abscission sites similar to those of Svevo.

Thus, these two homozygous recessive mutations in orthologs of the Btr1 gene (but not in TtBtr2) appear to be minimally required for transforming the BR morphology of WEW. Diversity analysis of 113 WEW, 85 DEW, and 9 durum accessions showed that all domesticated (DEW and durum) accessions carry the loss-of-function alleles for both TtBtr1 genes (Ttbtr1-A and Ttbtr1-B) (table S9). Relative to the fixation of the non-BR phenotype in barley, requiring only one mutation in either the Btr1 or Btr2 genes (17), we speculate that selection for the non-BR phenotype in wheat may have been more gradual, as it requires at least two homozygous recessive mutations. This is supported by the archaeological records, which suggest that spike indehiscence took several millennia to become established (18).

To detect additional regions of the WEW genome under domestication selection, we examined DNA variation by exome capture sequencing (19) in a set of wild and domesticated emmer wheat accessions (fig. S8 and table S10). The phylogenetic tree of these genotypes (Fig. 4A) shows a clear separation between the domesticated and wild wheat accessions. WEW accessions from Southern (Israel, Syria, and Lebanon) and Northern (Turkey) Levant clustered separately, whereas the DEW accessions were clustered to regions of the Indian Ocean, Mediterranean, Eastern Europe, and Caucasus. Turkish accessions were present in each of main DEW groups, which is consistent with the diffusion of domesticated accessions from a single site of origin (20). Notably, the closest domesticated accession to WEW was from Turkey, the putative location of wheat domestication (20).

Fig. 4 Genome-wide diversity analyses.

(A) Maximum likelihood tree of 34 wild (red) and 31 domesticated emmer (blue) accessions (table S10), from pairwise estimates of genetic distance made by using an exome capture SNP data set. Bootstrap values are indicated on the nodes. (B) Relationship between genome-wide, window-based estimates of the ratio of genetic diversity in domesticated (πD) and wild (πW) emmer populations, and the genetic differentiation FST between them. The critical values of πD / πW and FST for detecting outliers are indicated by vertical and horizontal dashed lines, respectively. The specific values of πD / πW and FST are highlighted for the regions harboring TtBtr1-A (red) and TtBtr1-B (blue) genes. (C) The enrichment for nonsynonymous SNPs in the regions detected by three different diversity scans. The ratio of nonsynonymous to synonymous SNPs (Nonsyn/Syn) was estimated in the outliers of the πD / πW, FST, and Tajima’s D diversity scans. The statistical significance was assessed by performing 1000 permutations and estimating the Nonsyn/Syn ratio (gray dots) in the randomly selected genomic windows. The critical values corresponded to the 95th percentile of permuted data (black line). The actual Nonsyn/Syn ratio estimated in the regions identified in the diversity scans are shown by red lines.

Overall, only minor loss of genetic diversity is seen among DEW genotypes (mean nucleotide diversity in domesticated emmer, πD = 1.1 × 10−3) as compared to their wild ancestors (mean nucleotide diversity in wild emmer, πW = 1.3 × 10−3), consistent with observations in maize (21) and barley (22) (database S3). Similar to maize (23), the transition from wild to the domesticated form of tetraploid wheat was also accompanied by a shift in allele frequencies toward more common alleles, likely due to recent population bottlenecks (fig. S9). We investigated the patterns of genetic diversity (πDW diversity ratio), site frequency spectra (Tajima’s D), and genetic differentiation (FST) across the genome (fig. S10). Consistent with the expected effects of selection on genetic diversity patterns, the regions harboring TtBtr1-A and TtBtr1-B genes were outliers in at least one of the three diversity scans (database S4). The TtBtr1-A region also showed a low Tajima’s D value (–0.82; <5th percentile) within the population of DEW (Fig. 4B), a result often associated with a domestication bottleneck (24).

Depending on the diversity scan, between 32 and 154 genomic regions, spanning 0.6% (68 Mb) to 3.1% (373 Mb) of the WEW genome, emerge as regions potentially affected by selection. Regions of the wheat genome detected in all three selection scans were significantly enriched (>95th percentile) for nonsynonymous single-nucleotide polymorphisms (SNPs) (Fig. 4C), indicating that domestication preferentially enriched variants with possible functional effects in coding regions. A Gene Ontology (GO) enrichment analysis of the genes in the regions identified in the diversity scans (table S11) included genes involved in response to auxin stimulus (GO: 0009733), consistent with the evidence of strong selective sweeps on auxin-responsive genes in both maize and rice (25). Thus, as with the non-BR phenotype in barley and wheat, convergent evolution has likely played a role in crop domestication.

In the face of global challenges, agricultural research and plant breeding will be essential to increase crop yields. The availability of enhanced genomic resources like the WEW assembly, capturing both genic and intergenic regions, will underpin and accelerate global efforts in gene discovery, functional characterization, and breeding.

Supplementary Materials

Materials and Methods

Figs. S1 to S14

Tables S1 to S17

References (2658)

Databases S1 to S7

References and Notes

  1. See supplementary materials.
  2. Acknowledgments: Supported by Israel Science Foundation (grants 999/12, 1824/12, 322/15), the German-Israeli Foundation for Scientific Research and Development (GIF) grant I-1212-315.13, Integrated DNA Technologies, Inc. (IDT), the United States–Israel Binational Science Foundation (BSF grants 2013396 and 2015409), USDA NIFA (National Institute of Food and Agriculture) (grant 2016-67013-24473), German Ministry of Education and Research (grants 0314000, 0315954, 031A536), Genome Canada and Genome Prairie, Saskashewan Ministry of Agriculture and Western Grains Research Foundation (CTAG2 project), U.S. Agency for International Development Middle East Research and Cooperation (grant M34-037), Italian Ministry of Education and Research Flagship InterOmics (PB05), and CREA-interomics projects. Additional support was funded from the Manna Center Program for Food Safety and Security in Tel Aviv University, Montana Plant Sciences endowment, the Lieberman-Okinow endowment at the University of Minnesota, and the Charles and Tillie Lubin Center for Plant Biotechnology at the Weizmann Institute. We thank O. Savin, I. Hamerman, A. Fiebig, and C. Wright. The WEW sequence (GenBank: LSYQ00000000) and all other WEW genomic resources can be accessed at

Stay Connected to Science

Navigate This Article