The sheep genome illuminates biology of the rumen and lipid metabolism

See allHide authors and affiliations

Science  06 Jun 2014:
Vol. 344, Issue 6188, pp. 1168-1173
DOI: 10.1126/science.1252806

A genome for ewe and ewe

Sheep-specific genetic changes underlie differences in lipid metabolism between sheep and other mammals, and may have contributed to the production of wool. Jiang et al. sequenced the genome of two Texel sheep, a breed that produces high-value meat, milk, and wool. The genome information will provide an important resource for livestock production and aid in the understanding of mammalian evolution.

Science, this issue p. 1168


Sheep (Ovis aries) are a major source of meat, milk, and fiber in the form of wool and represent a distinct class of animals that have a specialized digestive organ, the rumen, that carries out the initial digestion of plant material. We have developed and analyzed a high-quality reference sheep genome and transcriptomes from 40 different tissues. We identified highly expressed genes encoding keratin cross-linking proteins associated with rumen evolution. We also identified genes involved in lipid metabolism that had been amplified and/or had altered tissue expression patterns. This may be in response to changes in the barrier lipids of the skin, an interaction between lipid metabolism and wool synthesis, and an increased role of volatile fatty acids in ruminants compared with nonruminant animals.

Sheep and goats are thought to be the first domesticated livestock species and thus integral to animal husbandry. Sheep are ruminants, digesting plant material in a four-chambered stomach (1). The largest compartment is the rumen, which uses microbial flora to ferment the feed, facilitating the conversion of lignocellulose-rich plant materials, of low value in the human diet, to animal protein (2). The rumen is thought to have evolved about 35 to 40 million years ago (3), coinciding with the emergence of grasslands, in a cooler climate and atmosphere containing lower CO2 than today (4, 5). Ruminants are now the dominant terrestrial herbivores. The rumen microbial flora also generate volatile fatty acids (VFAs) (6), requiring specialized energy and lipid metabolism in ruminants, and produce the greenhouse gas methane, which may be relevant to climate change (7). Another feature of sheep is wool, which has a substantial proportion of its weight made up of lanolin, formed primarily from wax esters (8, 9). Thus, synthesis of wool may be linked to fatty acid metabolism. Given these unusual evolutionary traits, sheep provide an area for exploration of the genetic underpinnings of digestion and fatty acid metabolism.

We assembled the reference genome sequence of the sheep (Fig. 1) from two individuals of the Texel breed totaling ~150-fold sequence coverage (table S1) using linkage and radiation hybrid maps (tables S2 and S3) to order and orient the super-scaffolds (10). The final sheep genome assembly, Oar v3.1, has a contig N50 length of ~40 kb and a total assembled length of 2.61 Gb, with ~99% anchored onto the 26 autosomes and the X chromosome (table S4) (10). About 0.2% of each Texel’s genome are heterozygous loci—i.e., SNPs (single-nucleotide polymorphisms) (Fig. 1, C and D)—a quarter of which are heterozygous in both animals. Due to selection for a beneficial muscle hypertrophy phenotype in the Texel breed (11) (Online Mendelian Inheritance in Animals 001426-9940), both sheep share a long run of homozygosity spanning the MSTN gene (Fig. 1, C and D). The SNPs also enabled us to identify allele-specific gene expression (Fig. 1E) (see the supplementary text) (10).

Fig. 1 The genome of sheep.

(A) A total of 1097 segmental duplications (length >5 kb) in the two Texel sheep (10). (B) Sheep versus cattle chromosome break points (10); gaps in Oar v3.1 are red; probable misassemblies in UMD 3.1 are yellow; and probable true structural differences are blue. (C) Distribution of SNPs in the Texel ewe in 1-Mb nonoverlapping windows; range of values 1 to 4954. (D) Distribution of SNPs in the Texel ram in 1-Mb nonoverlapping windows; range of values 3 to 5676. (E) Distribution of mono-allelically expressed SNPs in 500-kb sliding windows, range of values 0 to 42 (10). The scale is in Mb. Positions of loci discussed in the text are indicated.

We identified segmental duplications in sheep (Fig. 1A and table S5) (10) and compared the genome assemblies of sheep, goat, and cattle (fig. S1), identifying 141 breakpoints between sheep and cattle (Fig. 1B and table S6) (see the supplementary text). We compared the sequences of sheep, goat, cattle, yak, pig, camel, horse, dog, mouse, opossum, and human proteins and identified 4850 single-copy orthologous genes from which we constructed a phylogenetic tree (Fig. 2 and fig. S2) (10). The separation of sheep from goats and the diversification of the bovids occurred contemporaneously with the expansion of the C4 grasses (which first fix CO2 into a four-carbon rather than a three-carbon compound) in the late Neogene (4). RNA-Seq transcriptome data were generated from 94 tissue samples (from 40 tissues), including 83 from four additional Texel individuals (table S7) (10). A protein clustering analysis among the 11 mammalian species identified 321 expanded subfamilies in the ruminant branch, of which 73 were ruminant specific (tables S8 and S9) (10). We identified sheep genes exhibiting changes in copy number (e.g., lysozyme C–related proteins, prolactin-related proteins, pregnancy-associated glycoproteins, RNASE1, ASIP, MOGAT2, and MOGAT3) and changes in tissue specificity of gene expression (e.g., MOGAT2, MOGAT3, and FABP9) (Figs. 1 and 2) (10).

Fig. 2 Sheep relative to human and livestock evolution.

A phylogenetic tree generated using single-copy orthologous genes (10). The origins (black) and amplifications (red) of genes discussed in the text are marked. The scale is in millions of years ago (Ma). Blue numbers on the nodes are the divergence time from present (Ma) and its confidence interval. Dates for major events in the evolution of grasses are from (4) and (5).

The mammalian epidermal development complex (EDC) region contains up to 70 genes encoding proteins involved in keratinized epidermal structure development, including the rumen, skin, and wool (12). The sheep EDC region (Figs. 1, 3) included several previously unidentified, or poorly annotated, genes in any mammalian genome (table S10). One such gene in the top 0.1% of all genes expressed in the rumen, but not expressed in the skin (Fig. 3A), is predicted to encode a large S100 fused-type protein (12). This protein has homology to trichohyalin (TCHH) (12), which is highly expressed in the skin, and we designated it Trichohyalin-like 2 (TCHHL2) (figs. S3 and S4). Expressed sequence tag (EST) data indicate that TCHHL2 is also expressed in the rumen of cattle (see the supplementary text). Although not previously annotated, syntenically conserved orthologous genes to TCHHL2 were detected in many mammalian genomes—including a marsupial, the Tasmanian devil, and a monotreme, the platypus (fig. S3)—but not in the chicken, suggesting that TCHHL2 may be specific to mammals. All TCHHL2 orthologs encode a protein containing up to 70 tandem copies of a highly variable 15 amino acid repeat, rich in arginine, glutamic acid, aspartic acid, and proline, which does not appear to be rumen specific (fig. S3). A short array of seven copies of the 15 amino acid repeat unit has been duplicated in the common ancestor of ruminants (Fig. 3B and fig. S4). S100 fused-type proteins are substrates for transglutaminase-mediated cross-linking to themselves and to other proteins, including keratins, during epidermal cornification and hair/wool development (12), suggesting that TCHHL2 may play a role in cross-linking the keratins at the rumen surface. TCHHL2 was expressed in other sheep epidermal tissues, but >1000-fold lower than in rumen; thus, it may have a similar, but less extensive, role in these tissues.

Fig. 3 EDC gene expression and sequence organization.

(A) Expression levels of the genes in the EDC region in selected sheep tissues. The abomasum is equivalent to the stomach of a nonruminant. Genes are ordered according to their genomic order, with pseudogenes omitted. Genes discussed in the text are highlighted. (B) Proposed evolution patterns of the core 15 amino acid repeat (boxed) of predicted TCHHL2 proteins. Percent identity between repeat units is within each species. (C) Maximum likelihood phylogenetic tree of PRD-SPRRII and SPRR2 proteins. For clarity, three representative PRD-SPRRII-like cattle sequences are shown. Bootstrap values ≥50% (500 replicates) are shown.

PRD-SPRRII (13) is also in the EDC region and the top 0.1% of genes expressed in the rumen, but not in any other tissue sampled, including skin (Fig. 3A). PRD-SPRRII is homologous to the SPRR2 gene family (12, 14) but encodes a protein with a distinctive proline- and histidine-rich sequence that disrupts the glutamine-rich amino-terminus present in SPRR2 proteins, potentially affecting its transglutamination sites (Fig. 3C) (15). We identified four additional genes related to PRD-SPRRII in the sheep EDC region, two of which were also highly expressed in the rumen but not in any other tissue sampled, including the skin (Fig. 3A), and eight related genes in the cattle EDC region also expressed in the rumen but none in nonruminants (Fig. 3C) (see the supplementary text). Thus, it appears that the ruminant-specific PRD-SPRRII family genes, resulting from the amplification and sequence divergence of an SPRR2 gene, have gained a new expression pattern, altered amino-terminal sequence, and modified function during rumen evolution. By analogy with SPRR2 (12, 16), the PRD-SPRRII family proteins are predicted to be major structural proteins and may function in the cornification of the keratin-rich surface of the rumen.

The primary role of the skin and wool, an important economic product of sheep, is to form a barrier between the organism and the external environment, reducing water and heat loss and pathogen entry (17). The sheep EDC gene, LOC101122906, which we designated LCE7A, represents a previously unrecognized subfamily of the late cornified envelope (LCE) genes (12, 18) (fig. S5). We identified LCE7A coding sequence in the genomes of most mammals, although it has not been previously annotated (fig. S5). LCE7A is expressed in sheep (Fig. 3A and table S10), cattle, and goat skin (see the supplementary text), but not in the rumen or any other tissue examined (Fig. 3A). In situ hybridization showed LCE7A expression in Merino wool follicles, including the inner root sheath (fig. S6). LCE7A contains the transglutaminase target site present in most LCE proteins (fig. S5) and is likely to be a substrate for transglutaminase-mediated cross-linking of proteins in the epidermis, inner root sheath, or wool (18). LCE7A also appears to be under positive selection in the sheep lineage, with a sheep versus cattle ratio of the number of nonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous site of 2.5 (P < 0.005), possibly reflecting an association of LCE7A with wool development.

Wool grease (lanolin), secreted from the sebaceous glands attached to the wool follicles, constitutes 10 to 25% of the wool weight (9). The wool follicles are located in the dermal layer between the surface keratinocytes, which synthesize surface lipids (19), and the subcutaneous adipose tissue. We identified the genes encoding lipid metabolic enzymes expressed in the skin (table S11) and positioned them on known and putative lipid metabolic pathways likely to be involved in the storage and mobilization of long-chain fatty acid components of the sebum and epidermal lipids (Fig. 4A). Unexpectedly, the skin transcriptome revealed high expression of MOGAT2 and MOGAT3, members of the acylglycerol O-acyltransferase (DGAT2/MOGAT) gene family that are involved in the synthesis of diacylglyceride (DAG) and triacylglyceride (TAG) from monoacylglyceride (MAG) (Fig. 4A). In humans, MOGAT3 is an essential and rate-limiting step for the absorption of dietary fat in the small intestine (20) and is an important liver enzyme (21). MOGAT2 and MOGAT3 are single-copy genes in almost all mammals with available data. However, in ruminants, both genes have undergone tandem gene expansions, indicative of evolving functionality. (Fig. 4B and figs. S7 and S8). MOGAT2 has more than five tandemly duplicated copies in sheep, with the first copy expressed in the duodenum and the last copy expressed in the skin and with no expression of any copy detected in the liver (fig. S7). Three nearly identical MOGAT3 copies were highly expressed in sheep skin and at a much lower level in white adipose and omentum (Fig. 4B). In contrast to humans, we detected no expression of functional MOGAT3 in sheep duodenum or liver. The skin had two MOGAT3 splice variants (fig. S9): The most common transcript encodes a predicted protein orthologous to the typical mammalian MOGAT3; the second contains a predicted alternate start codon and amino-terminal sequence that is missing the probable membrane anchor (fig. S10), predicted to be uncoupled from the endoplasmic reticulum membrane-bound TAG synthesis pathway (Fig. 4A).

Fig. 4 Proposed sheep-skin lipid metabolic pathways and the expression of the MOGAT3 region.

(A) Proposed sheep-skin lipid metabolic pathways. The proposed nonmembrane form of MOGAT3 is red; the membrane form of MOGAT2/3 is blue. The glycerol-3-phosphate and MOGAT pathways to DAG are in blue and red arrows, respectively. Dashed lines represent transport or diffusion of products. (B) Expression of eight MOGAT3 genes in sheep skin (Gansu fine wool); sheep omentum, white adipose, and liver (Texel); goat skin (Shanbei Cashmere); and goat liver (Yunling Black).

The presence of MOGAT2 and MOGAT3 in sheep skin indicates that there may be an alternative pathway for DAG synthesis, either from recycling MAG generated from the mobilization of TAG stored within a cell to generate fatty acids for incorporation into other products, or from external sources of MAG (Fig. 4A). The MOGAT pathway does not generate glycerol, which requires phosphorylation in the liver before reuse for TAG synthesis in the skin via the glycerol-3-phosphate (G3P) pathway (Fig. 4A), potentially increasing the efficiency of recycling of the glyerol backbone in sheep skin. The MOGAT pathway also bypasses 1-acyl-lysophosphatidic acid (LPA) and phosphatidic acid (PA) synthesis (Fig. 4A). Skin produces a lipase (LIPH) to cleave PA into 2-acyl LPA, which has a role in controlling hair-follicle development (22). LIPH mutations in several mammalian species result in wool-like hair due to changes in follicle shape (23). Thus, the MOGAT pathway in sheep skin may also reduce the coupling between TAG and PA synthesis, skin barrier lipid synthesis, and follicle development signaling, facilitating wool production.

In ruminants, the liver is primarily a gluconeogenic organ using propionate (a VFA) as the source of carbon. It contributes little to the synthesis of lipids, or to the uptake of lipids from circulation, and is inefficient at exporting TAG (24, 25). The apparent loss of MOGAT3 expression in the intestines and both MOGAT2 and MOGAT3 in the liver may reflect the greater importance of VFAs and the reduced importance of the liver in long-chain fatty acid metabolism in ruminants compared with nonruminants.

We identified major genomic signatures associated with interactions between diet, the digestive system, and metabolism in ruminants. These include two extensions of their biochemical capabilities that have been extensively exploited by humans: the production of wool by sheep and the evolution of an organ that houses a diverse community of microorganisms that enable efficient digestion of plants.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S33

Tables S1 to S27

References (26112)

References and Notes

  1. Material and methods are available as supplementary material on Science Online
  2. Acknowledgments: The International Sheep Genomics Sequencing Consortium is grateful to the following for funding support for the sheep genome sequencing project: one 973 Program (no. 2013CB835200) and one CAS Program (XDB13000000) to Kunming Institute of Zoology, China; BGI-Shenzhen (ZYC200903240077A and ZYC200903240078A); China National GeneBank-Shenzhen for support for the storage of samples and data; Inner Mongolia Agricultural University (30960246 and 31260538); The Roslin Institute, University of Edinburgh and Biotechnology and Biological Sciences Research Council, UK (BBSRC): BB/1025360/1, BB/I025328/1, Institute Strategic Programme, and National Capability Grants; The Roslin Foundation; The Scottish Government, UK; Department for Environment, Food and Rural Affairs/Higher Education Funding Council/Scottish Higher Education Funding Council Veterinary Training and Research Initiative, UK; USDA-ARS, USA, USDA-National Research Initiative Competitive Grants Program, USA (grant nos. 2008-03923 and 2009-03305); Wellcome Trust (grant nos. WT095908 and WT098051), BBSRC (grant nos. BB/I025506/1 and BB/I025360/1) and European Molecular Biology Laboratory; USDA-NRSP-8, USA; USDA-ARS grant 5348-32000-031-00D; Meat and Livestock Australia and Australian Wool Innovation Limited through SheepGENOMICS, Australia; Australian Government International Science Linkages Grant (CG090143), Australia; University of Sydney, Australia; CSIRO, Australia; AgResearch, NZ, Beef + Lamb NZ through Ovita, New Zealand; INRA and Agence Nationale de la Recherche project SheepSNPQTL, France; European Union through the Seventh Framework Programme Quantomics (KBBE222664) and 3SR (KBBE245140) projects; the Ole Rømer grant from Danish Natural Science Research Council, BGI-Shenzhen, China; the Earmarked Fund for Modern China Wool & Cashmere Technology Research System (no.nycytx-40-3); and the Australian Department of Agriculture Food and Fisheries, Filling the Research Gap project, “Host control of methane emissions.” We thank B. Freking (USDA-ARS-U.S. Meat Animal Research Center) for provision of Texel ram tissue samples for DNA extraction and sequencing. We thank the sequencing teams and other contributors; full details are in the acknowledgements section of the supplementary materials. We thank SheepGENOMICS and Utah State University for access to the genotyping data for the SheepGENOMICS and Louisiana State University flocks, respectively. We thank L. Goodman for help with condensing the text. The genome assemblies have been deposited in GenBank, Oar v1.0 (GCA_000005525.1), and Oar v3.1 (GCA_000298735.1) and in GigaDB, Oar v2.0 ( The Ensembl annotation of the Oar v3.1 assembly is available at, and the gene builds are available from The RNA-Seq data sets have been deposited in public databases: 83 samples from The Roslin Institute (European Nucleotide Archive (ENA) study accession PRJEB6169), seven tissues from the genome-sequenced Texel ewe and Gansu alpine fine wool sheep skin (GenBank accession GSE56643), and three blood samples (GenBank BioProject accession PRJNA245615). The methylated DNA immunoprecipitation sequencing (MeDIP-seq) raw reads from the Texel ewe liver have been deposited in GenBank (GSE56644). The bacterial artificial chromosome sequence assembly has been deposited in GenBank (KJ735098). The raw reads of the genome sequencing projects have been deposited in public nucleotide databases: Texel ewe (GenBank accession SRA059406), Texel ram (ENA study accession PRJEB6251, GenBank accession SRP015759), and 0.5-fold coverage 454 sequencing of six animals (GenBank accessions SRP000982, SRP003883, and SRP006794). The authors declare no competing financial interests.
View Abstract

Navigate This Article