Research Article

Large-scale ruminant genome sequencing provides insights into their evolution and distinct traits

See allHide authors and affiliations

Science  21 Jun 2019:
Vol. 364, Issue 6446, eaav6202
DOI: 10.1126/science.aav6202

Phylogeny and characteristics of ruminants

Ruminants are a diverse group of mammals that includes families containing well-known taxa such as deer, cows, and goats. However, their evolutionary relationships have been contentious, as have the origins of their distinctive digestive systems and headgear, including antlers and horns (see the Perspective by Ker and Yang). To understand the relationships among ruminants, L. Chen et al. sequenced 44 species representing 6 families and performed a phylogenetic analysis. From this analysis, they were able to resolve the phylogeny of many genera and document incomplete lineage sorting among major clades. Interestingly, they found evidence for large population reductions among many taxa starting at approximately 100,000 years ago, coinciding with the migration of humans out of Africa. Examining the bony appendages on the head—the so-called headgear—Wang et al. describe specific evolutionary changes in the ruminants and identify selection on cancer-related genes that may function in antler development in deer. Finally, Lin et al. take a close look at the reindeer genome and identify the genetic basis of adaptations that allow reindeer to survive in the harsh conditions of the Arctic.

Science, this issue p. eaav6202, p. eaav6335, p. eaav6312; see also p. 1130

Structured Abstract

INTRODUCTION

The ruminants are one of the most successful mammalian lineages, exhibiting extensive morphological and ecological diversity and containing several key livestock species, such as cattle, buffalo, yak, sheep, and goat. Ruminants have evolved several distinct characteristics such as a multichambered stomach, cranial appendages (headgear), specialized dentition, a highly cursorial locomotion, and a wide range of body size variations. Despite their biological prominence and value to human societies, the evolutionary history of ruminants has not been fully resolved, and the molecular mechanisms underlying their particular characteristics remains largely unknown.

RATIONALE

We seek to resolve the controversies in the ruminant phylogeny and reveal the genetic basis underpinning the evolutionary innovations in ruminants. Here, we report the newly sequenced genomes of 44 ruminant species, covering about half the genera and all six extant Ruminantia families. We included seven published ruminant genomes (five bovids and two cervids) to reconstruct the phylogenetic tree by using improved time calibrations. We also reconstructed the Pleistocene demographic histories of these ruminant species using whole-genome heterozygosity information. Together with transcriptomic data of 516 samples from 68 tissues of four species, we conducted comparative genomic analyses to reveal candidate genes and regulatory elements that might have contributed to the evolution of the distinct ruminant characteristics.

RESULTS

Using whole-genome orthologous sequences obtained from 51 ruminants, we have produced a new well-supported ruminant phylogenetic tree. The new tree resolves previous controversies over the deep branches of ruminant families, as well as the highly radiated Bovidae family. We estimated the emergence of crown Ruminantia to the late Oligocene (39.1 million to 32.3 million years ago) and that of Pecora to the Neocene (23.3 million to 20.8 million years ago). Investigations of demographic history revealed massive population decline events that occurred in most ruminant species, starting from ~100,000 to 50,000 years ago, which was temporally and spatially concurrent with the increased human activities on different continents during this period. We further identified many genomic changes that associate with important evolutionary innovations, such as the multichambered stomach, headgear, body size variation, cursorial locomotion, and dentition.

CONCLUSION

Our results demonstrate the power of using comparative phylogenomic approaches in resolving the deep branches of phylogeny that result from rapid radiations. The data and results presented in this study provide valuable resources and insights into the evolution of ruminant and mammalian biology.

Phylogeny and trait evolution of ruminants.

The phylogenic tree of ruminants is presented with the species within same families and subfamilies collapsed. The ruminants have many textbook examples of distinct traits. The four-chambered stomach with omasum chamber is a key innovation evolved in pecoran ruminants. Headgear keratinous sheath only appear in Bovidae and Antilocapridae lineages. Many ruminants have evolved high-crowned or hypsodont teethes. The Antilocapridae and two bovid lineages are among the mammals with highest cursorial locomotion ability.

Abstract

The ruminants are one of the most successful mammalian lineages, exhibiting morphological and habitat diversity and containing several key livestock species. To better understand their evolution, we generated and analyzed de novo assembled genomes of 44 ruminant species, representing all six Ruminantia families. We used these genomes to create a time-calibrated phylogeny to resolve topological controversies, overcoming the challenges of incomplete lineage sorting. Population dynamic analyses show that population declines commenced between 100,000 and 50,000 years ago, which is concomitant with expansion in human populations. We also reveal genes and regulatory elements that possibly contribute to the evolution of the digestive system, cranial appendages, immune system, metabolism, body size, cursorial locomotion, and dentition of the ruminants.

Ruminantia is an important group of terrestrial herbivores, including at least 200 extant species (1) spanning six families: Tragulidae, Antilocapridae, Giraffidae, Moschidae, Cervidae, and Bovidae. The most species-rich of these families is Bovidae, which encompasses at least 143 species (2, 3), including important livestock animals (such as cattle, buffalo, yak, sheep, and goat) (4, 5). Ruminants possess several distinct and characteristic anatomical hallmarks, such as a multichambered stomach and cranial appendages (headgear). The acquisition of the rumen in the ruminants and of the omasum in pecorans (all ruminants except the Tragulidae) allow these animals to use plant material with a higher efficiency than other herbivorous mammals, such as equids (68). This effect is believed to be associated with the evolutionary success of these animals in terms of diversity, abundance, and geographic range (9). Ruminants have also evolved extreme morphological diversity, ranging in body weight from <2 kg to >1200 kg (10, 11), and a wide variety of distinct behavioral and physiological traits. Ultimately, these adaptations in ruminants likely explain the remarkable abundance of these animals among domestic animals.

Despite their biological prominence and value to human civilization, much remains to be learned about the ruminants. For example, the phylogeny of ruminants is far from resolved, and inconsistencies remain even at the family level (1215). Moreover, the genetic basis underlying many of their characteristic traits remains unknown. To fill in our gaps in knowledge, we performed de novo assembly of the genomes of 44 ruminant species, representing 36 genera that span all six families. In combination with five previously published bovid genomes, two published cervid genomes (1622), and recently updated fossil information (15), we constructed a time-calibrated phylogenetic tree of the group, analyzed species population histories, and investigated the genomic evolution of these species. Our results not only provide data for understanding the origin and evolution of this important mammalian group and their particular traits but also have implications for placing ruminant livestock genomic resources into an evolutionary context and for conserving ruminant biodiversity.

Results

Genome sequencing, assembly, and annotation

We used Illumina sequencing technology (23) to generate more than 40 trillion base pairs (Tbp) of raw data and then de novo assembled genomes for 44 ruminant species (Table 1 and tables S1 and S2). Eleven of the species are listed as “vulnerable” or worse on the International Union for Conservation of Nature (IUCN) Red List (table S1). Forty of the genomes were assembled into large scaffolds (Table 1 and tables S3 and S4) with a series of mate-paired large-insert libraries, whereas four genomes—common eland (Taurotragus oryx), mountain nyala (Tragelaphus buxtoni), bongo (Tragelaphus eurycerus), and oribi (Ourebia ourebi)—were only assembled to the contig level becausse of degenerated samples but still qualified for most comparative genomic analyses (Table 1 and table S4). In addition, we improved the quality of the genome assembly of the black muntjac deer (Muntiacus crinifrons; contig N50 = 2.4 Mbp) using long reads generated from PacBio Single Molecule, Real-Time (SMRT) sequencing. This improved assembly of a Cervidae species facilitated synteny and other comparative analyses within the ruminants. Three genomes—reindeer (Rangifer tarandus), milu deer (Elaphurus davidianus), and Marco Polo sheep (Ovis ammon)—were released in data notes before this study (2426).

Table 1 Assembly statistics of 44 ruminant species.

View this table:

To evaluate the quality of these genome assemblies (fig. S1 and tables S5 to S7), we performed BUSCO and synteny analyses and observed high BUSCO (27) scores (average 87%, from 75 to 93%) (table S6) and synteny continuity (table S7), showing that the majority of the assemblies were of high quality for downstream comparative analyses. The assembled genomes ranged in size from 2.52 to 3.25 Gbp, which was mostly consistent with the cytological C values (28) and k-mer–based estimations (table S5), indicating relatively complete genome coverages for all species.

After masking repeats (table S8), we used both de novo and homology-based gene prediction to annotate the genomes. The protein sequences of human (Ensembl 87 release), cattle (Ensembl 87 release), and sheep (Ensembl 87 release) were used as templates for homology-based gene prediction. The final annotated gene numbers range from 19,304 to 25,753 (table S9) among different species, with variations driven primarily by the quality of the assembled genomes. The gene length and exon length distributions were similar to those of other mammals (fig. S2 and table S9). We also identified 221,166 ruminant-specific conserved nonexonic elements (RSCNEs) by comparing all the ruminant genomes to 12 mammalian outgroup species (fig. S3 and table S10). These RSCNEs occupy ~0.61% of the genomes (~16.5 Mbp) (table S11) (23). A user-friendly, publicly available genome browser database was established (http://animal.nwsuaf.edu.cn/code/index.php/Ruminantia) for visualization of the genomic and transcriptomic data presented in this study.

Resolving the ruminant phylogeny

The lack of a fully resolved phylogeny for ruminants (1215) still hinders an understanding of the evolution of ruminant diverse phenotypes. In particular, the phylogenetic positions of the Antilocapridae and Moschidae families have been strongly debated, and so have the relationships among subfamilies within the diverse Bovidae family [(12), review]. Furthermore, although “dwarf antelopes” have previously been grouped in the tribe Neotragini, recent molecular studies suggest that these animals are polyphyletic (14, 15). These controversies are probably attributable to convergent evolution (challenging morphology-based approaches) and incomplete lineage sorting (ILS) in conjunction with the short internal branches in the ruminant radiation (challenging genetic inference) (12).

To resolve these phylogenetic challenges, we first estimated a whole-genome phylogenetic tree with ExaML under the GTR+GAMMA model (23) using the killer whale genome (29) as an outgroup. In total, 373 Mbp of orthologous syntenic sequences were obtained from whole-genome alignment by using the goat as a reference genome (19), yielding a whole-genome tree with 100% bootstrap support for all nodes (Fig. 1A). We also performed phylogenetic analyses with other genome partitions, including 6406 orthologous protein-coding genes identified in the 51 ruminant species and the killer whale, the fourfold degenerate sites in these genes, conserved nonexonic elements (CNEs), and complete mitochondrial genomes (mtDNA). With the exception of the mtDNA tree, the topologies of all other trees were identical with that of whole-genome tree (Fig. 1A and figs. S4 to S10).

Fig. 1 Phylogeny of ruminants.

(A) The maximum likelihood phylogenetic tree from whole-genome sequences of 51 ruminant species and 13 fossil calibrations. To compute the node supports, 200 bootstraps were used, and all nodes have 100% support. The origin and credit of the portraits of different species are listed in table S54. (B) Prevalent discordance among 10,000 random WGTs was observed across different families of ruminant.

Although the phylogenetic tree constructed with concatenated nuclear genome sequences (nDNA tree) was highly supported, phylogenetic discordance was pervasive across genomic regions (Fig. 1B). To further assess genome-wide tree discordance, 10,000 random genomic 1-Kbp windows at least 50 Kbp distant from each other were extracted. These 1-Kbp windows had enough segregating sites to generate window-based gene trees (WGTs) of high resolution (fig. S11). Although all individual WGTs differed from the species-level nDNA tree topology, 21.3% of WGTs exhibited the same family-level topology as that of the concatenated nDNA tree (table S12). This type of tree incongruence is usually caused by ILS, which has been widely observed in phylogenies of many animal groups—such as African cichlids (30), birds (31), and great apes (32, 33)—and is known to be exacerbated by the effects of gene tree estimation error (34, 35).

To ensure that the reconstructed phylogeny is robust in the presence of ILS, we performed several analyses with the coalescent-based phylogenetic method ASTRAL (36) using the 10,000 WGTs. We did not use entire genes as the unit of gene tree construction because genes can span over several recombination blocks in the genome (37). The topology of the obtained ASTRAL coalescent tree (fig. S12) is identical to the nDNA tree (Fig. 1A). We also used another coalescent-based phylogenetic method, MP-EST (38), to carry out additional phylogenetic analyses using both the 10,000 windows as well as the 6406 orthologous genes and obtained an identical topology as those of the ASTRAL and nDNA trees (fig. S13), further supporting the robustness of the inferred ruminant phylogeny. To minimize negative impacts of low tree resolution in the WGTs, we contracted low-supported branches [below 3, 10, and 20%, bootstrap support, as suggested in (36)] and still observed the same topology as that of the nDNA tree in all cases (fig. S14).

We further tested whether gene flow could have contributed to the observed topological discordances. The results from the ABBA-BABA test (39), the DFOIL software (extend from 4 taxa to 5 taxa) (40), and admixturegraph (41) consistently suggested some possible ancient gene flows among Giraffidae, Cervidae, Bovidae, and Moschidae (fig. S15 and table S13). The strongest gene flow signal between Giraffidae and Cervidae-Bovidae-Moschidae coincides with an overrepresentation of this topology in the WGTs relative to the expectation of ILS, which requires that for a set of four species trees, alternative two phylogenetic topologies other than the species tree have equal proportion (fig. S16). By contrast, PhyloNet (42) and PhyloNetworks (43) were sensitive to model and parameter choices and did not generate consistent results (figs. S17 and S18). It is plausible that the high parameter space of the phylogenetic model precluded these methods from performing complete explorations of the parameter space in our large-time-scale phylogenomic data.

Our new ruminant tree supports a sister-group relationship for Antilocapridae and Giraffidae, representing the oldest branch among the extant pecoran families. This is different from the mtDNA-based phylogenies, which placed Antilocapridae as an outgroup to all other pecorans (figs. S9 and S10) (12, 14, 15). The sister relationship of these two groups has been proposed before (44, 45) but now received 100% bootstrap support by the nDNA tree and a local posterior probability (46) of 1 in the ASTRAL tree and had 23% higher frequency of WGTs than those of the alternative topologies (fig. S16). Another contentious issue is on the placement of Moschidae, which was previously placed at the base of Pecora (4749) or as a sister group to Bovidae (50) from morphological data. In some mtDNA studies, Moschidae was proposed as a sister group of Cervidae (51) or a sister taxon to Bovidae (14, 52, 53). Our nDNA tree confirmed that Moschidae is a sister group of Bovidae, and the frequency of WGTs further support this conclusion (fig. S16).

Our results also resolved some controversial subfamily relations in Bovidae, such as the placement of the Reduncinae (12). With the whole-genome tree, Reduncinae were confirmed to be a sister group to the ancestral lineage of Caprinae, Alcelaphinae, and Hippotraginae. This topology is also supported by the ASTRAL tree and has a slightly higher WGT frequency than those of alternatives (fig. S19). We further confirmed the previously ambiguous sister-group relationship between impala (Aepyceros melampus) and suni (Neotragus moschatus) (54, 55) and found no phylogenetic support for the “waste-bucket” Neotragini tribe (56). These findings mostly agree with the topology from Decker et al. (57), which used single-nucleotide polymorphism (SNP) chip data but lacked samples from Moschidae and Tragulidae. Furthermore, our results provided higher bootstrap supports for the deep nodes and also refined some species positions within Bovidae—the position of bushbuck (Tragelaphus scriptus) and mountain nyala (Tragelaphus buxtoni).

Using fossil calibrations (tables S14 and S15) (15), we estimated the emergence of crown Ruminantia at 39.1 million to 32.3 million years ago (late Oligocene), and the emergence of Pecora at 23.3 million to 20.8 million years ago (Neocene) (Fig. 1A and figs. S20 to S27). The evolutionary rate in the ancestral ruminant lineage was ~1.5 × 10−9, which was significantly higher than that in other mammals (Student’s t test, P < 0.01) (fig. S28). Among the ruminant families, Tragulidae had the highest evolutionary rate, and Giraffidae had the lowest evolutionary rate (fig. S28). We found a significant negative correlation between evolutionary rate and log body size (fig. S29).

Pleistocene population dynamics in ruminants

We used our whole-genome dataset to investigate the demographic histories of different ruminant species using the pairwise sequentially Markovian coalescent (PSMC) method (58), which can infer changes in the effective population size (Ne) over the Pleistocene (figs. S30 and S31 and table S16). The analyses produced a species-specific demographic pattern with no clear grouping of patterns according to their habitat types or feeding types (fig. S32). This might imply that the ruminant species responded differentially to biotic and abiotic pressures associated with their different ecological niches (59). However, we found population declines for many species (25 out of the 40 species with scaffolded genome assemblies) starting from ~100,000 to 50,000 years ago (Fig. 2A), which suggests that late Pleistocene large mammal declines were much more severe than previously suspected, involving major declines in the populations of most species along with the mass extinction of large mammals at this time (60). We speculate that this community-wide decline might be at least partially attributable to human activities. This is supported by ruminant population declines coinciding with increasing human effective population size after the dispersal out of Africa during this period (Fig. 2A) (61). Intriguingly, the onset of ruminant decline coincided with the sequential population expansion of humans on different continents in this period (Fig. 2B), which would not be expected if the decline signal was a structural artefact. By contrast, ruminant population size dynamics showed no apparent correlation with the climatic changes dominated by serial glaciations during the Pleistocene (figs. S30 and S31). Taken together, these results highlight a possible role for humans in the massive decline of mammalian species in the late Pleistocene (60).

Fig. 2 Population size history of ruminants.

(A) The normalized effective population sizes (Ne) of Ruminantia shows a clear decline trend from 100,000 to 50,000 years ago, in sharp contrast to the normalized Ne of human, which expanded dramatically at the same time. The Ne of each species is inferred by using PSMC (23). The normalized Ne is calculated by dividing the estimated value of Ne for each species at each time point with its maximum value. After normalizing each species’s Ne, we put 20 Ne datum points along the time axis into a window. For ruminants, a gray shadowed bar indicates the variation interval of different species at a window, and the trend line (red) is plotted by using the “smooth.spline” function in the R package. (B) Species grouped into continents Africa (26 species), Asia (seven species), and North America (four species). The trend line (red) indicates the dynamics of normalized effective population sizes in each continent. ka, thousand years.

Structure and evolution of ruminant genomes

Synteny

We explored the general architecture of ruminant genomes through comparison with other high-quality mammalian genomes (23). Specifically, we compared the syntenic relationship between the goat (the most well-assembled ruminant genome) (19) and human (Primates) (62), dog (Carnivora) (63), horse (Perissodactyla) (64), pig (Suina) (65), camel (Tylopoda) (66), killer whale (Cetacea) (29), and black muntjac (Cervidae of Ruminantia). The high-quality assembly of the black muntjac (Table 1) by use of PacBio reads allowed us to assess the within-ruminant synteny by including both a Cervidae representative as well as the high-quality Bovidae representative (goat). We found that Primates and Perissodactyla have more prevalent genome rearrangements (>200 rearrangements per million base pairs) compared with that of Ruminantia (Fig. 3A and table S17). Within the Cetartiodactyla, pig (2n = 18, Suina) and camel (2n = 74, Tylopoda) experienced more genome rearrangements (~120 rearrangements per million base pairs) than the killer whale (2n = 44, Cetacea). Thus, the greater level of conserved synteny between Cetacea and Ruminantia (68 rearrangements per million base pairs) is consistent with their sister relationship within the Cetartiodactyla. Few rearrangements (19 rearrangements per million base pairs) were observed between the black muntjac (2n = 8/9) and the goat (2n = 58), suggesting that synteny has been conserved among different lineages of Ruminantia, despite large variation in chromosome numbers (table S18).

Fig. 3 Structural characteristics and evolution of ruminant genomes.

(A) Goat (19) was used to represent Ruminantia-specific genomic rearrangements in comparisons with Primates (human), Perissodactyla (horse), Suina (pig), Tylopoda (camel), and Cetacea (Killer whale). Syntenic blocks are linked between genomes in a circos plot. The red number beside each circos quantifies the occurrences of rearrangement events per aligned megabase (Mb) sequence. The high-quality de novo assembly of the black muntjac was included for within-ruminant synteny inference. (B) The average genome sizes, TE sizes, and contents of different TE types of Primates (human, chimpanzee, gorilla, and orangutan), Carnivora (dog, cheetah, and polar bear), Perissodactyla (horse, przewalski’s horse, and rhinoceros), Tylopoda (dromedary camel, bactrian camel, and alpaca), Suina (pig), and Cetacea (minke whale, killer whale, beluga whale, sperm whale, and yangtze finless porpoise). Overall, the average genome size of Ruminantia is significantly (Student’s t test, P < 0.01) larger than those of Canivora, Perissodactyla, Tylopoda, Suina, and Cetacea. Tragulidae is marked with dashed lines because the genome assembly contains more gaps, which hindered the annotation of TEs. The proportions of LINE, SINE, LTR, and DNA transposons are presented in the stacked bar plot. LINE/BovB and LINE/L1 are highlighted here to present their dynamic changes among ruminants. ***P < 0.01. (C) The average contents of different SINE types are plotted in the stacked bar plot across mammalian orders and suborders of Cetartiodactyla, with different colors. (D) A large fragment insertion of 3,396,232 bp is observed in the goat genome, which is also validated in other ruminant families, containing a cluster of PAG genes, specifically containing 36 coding genes and 32 pseudogenes in goat.

Genome size

Our ruminant genome assembly sizes ranged from 2.52 Gbp [oribi (Ourebia ourebi)] to 3.25 Gbp [klipspringer (Oreotragus oreotragus)] (table S5). The average assembled genome size of ruminants was (2.7 Gbp), which is larger than Carnivora (~2.3 Gbp), Perissodactyla (~2.4 Gbp), Suina (~2.5 Gbp), Tylopoda (~2.0 Gbp), and Cetacea (~2.4 Gbp) and smaller than Primates (~3.0 Gbp) (Fig. 3B). Analyses (23) confirm that transposable element (TE) content is the major cause of genome size variation (Fig. 3, B and C, and table S19).

When comparing the goat genome with the human, horse, pig, and killer whale genomes, we also observed and validated (23) large insertions and deletions (over 50 kbp in length) in ruminants (table S20). The largest insertion from segmental duplication in the goat contains a cluster of PAG (pregnancy-associated glycoprotein) genes, with 36 coding sequences and 32 pseudogenes (Fig. 3D). The main functions of PAGs are immune regulation and maintenance of pregnancy (67). We also found other insertions with important gene annotations in ruminants, including interferon and olfactory receptors (fig. S33 and table S21).

Evolution of genes and gene families in Ruminants

We obtained a high-confidence orthologous gene set for the full set of 51 ruminant species using camel, cat, dog, horse, human, minke whale, killer whale, and pig as outgroups (23). Using the resolved phylogeny of Ruminantia, we identified rapidly evolving genes (REGs), positively selected genes (PSGs), and newly evolved genes (Fig. 4A and tables S22 to S25) (23).

Fig. 4 Newly evolved genes, expanded and contracted gene families, REGs, and PSGs in ruminants.

(A) Positively selected genes (PSGs), rapidly evolving genes (REGs), expanded genes, and newly evolved genes are shown along the phylogenetic tree. PSGs and REGs are identified with PAML. Newly evolved genes are identified based on the synteny relationships, with goat used as a reference. Expanded gene families are identified for the pecoran families. Tragulidae are excluded because of the relatively low quality of the assembled tragulid genome. (B) Diagram of the process of leukocytes crossing blood vessels, highlighting PSGs (orange) and REGs (blue). The orange rectangles indicate PSGs, and the blue rectangles represent the REGs. The solid lines represent direct interaction, and the dotted lines represent indirect interactions. (C) Diagram of nutrient metabolism evolution in Ruminantia displaying genes involved in nutrition metabolism. Expanded gene families are marked red, PSGs are marked yellow, and REGs are marked blue in (C) and (D). (D) Expansion of the lysozyme c gene family throughout the Ruminantia. The different expanded copies are presented with colors. Each line corresponds to the order or family in (A), and we chose the best assembled species genome sequences to draw the region. The slashes on each line indicate the ends of scaffolds in the assembled genomes.

Functional enrichment analyses of the PSGs (tables S26 and S27), the REGs (table S28), and expanded gene families (table S29) in the ancestral ruminant branch all exhibit enrichment in immune functions. As many as 20 PSGs and 12 REGs are involved in the crossing of blood vessels by leukocytes, which respectively constitutes 17.9 and 10.7% of this key pathway in active immunization (Fig. 4B and tables S30 and S31) (68, 69). We also observed gene family expansions in the ruminant ancestor (table S32), including the interferon family (IFNs) (figs. S34 to S36), PAG family (fig. S34), and cathelicidin and serpin peptidase inhibitor families (figs. S34, S37, and S38), which are all involved in immune system pathways. In addition, we identified a rumen-specifically expressed newly evolved gene, ENSBTAG00000038127 (23), which contains an immunoglobulin V-set domain, usually involved in mimicking the antibody variable domain of several diverse protein families (70). Although rapid evolution of immune genes was found in a wide range of animals, the number of PSGs from the leukocyte transendothelial pathway is highest in Ruminantia and could suggest a key role of this pathway in the evolution of ruminant immune system (table S33).

In addition to immune system–related genes, we also observed a series of PSGs, REGs, and expanded gene families in ruminants involved in lipid metabolism, glycolysis, oxidative phosphorylation, and amino acid metabolism (Fig. 4, C and D, and table S34) (23). Ruminants have a distinct digestive system, in which the main source of energy comes from volatile fatty acids (VFAs) and the rate of glycolysis is low (71). The genomic changes associated with metabolism may reflect the adaptation of the digestive system in ruminants and may therefore have played important parts in the success of the ruminants.

Genomic variations related to ruminant morphological characteristics

We were able to leverage our phylogenetic tree and genomic data to conduct evolutionary genomic analyses aimed at identifying genomic variations correlated with particular ruminant characteristics. Specifically, we investigated the evolution of the multichambered stomach, headgear, body size, cursorial locomotion, and dentition.

Evolution of the multichambered stomach

Whereas pecoran ruminants have a four-chambered stomach composed of the rumen, reticulum, omasum, and abomasum, the basal group of Ruminantia, the Tragulidae, lacks the omasum (72).

Using a transcriptomic dataset from 516 samples covering 50 tissues of sheep (table S35) (23), we found that the gene expression profiles of rumen, reticulum, and omasum are closest to that of esophagus (Fig. 5A), implying that the three stomach chambers might have originated from the esophagus, as suggested by Warner (73) and Xiang et al. (74). Regarding the distinct rumen organ, we found that several newly evolved genes played a role in its function. Among the 295 newly evolved genes identified in the ancestor of ruminants (Fig. 4A and fig. S39), seven were highly or specifically expressed in the rumen (fig. S40). Two genes, PRD-SPRRII and TCHHL2, are important structural genes in the rumen (16). Three newly evolved serpin genes may have inhibitory functions of different proteases or through anti-inflammatory functions (75, 76), and two KRT6A genes are likely involved in the activation of follicular keratinocytes (77, 78).

Fig. 5 Genomic features related to ruminant characteristics.

(A) Pairwise Spearman correlations of gene expressions indicated that the omasum, rumen, and reticulum evolved as an extension of the esophagus, whereas the abomasum evolved as an extension of the duodenum. Although it is anatomically closer to the reticulum, the omasum has a more similar gene expression atlas with that of the rumen than the reticulum, which mirrors the similar functions between them. (B) Diagram of the convergent feature of keratinous sheath in the Bovidae and Antilocapridae. The two red amino acids indicate convergent mutations of the KRT82 between Antilocapridae and Bovidae, and the red dots indicate the included species of Antilocapridae and Bovidae. (C) Body size contrast among ruminants and 11 genes related with bone development that have at least four specific mutations in giraffe and are involved in TGF-β, Hedgehog, Notch, Wnt, and FGF pathways. (D) A total of 642 genes in GO related to the development of body size are retrieved, and we calculated the dN/dS ratios on the branches, leading to large, medium, and small body sizes, under the two-ratio branch model (model 2) of PAML. Background dN/dS ratios are calculated under one-ratio branch model (model 1) of PAML. The distribution densities of the dN/dS values are shown. A box plot of dN/dS values in different body size categories is shown. The mean dN/dS values at the branches of large body size and small body size are significantly larger than background. ***P < 0.01. (E) Antilocapridae and Antilopinae species are among the most mobile land mammals, and both of them have specific mutations in several genes of the mitochondrial electron transport chain.

Although the abomasum is analogous to the true stomach in other mammals, it is hypothesized that the ruminant abomasum has evolved particular adaptations to digest the microbe-rich content from upstream chambers (72). The lysozyme c family, which degrades bacterial and microbial cells as members enter the abomasum to extract nutrients (79), has expanded to 10 or more copies in ruminants, whereas other mammals have only one or a few copies of this gene family (79). Our comparative genomics analyses revealed that the duplication of lysozyme c genes began in the ancestors of Ruminantia, continued to expand along the diversification of pecoran families, and eventually culminated with the highest copy numbers in Bovidae (Fig. 4D). This accumulated evolution of lysozyme c gene copy number in ruminants may be associated with an improved harvesting of nutrients from the rumen microbiome.

As a newly evolved organ in the pecorans, the omasum is adjacent to the reticulum, whereas its expression profile is closest to that of the rumen. This expression overlap may be linked to the resemblance in structure and function of the rumen and omasum, which are both involved in the absorption of water, minerals, electrolytes, and VFAs, whereas the reticulum mainly has a different function as a filter for the fermentation products from the rumen (6). Anatomically, the omasum is composed of the same stratified squamous epithelium of mucosal-layered tissue as the rumen and reticulum, which is different from the abomasum and intestines (6). To further reveal the genetic basis underlying the evolution of the omasum in pecorans, we identified 75 genes that were specifically highly expressed in the omasum compared with other organs (fig. S41). Among these, one gene was newly evolved (LOC101107119), and another (SCNN1D) exhibited pecoran-specific amino acid changes relative to Tragulidae and killer whale genomes (fig. S42). LOC101107119 is annotated as a prostaglandin F synthase 1–like gene and might have a similar function to that of the prostaglandin F synthase 1 (PGDF1), which is involved in ketone metabolism (80). SCNN1D encodes the δ subunit of the epithelial sodium channel, which mediates Na+ reabsorption and water absorption in the digestive tract (81).

Transcriptional factors and regulatory elements may have played important roles in the evolution of the omasum by changing the expression patterns of their host genes to be recruited in omasum functions. We found four genes with pecoran-specific CNEs within their immediate upstream/downstream 10-kbp regions (SIM2, PAX9, KCNK5, and DENND2C) (table S36), of which PAX9 and SIM2 are important transcriptional factors. PAX9 regulates squamous cell differentiation in the esophageal epithelium (82). A CNE with two pecoran-specific mutations was found at the 5′ upstream of PAX9 in pecorans, which might play a role in regulating PAX9 expression in the omasum. SIM2 is highly expressed in the omasum but inactive in the rumen [fragments per kilobase of exon per million fragments mapped (FPKM) < 1], and it works as a suppressor of cell proliferation in the epithelium (83, 84). The differential expression patterns of SIM2 in the omasum and rumen is consistent with the epithelial cells in the omasum not being completely renewed, as is the case in the rumen (85).

Evolution of headgear

The ruminant families exhibit spectacular variation in headgear morphology (86). To reveal the genetic basis of headgear origin in ruminants, its secondary loss in two independent lineages and the biologically exceptional, rapid regeneration ability of cervid antlers, we performed large-scale comparative transcriptomics and functional experiments in an accompanying paper (87). Substantial similarities in the transcriptome profiling of different headgear types (87) are consistent with a single origin of headgear in the pecoran ancestor. This ancestral headgear subsequently diversified into horns, antlers, ossicones, and pronghorns in the different pecoran families and was lost independently in the lineages of Moschidae and Hydropotinae perhaps because of the convergent pseudogenization of the horn-development RXFP2 gene (87). These complex patterns, along with the lack of a well-resolved phylogeny, have confounded a synthesis of headgear evolution.

We further examined the 201 highly expressed genes in the headgear of both bovids and cervids (87) and identified 36 of these genes harboring pecoran-specific CNEs (table S36). Nine genes (ALX1, VCAN, COL1A1, SATB2, RUNX2, POSTN, SP7, TNC, and COL4A2) were classified in Gene Ontology (GO) categories related to bone development. RUNX2 is a key transcriptional factor in the regulation of bone development (88) and has four pecoran-specific CNEs in its intronic regions (fig. S43A and table S36), potentially causing high RUNX2 expression in the headgear sprouts. SP7 encodes an important regulatory factor of the biomineralization and formation of bones (89) and has specific mutations in the pecoran 3′ untranslated region, one of which was located in the binding region of the microRNA bta-mir-145 (fig. S43B). These genes were possibly rewired for the formation of bony headgears of ruminants, laying out a hypothesis that can be tested experimentally in the future.

In addition, we explored the genetic basis of the keratinous sheath found convergently in Bovidae and Antilocapridae headgear (Fig. 5B). Transcriptomic data from horn sprouts of sheep and goat (87), both of which belong to Bovidae, identified seven highly expressed keratin genes: KRT1, KRT2, KRT3, KRT5, KRT10, KRT14, and KRT84 (fig. S44A). Except for KRT10 and KRT14, the above keratin genes encode Type II α-keratin proteins, suggesting an essential role for Type II α-keratin genes in the formation of the keratinous sheath of bovids. We further examined convergent amino acid substitutions between the pronghorn (Antilocapra americana) and bovids. Using the 12 mammalian species and other ruminant species as outgroups, we identified 106 proteins (table S37) that contain at least two convergent amino acid changes in these two ruminant families (23). Among these proteins, a horn-specific Type II α-keratin protein, KRT82, contained two convergently changed amino acid sites, Ala82Thr and Ser509Ala, of which the Ala82Thr is located in the keratin head domain (Fig. 5B and fig. S44B). This suggests that Type II α-keratin may be important in forming the keratinous sheath that evolved convergently in Bovidae and Antilocapridae.

Evolution of body size in ruminants

Ruminants, especially bovids, encompass a wide body size range that spans four orders of magnitude, from as little as 2 kg to as high as 1200 kg (Fig. 5C) (10, 11). We retrieved 642 genes related to the development of body size and estimated the ratios between the nonsynonymous substitution rate (dN) and synonymous substitution rate (dS) of these genes on branches that exhibit substantially increased body sizes and on branches that exhibit decreased body sizes (fig. S45). Compared with the background, these genes showed significantly higher dN/dS ratios in branches with large-sized and small-sized species (Student’s t test, P < 0.01) but similar ratios in branches with medium-sized species (Fig. 5D). We observed six genes (CXCL13, RNF115, NPNT, KL, SLC9A3R1, and MSTN) that had significantly elevated dN/dS ratios along with the increasing of body size (Student’s t test, P < 0.01) (table S38), and SLC9A3R1 and MSTN have dN/dS values > 1, an indication of positive selection. SLC9A3R1 affects osteogenesis by mineralizing osteoblasts, and disrupting this gene resulted in reduced body weights in mice (90). MSTN is an important gene in the regulation of muscle cell growth and differentiation (91), and mutations in MSTN affect the muscle mass of goat (92), sheep (93), and cattle (94, 95) as well as other mammals (96, 97). These results indicate that SLC9A3R1 and MSTN might be targets of natural selection favoring increased body sizes (fig. S45) by regulating the development of bone and muscle. In reduced body size branches, five genes (SBDS, BMP3, LRRN3, NFATC3, and SMARCAL1) had significantly elevated dN/dS ratios (table S38), and the dN/dS ratio of SBDS was larger than 1. SBDS is an important gene in cell proliferation and is the causal gene of Shwachman-Diamond syndrome in humans, which is characterized by skeletal abnormalities and short stature (98). BMP3 is a well-known gene involved in the regulation of osteogenesis in mammals (99). The genes mentioned above may therefore explain the body size variation in ruminants and may be relevant to livestock breeding application.

As the tallest terrestrial animal, giraffes have a distinct stature and body morphology, which likely are adaptions to their savanna habitat (100). Among the 366 genes related to bone development in the KEGG annotation, 115 genes had giraffe-specific mutations (table S39), including genes in the transforming growth factor–β (TGF- β), Hedgehog, Notch, Wnt, and FGF signaling pathways. Eleven genes with more than four nonsynonymous mutations may be related to the extreme elongation of body structures in giraffe because they are part of bone development pathways: TGF-β (CHRD and LTBP1), Wnt (APC, APC2, and CREBBP), Notch (NOTCH1, NOTCH3, and NOTCH4), Hedgehog (GLI2 and GLI3), and FGF (FGFRL1) (Fig. 5C). Three such genes (FGFRL1, NOTCH4, and CREBBP) had been identified in a previous comparative study by using a low-coverage genome assembly of giraffe (Giraffa camelopardalis) (table S39) (100).

Adaptations to cursorial locomotion

Although early ruminants were probably forest-dwelling (101), many lineages have adapted to open habitats by adopting more cursorial body plans designed for rapid and/or sustained movement in an open habitat (102). These include adaptations in limb morphology (103) and possibly also physiology. The pronghorn (Antilocapra americana) and members of Antilopinae such as the springbok (Antidorcas marsupialis) are among the fastest-running animals on Earth (104). Both lineages exhibit strong selection in their mitochondrial electron transport chains, although the targeted genes of selection differ between these species. In Antilopinae, two genes, SUOX (sulfite oxidase) and NLN (neurolysin), were under positive selection (χ2 test, P < 0.05) (table S24). In the pronghorn, two mitochondrial electron transport chain enzyme encoding genes [COX5A (cytochrome c oxidase subunit 5A) and PPOX (protoporphyrinogen oxidase)] were subjected to positive selection (χ2 test, P value < 0.05) (table S24). In addition, four proteins in the mitochondrial electron transport chain—NDUFA10 (reduced nicotinamide adenine dinucleotide dehydrogenase 1 α subcomplex subunit 10), SDHB (succinate dehydrogenase iron-sulfur subunit), UQCRC2 (cytochrome b-c1 complex subunit 2), and ATP5B (adenosine 5′-triphosphate synthase subunit β), functioning in oxidative phosphorylation—exhibited convergent amino acid changes (Fig. 5E and tables S40 and S41). Furthermore, all species of the bovid tribe Alcelaphini, which are adapted to open and seasonal grasslands and have cursorial locomotion, shared three specific mutations (Tyr211Phe, Gln481Arg, and Glu622Ala) in the ACE (angiotensin I converting enzyme) gene (fig. S46) and one specific mutation (Glu97Asp) in the EPO (erythropoietin) gene (fig. S47), both of which are important for endurance (105, 106). These observations imply that shared or distinct molecular pathways might have played roles in convergent phenotypic evolution during the radiation of ruminants.

Evolution of dentition

Ruminants have specialized dentition patterns, lacking upper incisors and having a high prevalence of high-crowned or hypsodont teeth—a likely adaptation to abrasive diets such as grass or feeding under abrasive conditions, as encountered in dry habitats (107). The distinct dentition is believed to have contributed to the evolutionary success of the ruminants by expanding their food sources (108). We observed that 11 genes related to dentition had RSCNEs in the 10-kbp downstream or upstream or in introns (table S42). Among these genes, ENAM functions in the mineralization and structural organization of enamel (109) and is possibly associated with enamel thickness in humans (110). Furthermore, ENAM is an important factor in the adaptation of dentition to specific diets (111). We observed seven RSCNEs in this gene region (fig. S48A). In addition to these RSCNEs, ENAM had a two–amino acid insertion and several mutations that were specific to ruminants (fig. S48, B and C). A genome-wide association analysis using the hypsodonty index of ruminants [data from (112)] identified one SNP significantly associated (Fisher’s exact test, P < 0.01) with the hypsodonty index located in the binding site of the EBF1 transcription factor in the intron of FGF14 (fig. S49), an important factor controlling tooth development (113). These genes provide candidates for future experimental studies on mammal dentition.

Conclusion

A well-resolved phylogeny of Ruminantia with full genome data provides an opportunity to understand the evolutionary processes and genetic basis of the distinct structure and evolutionary patterns of ruminants. Our comprehensive evolutionary and comparative analyses have revealed numerous genetic variations correlated with specific traits in ruminants. This study provides valuable genomic resources as well as insights into not only the evolution and diversification of ruminants but also our understanding of mammalian biology.

Materials and methods

We sequenced and assembled 44 ruminant genomes using Illumina reads, with the black muntjac (Muntiacus crinifrons) further sequenced with PacBio SMART long reads. The assemblies were performed with SOAPdenovo v1.05 (114), Platanus v1.2.4 (115), and Supernova assembler (116), and some contigs were further scaffolded with SSPACE v3.0 (117) and “cross_genome” commands in the Phusion2 package (118). Repeat elements were annotated by combining results of Tandem Repeat Finder v4.07b (119), RepeatMasker v4.0.5 (120), RepeatModeller v1.0.4 (120), and LTR_FINDER v1.0.6 (121). Genes were annotated with homologous TBLASTN (122) protein searches of human (Ensembl 87 release), cattle (Ensembl 87 release), and sheep (Ensembl 87 release), combining de novo prediction of SNAP (123), GENSCAN v1.0 (124), glimmerHMM v3.0.3 (125), and AUGUSTUS v2.5.5 (126). Whole-genome alignments were constructed with LAST v885 (127) and MULTIZ v11.2 (128), using goat as the reference. The maximum likelihood phylogenic trees were constructed with RAxML v8.2.9 (129) and ExaML v3.0.17 (130) by using a general time-reversible model (“GTR+GAMMA”). ASTRAL III v5.1.1 (36) and MP-EST v2.0 (38) were used to perform coalescent species tree estimations. DiscoVista v1.0 (131) was used to quantify and visualize gene tree discordance for alternative topologies. Gene flows among ruminant families were performed with ABBA-BABA test (39), DFOIL (40), admixturegraph v1.0.2 (41), PhyloNet v3.6.9 (42), and PhyloNetworks v0.9.0 (43). Time calibration was conducted with r8s v1.70 (132), BEAST v1.8.4 (133), MCMCTREE in PAML v4.8 (134), and Multidivtime (135). Demographic history reconstruction was carried out by means of PSMC analysis (56). PhyloFit v1.4 (136) and phastCons v1.4 (137) were used to infer conserved nonexonic elements in ruminants. PSGs and REGs were identified by use of branch and branch-site models in PAML v4.8 (134). The gene family expansion or contraction analysis was performed by using CAFÉ v4.7 (138). In-house scripts and pipelines are deposited in Zenodo (https://zenodo.org/record/2549147). Detailed methods and materials are described in the supplementary materials (23).

Supplementary Materials

science.sciencemag.org/content/364/6446/eaav6202/suppl/DC1

Materials and Methods

Figs. S1 to S54

Tables S1 to S54

References (140199)

References and Notes

  1. Materials and methods are available as supplementary materials.