Revealing the History of Sheep Domestication Using Retrovirus Integrations

See allHide authors and affiliations

Science  24 Apr 2009:
Vol. 324, Issue 5926, pp. 532-536
DOI: 10.1126/science.1170587


The domestication of livestock represented a crucial step in human history. By using endogenous retroviruses as genetic markers, we found that sheep differentiated on the basis of their “retrotype” and morphological traits dispersed across Eurasia and Africa via separate migratory episodes. Relicts of the first migrations include the Mouflon, as well as breeds previously recognized as “primitive” on the basis of their morphology, such as the Orkney, Soay, and the Nordic short-tailed sheep now confined to the periphery of northwest Europe. A later migratory episode, involving sheep with improved production traits, shaped the great majority of present-day breeds. The ability to differentiate genetically primitive sheep from more modern breeds provides valuable insights into the history of sheep domestication.

The first agricultural systems, based on the cultivation of cereals, legumes, and the rearing of domesticated livestock, developed within Southwest Asia ~11,000 years before present (yr B.P.) (1, 2). By 6000 yr B.P., agro-pastoralism introduced by the Neolithic agricultural revolution became the main system of food production throughout prehistoric Europe, from the Mediterranean north to Britain, Ireland, and Scandinavia (3); south into North Africa (4); and east into West and Central Asia (5).

Sheep and goats were the first livestock species to be domesticated (6). Multiple domestication events, as inferred by multiple mitochondrial lineages, gave rise to domestic sheep and similarly other domestic species (710). Initially, sheep were reared mainly for meat but, during the fifth millennium B.P. in Southwest Asia and the fourth millennium B.P. in Europe, specialization for “secondary” products such as wool became apparent. Sheep selected for secondary products appear to have replaced more primitive domestic populations. Whether specialization for secondary products occurred first in Southwest Asia or occurred throughout Europe is not known with certainty, owing to the lack of definitive archaeological evidence for the beginning of wool production (6, 11, 12).

For this study, we used a family of endogenous retroviruses (ERVs) as genetic markers to examine the history of the domestic sheep. ERVs result from the stable integration of the retrovirus genome (“provirus”) into the germline of the host (13) and are transmitted vertically from generation to generation in a Mendelian fashion. The sheep genome contains at least 27 copies of ERVs related to the exogenous and pathogenic Jaagsiekte sheep retrovirus (enJSRVs) (1416). Most enJSRVs loci are fixed in domestic sheep, but some are differentially distributed between breeds and individuals (i.e., they are insertionally polymorphic) (14). enJSRVs can be used as highly informative genetic markers because the presence of each endogenous retrovirus in the host genome is the result of a single integration event in a single animal and is irreversible, so populations sharing the same provirus in the same genomic location are de facto phylogenetically related.

We analyzed genomic DNA samples collected from 1362 animals belonging to 133 breeds of the domestic sheep (Ovis orientalis aries, usually referred to as Ovis aries) and closest wild relatives (see below) divided into 65 groups formed by one or more breeds sharing a common geographical location and/or breeding links (table S1) (17). Samples tested also included the Urial sheep (Ovis vignei) and the Mediterranean and Asiatic Mouflon (Ovis orientalis musimon, Ovis orientalis ophion, and Ovis orientalis orientalis). Most of the breeds that we studied are local, historically related to specific geographical areas, and not subjected to the intensive breeding programs of commercial flocks.

Samples were tested for the presence or absence of six independently inherited insertionally polymorphic enJSRVs (enJSRV-18, enJSRV-7, enJSRV-8, enJSRV-15, enJSRV-16, and enJS5F16) by polymerase chain reaction with two sets of primers that amplify, respectively, the 5′ and 3′ long terminal repeats (LTRs) of each provirus (including the flanking genomic DNA sequences of the host) as described (14, 17). Provirus enJSRV-18 had by far the highest frequency in our data set (85%); enJSRV-7 and enJS5F16 were detected in 27% and 30% of the samples, respectively; and enJSRV-15, enJSRV-16, and enJSRV-8 were present in only 3 to 5% of the samples (Fig. 1A).

Fig. 1

Worldwide distribution of insertionally polymorphic enJSRVs. Distribution of the insertionally polymorphic enJSRV loci analyzed in this study in 65 sheep populations representing local breeds from the Old World. (A) Frequencies of each enJSRV locus in each population are represented by a vertical bar and arranged in descending order. Insertion frequencies were obtained with the software Arlequin 3.11 (27); the absence of a specific enJSRV provirus was treated as a recessive allele. (B) Locations of sheep populations sampled. (C to F) Interpolation maps displaying the spatial distribution of estimated enJSRVs frequencies. The geographical variation was visualized with the “Spatial Analyst Extension” of ArcView GIS 3.2 software (ESRI, Redlands, california, USA). Interpolated map values were calculated by using the inverse distance weighted with 12 nearest neighbors and a power of 2, and interpolation surfaces were divided into 13 classes with higher insertion frequencies indicated by darkest shading. The central point of the sampling area was used as geographic coordinates for each population (table S1).

We inferred the distribution of the insertionally polymorphic enJSRV loci in the earliest domesticated sheep by determining their occurrence in the Urial sheep and in the Mediterranean/Asiatic Mouflon, and then by verifying the molecular signatures indicative of the age of a provirus. The estimated divergence between the Urial (one of the closest living relatives of the domestic sheep) and the domestic sheep is ~800,000 yr B.P. (18). Consequently, any provirus that is shared between these two species will predate the process of domestication. The same is true for the Asiatic Mouflon, which is believed to be the direct ancestor of the domestic sheep (1921), whereas the closely related Mediterranean Mouflon is thought to be the remnant of the first domesticated sheep readapted to feral life (19, 22, 23). Despite its widespread distribution in the samples tested, enJSRV-18 was absent from the Urial sheep (n = 5), the Mediterranean Mouflon (n = 17), and the Asiatic Mouflon (n = 15). By contrast, the relatively rarer enJSRV-7 was detected in three of five Urial sheep, in most (86%) Asiatic Mouflons, and in all Mediterranean Mouflons. These data suggest that the integration of enJSRV-7 in the germline of the host predates the integration of enJSRV-18. Differences between the proximal (5′) and distal (3′) LTRs of enJSRV-7 confirm its antecedence. The divergence between the 5′ and 3′ LTR gives an estimate of the “age” of an endogenous provirus because upon infection, retroviruses reverse transcribe their genome from RNA into DNA, and during this process they duplicate the genomic ends, giving rise to two identical LTRs. Proximal and distal LTRs of an endogenous retrovirus must be identical upon integration, but can diverge over time at the same rate as noncoding sequences (~2.3 × 10−9 to 5 × 10−9 substitutions per site per year). enJSRV-7 appears to be the oldest provirus in our samples because it displays five nucleotide (nt) substitutions between 5′ and 3′ LTRs (445 nt long), whereas all the other insertionally polymorphic proviruses (including enJSRV-18) have identical LTRs. These data suggest that the populations originating from the earliest domesticated sheep did not carry any of the insertionally polymorphic enJSRVs used in this study or carried enJSRV-7.

To visualize the geographical variation of all enJSRV loci, we constructed interpolation maps from their insertion frequency values (Fig. 1, B to F). The highest frequency of enJSRV-7 was found in the Mediterranean Mouflon and in Soay sheep now inhabiting the island of St. Kilda off northwest Scotland (Fig. 1C). enJSRV-18 was uniformly distributed at very high frequencies throughout the Old World. Low frequencies of enJSRV-18 were observed in the islands inhabited by the Mediterranean Mouflon and in peripheral regions of northwest Europe (Fig. 1D). Two enJSRV proviruses, enJS5F16 and enJSRV-8, showed a similar geographical pattern with a high frequency in the British Isles and Scandinavia (Fig. 1, E and F). The less common enJSRV-15 and enJSRV-16 had less obvious geographical patterns (fig. S1).

We then analyzed the combination of insertionally polymorphic enJSRVs (which we call “retrotype”) in each of the populations analyzed (Fig. 2). The R2 retrotype (representing the presence of enJSRV-18 only) was the predominant retrotype in most of the populations tested. The R4 retrotype, indicating presence of enJSRV-18 and enJSRV-7 together (Fig. 2), was another common retrotype in the area corresponding to the historical Phoenicia and in southern Europe, suggesting that maritime trade and colonization had a major influence on sheep movement in the Mediterranean, as confirmed by studies using sheep mitochondrial DNA variation (24, 25). Additional enJSRV insertions accounted for more complex retrotypes of populations in northern Europe (see also supporting online text). Sheep populations in Africa, Pakistan, and China displayed a similarly homogeneous R2 retrotype pattern common to the populations in Southwest Asia, suggesting direct migratory links of domestic sheep between these areas. Most of the populations from Scandinavia displayed retrotypes similar to those of Icelandic and the Faeroe Island populations, supporting the historically registered movements of the Norse settlers during the later first millennium C.E. (26). To visualize the genetic relationship of the tested populations, we analyzed the data using two different approaches: a multidimensional scaling (MDS) plot obtained from the interpopulation matrix of Nei’s unbiased genetic distances and principal component analysis (PCA) computed from the correlation matrix among enJSRV insertion frequencies.

Fig. 2

Combination of enJSRV proviruses (retrotypes) in the domestic sheep. Pie charts in the figure represent the frequency of each retrotype in the 65 populations tested. Each sheep tested was assigned a retrotype on the basis of the combination of insertionally polymorphic enJSRV proviruses present in their genome. Retrotypes R0 to R14 were defined as follows: R0 = no insertionally polymorphic enJSRVs; R1 = enJSRV-7; R2 = enJSRV-18; R3 = enJS5F16; R4 = enJSRV-7 + enJSRV-18; R5 = enJSRV-7 + enJS5F16; R6 = enJSRV-18 + enJS5F16; R7 = enJSRV-7 + enJSRV-18+ enJS5F16; R8 = enJSRV-8; R9 =enJS5F16 + enJSRV-8; R10 = enJSRV-7 + enJS5F16 + enJSRV-8; R11 = enJSRV-18 + enJSRV-8; R12 = enJSRV-18 + enJS5F16 + enJSRV-8; R13 = enJSRV-7 + enJSRV-18 + enJSRV-8; R14 = enJSRV-7 + enJSRV-18 + enJS5F16 + enJSRV-8. Each retrotype is represented with a different color (and pattern) as indicated in the figure. Numbers beside each pie chart indicate each of the 65 populations tested as indicated in table S1. Most of the populations in Southwest Asia, Central Asia, Southern Europe, and Africa possess R2 (i.e., presence of enJSRV-18 only, shown in green) as the predominant retrotype. Around the Mediterranean basin there is also a high proportion of R4 given by the contemporary presence of enJSRV-7 and enJSRV-18 (shown in yellow). The primitive breeds are characterized by a high proportion of animals with R0 (no insertionally polymorphic proviruses, shown in white) or R1 (presence of enJSRV-7 only, shown in red). A “Nordic” retrotype, R3 (shown in blue), was characterized by a low frequency of enJSRV-18 and a high frequency of enJS5F16; Nordic populations also had a relatively high frequency of sheep with none of the insertionally polymorphic proviruses tested.

The MDS analysis revealed a marked separation (particularly evident in the first dimension) between the great majority of domestic breeds and an outer group formed by the Mouflon, Soay sheep, Hebrideans, Orkney sheep, Icelandic, and Nordic breeds (Fig. 3A). Similar results were obtained by PCA (Fig. 3B).

Fig. 3

Genetic distances between sheep populations on the basis of enJSRVs insertion frequencies. (A) Multidimensional (MDS) scaling plot computed from the matrix of Nei’s unbiased genetic distances (TFPGA 1.3 software) (28). The dominant nature of the enJSRVs as genetic markers was considered in all analyses. The matrix of interpopulation distances was summarized in two dimensions by use of MDS analysis as implemented by STATISTICA ’99 software (StatSoft, Tulsa, Oklahoma, USA). Each triangle represents one of the 65 populations tested. In the graph, only those populations outside the main cluster (enclosed within the square with the broken line and including most breeds from Africa, Asia, and Europe) have been named. (B) Tridimensional plot summarizing data obtained by PCA of the insertionally polymorphic enJSRV proviruses in the 65 sheep populations tested with the Proc Factor of the statistical package SAS/STAT (SAS Institute, Cary, North Carolina, USA) according to the recommendations by Cavalli-Sforza et al. (29). Four factors, accounting for 86.66% of variation, with eigenvalue ≥ 1 were identified. Factor 1 (on the x axis) explained 30.09% of variation and can be interpreted as the “Northern Sea factor,” distinguishing between a group of populations formed from some United Kingdom and continental European (including Denmark and Texel) sheep populations and the others. Factor 2 (on the y axis) explained 23.58% of variation separating the Texel population from the rest. Factor 3 (on the z axis) explained 22.92% of variation and can be interpreted as the “primitive breed factor,” distinguishing the group of populations formed by the Mouflon and Scandinavian populations (including the Hebridean, Orkney, and Soay populations) from the rest. For clarity, the populations that form the main cluster have not been named.

Collectively, the data we obtained indicate that relicts of the first migrations are still present in the Mouflon of Sardinia, Corsica, and Cyprus and in breeds in peripheral north European areas. On the basis of their retrotypes, these primitive populations are characterized by the absence of enJSRV-18 (fixed in most of the modern breeds) and either the presence of enJSRV-7 in high frequency or the lack of insertionally polymorphic enJSRVs (including enJSRV-7). By contrast, the retrotypes of the great majority of sheep breeds cluster together and are characterized by the high frequency or fixation of enJSRV-18.

The homogeneous retrotypes (R2 only, or both R2 and R4) that we observed in the sheep of modern-day Turkey, Iran, Saudi Arabia, Syria, Israel, and Egypt, combined with available archaeological evidence, suggest that selection of domestic sheep with the desired secondary characteristics common to the modern breeds occurred first in Southwest Asia and then spread successfully into Europe and Africa, and the rest of Asia. This may provide genetic support to the theory that specialized wool production arose in Southwest Asia and then spread throughout Europe (11). The primitive breeds survived the second migrations of improved breeds from Southwest Asia by returning to a feral or semiferal state in islands without predators or by occupying inaccessible areas less prone to commercial exchanges and associated introgression. Most, if not all, of the breeds we identified as of ancient origin were already considered primitive on the basis of morphological traits such as a darker and coarser hair (instead of a whiter woolly fleece), a moulting coat, and the frequent presence of horns in females as well as males (Fig. 4).

Fig. 4

Morphological characteristics of primitive breeds. Breeds identified in this study as remnants of the first sheep migrations possess morphological characteristics (such as darker, coarser fleece; moulting coat; frequent presence of horns in females) similar to those of wilder sheep and the Mouflon. (A) Urial sheep; (B) Cyprus Mouflon; (C) Mediterranean Mouflon; (D) Orkney sheep; (E) Soay sheep; (F) Gute sheep; (G) Åland sheep; (H) Icelandic sheep; and (I) Hebridean sheep.

Our study also provides genetic evidence supporting the anecdotal origin of some less common sheep breeds. For example, one of the 10 populations analyzed from the British Isles, the Jacob sheep, displayed a homogeneous R2 retrotype very different from that of the other British populations and more similar to that of the southwestern Asiatic and African breeds. The origins of the Jacob are unknown. This breed owes its name to the Biblical story of Jacob who took “every speckled and spotted sheep” as a wage from his father-in-law Laban (Genesis 30:25–43; probably the first recorded use of selective breeding in livestock). Our retrotype analysis supports a direct link between the Jacob sheep and breeds in Southwest Asia or Africa rather than other British breeds. Our study also firmly links the Soay sheep with the Mediterranean and Asiatic Mouflon.

In conclusion, the polymorphic nature of enJSRVs revealed a remarkable secondary population expansion of improved domestic sheep, most likely out of Southwest Asia, providing valuable insights into the history of pastoralist societies whose economy included sheep husbandry. By differentiating genetically primitive breeds from modern ones, our study offers a rationale for identifying and preserving rare gene pools. Finally, we demonstrate the utility of ERVs as a new class of genetic markers used to unravel the history of a domesticated species.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S3

Tables S1 and S2


References and Notes

  1. Materials and Methods are available as supporting material on Science Online.
  2. M. P. Miller, Department of Biological Sciences, Northern Arizona University, Flagstaff, AZ (1997).
  3. We thank the ECONOGENE, North-SheD, and NordGen consortia, and C. Leroux, J. DeMartini, J. Trafford, S. Haywood, V. Andresdottir, S. Thorgeirsdottir, M. Ganter, M. Reichert, Y. Bayon Gonzalez, K. Voigt, V. Feinstein, F. San Primitivo, and P. Halstead for help in obtaining some of the samples used in this study and for useful comments. We thank P. Murcia for coining the term “retrotypes.” We are grateful to G. P. Di Meo and A. Perucatti for fluorescence in situ hybridization analysis. We also thank B. Huffman (, NordGen, A. Ozgul, S. Jeppson, and K. Headspeath for images reproduced in Fig. 4. This study was supported by the Biotechnology and Biological Sciences Research Council, the Wellcome Trust and in part by NIH grant HD05274, the Scottish Funding Council through a Strategic Research Developmental Grant, Fundação para a Ciência e a Tecnologia, “Misura P5 Biodiversita’ animale,” from the Regione autonoma della Sardegna, the Chinese Academy of Sciences, the National Natural Science Foundation of China, and the European Regional Fund (Centre of Excellence FIBIR). The Soay sheep project in St. Kilda is supported by the Natural Environment Research Council. IPATIMUP is partially supported by “Programa Operacional Ciência e Inovação 2010” (POCI 2010), VI Programa Quadro (20022006). M.P. is a Wolfson–Royal Society Research Merit awardee. The GenBank accession numbers of the enJSRV loci described in this paper are as follows: EE680319 (enJSRV-6), EF680306 (enJSRV-8), EF680298 (enJSRV-7), EF680299 (enJSRV-15), EF680300 (enJSRV-16), EF680301 (enJSRV-18), and AF136224 (enJS5F16).
View Abstract

Navigate This Article