The Genetic Legacy of Paleolithic Homo sapiens sapiens in Extant Europeans: A Y Chromosome Perspective

See allHide authors and affiliations

Science  10 Nov 2000:
Vol. 290, Issue 5494, pp. 1155-1159
DOI: 10.1126/science.290.5494.1155


A genetic perspective of human history in Europe was derived from 22 binary markers of the nonrecombining Y chromosome (NRY). Ten lineages account for >95% of the 1007 European Y chromosomes studied. Geographic distribution and age estimates of alleles are compatible with two Paleolithic and one Neolithic migratory episode that have contributed to the modern European gene pool. A significant correlation between the NRY haplotype data and principal components based on 95 protein markers was observed, indicating the effectiveness of NRY binary polymorphisms in the characterization of human population composition and history.

Various types of evidence suggest that the present European population arose from the merging of local Paleolithic groups and Neolithic farmers arriving from the Near East after the invention of agriculture in the Fertile Crescent (1–5). However, the origin of Paleolithic European groups and their contribution to the present gene pool have been debated (6,7). Assuming no selection, local differentiation occurred in isolated and small Paleolithic groups by drift (8, 9). Range expansions and population convergences, which occurred at the end of the Paleolithic, were catalyzed by improved climate and new technologies and spread the present genetic characteristics to surrounding areas (8). The smaller effective population size of the NRY enhances the consequences of drift and founder effect relative to the autosomes, making NRY variation a potentially sensitive index of population composition. Previously, the distribution of two NRY restriction fragment length polymorphism (RFLP) markers suggested Paleolithic and Neolithic contributions to the European gene pool (10). NRY binary markers (11) representing unique mutational events in human history allow a more comprehensive reconstruction of European genetic history.

Twenty-two relevant binary markers [4 gathered from the literature and 18 detected by denaturing high-performance liquid chromatography (DHPLC) (12)] were genotyped in 1007 Y chromosomes from 25 different European and Middle Eastern geographic regions. More than 95% of the samples studied could be assigned to haplotypes or clades of haplotypes defined by just 10 key mutations (Fig. 1 and Table 1). The frequency distribution of Y chromosome haplotypes revealed here defines the basic structure of the male component of the extant European populations and provides testimony to population history, including the Paleolithic period. Two lineages (those characterized by M173 and M170) appear to have been present in Europe since Paleolithic times. The remaining lineages entered Europe most likely later during independent migrations from the Middle East and the Urals as they are found at higher frequencies and with more variation of linked microsatellites than in other continents (10–14).

Figure 1

(Top) Maximum parsimony phylogeny of the NRY markers found in Europe and elsewhere. YAP (32), TAT (14), RPS4 [= RPS4YC711T (33)], and 4064 [= SRY4064 (34)] were previously described. The remaining polymorphisms were identified with DHPLC (11, 12,27) and are deposited in the National Center for BioTechnology Information (NCBI) dbSNP database ( The phylogeny is rooted with the use of great ape sequences. (Bottom) The 19 haplotypes observed (Table 1) were pooled into six classes represented by different colors: Yellow indicates haplotype Eu4; blue includes Eu7 and Eu8, which both involve the M170 mutation; red groups three separate haplotypes for reasons explained in the text; pink includes haplotypes Eu13 and Eu14, which both involve the TAT mutation; and green indicates Eu18 and purple indicates Eu19, which despite sharing the M173 mutation are distinguished because they represent a distinct dichotomy in European phylogeography. The other nine observed haplotypes, which catalog the remaining <5% of the total samples, are shown as black dashed lines and are represented in the white sector of relevant pie charts. Three haplotypes, Eu2, Eu5, and Eu21, were not detected. The pie sectors are proportional to the relative frequencies of haplotypes or clades in each population. The two Basque samples have been pooled.

Table 1

Frequencies (in percent) of the haplotypes found in the examined European populations.

View this table:

Of the 22 haplotypes that constitute the phylogeny in Fig. 1(top), Eu18 and Eu19 characterize about 50% of the European Y chromosomes. Although they share M173, the two haplotypes show contrasting geographic distribution. The frequency of Eu18 decreases from west to east, being most frequent in Basques (Fig. 1, bottom, andTable 1). This lineage includes the previously described proto-European lineage that is characterized by the 49a,f haplotype 15 (10). In contrast, haplotype Eu19, which is derived from the M173 lineage and is distinguished by M17, is virtually absent in Western Europe. Its frequency increases eastward and reaches a maximum in Poland, Hungary, and Ukraine, where Eu18 in turn is virtually absent. Both haplotypes Eu18 and Eu19 share the derived M45 allele. The lineage characterized by M3, common in Native Americans (12) and a few Siberian populations (15), is also a derivative of M45. This observation suggests that M173 is an ancient Eurasiatic marker that was brought by or arose in the group of Homo sapiens sapiens who entered Europe and diffused from east to west about 40,000 to 35,000 years ago (16, 17), spreading the Aurignac culture. This culture also appeared almost simultaneously in Siberia (17), from which some groups eventually migrated to the Americas.

We interpret the differentiation and the distribution of haplotypes Eu18 and Eu19 as signatures of expansions from isolated population nuclei in the Iberian peninsula and the present Ukraine, following the Last Glacial Maximum (LGM). In fact, during this glacial period (20,000 to 13,000 years ago), human groups were forced to vacate Central Europe, with the exception of a refuge in the northern Balkans (16). Similar discrete patterns of the flora and fauna in Europe have been attributed to glaciation-modulated isolation followed by dispersal from climatic sanctuaries (18). This scenario is also supported by the finding that the maximum variation for microsatellites linked to Eu19 is found in Ukraine (19). In turn, the maximum variation for microsatellites linked to 49a,f Ht15 and its derivatives (and then to the Eu18 lineage) is in the Iberian peninsula (19). This is consistent with the diffusion of M173-marked Eu18 from its refuge after the LGM, in agreement with mitochondrial DNA (mtDNA) haplogroup V and some of the H lineages (20). Haplotype Eu19 has been also observed at substantial frequency in northern India and Pakistan (12) as well as in Central Asia (12). Its spread may have been magnified by the expansion of the Yamnaia culture from the “Kurgan culture” area (present-day southern Ukraine) into Europe and eastward, resulting in the spread of the Indo-European language (21). An alternative hypothesis of a Middle Eastern origin of Indo-European languages was proposed on the basis of archaeological data (3).

We estimated the age of M173 by using the variation of three microsatellites, namely DYS19, YCAIIa, and YCAIIb (22). Although an estimate of ∼30,000 years for M173 must be interpreted cautiously (23), it is consistent with our hypothesis that M173 marks the Aurignac settlement in Europe or, at least, predates the LGM.

The polymorphism M170 represents another putative Paleolithic mutation whose age has been estimated to be ∼22,000 years (22, 23). With the exception of idiosyncratic distributions indicative of recent gene flow, M170 is confined to Europe (Eu7). The mutation is most frequent in central Eastern Europe and also occurs in Basques and Sardinians that have accumulated a subsequent mutation (M26) that distinguishes Eu8. The closest phylogenetic predecessor is the M89 mutation, from which the most important Middle Eastern lineages originated. We propose that M170 originated in Europe in descendants of men that arrived from the Middle East 20,000 to 25,000 years ago, who have been associated with the Gravettian culture (16). This migration may have coincided with that of mtDNA haplogroup H to Europe. It has been suggested that Gravettian and Aurignac groups coexisted for a few thousand years, maintaining their identities despite occasional contacts. During the LGM, Western Europe was isolated from Central Europe, where an Epi-Gravettian culture persisted in the area of present-day Austria, the Czech Republic, and the northern Balkans (16). After climatic improvement, this culture spread north and east (16). This finding is supported by the present Eu7 haplotype distribution. In this scenario, haplotype Eu8 would have originated in the western Paleolithic population during the LGM, as local differentiation of the M170 lineage. The frequency and the distribution of haplogroup H across Europe support gene flow between Gravettian and Western European Aurignac groups and suggest differential gender migratory phenomena (24).

The cline of frequencies for haplotypes marked by M35 (Eu4), M172 (Eu9), M89 (Eu10), and M201 (Eu11) decreases from the Middle East into Europe. Haplotype Eu4 is phylogenetically distinct from the other three and defines most European YAP+ chromosomes. The Eu4 haplotype appears to correspond to the previously reported Ht-4, defined by the absence of M2 (25). Comparative genotyping with the Y chromosome RFLPs 49a,f and 12f2 [(10) and citations therein] revealed that Eu9 and Eu10 share the 12f2-derived 8Kb allele, whereas Eu11 has the ancestral 12f2-10Kb allele. Haplotypes Eu9, Eu10, and Eu11share the 49a,f haplotype 8 or its derivatives, which are not observed in any of the other 16 Eu haplotypes (19), suggesting a shared common ancestry. Thus, we have displayed the combined frequencies of haplotypes Eu9, Eu10, and Eu11 in Fig. 1. By correlation between Ht-4 ≈ Eu4 and 12f2-8Kb ≈ Eu9 and Eu10, the origin of these lineages has been estimated to be about 15,000 to 20,000 years ago (13). A similar date (17,000 years ago) for Eu11 has been estimated (22, 23). The molecular age of a mutation and its corresponding haplotype must predate the demographic migratory event it marks. The age estimates of these haplotypes, especially considering their approximation (22,23), cannot distinguish whether they came to Europe before or after the LGM. However, the decreasing clinal pattern of haplotypes Eu4, Eu9, Eu10, and Eu11 from the Middle East to Europe would not be compatible with the localization of peoples carrying these Y chromosomes to refuges during the LGM. If these haplotypes were present in Europe before the LGM, we would expect to see a differentiation between the European and Middle Eastern lineages because of temporal and spatial isolation. Unpublished data from a 49a,f system and seven short tandem repeats (STRs) in a large sample of these NRY haplotypes from Europe and the Middle East (19) have revealed that almost all the compound haplotypes observed in Europe were included in the smaller sample of the Middle East (19). A similar result was observed for mtDNA haplogroup J, which, although considered Paleolithic, is believed to have been introduced to Europe during the Neolithic (6). These observations suggest that the four NRY haplotypes, as well as mtDNA haplogroup J, had sufficient time to differentiate in the Middle East and then migrate toward Europe in sufficiently large numbers to account for most of the existing variation. Therefore, haplotypes Eu4, Eu9, Eu10, and Eu11 represent the male contribution of a demic diffusion of farmers from the Middle East to Europe. The contribution of the Neolithic farmers to the European gene pool seems to be more pronounced along the Mediterranean coast than in Central Europe. This is evident from Fig. 2, in which we have plotted the frequencies of haplotypes Eu4, Eu9, Eu10, and Eu11 against the geographic distances from the Middle East for each population. The regression line accounting for Mediterranean populations has a slope that is significantly different from the other populations, indicating that the diffusion of Neolithic farmers affected Southern more than Central Europe.

Figure 2

Abscissa: distances in thousands of kilometers of each population from the average of the two Middle Eastern populations (Lebanese and Syrians). Ordinate: logarithm of relative frequencies of Neolithic markers (sum of Eu4, Eu9, Eu10, and Eu11) in the Mediterranean and non-Mediterranean populations. The Middle Eastern point (X = 0) was considered for both series of points. The two regression lines are significantly different (P< 0.01).

While allelotyping M35 by DHPLC, we found a previously unknown mutation, M178, in 95% of all TAT chromosomes. The latter has been reported to be ∼4000 years old and marks a recent Uralic migration confined to Northern Europe (14). Neither TAT nor M178 was detected in Hungary, where a Uralic language is spoken.

The first two principal components (PC) derived from the data inTable 1 are shown in Fig. 3. The Udmurts, Mari, and Saami were excluded because they monopolized the first PC and compressed the rest of the variation because of their high TAT/M178 frequency. In the plot, it is possible to see three clusters of distinct geography and culture. The first comprises Basques and Western Europeans, the second Middle Eastern, and the third Eastern European populations from Croatia, Ukraine, Hungary, and Poland. These three geographic clusters correspond to the major glacial refuges and to the region of origin of the farmers' expansion.

Figure 3

PC analysis of data in Table 1. The first PC accounts for 46.24% of the variance, whereas the second accounts for 34.69%.

The most comprehensive previous survey of the European gene pool has been the PC analysis of 95 autosomal protein polymorphisms (5, 8). We compared the frequency distribution of the major Eu Y chromosome haplotypes with the first three PCs of Europe (Table 2). Because Sardinians were not included in the original PC analysis because of their pronounced outlier phylogenetic status (5), they were also excluded in our correlation analysis. The first PC, which was proposed to reflect the diffusion of Neolithic farmers (5, 8), correlates with Eu4, Eu9, Eu10, and Eu11. The second PC, whose meaning has never been fully assessed (5, 8), is correlated with the spread of Eu18 from Spain toward Central Europe and, on the opposite pole, with the spread of Uralic TAT/M178 (Eu13 and Eu14). The third PC, the meaning of which has been debated (3, 5, 8), correlates to the M17 mutation (Eu19). The concordance of protein-based PC and NRY data suggests that migration, more than natural selection, has influenced the pattern of NRY variation observed.

Table 2

Correlation between the first three PCs based on autosomal protein markers (5) and the frequency of the major European Y chromosome haplotypes.

View this table:

Analyses of mtDNA sequence variation in European populations have been conducted (6, 20). These data suggest that the gene pool has ∼80% Paleolithic and ∼20% Neolithic ancestry. Our data support this observation because haplotypes Eu4, Eu9, Eu10, and Eu11 account for ∼22% of European Y chromosomes. Thus, the mtDNA and Y data corroborate the previous observation that the first PC of the 95 classical polymorphisms accounts for ∼28% of the overall genetic variation (5, 6). However, some differences exist between the mtDNA and Y data pertaining to the putative Paleolithic components. It has been proposed that mtDNA haplogroup U5 arrived from the Middle East 45,000 years ago (6, 26). We did not detect any corresponding Y haplotypes. Furthermore, most European mtDNA lineages, which account for 60 to 70% of the variation in Europe, have been interpreted as having arrived from the Middle East during the Paleolithic about 25,000 years ago (6). Correspondingly, ∼20% of contemporary Y lineages characterized by the M170 mutation derive from deep phylogenetic M89 ancestry, consistent with a Middle Eastern Paleolithic heritage. Moreover, the remaining ∼50% of Y lineages associated with the M173 mutation indicate a major influence on the extant gene pool from Central Asia ∼30,000 years ago. In contrast, Central Asian mtDNA 16223/C haplogroups (I, X, and W) account for only ∼7% of the contemporary composition (26). These discrepancies may be due in part to the apparent more recent molecular age of Y chromosomes relative to other loci (27), suggesting more rapid replacement of previous Y chromosomes. Gender-based differential migratory demographic behaviors will also influence the observed patterns of mtDNA and Y variation (24).

The previously categorized Sardinians, Basques, and Saami outliers (5) share basically the same Y binary components of the other Europeans. Their peculiar position with respect to frequency is probably a consequence of genetic drift and isolation. In addition, our analysis highlights the expansion of the Epi-Gravettian population from the northern Balkans.

Almost all of the European Y chromosomes analyzed in the present study belong to 10 lineages characterized by simple biallelic mutations. Furthermore, a substantial portion of the European gene pool appears to be of Upper Paleolithic origin, but it was relocated after the end of the LGM, when most of Europe was repopulated (16).

  • * To whom correspondence should be addressed. E-mail: semino{at}

  • These authors contributed equally to this work.


View Abstract

Stay Connected to Science

Navigate This Article