Genomic structure in Europeans dating back at least 36,200 years

See allHide authors and affiliations

Science  28 Nov 2014:
Vol. 346, Issue 6213, pp. 1113-1118
DOI: 10.1126/science.aaa0114


The origin of contemporary Europeans remains contentious. We obtained a genome sequence from Kostenki 14 in European Russia dating from 38,700 to 36,200 years ago, one of the oldest fossils of anatomically modern humans from Europe. We find that Kostenki 14 shares a close ancestry with the 24,000-year-old Mal’ta boy from central Siberia, European Mesolithic hunter-gatherers, some contemporary western Siberians, and many Europeans, but not eastern Asians. Additionally, the Kostenki 14 genome shows evidence of shared ancestry with a population basal to all Eurasians that also relates to later European Neolithic farmers. We find that Kostenki 14 contains more Neandertal DNA that is contained in longer tracts than present Europeans. Our findings reveal the timing of divergence of western Eurasians and East Asians to be more than 36,200 years ago and that European genomic structure today dates back to the Upper Paleolithic and derives from a metapopulation that at times stretched from Europe to central Asia.

Secrets of human ancestor evolution revealed

Studies of ancient humans help us understand the movement and evolution of modern populations of humans. Seguin-Orlando et al. present the genome of an ancient individual, K14, from northern Russia who lived over 36,000 years ago. K14 is more similar to west Eurasians and Europeans than to east Asians, indicating that these populations had already diverged.

Science, this issue p. 1113

The ancestors of contemporary Eurasians are believed to have left Africa some 60,000 to 50,000 years ago (60 to 50 ka) (1, 2), possibly 30,000 to 40,000 years later than Australo-Melanesian ancestors (3). Despite controversies about routes out of Africa, the first Upper Paleolithic (UP) industries of Eurasia are found in the Levant from ~48 ka (4, 5). Expansion into Europe took place through multiple events that by ~40 ka had generated a spatially and culturally structured anatomically modern human (AMH) population—from Russia (6) to Georgia (7), Bulgaria (8), southern Europe (9, 10), and the United Kingdom (11). The few AMH fossils associated with these initial UP industries are morphologically variable (9, 1217). In western Eurasia, the distinctive Aurignacian toolkit, first observed at Willendorf (Austria) by 43.5 ka (18), became predominant across the earlier range by 39 ka. Although analyses of ancient human genomes have advanced our understanding of the European past, revealing contributions from Paleolithic Siberians, European Mesolithic, and Near Eastern Neolithic groups to the European gene pool (1923), the possible contribution of the earliest Eurasians to these later cultures and to contemporary human populations remains unknown. To investigate this, we sequenced the genome of Kostenki 14 (K14, Markina Gora) (Fig. 1A).

Fig. 1 Sampling locations and genomic affinities of K14 and other ancient genomes.

(A) Location of Kostenki and the samples analyzed in this study. Kostenki (K14) is shown in red, while comparative ancient samples are shown in blue. (B) Admixture proportions for the ancient genomes, assuming nine ancestral components for a clustering analysis in a set of modern worldwide populations. We labeled the components according to the modern populations in which they are maximized for all but one case: The yellow component that we label HG is maximized in eastern Europeans. NEOL: Neolithic farmers. (C) Shared drift between K14 and a set of worldwide populations. For every modern population X on the map, we compute f3(Mbuti Pygmy; K14, X). The warmer colors indicate increased shared ancestry. (D) Shared drift between K14 and a set of European populations. This figure is a close-up of (C).

The locality of Kostenki-Borshchevo on the Middle Don River, Russia, has one of the most extensive Paleolithic records in eastern Europe. The K14 human skeleton was excavated in 1954 (24) and was recently dated to 33,250 ± 500 radiocarbon years before the present (B.P.) (25), 38.7 to 36.2 thousand calendar years B.P. (ky cal B.P.), in agreement with the stratigraphic position of the burial that cuts into the Campanian Ignimbrite ash layer dated to ~39.3 ky cal B.P. (26). Below the skeleton, there is a distinctive early UP industry, with end scrapers, burins, prismatic cores, and bone artifacts (layer IV); the cultural layer above (layer III) has a regionally local character (27, 28) [supplementary materials (SM) S1 and S2].

We performed 13 DNA extractions from a total of 1.285 g of the left tibia (dorsal side of the shaft), using two extraction methods based on silica purification (29, 30). We first constructed seven Illumina libraries and validated the presence of typical signatures of postmortem DNA damage, using a fraction of DNA extracts (SM S3). The remaining extracts were built into 63 libraries after enzymatic uracil-specific excision reagent (USER) treatment to limit the effect of nucleotide misincorporations in downstream analyses (31) (table S2). Additionally, a limited fraction of two DNA extracts was purified for methylated DNA fragments using methyl binding domain (MBD) enrichment (32) before USER treatment and library building, for a total of eight DNA libraries. Following stringent quality criteria for read alignment, we identified a total number of 175.2 million unique reads aligning against the human reference genome hg19, representing an average depth of coverage of 2.84X (SM S4). The eight USER-treated DNA libraries that exhibited limited error rates and contamination levels were selected for further analyses. This restricted the data set to 148.9 million unique reads, representing a final depth of coverage of 2.42X. We exploited the fact that K14 was a male and used the heterozygosity levels present in the X chromosome to estimate overall levels of contamination around 2.0% (SM S5 and S6 and table S5). The population genetics analyses results are robust to contamination of that level. In particular, we replicated the main analyses with selected libraries with varying contamination levels and observed no qualitative effect on the results (see SM S9 for details).

Mitochondrial analyses confirmed the sequence previously reported for K14 [haplogroup U2 (33)], which supports data authenticity. The Y chromosome belongs to haplogroup C M130, the same as in La Braña—a late Mesolithic hunter-gatherer (MHG) from northern Spain (22) (SM S7).

To identify patterns of shared ancestry and admixture among K14, other ancient genomes and contemporary Eurasians [based on a single-nucleotide polymorphism (SNP) array panel of 2091 individuals from 167 populations], we carried out a series of analyses—model-based clustering and principal component analysis (PCA)—to show the contribution of diverse genetic components within K14: D statistics to explore the affinity of K14 to pairs of populations (using Mbuti Pygmy as an outgroup); f4 statistics to test whether a given modern population is equidistant to an ancient individual and a particular recent group (here, Sardinians), given an outgroup (here, Papuans); and f3 statistics to explore both patterns of admixture (“admixture” f3) and shared ancestry (“outgroup” f3). Key results were also replicated using two whole-genome sequencing data sets of modern individuals from worldwide populations (23, 34).

Model-based clustering analyses (35) show that K14 has different genetic components of substantial size (Fig. 1B and SM S10), suggesting the sharing of sets of alleles with different Eurasian groups. The largest fraction of K14’s ancestry derives from a component that is maximized in European MHGs and also predominant in contemporary northern and eastern Europeans. The genetic affinity of K14 to contemporary Europeans is also observed using outgroup f3 statistics (36). Using Mbuti Pygmy as an outgroup, we find that among a panel of 167 contemporary populations, Europeans have the greatest affinity (i.e., the largest f3) to K14 (Fig. 1C). This conclusion is also formally supported by comparing pairs of populations to K14 using the D statistics of the form D(Mbuti Pygmy, K14; Population 1, Population 2). This statistic is expected to be equal to zero if K14 is symmetrically related to Population 1 and Population 2, whereas its expectation is negative (positive) if K14 is more closely related to Population 1 (Population 2). For pairs of populations involving East Asians (Population 1) and Europeans (Population 2), K14 is always significantly more closely related to Europeans [e.g., Z = 12.1 (Han and Lithuanians)], in all data sets analyzed (SM S9 and table S7). We also confirmed that these results are robust to possible contamination from a modern DNA source by filtering for reads with a high likelihood of ancient DNA using a model-based approach (37), as well as calculating contamination-corrected D statistics (23) (SM S9 and fig. S18).

Within Europe, northern Europeans show the closest affinity to K14, based both on the f3 (Fig. 1D) and D statistics [e.g., Z = 6.7 for Sardinians and Lithuanians; table S7 and fig. S16]. This pattern closely resembles that of European MHGs (La Braña, Ajv58, Loschbour, and Motala) and Mal’ta (MA1) (figs. S14 and S15), with the exception of the latter’s strong genetic affinity with Native Americans, which is unique to that individual. Furthermore, a direct comparison to ancient genomes in the outgroup f3 statistics shows that K14 has a higher affinity with MHGs (Loschbour and La Braña) than any other ancient individual or contemporary population (fig. S14). Together with the rare Y chromosome lineage shared with La Braña, these results provide strong evidence of shared ancestry and extensive gene flow between UP West Eurasian people related to K14 and European MHGs and their contemporary European descendants.

An interpretation of the above results would be that K14 is an early member of a lineage leading to western Eurasian MHGs after their split from the proposed ancestral northern Eurasian lineage, including MA1. However, D statistics of the form D(Mbuti Pygmy, Modern; Ancient, K14)—which test whether K14 and an ancient individual form a clade with respect to a modern population—reject this simple tree-like relationship. We find that all contemporary non-Africans, except Australo-Melanesians, are closer to either MA1 or MHGs than to K14 [e.g., Z = –5.3 for D (Mbuti and Han; Loschbour and K14); SM S9, table S10, and fig. S19]. This would suggest a basal position of K14 with respect to MHGs and ancient north Eurasians, which is also shown in admixture graphs using TreeMix (SOM S12 and figs. S24 and S25). In addition, a sizeable component of K14’s ancestry observed in the model-based clustering analyses is predominant in contemporary Middle Eastern/Caucasus (ME/C) populations and Neolithic ancient genomes (NEOL) (Gok2, Iceman, and Stuttgart) but absent in MA1 or MHGs (Fig. 1B and fig. S20). This component has been associated with a suggested “basal Eurasian” lineage contributing to NEOL to explain an observed increase in allele sharing between MHGs/MA1 and East Asians compared with NEOL (21). Because K14 shows the same pattern as NEOL, a parsimonious explanation would be that K14 also derives some ancestry from a related basal Eurasian lineage. Consistent with this hypothesis, we find that East Asians are equally distant to NEOL and K14, using D statistics as described above [e.g., Z = 0.0 for D (Mbuti, Han; Stuttgart, K14); tables S10 and S11]. This suggests that the main ancestral components proposed for contemporary Europeans, including the Middle Eastern component commonly attributed to the expansion of early farmers within Europe, were likely already genetically differentiated and related through complex gene flow by the time of K14, at least 36.2 ka (Fig. 2).

Fig. 2 Relationships of the K14 sample and MA1, MHG, NEOL, modern Europeans, and the modern populations in the Yenisei region.

This representation is a possible topology consistent with the results presented in this study in the context of the relationships described by Lazaridis et al. (21) for the modern European populations and Raghavan et al. (23) for MA1. Present-day populations are colored in blue, ancient poplation in red, and ancestral populations in green. Solid lines represent descent without admixture events, and dashed lines show admixture events. Arrows do not depart from ancient samples (K14 and MA1) because they represent relationships of population ancestry. We only show the topology of the potential population tree: There is no notion of time in this representation. The tree is not the result of a model-fitting procedure but rather a possible topology consistent with the key results (A, B, and C) of this study.

We further investigated the relationship of K14 and the other ancient genomes to East Asian and Siberian populations using f4 statistics f4(Sardinian, Ancient; Modern, Papuan), which measure whether a modern population shares more alleles with contemporary Europeans or an ancient genome. We find that all Siberian and East Asians are equally distant from western MHGs (all |Z| < 1.9) (Fig. 3D and table S12), supporting the postulated early split between East Asians and western Eurasians. In contrast to MHGs and MA1, all Siberian populations are genetically closer to contemporary Europeans (Sardinians) than to K14 (3.1 < |Z| < 9.9) (table S12), particularly those from the Yenisei and Ob’ basins (e.g., Shors, Z = 8.0) (Fig. 3A). Furthermore, these populations derive parts of their ancestry from a European hunter-gatherer (HG) component inferred in the clustering analysis (Fig. 1D and fig. S20), with populations showing a higher HG ancestry proportion also being closer to contemporary Europeans, using the f4 statistic (Spearman ρ = 0.96; P = 3.0 × 10−18) (Fig. 3D and table S13). Notably, the opposite pattern is observed with Scandinavian MHGs (Ajv58 and Motala), where the same populations tend to share more alleles with MHGs than contemporary Europeans and the HG component is negatively correlated with f4 (e.g., Motala ρ = –0.85; P = 6.2 × 10−10) (Fig. 3, C and D). Calculating admixture f3 statistics, we find significant evidence for admixture in those populations, with a variety of Siberian and European source populations. The best pair of source populations (i.e., the most negative f3 statistic) involves Swedish MHGs (Motala) and Evens (a northeast Siberian population) [e.g., f3 (Shors; Evens, Motala) = –0.012; Z = –9.1] (table S14). Altogether, these results suggest that contemporary Siberian populations from the Yenisei basin derive part of their gene pool from a Eurasian HG population that shares ancestry with K14 but is more closely related to Scandinavian MHGs than to either MA1 or western European MHGs, indicating gene flow between their ancestors and Scandinavian Europe after K14 but before the Mesolithic (between 36.3 and 7 ky B.P.).

Fig. 3 Relationships of K14 and other HG genomes with contemporary East Asian and Siberian populations.

(A) Values of the f4 statistic for a set of Siberian and East Asian populations and K14. We compute the f4 statistic for a topology (Sardinian, K14; X, Papuan). Warmer values indicate departure from the topology (Sardinian, K14; X, Papuan) with increased ancestry between the modern population X and the Sardinian. The Yenisei region includes the Selkup, Shor, and Ket populations. (B) Values of the f4 statistic for a set of Siberian and East Asian populations and MA1. We compute the f4 statistic for a topology (Sardinian, MA1; X, Papuan). (C) Values of the f4 statistic for a set of Siberian and East Asian populations and Scandinavian hunter gatherers (Motala). (D) Relationship between the HG admixture proportion and the f4(Sardinian, K14; X, Papuan) shown in (A). The red lines are linear regressions for each case.

Finally, we estimated levels of Neandertal ancestry in K14 using f4-ratio statistics (38). Our estimates are consistent with previous analyses (34) showing a Neandertal contribution lower than 2% for most individuals (Fig. 4A). However, both La Braña and K14 show slightly elevated levels, with an estimated 2.4 ± 0.4% in K14 (tables S15 and S16). Restricting this analysis to genomic regions without evidence for Neandertal introgressed haplotypes in contemporary humans (38, 39) results in 0% estimated ancestry for most individuals except K14, where 0.9 ± 0.4% Neandertal ancestry is still detected (tables S17 and S18). The difference between K14 and modern genomes could be caused by several factors, including sampling effects and genetic drift, natural selection as argued in (38, 39), or by the effects of additional Neandertal admixture not represented in the modern gene pool. We next compared the size distribution of genomic tracts of archaic hominin origin in K14 and other ancient individuals (Fig. 4B) by identifying genomic regions with high frequencies of archaic alleles at sites where all modern Africans carry the ancestral allele. The length of Neandertal tracts was higher in K14 than in other ancient individuals, with the longest tract totaling ~3 Mb on chromosome 6 (Fig. 4C). This is consistent with K14 being closer to the time of the admixture event with Neandertals, and carrying longer archaic tracts that have been affected by less recombination, than in the other ~11,000- to 30,000-year-old younger ancient genomes. We then used the length distribution of shared ancestry to estimate the admixture time of Neandertals and humans based on the K14 sample and obtained an estimate of ~54,000 years (S15). We note that genomic data from a 45,000-year-old modern human from Siberia, which was published during the review process of this study, also shows longer segments of Neandertal ancestry, further supporting our conclusions (40). Because of the divergent position of the K14 sample, we also examined whether it contained any fragments of introgressed DNA from other previously unsampled hominins. However, the distribution of tracts of divergent DNA provides no evidence for additional divergent introgressed DNA (S14).

Fig. 4 Neandertal admixture in K14 and other ancient genomes.

(A) Neandertal admixture proportions for the modern and ancient individuals from Eurasia. (B) Ancestry tract length distribution for tracts identified as Neandertal through a sliding-window approach. The sites are ascertained to be ancestral in the African populations. For each non-African, the tracts are identified as the regions where sites are derived in Neandertal and the individual shown in X. (C) The longest Neandertal haplotype identified in K14 through a sliding-window approach. Individuals were clustered using hierarchical clustering on the genotype matrix for the region. Missing data are shown in white; gray indicates homozygous ancestral, blue heterozygote-derived, and black homozygous-derived.

Several studies have reported on the basal genetic distinctiveness between western Eurasian and eastern Asian populations, as well as between all Eurasians and Australo-Melanesians (4143). Our results show no close genetic relationship between K14 and Australo-Melanesians and support earlier studies that suggest Australo-Melanesians derive part of their ancestry from an early population divergence that predates the separation of Europeans and East Asians (3). The K14 genome shows that this early UP individual was clearly part of a western Eurasian lineage that had already diverged from eastern Asians, thus establishing a minimum date for that separation at least 36.2 ka. The fact that the limited genomic information on the ~40 ka Tianyuan modern human from China clusters with contemporary East Asian populations (44) suggests an even earlier date.

Our results further suggest that the early stages of the western Eurasian lineage were already complex (see also Fig. 2). Besides its core affinities with subsequent European groups, K14 also shares alleles with European Neolithic farmers and contemporary people from the Middle East/Caucasus, which are not found in MA1 and western European MHGs, indicating genetic exchange between K14 and a Basal Eurasian Lineage (which eventually contributed to Neolithic groups) after the ancestors of MA1 and subsequent European MHGs had diverged. This implies that early AMH populations became structured early in their history, but in the UP already contained the major genetic components found in Europeans today. As such, our findings show the existence of a metapopulation structure in Europe from the Upper Paleolithic onward, remnants of which are still found today despite migrations to and from Europe since the UP. The early UP contribution is greater among northern than southern Europeans, in agreement with the southeast to west and north gene flow cline resulting from the expansion of Neolithic famers 9 to 6 ky cal B.P. (20, 45). However, descendants of the early UP population represented by K14 likely also contributed genes to western Siberian groups living around the mouth of the Yenisei River. Therefore, our findings support the view that these Uralic-speaking populations represent an ancient admixture between European and East Asian lineages. The recently proposed Holocene gene flow from East Asians into northern Europeans (21) can, in our view, be equally well explained by population structure of the hunter-gatherer metapopulation within Europe. As such, our results paint an increasingly complex picture of colonization history of Europe from the UP to today. Instead of inferring a few discrete migration events from Asia into Europe, we now see evidence that humans in Western Eurasia formed a large metapopulation with gene flow in multiple directions occurring repeatedly and perhaps continuously.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S26

Tables S1 to S18

References (46183)

References and Notes

  1. Acknowledgments: We thank J. F. Hoffecker for help and discussion and the Danish National Sequencing Centre, especially C. Mortensen and K. Magnussen, for technical assistance. We also thank D. Poznik for providing the chromosome Y mask file and table of informative SNPs. GeoGenetics members were supported by the Lundbeck Foundation and the Danish National Research Foundation (DNRF94). A.-S.M. was supported by the Swiss National Science Foundation (PBSKP3_143529). Research on the archaeological background by P.R.N. was supported by a Marie Curie Career Integration Grant (322261). M.W. and D.L. thank the Australian Research Council for support. Data for this study are available under accession no. PRJEB7618 from the European Nucleotide Archive ( Source code for the main analyses in this study is available at GitHub (
View Abstract

Navigate This Article