Research Article

The first horse herders and the impact of early Bronze Age steppe expansions into Asia

See allHide authors and affiliations

Science  29 Jun 2018:
Vol. 360, Issue 6396, eaar7711
DOI: 10.1126/science.aar7711

Ancient steppes for human equestrians

The Eurasian steppes reach from the Ukraine in Europe to Mongolia and China. Over the past 5000 years, these flat grasslands were thought to be the route for the ebb and flow of migrant humans, their horses, and their languages. de Barros Damgaard et al. probed whole-genome sequences from the remains of 74 individuals found across this region. Although there is evidence for migration into Europe from the steppes, the details of human movements are complex and involve independent acquisitions of horse cultures. Furthermore, it appears that the Indo-European Hittite language derived from Anatolia, not the steppes. The steppe people seem not to have penetrated South Asia. Genetic evidence indicates an independent history involving western Eurasian admixture into ancient South Asian peoples.

Science, this issue p. eaar7711

Structured Abstract


According to the commonly accepted “steppe hypothesis,” the initial spread of Indo-European (IE) languages into both Europe and Asia took place with migrations of Early Bronze Age Yamnaya pastoralists from the Pontic-Caspian steppe. This is believed to have been enabled by horse domestication, which revolutionized transport and warfare. Although in Europe there is much support for the steppe hypothesis, the impact of Early Bronze Age Western steppe pastoralists in Asia, including Anatolia and South Asia, remains less well understood, with limited archaeological evidence for their presence. Furthermore, the earliest secure evidence of horse husbandry comes from the Botai culture of Central Asia, whereas direct evidence for Yamnaya equestrianism remains elusive.


We investigated the genetic impact of Early Bronze Age migrations into Asia and interpret our findings in relation to the steppe hypothesis and early spread of IE languages. We generated whole-genome shotgun sequence data (~1 to 25 X average coverage) for 74 ancient individuals from Inner Asia and Anatolia, as well as 41 high-coverage present-day genomes from 17 Central Asian ethnicities.


We show that the population at Botai associated with the earliest evidence for horse husbandry derived from an ancient hunter-gatherer ancestry previously seen in the Upper Paleolithic Mal’ta (MA1) and was deeply diverged from the Western steppe pastoralists. They form part of a previously undescribed west-to-east cline of Holocene prehistoric steppe genetic ancestry in which Botai, Central Asians, and Baikal groups can be modeled with different amounts of Eastern hunter-gatherer (EHG) and Ancient East Asian genetic ancestry represented by Baikal_EN.

In Anatolia, Bronze Age samples, including from Hittite speaking settlements associated with the first written evidence of IE languages, show genetic continuity with preceding Anatolian Copper Age (CA) samples and have substantial Caucasian hunter-gatherer (CHG)–related ancestry but no evidence of direct steppe admixture.

In South Asia, we identified at least two distinct waves of admixture from the west, the first occurring from a source related to the Copper Age Namazga farming culture from the southern edge of the steppe, who exhibit both the Iranian and the EHG components found in many contemporary Pakistani and Indian groups from across the subcontinent. The second came from Late Bronze Age steppe sources, with a genetic impact that is more localized in the north and west.


Our findings reveal that the early spread of Yamnaya Bronze Age pastoralists had limited genetic impact in Anatolia as well as Central and South Asia. As such, the Asian story of Early Bronze Age expansions differs from that of Europe. Intriguingly, we find that direct descendants of Upper Paleolithic hunter-gatherers of Central Asia, now extinct as a separate lineage, survived well into the Bronze Age. These groups likely engaged in early horse domestication as a prey-route transition from hunting to herding, as otherwise seen for reindeer. Our findings further suggest that West Eurasian ancestry entered South Asia before and after, rather than during, the initial expansion of western steppe pastoralists, with the later event consistent with a Late Bronze Age entry of IE languages into South Asia. Finally, the lack of steppe ancestry in samples from Anatolia indicates that the spread of the earliest branch of IE languages into that region was not associated with a major population migration from the steppe.

Model-based admixture proportions for selected ancient and present-day individuals, assuming K = 6, shown with their corresponding geographical locations.

Ancient groups are represented by larger admixture plots, with those sequenced in the present work surrounded by black borders and others used for providing context with blue borders. Present-day South Asian groups are represented by smaller admixture plots with dark red borders.


The Yamnaya expansions from the western steppe into Europe and Asia during the Early Bronze Age (~3000 BCE) are believed to have brought with them Indo-European languages and possibly horse husbandry. We analyzed 74 ancient whole-genome sequences from across Inner Asia and Anatolia and show that the Botai people associated with the earliest horse husbandry derived from a hunter-gatherer population deeply diverged from the Yamnaya. Our results also suggest distinct migrations bringing West Eurasian ancestry into South Asia before and after, but not at the time of, Yamnaya culture. We find no evidence of steppe ancestry in Bronze Age Anatolia from when Indo-European languages are attested there. Thus, in contrast to Europe, Early Bronze Age Yamnaya-related migrations had limited direct genetic impact in Asia.

The vast grasslands making up the Eurasian steppe zones, from Ukraine through Kazakhstan to Mongolia, have served as a crossroad for human population movements during the last 5000 years (13), but the dynamics of its human occupation—especially of the earliest period—remain poorly understood. The domestication of the horse at the transition from the Copper Age to the Bronze Age, ~3000 BCE, enhanced human mobility (4, 5) and may have triggered waves of migration. According to the “steppe hypothesis,” this expansion of groups in the western steppe related to the Yamnaya and Afanasievo cultures was associated with the spread of Indo-European (IE) languages into Europe and Asia (1, 2, 4, 6). The peoples who formed the Yamnaya and Afanasievo cultures belonged to the same genetically homogeneous population, with direct ancestry attributed to both Copper Age (CA) western steppe pastoralists, descending primarily from the European Eastern hunter-gatherers (EHG) of the Mesolithic and to Caucasian groups (1, 2) related to Caucasus hunter-gatherers (CHG) (7).

Within Europe, the steppe hypothesis is supported by the reconstruction of Proto-IE (PIE) vocabulary (8), as well as by archaeological and genomic evidence of human mobility and Early Bronze Age (3000 to 2500 BCE) cultural dynamics (9). For Asia, however, several conflicting interpretations have long been debated. These concern the origins and genetic composition of the local Asian populations encountered by the Yamnaya- and Afanasievo-related populations, including the groups associated with Botai, a site that offers the earliest evidence for horse husbandry (10). In contrast, the more western sites that have been supposed by some to reflect the use of horses in the Copper Age (4) lack direct evidence of domesticated horses. Even the later use of horses among Yamnaya pastoralists has been questioned by some (11) despite the key role of horses in the steppe hypothesis. Furthermore, genetic, archaeological, and linguistic hypotheses diverge on the timing and processes by which steppe genetic ancestry and the IE languages spread into South Asia (4, 6, 12). Similarly, in present-day Turkey, the emergence of the Anatolian IE language branch, including the Hittite language, remains enigmatic, with conflicting hypotheses about population migrations leading to its emergence in Anatolia (4, 13).

Ancient genomes inform upon human movements within Asia

We analyzed whole-genome sequence data of 74 ancient humans (14, 15) (tables S1 to S3) ranging from the Mesolithic (~9000 BCE) to Medieval times, spanning ~5000 km across Eastern Europe, Central Asia, and Western Asia (Anatolia) (Fig. 1). Our genome data includes 3 Copper Age individuals (~3500 to 3300 BCE) from Botai in northern Kazakhstan (Botai_CA; 13.6X, 3.7X, and 3X coverage, respectively); 1 Early Bronze Age (~2900 BCE) Yamnaya sample from Karagash, Kazakhstan (16) (YamnayaKaragash_EBA; 25.2X); 1 Mesolithic (~9000 BCE) EHG from Sidelkino, Russia (SidelkinoEHG_ML; 2.9X); 2 Early/Middle Bronze Age (~2200 BCE) central steppe individuals (~4200 BP) (CentralSteppe_EMBA; 4.5X and 9.1X average coverage, respectively) from burials at Sholpan and Gregorievka that display cultural similarities to Yamnaya and Afanasievo (12); 19 individuals of the Bronze Age (~2500 to 2000 BCE) Okunevo culture of the Minusinsk Basin in the Altai region (Okunevo_EMBA; ~1X average coverage; 0.1 to 4.6X); 31 Baikal hunter-gatherer genomes (~1X average coverage; 0.2 to 4.5X) from the cis-Baikal region bordering on Mongolia and ranging in time from the Early Neolithic (~5200 to 4200 BCE; Baikal_EN) to the Early Bronze Age (~2200 to 1800 BCE; Baikal_EBA); 4 Copper Age individuals (~3300 to 3200 BCE; Namazga_CA; ~1X average coverage; 0.1 to 2.2X) from Kara-Depe and Geoksur in the Kopet Dag piedmont strip of Turkmenistan, affiliated with the period III cultural layers at Namazga-Depe (fig. S1), plus 1 Iron Age individual (Turkmenistan_IA; 2.5X) from Takhirbai in the same area dated to ~800 BCE; and 12 individuals from Central Turkey (figs. S2 to S4), spanning from the Early Bronze Age (~2200 BCE; Anatolia_EBA) to the Iron Age (~600 BCE; Anatolia_IA), and including 5 individuals from presumed Hittite-speaking settlements (~1600 BCE; Anatolia_MLBA), and 2 individuals dated to the Ottoman Empire (1500 CE; Anatolia_Ottoman; 0.3 to 0.9X). All the population labels including those referring to previously published ancient samples are listed in table S4 for contextualization. Additionally, we sequenced 41 high-coverage (30X) present-day Central Asian genomes, representing 17 self-declared ethnicities (fig. S5), and collected and genotyped 140 individuals from five IE-speaking populations in northern Pakistan.

Fig. 1 Geographic location and dates of ancient samples.

(A) Location of the 74 samples from the steppe, Lake Baikal region, Turkmenistan, and Anatolia analyzed in the present study. MA1, KK1, and Xiongnu_IA were previously published. Geographical background colors indicate the western steppe (pink), central steppe (orange) and eastern steppe (gray). (B) Timeline in years before present (BP) for each sample. ML, Mesolithic; EHG, Eastern hunter-gatherer; EN, Early Neolithic; LN, Late Neolithic; CA, Copper Age; EBA, Early Bronze Age; EMBA, Early/Middle Bronze Age; MLBA, Middle/Late Bronze Age; IA, Iron Age.

Tests indicated that the contamination proportion of the data was negligible (14) (see table S1), and we removed related individuals from frequency-based statistics (fig. S6 and table S5). Our high-coverage Yamnaya genome from Karagash is consistent with previously published Yamnaya and Afanasievo genomes, and our Sidelkino genome is consistent with previously published EHG genomes, on the basis that there is no statistically significant deviation from 0 of D statistics of the form D(Test, Mbuti; SidelkinoEHG_ML, EHG) (fig. S7) or of the form D(Test, Mbuti; YamnayaKaragash_EBA, Yamnaya) (fig. S8; additional D statistics shown in figs. S9 to S12).

Genetic origins of local Inner Asian populations

In the Early Bronze Age, ~3000 BCE, the Afanasievo culture was formed in the Altai region by people related to the Yamnaya, who migrated 3000 km across the central steppe from the western steppe (1) and are often identified as the ancestors of the IE-speaking Tocharians of first-millennium northwestern China (4, 6). At this time, the region they passed through was populated by horse hunter-herders (4, 10, 17), while further east the Baikal region hosted groups that had remained hunter-gatherers since the Paleolithic (1822). Subsequently, the Okunevo culture replaced the Afanasievo culture. The genetic origins and relationships of these peoples have been largely unknown (23, 24).

To address these issues, we characterized the genomic ancestry of the local Inner Asian populations around the time of the Yamnaya and Afanasievo expansion. Comparing our ancient samples to a range of present-day and ancient samples with principal components analysis (PCA), we find that the Botai_CA, CentralSteppe_EMBA, Okunevo_EMBA, and Baikal populations (Baikal_EN and Baikal_EBA) are distributed along a previously undescribed genetic cline. This cline extends from the EHG of the western steppe to the Bronze Age (~2000 to 1800 BCE) and Neolithic (~5200 to 4200 BCE) hunter-gatherers of Lake Baikal in Central Asia, which are located on the PCA plot close to modern East Asians and two Early Neolithic (~5700 BCE) Devil’s Gate samples (25) (Fig. 2 and fig. S13). In accordance with their position along the west-to-east gradient in the PCA, increased East Asian ancestry is evident in ADMIXTURE model-based clustering (Fig. 3 and figs. S14 and S15) and by D statistics for Sholpan and Gregorievka (CentralSteppe_EMBA) and Okunevo_EMBA, relative to Botai_CA and the Baikal_EN sample: D(Baikal_EN, Mbuti; Botai_CA, Okunevo_EMBA) = –0.025 Z = –12; D(Baikal_EN, Mbuti; Botai_CA, Sholpan) = –0.028 Z = –8.34; D(Baikal_EN, Mbuti; Botai_CA, Gregorievka) = –0.026 Z = –7.1. The position of this cline suggests that the central steppe Bronze Age populations all form a continuation of the Ancient North Eurasian (ANE) population, previously known from the 24,000-year-old Mal’ta (MA1), the 17,000-year-old AG-2 (26), and the ~14,700-year-old AG-3 (27) individuals from Siberia.

Fig. 2 Principal component analyses using ancient and present-day genetic data.

(A) PCA of ancient and modern Eurasian populations. The ancient steppe ancestry cline from EHG to Baikal_EN is visible at the top outside present-day variation, whereas the YamnayaKaragash_EBA sample has additional CHG ancestry and locates to the left with other Yamnaya and Afanasievo samples. Additionally, a shift in ancestry is observed between the Baikal_EN and Baikal_LNBA, consistent with an increase in ANE-related ancestry in Baikal_LNBA. (B) PCA estimated with a subset of Eurasian ancient individuals from the steppe, Iran, and Anatolia as well as present-day South Asian populations. PC1 and PC2 broadly reflect west-east and north-south geography, respectively. Multiple clines of different ancestry are seen in the South Asians, with a prominent cline even within Dravidians in the direction of the Namazga_CA group, which is positioned above Iranian Neolithic in the direction of EHG. In the later Turkmenistan_IA sample, this shift is more pronounced and toward Steppe EBA and MLBA. The Anatolia_CA, EBA, and MLBA samples are all between Anatolia Neolithic and CHG, not in the direction of steppe samples.

Fig. 3 Model-based clustering analysis of present-day and ancient individuals assuming K = 6 ancestral components.

The main ancestry components at K = 6 correlate well with CHG (turquoise), a major component of Iran_N, Namazga_CA and South Asian clines; EHG (pale blue), a component of the steppe cline and present in South Asia; East Asia (yellow ochre), the other component of the steppe cline also in Tibeto-Burman South Asian populations; South Indian (pink), a core component of South Asian populations; Anatolian_N (purple), an important component of Anatolian Bronze Age and Steppe_MLBA; Onge (dark pink) forms its own component.

To investigate ancestral relationships between these populations, we used coalescent modeling with the momi (Moran Models for Inference) program (28) (Fig. 4, figs. S16 to S22, and tables S6 to S11). This exploits the full joint-site frequency spectrum and can separate genetic drift into divergence-time and population-size components, in comparison to PCA, admixture, and qpAdm approaches, which are based on pairwise covariances. We find that Botai_CA, CentralSteppe_EMBA, Okunevo_EMBA, and Baikal populations are deeply separated from other ancient and present-day populations and are best modeled as mixtures in different proportions of ANE ancestry and an Ancient East Asian (AEA) ancestry component represented by Baikal_EN, with mixing times dated to ~5000 BCE. Although some modern Siberian samples lie under the Baikal samples in Fig. 2A, these are separated out in a more limited PCA, involving just those populations and the ancient samples (fig. S23). Our momi model infers that the ANE lineage separated ~15,000 years ago in the Upper Paleolithic from the EHG lineage to the west, with no independent drift assigned to MA1. This suggests that MA1 may represent their common ancestor. Similarly, the AEA lineage to the east also separated ~15,000 years ago, with the component that leads to Baikal_EN and the AEA component of the steppe separating from the lineage leading to present-day East Asian populations represented by Han Chinese (figs. S19 to S21). The ANE and AEA lineages themselves are estimated as having separated approximately 40,000 years ago, relatively soon after the peopling of Eurasia by modern humans.

Fig. 4 Demographic model of 10 populations inferred by maximizing the likelihood of the site frequency spectrum (implemented in momi).

We used 300 parametric bootstrap simulations (shown in gray transparency) to estimate uncertainty. Bootstrap estimates for the bias and standard deviation of admixture proportions are listed beneath their point estimates. The uncertainty may be underestimated here, due to simplifications or additional uncertainty in the model specification.

Because the ANE MA1 sample comes from the same cis-Baikal region as the AEA-derived Neolithic samples analyzed here, we document evidence for a population replacement between the Paleolithic and the Neolithic in this region. Furthermore, we observe a shift in genetic ancestry between the Early Neolithic (Baikal_EN) and the Late Neolithic/Bronze Age hunter-gatherers (Baikal_LNBA) (Fig. 2A), with the Baikal_LNBA cluster showing admixture from an ANE-related source. We estimate the ANE related ancestry in the Baikal_LNBA to be ~5 to 11% (qpAdm) (table S12) (2), using MA1 as a source of ANE, Baikal_EN as a source of AEA, and a set of six outgroups. However, neither MA1 nor any of the other steppe populations lie in the direction of Baikal_LNBA from Baikal_EN on the PCA plot (fig. S23). This suggests that the new ANE ancestry in Baikal_LNBA stems from an unsampled source. Given that this source may have harbored East Asian ancestry, the contribution may be larger than 10%.

These serial changes in the Baikal populations are reflected in Y-chromosome lineages (Fig. 5A, figs. S24 to S27, and tables S13 and S14). MA1 carries the R haplogroup, whereas the majority of Baikal_EN males belong to N lineages, which were widely distributed across Northern Eurasia (29), and the Baikal_LNBA males all carry Q haplogroups, as do most of the Okunevo_EMBA as well as some present-day Central Asians and Siberians. Mitochondrial haplogroups show less turnover (Fig. 5B and table S15), which could either indicate male-mediated admixture or reflect bottlenecks in the male population.

Fig. 5 Y-chromosome and mitochondrial lineages identified in ancient and present-day individuals.

(A) Maximum likelihood Y-chromosome phylogenetic tree estimated with data from 109 high-coverage samples. Dashed lines represent the upper bound for the inclusion of 42 low-coverage ancient samples in specific Y-chromosome clades on the basis of the lineages identified. (B) Maximum likelihood mitochondrial phylogenetic tree estimated with 182 present-day and ancient individuals. The phylogenies displayed were restricted to a subset of clades relevant to the present work. Columns represent archaeological groups analyzed in the present study, ordered by time, and colored areas indicate membership of the major Y-chromosome and mitochondrial DNA (mtDNA) haplogroups.

The deep population structure among the local populations in Inner Asia around the Copper Age/Bronze Age transition is in line with distinct origins of central steppe hunter-herders related to Botai of the central steppe and those related to Altaian hunter-gatherers of the eastern steppe (30). Furthermore, this population structure, which is best described as part of the ANE metapopulation, persisted within Inner Asia from the Upper Paleolithic to the end of the Early Bronze Age. In the Baikal region, the results show that at least two genetic shifts occurred: first, a complete population replacement of the Upper Paleolithic hunter-gatherers belonging to the ANE by Early Neolithic communities of Ancient East Asian ancestry, and second, an admixture event between the latter and additional members of the ANE clade, occurring during the 1500-year period that separates the Neolithic from the Early Bronze Age. These genetic shifts complement previously observed severe cultural changes in the Baikal region (1822).

Relevance for history of horse domestication

The earliest unambiguous evidence for horse husbandry is from the Copper Age Botai hunter-herder culture of the central steppe in Northern Kazakhstan ~3500 to 3000 BCE (5, 10, 23, 3133). There was extensive debate over whether Botai horses were hunted or herded (33), but more recent studies have evidenced harnessing and milking (10, 17), the presence of likely corrals, and genetic domestication selection at the horse TRPM1 coat-color locus (32). Although horse husbandry has been demonstrated at Botai, it is also now clear from genetic studies that this was not the source of modern domestic horse stock (32). Some have suggested that the Botai were local hunter-gatherers who learned horse husbandry from an early eastward spread of western pastoralists, such as the Copper Age herders buried at Khvalynsk (~5150 to 3950 BCE), closely related to Yamnaya and Afanasievo (17). Others have suggested an in situ transition from the local hunter-gatherer community (5).

We therefore examined the genetic relationship between Yamnaya and Botai. First, we note that whereas Yamnaya is best modeled as an approximately equal mix of EHG and Caucasian HG ancestry and that the earlier Khvalynsk samples from the same area also show Caucasian ancestry, the Botai_CA samples show no signs of admixture with a Caucasian source (fig. S14). Similarly, while the Botai_CA have some Ancient East Asian ancestry, there is no sign of this in Khvalynsk or Yamnaya. Our momi model (Fig. 4) suggests that, although YamnayaKaragash_EBA shared ANE ancestry with Botai_CA from MA1 through EHG, their lineages diverge ~15,000 years ago in the Paleolithic. According to a parametric bootstrap, the amount of gene flow between YamnayaKaragash_EBA and Botai_CA inferred using the sample frequency spectrum (SFS) was not significantly different from 0 (P = 0.18 using 300 parametric bootstraps under a null model without admixture) (fig. S18). Additionally, the best-fitting SFS model without any recent gene flow fits the ratio of ABBA-BABA counts for (SidelkinoEHG_ML, YamnayaKaragash_EBA; Botai_CA, AncestralAllele), with Z = 0.45 using a block jackknife for this statistic. Consistent with this, a simple qpGraph model without direct gene flow between Botai_CA and Yamnaya, but with shared EHG-related ancestry between them, fits all f4 statistics (fig. S28), and qpAdm (2) successfully fits models for Yamnaya ancestry without any Botai_CA contribution (table S12).

The separation between Botai and Yamnaya is further reinforced by a lack of overlap in Y-chromosomal lineages (Fig. 5A). Although our YamnayaKaragash_EBA sample carries the R1b1a2a2c1 lineage seen in other Yamnaya and present-day Eastern Europeans, one of the two Botai_CA males belongs to the basal N lineage, whose subclades have a predominantly Northern Eurasian distribution, whereas the second carries the R1b1a1 haplogroup, restricted almost exclusively to Central Asian and Siberian populations (34). Neither of these Botai lineages has been observed among Yamnaya males (table S13 and fig. S25).

Using ChromoPainter (35) (figs. S29 to S32) and rare variant sharing (36) (figs. S33 to S35), we also identify a disparity in affinities with present-day populations between our high-coverage Yamnaya and Botai genomes. Consistent with previous results (1, 2), we observe a contribution from YamnayaKaragash_EBA to present-day Europeans. Conversely, Botai_CA shows greater affinity to Central Asian, Siberian, and Native American populations, coupled with some sharing with northeastern European groups at a lower level than that for Yamnaya, due to their ANE ancestry.

Further toward the Altai, the genomes of two CentralSteppe_EMBA women, who were buried in Afanasievo-like pit graves, revealed them to be representatives of an unadmixed Inner Asian ANE-related group, almost indistinguishable from the Okunevo_EMBA of the Minusinsk Basin north of the Altai through D statistics (fig. S11). This lack of genetic and cultural congruence may be relevant to the interpretation of Afanasievo-type graves elsewhere in Central Asia and Mongolia (37). However, in contrast to the lack of identifiable admixture from Yamnaya and Afanasievo in the CentralSteppe_EMBA, there is an admixture signal of 10 to 20% Yamnaya and Afanasievo in the Okunevo_EMBA samples (fig. S21), consistent with evidence of western steppe influence. This signal is not seen on the X chromosome (qpAdm P value for admixture on X 0.33 compared to 0.02 for autosomes), suggesting a male-derived admixture, also consistent with the fact that 1 of 10 Okunevo_EMBA males carries a R1b1a2a2 Y chromosome related to those found in western pastoralists (Fig. 5). In contrast, there is no evidence of western steppe admixture among the more eastern Baikal region Bronze Age (~2200 to 1800 BCE) samples (fig. S14).

The lack of evidence of admixture between Botai horse herders and western steppe pastoralists is consistent with these latter migrating through the central steppe but not settling until they reached the Altai to the east (4). Notably, this lack of admixture suggests that horses were domesticated by hunter-gatherers not previously familiar with farming, as were the cases for dogs (38) and reindeer (39). Domestication of the horse thus may best parallel that of the reindeer, a food animal that can be milked and ridden, which has been proposed to be domesticated by hunters via the “prey path” (40); indeed, anthropologists note similarities in cosmological beliefs between hunters and reindeer herders (41). In contrast, most animal domestications were achieved by settled agriculturalists (5).

Origins of Western Eurasian genetic signatures in South Asians

The presence of Western Eurasian ancestry in many present-day South Asian populations south of the central steppe has been used to argue for gene flow from Early Bronze Age (~3000 to 2500 BCE) western steppe pastoralists into the region (42, 43). However, direct influence of Yamnaya or related cultures of that period is not visible in the archaeological record, except perhaps for a single burial mound in Sarazm in present-day Tajikistan of contested age (44, 45). Additionally, linguistic reconstruction of protoculture coupled with the archaeological chronology evidences a Late (~2300 to 1200 BCE) rather than Early Bronze Age (~3000 to 2500 BCE) arrival of the Indo-Iranian languages into South Asia (16, 45, 46). Thus, debate persists as to how and when Western Eurasian genetic signatures and IE languages reached South Asia.

To address these issues, we investigated whether the source of the Western Eurasian signal in South Asians could derive from sources other than Yamnaya and Afanasievo (Fig. 1). Both Early Bronze Age (~3000 to 2500 BCE) steppe pastoralists Yamnaya and Afanasievo and Late Bronze Age (~2300 to 1200 BCE) Sintashta and Andronovo carry substantial amounts of EHG and CHG ancestry (1, 2, 7), but the latter group can be distinguished by a genetic component acquired through admixture with European Neolithic farmers during the formation of the Corded Ware complex (1, 2), reflecting a secondary push from Europe to the east through the forest-steppe zone.

We characterized a set of four south Turkmenistan samples from Namazga period III (~3300 BCE). In our PCA analysis, the Namazga_CA individuals were placed in an intermediate position between Iran Neolithic and western steppe clusters (Fig. 2). Consistent with this, we find that the Namazga_CA individuals carry a significantly larger fraction of EHG-related ancestry than Neolithic skeletal material from Iran [D(EHG, Mbuti; Namazga_CA, Iran_N) Z = 4.49], and we are not able to reject a two-population qpAdm model in which Namazga_CA ancestry was derived from a mixture of Neolithic Iranians and EHG (~21%) (P = 0.49).

Although CHG contributed both to Copper Age steppe individuals (e.g., Khvalynsk, ~5150 to 3950 BCE) and substantially to Early Bronze Age (~3000 to 2500 BCE) steppe Yamnaya and Afanasievo (1, 2, 7, 47), we do not find evidence of CHG-specific ancestry in Namazga. Despite the adjacent placement of CHG and Namazga_CA on the PCA plot, D(CHG, Mbuti; Namazga_CA, Iran_N) does not deviate significantly from 0 (Z = 1.65), in agreement with ADMIXTURE results (Fig. 3 and fig. S14). Moreover, a three-population qpAdm model using Iran Neolithic, EHG, and CHG as sources yields a negative admixture coefficient for CHG. This suggests that while we cannot totally reject a minor presence of CHG ancestry, steppe-related admixture most likely arrived in the Namazga population before the Copper Age or from unadmixed sources related to EHG. This is consistent with the upper temporal boundary provided by the date of the Namazga_CA samples (~3300 BCE). In contrast, the Iron Age (~900 to 200 BCE) individual from the same region as Namazga (sample DA382, labeled Turkmenistan_IA) is closer to the steppe cluster in the PCA plot and does have CHG-specific ancestry. However, it also has European farmer–related ancestry typical of Late Bronze Age (~2300 to 1200 BCE) steppe populations (13, 47) [D(Neolithic European, Mbuti; Namazga_CA, Turkmenistan_IA) Z = -4.04], suggesting that it received admixture from Late (~2300 to 1200 BCE) rather than Early Bronze Age (~3000 to 2500 BCE) steppe populations.

In a PCA focused on South Asia (Fig. 2B), the first dimension corresponds approximately to west-east and the second dimension to north-south. Near the lower right are the Andamanese Onge, previously used to represent the Ancient South Asian component (12, 42). Contemporary South Asian populations are placed along both east-west and north-south gradients, reflecting the presence of three major ancestry components in South Asia deriving from West Eurasians, South Asians, and East Asians. Because the Namazga_CA individuals appear at one end of the West Eurasian/South Asian axis, and given their geographical proximity to South Asia, we tested this group as a potential source in a set of qpAdm models for the South Asian populations (Fig. 6).

Fig. 6 A summary of the four qpAdm models fitted for South Asian populations.

For each modern South Asian population, we fit different models with qpAdm to explain their ancestry composition using ancient groups and present the first model that we could not reject in the following priority order: 1. Namazga_CA + Onge, 2. Namazga_CA + Onge + Late Bronze Age Steppe, 3. Namazga_CA + Onge + Xiongnu_IA (East Asian proxy), and 4. Turkmenistan_IA + Xiongnu_IA. Xiongnu_IA were used here to represent East Asian ancestry. We observe that although South Asian Dravidian speakers can be modeled as a mixture of Onge and Namazga_CA, an additional source related to Late Bronze Age steppe groups is required for IE speakers. In Tibeto-Burman and Austro-Asiatic speakers, an East Asian rather than a Steppe_MLBA source is required.

We are not able to reject a two-population qpAdm model using Namazga_CA and Onge for nine modern southern and predominantly Dravidian-speaking populations (Fig. 6, fig. S36, and tables S16 and S17). In contrast, for seven other populations belonging to the northernmost Indic- and Iranian-speaking groups, this two-population model is rejected, but not a three-population model including an additional Late Bronze Age (~2300 to 1200 BCE) steppe source. Last, for seven southeastern Asian populations, six of which were Tibeto-Burman or Austro-Asiatic speakers, the three-population model with Late Bronze Age (~2300 to 1200 BCE) steppe ancestry was rejected, but not a model in which Late Bronze Age (~2300 to 1200 BCE) steppe ancestry was replaced with an East Asian ancestry source, as represented by the Late Iron Age (~200 BCE to 100 CE) Xiongnu (Xiongnu_IA) nomads from Mongolia (3). Interestingly, for two northern groups, the only tested model we could not reject included the Iron Age (~900 to 200 BCE) individual (Turkmenistan_IA) from the Zarafshan Mountains and the Xiongnu_IA as sources. These findings are consistent with the positions of the populations in PCA space (Fig. 2B) and are further supported by ADMIXTURE analysis (Fig. 3), with two minor exceptions: In both the Iyer and the Pakistani Gujar, we observe a minor presence of the Late Bronze Age (~2300 to 1200 BCE) steppe ancestry component (fig. S14) not detected by the qpAdm approach. Additionally, we document admixture along the West Eurasian and East Asian clines of all South Asian populations using D statistics (fig. S37).

Thus, we find that ancestries deriving from four major separate sources fully reconcile the population history of present-day South Asians (Figs. 3 and 6), one anciently South Asian, one from Namazga or a related population, a third from Late Bronze Age (~2300 to 1200 BCE) steppe pastoralists, and one from East Asia. They account for western ancestry in some Dravidian populations that lack CHG-specific ancestry while also fitting the observation that whenever there is CHG-specific ancestry and considerable EHG ancestry, there is also European Neolithic ancestry (Fig. 3). This implicates Late Bronze Age (~2300 to 1200 BCE) steppe rather than Early Bronze Age (~3000 to 2500 BCE) Yamnaya and Afanasievo admixture into South Asia. The proposal that the IE steppe ancestry arrived in the Late Bronze Age (~2300 to 1200 BCE) is also more consistent with archaeological and linguistic chronology (44, 45, 48, 49). Thus, it seems that the Yamnaya- and Afanasievo-related migrations did not have a direct genetic impact in South Asia.

Lack of steppe genetic impact in Anatolians

Finally, we consider the evidence for Bronze Age steppe genetic contributions in West Asia. There are conflicting models for the earliest dispersal of IE languages into Anatolia (4, 50). The now extinct Bronze Age Anatolian language group represents the earliest historically attested branch of the IE language family and is linguistically held to be the first branch to have split off from PIE (51, 52, 53). One key question is whether Proto-Anatolian is a direct linguistic descendant of the hypothesized Yamnaya PIE language or whether Proto-Anatolian and the PIE language spoken by Yamnaya were branches of a more ancient language ancestral to both (49, 53). Another key question relates to whether Proto-Anatolian speakers entered Anatolia as a result of a Copper Age western steppe migration (~5000 to 3000 BCE) involving movement of groups through the Balkans into Northwest Anatolia (4, 54, 55) or a Caucasian route that links language dispersal to intensified north-south population contacts facilitated by the trans-Caucasian Maykop culture ~3700 to 3000 BCE (50, 54).

Ancient DNA findings suggest extensive population contact between the Caucasus and the steppe during the Copper Age (~5000 to 3000 BCE) (1, 2, 42). Particularly, the first identified presence of Caucasian genomic ancestry in steppe populations is through the Khvalynsk burials (2, 47) and that of steppe ancestry in the Caucasus is through Armenian Copper Age individuals (42). These admixture processes likely gave rise to the ancestry that later became typical of the Yamnaya pastoralists (7), whose IE language may have evolved under the influence of a Caucasian language, possibly from the Maykop culture (50, 56). This scenario is consistent with both the Copper Age steppe (4) and the Caucasian models for the origin of the Proto-Anatolian language (57).

PCA (Fig. 2B) indicates that all the Anatolian genome sequences from the Early Bronze Age (~2200 BCE) and Late Bronze Age (~1600 BCE) cluster with a previously sequenced Copper Age (~3900 to 3700 BCE) individual from Northwestern Anatolia and lie between Anatolian Neolithic (Anatolia_N) samples and CHG samples but not between Anatolia_N and EHG samples. A test of the form D(CHG, Mbuti; Anatolia_EBA, Anatolia_N) shows that these individuals share more alleles with CHG than Neolithic Anatolians do (Z = 3.95), and we are not able to reject a two-population qpAdm model in which these groups derive ~60% of their ancestry from Anatolian farmers and ~40% from CHG-related ancestry (P = 0.5). This signal is not driven by Neolithic Iranian ancestry, because the result of a similar test of the form D(Iran_N, Mbuti; Anatolia_EBA, Anatolia_N) does not deviate from zero (Z = 1.02). Taken together with recent findings of CHG ancestry on Crete (58), our results support a widespread CHG-related gene flow, not only into Central Anatolia but also into the areas surrounding the Black Sea and Crete. The latter are not believed to have been influenced by steppe-related migrations and may thus correspond to a shared archaeological horizon of trade and innovation in metallurgy (59).

Importantly, a test of the form D(EHG, Mbuti; Anatolia_EBA, Anatolia_MLBA) supports that the Central Anatolian gene pools, including those sampled from settlements thought to have been inhabited by Hittite speakers, were not affected by steppe populations during the Early and Middle Bronze Age (Z = –1.83). Both of these findings are further confirmed by results from clustering analysis (Fig. 3). The CHG-specific ancestry and the absence of EHG-related ancestry in Bronze Age Anatolia would be in accordance with intense cultural interactions between populations in the Caucasus and Anatolia observed during the late fifth millennium BCE that seem to come to an end in the first half of the fourth millennium BCE with the village-based egalitarian Kura-Araxes’ society (60, 61), thus preceding the emergence and dispersal of Proto-Anatolian.

Our results indicate that the early spread of IE languages into Anatolia was not associated with any large-scale steppe-related migration, as previously suggested (62). Additionally, and in agreement with the later historical record of the region (63), we find no correlation between genetic ancestry and exclusive ethnic or political identities among the populations of Bronze Age Central Anatolia, as has previously been hypothesized (64).


For Europe, ancient genomics have revealed extensive population migrations, replacements, and admixtures from the Upper Paleolithic to the Bronze Age (1, 2, 27, 65, 66), with a strong influence across the continent from the Early Bronze Age (~3000 to 2500 BCE) western steppe Yamnaya. In contrast, for Central Asia, continuity is observed from the Upper Paleolithic to the end of the Copper Age (~3500 to 3000 BCE), with descendants of Paleolithic hunter-gatherers persisting as largely isolated populations after the Yamnaya and Afanasievo pastoralist migrations. Instead of western pastoralists admixing with or replacing local groups, we see groups with East Asian ancestry replacing ANE populations in the Lake Baikal region. Thus, unlike in Europe, the hunter/gathering/herding groups of Inner Asia were much less affected by the Yamnaya and Afanasievo expansion. This may be due to the rise of early horse husbandry, likely initially originated through a local “prey route” (40) adaptation by horse-dependent hunter-gatherers at Botai. Work on ancient horse genomes (32) indicates that Botai horses were not the main source of modern domesticates, which suggests the existence of a second center of domestication, but whether this second center was associated with the Yamnaya and Afanasievo cultures remains uncertain in the absence of horse genetic data from their sites.

Our finding that the Copper Age (~3300 BCE) Namazga-related population from the borderlands between Central and South Asia contains both Iran Neolithic and EHG ancestry but not CHG-specific ancestry provides a solution to problems concerning the Western Eurasian genetic contribution to South Asians. Rather than invoking varying degrees of relative contribution of Iran Neolithic and Yamnaya ancestries, we explain the two western genetic components with two separate admixture events. The first event, potentially before the Bronze Age, spread from a non-IE-speaking farming population from the Namazga culture or a related source down to Southern India. Then the second came during the Late Bronze Age (~2300 to 1200 BCE) through established contacts between pastoral steppe nomads and the Indus Valley, bringing European Neolithic as well as CHG-specific ancestry, and with them Indo-Iranian languages into northern South Asia. This is consistent with a long-range South Eurasian trade network ~2000 BCE (4), shared mythologies with steppe-influenced cultures (41, 60), linguistic relationships between Indic spoken in South Asia, and written records from Western Asia from the first half of the 18th century BCE onward (49, 67).

In Anatolia, our samples do not genetically distinguish Hittite and other Bronze Age Anatolians from an earlier Copper Age sample (~3943 to 3708 BCE). All these samples contain a similar level of CHG ancestry but no EHG ancestry. This is consistent with Anatolian/Early European farmer ancestry, but not steppe ancestry, in the Copper Age Balkans (68) and implies that the Anatolian clade of IE languages did not derive from a large-scale Copper Age/Early Bronze Age population movement from the steppe [unlike the findings in (4)]. Our findings are thus consistent with historical models of cultural hybridity and “middle ground” in a multicultural and multilingual but genetically homogeneous Bronze Age Anatolia (69, 70).

Current linguistic estimations converge on dating the Proto-Anatolian split from residual PIE to the late fifth or early fourth millennia BCE (53, 71) and place the breakup of Anatolian IE inside Turkey before the mid-third millennium (51, 54, 72). In (49) we present new onomastic material (73) that pushes the period of Proto-Anatolian linguistic unity even further back in time. We cannot at this point reject a scenario in which the introduction of the Anatolian IE languages into Anatolia was coupled with the CHG-derived admixture before 3700 BCE, but note that this is contrary to the standard view that PIE arose in the steppe north of the Caucasus (4) and that CHG ancestry is also associated with several non-IE-speaking groups, historical and current. Indeed, our data are also consistent with the first speakers of Anatolian IE coming to the region by way of commercial contacts and small-scale movement during the Bronze Age. Among comparative linguists, a Balkan route for the introduction of Anatolian IE is generally considered more likely than a passage through the Caucasus, due, for example, to greater Anatolian IE presence and language diversity in the west (55). Further discussion of these options is given in the archaeological and linguistic supplementary discussions (48, 49).

Thus, while the steppe hypothesis, in the light of ancient genomics, has so far successfully explained the origin and dispersal of IE languages and culture in Europe, we find that several elements must be reinterpreted to account for Asia. First, we show that the earliest unambiguous example of horse herding emerged among hunter-gatherers, who had no substantial genetic interaction with western steppe herders. Second, we demonstrate that the Anatolian IE language branch, including Hittite, did not derive from a substantial steppe migration into Anatolia. And third, we conclude that Early Bronze Age steppe pastoralists did not migrate into South Asia but that genetic evidence fits better with the Indo-Iranian IE languages being brought to the region by descendants of Late Bronze Age steppe pastoralists.

Supplementary Materials

Supplementary Text

Figs. S1 to S37

Tables S1 to S17

References (74168)

References and Notes

  1. See the supplementary materials.
Acknowledgments: We thank K. Magnussen, L. Petersen, and C. Mortensen at the Danish National Sequencing Centre for conducting the sequencing and P. Reimer and S. Hoper at the 14Chrono Center Belfast for providing the AMS dating. We thank S. Ellingvåg, B. E. Heyerdahl, and the Explico-Historical Research Foundation team, as well as N. Thompson, for involvement in field work. We thank the Turkish Ministry of Culture and Tourism, Kaman-Kalehöyük Archeology Museum, and Nevşehir Museum for permission to use samples of Kaman-Kalehöyük and Ovaören. We thank J. Stenderup, P. V. Olsen, and T. Brand for technical assistance in the laboratory. We thank T. Korneliussen for helpful discussions. We thank St. John’s College, Cambridge, for providing the setting for fruitful scientific discussions. We thank all involved archaeologists, historians, and collaborators from Pakistan who assisted I.U. in the field. We thank G. Baimbetov (Shejire DNA), I. Baimukhan, B. Daulet, A. Kusaev, A. Kopbassarova, Y. Yousupov, M. Akchurin, and V. Volkov for important assistance in the field. Funding: The study was supported by the Lundbeck Foundation (E.W.), the Danish National Research Foundation (E.W.), and KU2016 (E.W.). Research at the Sanger Institute was supported by the Wellcome Trust (grant 206194). R.M. was supported by an EMBO Long-Term Fellowship (ALTF 133-2017). J.K. was supported by the Human Frontiers Science Program (LT000402/2017). Botai fieldwork was supported by University of Exeter, Archeology Exploration Fund, and N. Thompson, Clearwater Documentary. A.B. was supported by NIH grant 5T32GM007197-43. G.K. was funded by Riksbankens Jubileumsfond and European Research Council. M.P. was funded by Netherlands Organization for Scientific Research (NWO), project number 276-70-028. I.U. was funded by the Higher Education Commission of Pakistan. Archaeological materials from Sholpan and Grigorievka were obtained with partial financial support of the budget program of the Ministry of Education and Science of the Republic of Kazakhstan “Grant financing of scientific research for 2018-2020” no. AP05133498 “Early Bronze Age of the Upper Irtysh.” Author contributions: E.W., K.K., A.K.O., and A.W. initiated the study. E.W., R.D., K.K., A.K.O., and P.B.D. designed the study. E.W. and R.D. led the study. K.K. and A.K.O. led the archaeological part of the study. G.K., M.P., and G.B. led the linguistic part of the study. P.B.D., C.Z., F.E.Y., I.U., C.d.F., M.I., H.S., A.S.-O., and M.E.A. produced data. P.B.D., R.M., J.K., J.V.M.-M., S.R., K.H.I., M.S., R.N., A.B., J.N., E.W., and R.D. analyzed or assisted in analysis of data. P.B.D., R.M., J.K., J.V.M.-M., R.D., E.W., A.K.O., K.K., G.K., M.P., G.B., B.H., M.S., and R.N. interpreted the results. P.B.D., E.W., R.M., R.D., A.K.O., G.K., J.K., G.B., J.V.M.-M., K.K., and M.P. wrote the manuscript with considerable input from B.H., M.S., M.E.A., and R.N. P.B.D., V.Z., V.M., I.M., N.B., E.U., V.L., F.E.Y., I.U., A.M., K.G.S., V.M., A.G., S.O., S.Y.S., C.M., H.A., A.H., A.S., N.G., M.H.K., A.W., L.O., and A.K.O. excavated, curated, and sampled and/or described analyzed skeletons. Competing interests: The authors declare no competing interests. Data and materials availability: Genomic data are available for download at the ENA (European Nucleotide Archive) with accession numbers ERP107300 and PRJEB26349. SNP array data from Pakistan can be obtained from EGA through accession number EGAS00001002965. Y chromosome and mtDNA data are available at Zenodo under DOI 10.5281/zenodo.1219431.

Correction (28 June 2018): In the Summary figure, “TIA - Iron Age” should be changed to “IA - Iron Age”.

View Abstract

Navigate This Article