The prehistoric peopling of Southeast Asia

See allHide authors and affiliations

Science  06 Jul 2018:
Vol. 361, Issue 6397, pp. 88-92
DOI: 10.1126/science.aat3628

Ancient migrations in Southeast Asia

The past movements and peopling of Southeast Asia have been poorly represented in ancient DNA studies (see the Perspective by Bellwood). Lipson et al. generated sequences from people inhabiting Southeast Asia from about 1700 to 4100 years ago. Screening of more than a hundred individuals from five sites yielded ancient DNA from 18 individuals. Comparisons with present-day populations suggest two waves of mixing between resident populations. The first mix was between local hunter-gatherers and incoming farmers associated with the Neolithic spreading from South China. A second event resulted in an additional pulse of genetic material from China to Southeast Asia associated with a Bronze Age migration. McColl et al. sequenced 26 ancient genomes from Southeast Asia and Japan spanning from the late Neolithic to the Iron Age. They found that present-day populations are the result of mixing among four ancient populations, including multiple waves of genetic material from more northern East Asian populations.

Science, this issue p. 92, p. 88; see also p. 31


The human occupation history of Southeast Asia (SEA) remains heavily debated. Current evidence suggests that SEA was occupied by Hòabìnhian hunter-gatherers until ~4000 years ago, when farming economies developed and expanded, restricting foraging groups to remote habitats. Some argue that agricultural development was indigenous; others favor the “two-layer” hypothesis that posits a southward expansion of farmers giving rise to present-day Southeast Asian genetic diversity. By sequencing 26 ancient human genomes (25 from SEA, 1 Japanese Jōmon), we show that neither interpretation fits the complexity of Southeast Asian history: Both Hòabìnhian hunter-gatherers and East Asian farmers contributed to current Southeast Asian diversity, with further migrations affecting island SEA and Vietnam. Our results help resolve one of the long-standing controversies in Southeast Asian prehistory.

Anatomically modern humans expanded into Southeast Asia (SEA) at least 65 thousand years (ka) ago (1, 2), leading to the formation of the Hòabìnhian hunter-gatherer tradition first recognized by ~44 ka ago (3, 4). Though Hòabìnhian foragers are considered the ancestors of present-day hunter-gatherers from mainland Southeast Asia (MSEA) (5), the East Asian phenotypic affinities of the majority of present-day Southeast Asian populations suggest that diversity was influenced by later migrations involving rice and millet farmers from the north (4). These observations have generated two competing hypotheses: One states that the Hòabìnhian hunter-gatherers adopted agriculture without substantial external gene flow (6, 7), and the other (the “two-layer” hypothesis) states that farmers from East Asia (EA) replaced the indigenous Hòabìnhian inhabitants ~4 ka ago (8, 9). Studies of present-day populations have not resolved the extent to which migrations from EA affected the genetic makeup of SEA.

Obtaining ancient DNA evidence from SEA is challenging because of poor preservation conditions (10). We thus tested different whole-human-genome capture approaches and found that a modified version of MYbaits Enrichment performed best (11). We applied this method together with standard shotgun sequencing to DNA extracted from human skeletal material from Malaysia, Thailand, the Philippines, Vietnam, Indonesia, Laos, and Japan dating between 0.2 and 8 ka ago (11). We obtained 26 low-coverage ancient whole genomes, including those of a Japanese Ikawazu Jōmon individual and Hòabìnhian hunter-gatherers from Malaysia and Laos, as well as Late Neolithic, Bronze Age, and Iron Age farmers from across SEA (Fig. 1 and table S1) (11). We also sequenced mitochondrial DNA from 16 additional ancient individuals and high-coverage whole genomes from two present-day Jehai individuals from Northern Parak state, West Malaysia (table S3). All samples showed damage patterns typical of ancient DNA and minimal amounts of contamination (table S3) (11).

Fig. 1 Maps of ages and differential ancestry of ancient Southeast Asian genomes.

(A) Estimated mean sample ages for ancient individuals. (B to D) D statistics testing for differential affinity between (B) Papuans and Tiányuán (2240k dataset), (C) Önge and Tiányuán (2240k dataset), and (D) Mlabri and Hàn Chinese (Pan-Asia dataset).

We performed a principal component analysis (PCA) of worldwide present-day populations (12, 13) to find the strongest axes of genetic variation in our data and projected the ancient individuals onto the first two principal components. The two oldest samples—Hòabìnhians from Pha Faen, Laos [La368; 7950 with 7795 calendar years before the present (cal B.P.)] and Gua Cha, Malaysia (Ma911; 4415 to 4160 cal B.P.)—henceforth labeled “group 1,” cluster most closely with present-day Önge from the Andaman Islands and away from other East Asian and Southeast Asian populations (Fig. 2), a pattern that differentiates them from all other ancient samples. We used ADMIXTURE (14) and fastNGSadmix (15) to model ancient genomes as mixtures of latent ancestry components (11). Group 1 individuals differ from the other Southeast Asian ancient samples in containing components shared with the supposed descendants of the Hòabìnhians: the Önge and the Jehai (Peninsular Malaysia), along with groups from India and Papua New Guinea.

Fig. 2 Exploratory analyses of relationships of ancient Southeast Asian genomes to those of present-day populations.

Ancient samples are projected on the first two components of PCAs for (A) worldwide populations and (B) a subset of populations from EA and SEA. (C) fastNGSadmix plot at K = 13 (11). We refer to the following present-day language-speaking groups in relation to our ancient samples: Austroasiatic (bright green), Austronesian (pink), and Hmong-Mien (dark pink), along with a broad East Asian component (dark green). P.M., proto-Malay; M.N., Malaysian negrito; P.N., Philippines negrito; And. Is., Andaman Islands; NA, not applicable.

We also find a distinctive relationship between the group 1 samples and the Ikawazu Jōmon of Japan (IK002). Outgroup f3 statistics (11, 16) show that group 1 shares the most genetic drift with all ancient mainland samples and Jōmon (fig. S12 and table S4). All other ancient genomes share more drift with present-day East Asian and Southeast Asian populations than with Jōmon (figs. S13 to S19 and tables S4 to S11). This is apparent in the fastNGSadmix analysis when assuming six ancestral components (K = 6) (fig. S11), where the Jōmon sample contains East Asian components and components found in group 1. To detect populations with genetic affinities to Jōmon, relative to present-day Japanese, we computed D statistics of the form D(Japanese, Jōmon; X, Mbuti), setting X to be different present-day and ancient Southeast Asian individuals (table S22). The strongest signal is seen when X = Ma911 and La368 (group 1 individuals), showing a marginally nonsignificant affinity to Jōmon (11). This signal is not observed with X = Papuans or Önge, suggesting that the Jōmon and Hòabìnhians may share group 1 ancestry (11).

D-statistics of the form D(Papuan, Tiányuán; Y, Mbuti), where Y is a test population, are consistent with present-day East Asian populations and most populations of ancient and present-day SEA being more closely related to Tiányuán [a 40-ka-old East Asian individual (17)] than to Papuans (Fig. 1) (11, 18). However, this D statistic is not significantly different from 0 for Y = Jehai, Önge, Jarawa or group 1 (the ancient Hòabìnhians) (table S12). D statistics of the form D(Önge, Tiányuán; X, Mbuti), where X is Jarawa, Jehai, or group 1, show that these populations share more ancestry with Önge than with Tiányuán (Fig. 1) (11). Using TreeMix and qpGraph (16, 19) to explore admixture graphs that could potentially fit our data, we find that group 1 individuals are best modeled as a sister group to present-day Önge (Fig. 3, and figs. S21 to S23 and S35 to S37). Finally, the Jōmon individual is best-modeled as a mix between a population related to group 1/Önge and a population related to East Asians (Amis), whereas present-day Japanese can be modeled as a mixture of Jōmon and an additional East Asian component (Fig. 3 and fig. S29).

Fig. 3 Admixture graphs fitting ancient Southeast Asian genomes.

TreeMix and qpGraph admixture graphs combining present-day populations and selected ancient samples with high single-nucleotide polymorphism coverage (11). (A) A graph including group 1 samples (Ma911 and La368) fits them as sister groups to present-day Önge. (B) A graph including the highest-coverage group 1 (La368) and group 2 (La364, Ma912) samples shows that group 2 receives ancestry from both group 1 and the East Asian branch. (C) Using qpGraph, we modeled present-day East Asians (represented by Amis) as a mixture of an Önge-like population and a population related to the Tiányuán individual. (D) The Jōmon individual is modeled as a mix of Hòabìnhian (La368) and East Asian ancestry.

The remaining ancient individuals are modeled in fastNGSadmix as containing East Asian and Southeast Asian components present in high proportions in present-day Austroasiatic, Austronesian, and Hmong-Mien speakers, along with a broad East Asian component. A PCA including only East Asian and Southeast Asian populations that did not show considerable Papuan or Önge-like ancestry (fig. S11) separates the present-day speakers of ancestral language families in the region: Trans-Himalayan (formerly Sino-Tibetan), Austroasiatic, and Austronesian/Kradai (20). The ancient individuals form five slightly differentiated clusters (groups 2 to 6) (Fig. 1B), in concordance with fastNGSadmix and f3 results (Fig. 2 and figs. S12 to S19) (11).

Group 2 contains late Neolithic and early Bronze Age individuals (4291 to 2184 cal B.P.), from Vietnam, Laos, and the Malay Peninsula who are closely related to present-day Austroasiatic language speakers such as the Mlabri and Htin (Fig. 1) (11). Compared with groups 3 to 6, group 2 individuals lack a broad East Asian ancestry component that is at its highest proportion in northern EA in fastNGSadmix. TreeMix analyses suggest that the two individuals with the highest coverage in group 2 (La364 and Ma912) form a clade resulting from admixture between the ancestors of East Asians and of La368 (Fig. 3 and figs. S24 to S27). This pattern of complex, localized admixture is also evident in the Jehai, fitted as an admixed population between group 2 (Ma912) and the branch leading to present-day Önge and La368 (fig. S28). Consistent with these results, La364 is best modeled as a mixture of a population ancestral to Amis and the group 1/Önge-like population (Fig. 3). The best model for present-day Dai populations is a mixture of group 2 individuals and a pulse of admixture from East Asians (fig. S39).

Group 6 individuals (1880 to 299 cal B.P.) originate from Malaysia and the Philippines and cluster with present-day Austronesians (11) (Fig. 2). Group 6 also contains Ma554, having the highest amounts of Denisovan-like ancestry relative to the other ancient samples, although we observe little variation in this archaic ancestry in our samples from MSEA (11).

Group 5 (2304 to 1818 cal B.P.) contains two individuals from Indonesia, modeled by fastNGSadmix as a mix of Austronesian- and Austroasiatic-like ancestry, similar to present-day western Indonesians, a finding consistent with their position in the PCA (Fig. 2) (11). Indeed, after Mlabri and Htin, the present-day populations sharing the most drift with group 2 are western Indonesian samples from Bali and Java previously identified as having mainland Southeast Asian ancestry (21) (fig. S13). Treemix models the group 5 individuals as an admixed population receiving ancestry related to group 2 (figs. S30 and S31) and Amis. Despite the clear relationship with the mainland group 2 seen in all analyses, the small ancestry components in group 5 related to Jehai and Papuans visible in fastNGSadmix may be remnants of ancient Sundaland ancestry. These results suggest that group 2 and group 5 are related to a mainland migration that expanded southward across MSEA by 4 ka ago and into island Southeast Asia (ISEA) by 2 ka ago (2224). A similar pattern is detected for Ma555 (fig. S33) in Borneo (505 to 326 cal B.P., group 6), although this may be a result of recent gene flow.

Group 3 is composed of several ancient individuals from northern Vietnam (2378 to 2041 cal B.P.) and one individual from Long Long Rak (LLR), Thailand (1691 to 1537 cal B.P.). They cluster in the PCA with the Dai, Amis, and Kradai speakers from Thailand, consistent with an Austro-Tai linguistic phylum, comprising both the Kradai and Austronesian language families (20, 25). Group 4 contains the remaining ancient individuals from LLR in Thailand (1570 to 1815 cal B.P.), and Vt778 from inland Vietnam (2750 to 2500 cal B.P.). These samples cluster with present-day Austroasiatic speakers from Thailand and China, in support of a South China origin for LLR (26). The genetic distinction between Austroasiatic and Kradai speakers is discussed further in (11).

Present-day Southeast Asian populations derive ancestry from at least four ancient populations (Fig. 4). The oldest layer consists of mainland Hòabìnhians (group 1), who share ancestry with present-day Andamanese Önge, Malaysian Jehai, and the ancient Japanese Ikawazu Jōmon. Consistent with the two-layer hypothesis in MSEA, we observe a change in ancestry by ~4 ka ago, supporting a demographic expansion from EA into SEA during the Neolithic transition to farming. However, despite changes in genetic structure coinciding with this transition, evidence of admixture indicates that migrations from EA did not simply replace the previous occupants. Additionally, late Neolithic farmers share ancestry with present-day Austroasiatic-speaking hill tribes, in agreement with the hypotheses of an early Austroasiatic farmer expansion (20). By 2 ka ago, Southeast Asian individuals carried additional East Asian ancestry components absent in the late Neolithic samples, much like present-day populations. One component likely represents the introduction of ancestral Kradai languages in MSEA (11), and another the Austronesian expansion into ISEA reaching Indonesia by 2.1 ka ago and the Philippines by 1.8 ka ago. The evidence described here favors a complex model including a demographic transition in which the original Hòabìnhians admixed with multiple incoming waves of East Asian migration associated with the Austroasiatic, Kradai, and Austronesian language speakers.

Fig. 4 Model for plausible migration routes into SEA.

This schematic is based on ancestry patterns observed in the ancient genomes. Because we do not have ancient samples to accurately resolve how the ancestors of Jōmon and Japanese populations entered the Japanese archipelago, these migrations are represented by dashed arrows. A mainland component in Indonesia is depicted by the dashed red-green line. Gr, group; Kra, Kradai.

Supplementary Materials

Supplementary Text

Figs. S1 to S43

Tables S1 to S25

References (27111)

References and Notes

  1. See supplementary text.
Acknowledgments: We thank the National High-throughput DNA Sequencing Centre (Copenhagen Denmark) for advice and sequencing of samples, the Duckworth laboratory, University of Cambridge, for access to materials, K. Gregersen for making casts of teeth before sampling, and P. Tacon, ARCHE, Griffith University for assistance with sample transfer. E.W. thanks St. John’s College, University of Cambridge, for providing an inspiring environment for scientific thought. Funding: This work was supported by the Lundbeck Foundation, the Danish National Research Foundation, and the KU2016 program. H.Mc. is supported by the University of Adelaide’s George Murray Scholarship. R.S. thanks the Thailand Research Fund (TRF) for support (grants RTA6080001 and RDG55H0006). The excavation of the Jōmon individual was supported by a Grant-in-Aid for Scientific Research (B) (25284157) to Y.Y. The Jōmon genome project was organized by H.I.; as well as T.H. and H.O., who were supported by MEXT KAKENHI grants 16H06408 and 17H05132; and a Grants-in-Aid for Challenging Exploratory Research (23657167) and for Scientific Research (B) (17H03738). The Jōmon genome sequencing was supported by JSPS KAKENHI grant 16H06279 to A.T. and partly supported by the CHOZEN project in Kanazawa University and the Cooperative Research Project Program of the Medical Institute of Bioregulation, Kyushu University. Computations for the Jōmon genome were partially performed on the NIG supercomputer at ROIS National Institute of Genetics. M.M.L. is supported by the ERC award 295907. D.M.L. was supported by ARC grants LP120200144, LP150100583, and DP170101313. A.P. is supported by Leverhulme Project Research grant RPG-2016-235. M.E.P. acknowledges the Cardio-Metabolic research cluster at Jeffrey Cheah School of Medicine & Health Sciences, TMB research platform, Monash University Malaysia, and MOSTI Malaysia for research grant 100-RM1/BIOTEK 16/6/2B. A.S.M. was financed by the European Research Council (starting grant) and the Swiss National Science Foundation. Author contributions: E.W. initiated and led the study. E.W., D.M.L., L.V., M.E.A., H.O., M.E.D., A.S.M., L.O., H.Mc., and F.D. designed the study. E.W. and D.M.L. supervised the overall project, and L.V., F.D., F.R., V.S., T.S., M.M.S., R.S., T.M.H.N., C.H., K.W., E.P.E., J.C.G., R.K., H.B., C.P., H.I., T.H., M.E.D., F.A.A., A.S.M., and H.O. supervised specific aspects of the project. H.Mc., L.V., F.D., U.G.W., C.D., M.E.A., V.S., T.S., M.M.S., R.S., S.K., P.L., T.M.H.N., H.C.H., T.M.T., T.H.N., S.S., G.H.N., K.W., N.S., T.M., Y.Y., A.M.B., P.D., J.L.P., L.S., E.P.E., N.A.T., B.B.P., J.C.G., R.K., H.B., M.E.D., F.A.A., and C.P. excavated, curated, sampled, and/or described samples. H.Mc., L.V., T.G., A.S.O., S.W., P.B.D., M.Y., A.Ta., H.S., A.To., S.R., T.D., M.E.D., F.A.A., A.S.M., and T.H. produced data for analysis. H.Mc., F.R., L.V., T.G., J.V.M.M., C.D., T.K, T.S., H.Ma., S.N., S.W., A.M., A.S.M., M.E.D., L.O., and M.S. analyzed or assisted in the analysis of data. H.Mc., F.R., L.V., F.D., T.G., A.M., L.O., M.S., C.H., D.M.L., and E.W. interpreted the results. H.Mc., F.R., L.V., F.D., T.G., H.O., M.M.L., R.A.F., C.H., D.M.L., and E.W. wrote the manuscript with considerable input from J.V.M.M., C.D., S.W., G.V.D., A.P., V.S., T.S., M.M.S., R.S., T.M.H.N., H.C.H., T.H.N., K.W., T.H., S.N., and M.S. All authors discussed the results and contributed to the final manuscript. Competing interests: The authors declare no competing interests. Data and materials availability: This study has been evaluated by the Danish Bioethical Committee (H-16018872) and the Department of Orang Asli Affairs, Malaysia [JHEOA.PP.30.052 Iss.5 (17)]. MoU’s exist with local institutions where the sampling took place. Genomic data are available for download at the ENA (European Nucleotide Archive) with accession number PRJEB26721.

Correction (23 August 2018): The names of book editors A. Sanchez-Mazas and L. Sagart were misspelled in references 7 and 25, respectively. These errors have been corrected.

Stay Connected to Science

Navigate This Article