The genomic history of the Iberian Peninsula over the past 8000 years

See allHide authors and affiliations

Science  15 Mar 2019:
Vol. 363, Issue 6432, pp. 1230-1234
DOI: 10.1126/science.aav4040

Genomics of the Iberian Peninsula

Ancient DNA studies have begun to help us understand the genetic history and movements of people across the globe. Focusing on the Iberian Peninsula, Olalde et al. report genome-wide data from 271 ancient individuals from Iberia (see the Perspective by Vander Linden). The findings provide a comprehensive genetic time transect of the region. Linguistics analysis and genetic analysis of archaeological human remains dating from about 7000 years ago to the present elucidate the genetic impact of prehistoric and historic migrations from Europe and North Africa.

Science, this issue p. 1230; see also p. 1153


We assembled genome-wide data from 271 ancient Iberians, of whom 176 are from the largely unsampled period after 2000 BCE, thereby providing a high-resolution time transect of the Iberian Peninsula. We document high genetic substructure between northwestern and southeastern hunter-gatherers before the spread of farming. We reveal sporadic contacts between Iberia and North Africa by ~2500 BCE and, by ~2000 BCE, the replacement of 40% of Iberia’s ancestry and nearly 100% of its Y-chromosomes by people with Steppe ancestry. We show that, in the Iron Age, Steppe ancestry had spread not only into Indo-European–speaking regions but also into non-Indo-European–speaking ones, and we reveal that present-day Basques are best described as a typical Iron Age population without the admixture events that later affected the rest of Iberia. Additionally, we document how, beginning at least in the Roman period, the ancestry of the peninsula was transformed by gene flow from North Africa and the eastern Mediterranean.

The Iberian Peninsula, lying at the extreme southwestern corner of Europe, provides an excellent context in which to assess the final impact of population movements entering the continent from the east as well as interactions with North Africa. To study the genetic impact of prehistoric and historic events in Iberia, we prepared next-generation sequencing libraries treated with uracil-DNA glycosylase (UDG) (1) and enriched them for ~1.2 million single-nucleotide polymorphisms (SNPs) (2, 3) to generate genome-wide data from 4 Mesolithic, 44 Neolithic, 47 Copper Age, 53 Bronze Age, 24 Iron Age, and 99 historical-period Iberians (Fig. 1, A and B, and tables S1 and S2). We also generated 26 radiocarbon dates (table S3). We co-analyzed the new genomic data with previously reported data from 1107 ancient individuals, including 132 from Iberia (Fig. 1B) (2, 49), and 2862 present-day individuals (10). We filtered from the analysis datasets individuals covered by <10,000 SNPs, with evidence of contamination, or first-degree relatives of others (table S1). We analyzed the data with principal components analysis (PCA) (Fig. 1, C and D), f-statistics (11), and qpAdm (12) and summarize the results in Fig. 1E. We confirmed the robustness of key findings by repeating analyses after removing SNPs in CpG dinucleotides (table S5) that are susceptible to cytosine-to-thymine errors even in UDG-treated libraries (1).

Fig. 1

Overview of the ancient Iberian genetic time transect. (A) Geographic distribution and (B) dates of new and previously reported samples. Random jitter is added for sites with multiple individuals. Sites mentioned in the text are labeled. (C) PCA of 989 present-day west Eurasian individuals (gray dots), with ancient individuals from Iberia and other regions (pale yellow) projected onto the first two principal components. (D) Section of the PCA in (C) marked with the dashed box. (E) Schematic representation of events documented in this study.

Previous knowledge of the genetic structure of Mesolithic Iberia comes from three individuals from the northwest: LaBraña1 (2), Canes1 (5), and Chan (5). We add LaBraña2, who was a brother of the previously reported LaBraña1 (figs. S1 and S2 and table S6), as well as Cueva de la Carigüela (fig. S10), Cingle del Mas Nou, and Cueva de la Cocina from the southeast. In northwest Iberia, we document a previously unappreciated ancestry shift before the arrival of farming (Fig. 2A, fig. S5, and table S7). The oldest individual Chan was similar to the ~19,000-year-old El Mirón, whereas the La Braña brothers from ~1300 years later were closer to central European hunter-gatherers like the Hungarian KO1, with an even more extreme shift ~700 years later in Canes1. This likely reflects gene flow affecting northwest Iberia but not the southeast, where individuals remained close to El Mirón (Fig. 2A). More data from the Mesolithic period, especially from currently unsampled areas, would provide additional insight into the geographical impact and archaeological correlates of this ancestry shift.

Fig. 2 Genome-wide admixture proportions using qpAdm.

(A) Modeling Mesolithic, Neolithic, and Copper Age populations as a mixture of Anatolian Neolithic, El Mirón, and KO1. Percentages indicate proportion of El Mirón + KO1 ancestry. (B) Proportion of ancestry derived from central European Beaker/Bronze Age populations in Iberians from the Middle Neolithic to the Iron Age (table S15). Colors indicate the Y-chromosome haplogroup for each male (table S4). (C) Ancestry proportions for individuals from three sites in northeast Iberia dated between the 6th and 12th centuries CE. n represents the number of individuals analyzed in each site. (D) Ancestry proportions for individuals from southeast Iberia from the 3rd to 16th centuries CE (tables S20 and S21). Each bar represents one individual, with associated mtDNA (top) and Y-chromosome (bottom). Haplogroups with a likely recent nonlocal origin are bold.

For the Neolithic and Copper Age, we model populations as mixtures of groups related to Anatolian Neolithic, El Mirón, and KO1 (Fig. 2A and table S8). We replicate previous findings of the arrival of Anatolian Neolithic–associated ancestry in multiple regions of Iberia in the Early Neolithic (7, 8, 12); however, sampling from this period remains limited and studies of larger sample sizes and additional sites will be important to shed further light on the interaction between the incoming farmers and indigenous hunter-gatherers. For the Middle Neolithic and Copper Age, we reproduce previous reports of an increase of hunter-gatherer–related ancestry after 4000 BCE (6, 7, 12, 13), with higher proportions in groups from the north and center. Using our observations about population substructure in the Mesolithic as a reference frame, we show that the hunter-gatherer–related ancestry during those periods was more closely related to later northwestern (Canes1-like) hunter-gatherers than to the El Mirón–like hunter-gatherers (Fig. 2A), providing clues about the source of this ancestry.

Our Copper Age dataset includes a newly reported male (I4246) from Camino de las Yeseras (14) in central Iberia, radiocarbon dated to 2473–2030 calibrated years BCE, who clusters with modern and ancient North Africans in the PCA (Fig. 1C and fig. S3) and, like ~3000 BCE Moroccans (8), can be well modeled as having ancestry from both Late Pleistocene North Africans (15) and Early Neolithic Europeans (tables S9 and S10). His genome-wide ancestry and uniparental markers (tables S1 and S4) are unique among Copper Age Iberians, including individuals from sites with many analyzed individuals such as Sima del Ángel, and point to a North African origin. Our genetic evidence of sporadic contacts with North Africa during the Copper Age fits with the presence of African ivory at Iberian sites (16) and is further supported by a Bronze Age individual (I7162) from Loma del Puerco in southern Iberia who had 25% ancestry related to individuals like I4246 (Fig. 1D and table S16). However, these early movements from North Africa had a limited impact on Copper and Bronze Age Iberians, as North African ancestry only became widespread in the past ~2000 years.

From the Bronze Age (~2200–900 BCE), we increase the available dataset (6, 7, 17) from 7 to 60 individuals and show how ancestry from the Pontic-Caspian steppe (Steppe ancestry) appeared throughout Iberia in this period (Fig. 1, C and D), albeit with less impact in the south (table S13). The earliest evidence is in 14 individuals dated to ~2500–2000 BCE who coexisted with local people without Steppe ancestry (Fig. 2B). These groups lived in close proximity and admixed to form the Bronze Age population after 2000 BCE with ~40% ancestry from incoming groups (Fig. 2B and fig. S6). Y-chromosome turnover was even more pronounced (Fig. 2B), as the lineages common in Copper Age Iberia (I2, G2, and H) were almost completely replaced by one lineage, R1b-M269. These patterns point to a higher contribution of incoming males than females, also supported by a lower proportion of nonlocal ancestry on the X-chromosome (table S14 and fig. S7), a paradigm that can be exemplified by a Bronze Age tomb from Castillejo del Bonete containing a male with Steppe ancestry and a female with ancestry similar to Copper Age Iberians. Although ancient DNA can document that sex-biased admixture occurred, archaeological and anthropological research will be needed to understand the processes that generated it.

For the Iron Age, we document a consistent trend of increased ancestry related to Northern and Central European populations with respect to the preceding Bronze Age (Figs. 1, C and D, and 2B). The increase was 10 to 19% (95% confidence intervals given here and in the percentages that follow) in 15 individuals along the Mediterranean coast where non-Indo-European Iberian languages were spoken; 11 to 31% in two individuals at the Tartessian site of La Angorrilla in the southwest with uncertain language attribution; and 28 to 43% in three individuals at La Hoya in the north where Indo-European Celtiberian languages were likely spoken (fig. S6 and tables S11 and S12). This trend documents gene flow into Iberia during the Late Bronze Age or Early Iron Age, possibly associated with the introduction of the Urnfield tradition (18). Unlike in Central or Northern Europe, where Steppe ancestry likely marked the introduction of Indo-European languages (12), our results indicate that, in Iberia, increases in Steppe ancestry were not always accompanied by switches to Indo-European languages. This is consistent with the genetic profile of present-day Basques who speak the only non-Indo-European language in Western Europe but overlap genetically with Iron Age populations (Fig. 1D) showing substantial levels of Steppe ancestry.

In the historical period, our transect begins with 24 individuals from the 5th century BCE to the 6th century CE from the Greek colony of Empúries in the northeast (19) who fall into two main ancestry groups (Fig. 1, C and D, and fig. S8): one similar to Bronze Age individuals from the Aegean, and the other similar to Iron Age Iberians such as those from the nearby non-Greek site of Ullastret, confirming historical sources indicating that this town was inhabited by a multiethnic population (19). The impact of mobility from the central/eastern Mediterranean during the Classical period is also evident in 10 individuals from the 7th to 8th century CE site of L'Esquerda in the northeast, who show a shift from the Iron Age population in the direction of present-day Italians and Greeks (Fig. 1D) that accounts for approximately one-quarter of their ancestry (Fig. 2C and table S17). The same shift is also observed in present-day Iberians outside the Basque area and is plausibly a consequence of the Roman presence in the peninsula, which had a profound cultural impact and, according to our data, a substantial genetic impact too.

In contrast to the demographic changes in the Classical period, movements into Iberia during the decline of the Roman Empire had less long-term demographic impact. Nevertheless, individual sites—for example, the 6th century site of Pla de l'Horta in the northeast—bear witness to events in this period. These individuals, archaeologically interpreted as Visigoths, are shifted from those at L'Esquerda in the direction of Northern and Central Europe (Figs. 1D and 2C and table S18), and we observe the Asian mitochondrial DNA (mtDNA) haplogroup C4a1a also found in Early Medieval Bavaria (20), supporting a recent link to groups with ancestry originally derived from Central and Eastern Europe.

In the southeast, we recovered genomic data from 45 individuals dated between the 3rd and 16th centuries CE. All analyzed individuals fell outside the genetic variation of preceding Iberian Iron Age populations (Fig. 1, C and D, and fig. S3) and harbored ancestry from both Southern European and North African populations (Fig. 2D), as well as additional Levantine-related ancestry that could potentially reflect ancestry from Jewish groups (21). These results demonstrate that by the Roman period, southern Iberia had experienced a major influx of North African ancestry, probably related to the well-known mobility patterns during the Roman Empire (22) or to the earlier Phoenician-Punic presence (23); the latter is also supported by the observation of the Phoenician-associated Y-chromosome J2 (24). Gene flow from North Africa continued into the Muslim period, as is clear from Muslim burials with elevated North African and sub-Saharan African ancestry (Fig. 2D, fig. S4, and table S22) and from uniparental markers typical of North Africa not present among pre-Islamic individuals (Fig. 2D and fig. S11). Present-day populations from southern Iberia harbor less North African ancestry (25) than the ancient Muslim burials, plausibly reflecting expulsion of moriscos (former Muslims converted to Christianity) and repopulation from the north, as supported by historical sources and genetic analysis of present-day groups (25). The impact of Muslim rule is also evident in northeast Iberia in seven individuals from Sant Julià de Ramis from the 8th to 12th centuries CE who, unlike previous ancient individuals from the same region, show North African–related ancestry (Fig. 2C and table S19) and a complete overlap in PCA with present-day Iberians (Fig. 1D).

Our time transect allowed us to track frequency changes of phenotypically important variants over the past 4000 years (fig. S9), a period that has been minimally sampled in the ancient DNA literature not just in Iberia but in Europe more generally. Before this work, it was known that the lactase persistence allele at rs4988235, which is present at moderate or high frequencies in most European populations today and is one of the strongest known signals of selection in Europeans (26), occurred at extremely low frequencies in Europe through the Bronze Age (2), raising the question of when it became common. Here we show that in Iberia, the allele continued to occur at low frequency in the Iron Age (fig. S9) and only approached present-day frequencies in the past 2000 years, pointing to recent strong selection.

Beyond the specific insights about Iberia, this study serves as a model for how a high-resolution ancient DNA transect continuing into historical periods can be used to provide a detailed description of the formation of present-day populations (Fig. 1E); future application of similar strategies will provide equally valuable insights in other world regions.

Supplementary Materials

Supplementary Text

Figs. S1 to S11

Tables S1 to S22

References (27189)

Genotype Dataset

References and Notes

Acknowledgments: We thank I. Mathieson, M. Lipson, I. Lazaridis, J. Sedig, and K. Sirak for discussions, and M. E. Allentoft, K.-G. Sjögren, K. Kristiansen, and E. Willerslev for facilitating sample collection. We thank M. Meyer for sharing the optimized oligo sequences for single-stranded library preparation. We thank the different museums (listed in the supplementary materials) for permission to study archaeological remains. Funding: J.M.F., F.J.L.-C., J.I.M., F.X.O., J.D., and M.S.B. were supported by HAR2017-86509-P, HAR2017-87695-P, and SGR2017-11 from the Generalitat de Catalunya, AGAUR agency. C.L.-F. was supported by Obra Social La Caixa and by FEDER-MINECO (BFU2015- 64699-P). L.B.d.L.E. was supported by REDISCO-HAR2017-88035-P (Plan Nacional I+D+I, MINECO). C.L., P.R., and C.Bl. were supported by MINECO (HAR2016-77600-P). A.Esp., J.V.-V., G.D., and D.C.S.-G. were supported by MINECO (HAR2009-10105 and HAR2013-43851-P). D.J.K. and B.J.C. were supported by NSF BCS-1460367. K.T.L., A.W., and J.M. were supported by NSF BCS-1153568. J.F.-E. and J.A.M.-A. were supported by IT622-13 Gobierno Vasco, Diputación Foral de Álava, and Diputación Foral de Gipuzkoa. We acknowledge support from the Portuguese Foundation for Science and Technology (PTDC/EPH-ARQ/4164/2014) and the FEDER-COMPETE 2020 project 016899. P.S. was supported by the FCT Investigator Program (IF/01641/2013), FCT IP, and ERDF (COMPETE2020 – POCI). M.Si. and K.D. were supported by a Leverhulme Trust Doctoral Scholarship awarded to M.B.R. and M.P. D.R. was supported by an Allen Discovery Center grant from the Paul Allen Foundation, NIH grant GM100233, and the Howard Hughes Medical Institute. V.V.-M. and W.H. were supported by the Max Planck Society. Authors contributions: N.R., N.A., N.B., O.C., B.J.C., D.F., A.M.L., M.M., J.O., K.S., Z.Z., M.Si., K.D., C.J.E., D.J.K., M.B.R., W.H., R.P., and D.R. performed or supervised laboratory work. J.M.J.A., I.J.T.M., D.C.S.-G., P.C., M.Sa., J.T., M.L., J.F.-E., J.A.M.-A., C.Ba., F.J.B., J.B., N.C., E.V.M., D.V., A.C., J.M.F., O.G.-P., J.I.M., F.X.O., J.M.V., A.D.-C., I.O.-C., P.G.B., A.M.S., C.A.-F., J.J.E., A.M.-M., P.R.-G., J.R.M., E.V.V., K.T.L., J.M., A.W., G.D., B.A., F.C., A.Esp., G.d.P., A.Est., C.F., G.F., S.F., F.G.-G., T.M., A.R., J.V.-V., G.A.A., V.B.G., L.B.d.L.E., M.B.S., G.G.A., M.S.H.P., A.L., Y.C.M., I.C.B., A.F.F., D.L.-S., M.S.T., A.C.V., C.Bl., J.D., M.J.d.P.M., A.A.D.-C., R.F.F., J.F.F., R.G.-P., V.S.G., E.G.-D., A.M.H.-C., J.J.-C., C.L., F.J.L.-C., D.L.-R., S.B.M., M.M.P., A.O.F., G.P.B., P.R., M.S.B., A.C.S., J.M.V.E., M.Si., M.B.R., K.W.A., W.H., R.P., C.L.-F., and D.R. assembled archaeological material. I.O., S.M., N.P., M.F.-B., V.V.-M., M.Si., C.J.E., F.G., M.P., P.S., and D.R. analyzed data. I.O., C.L.-F., and D.R. wrote the manuscript. Competing interests: The authors declare no competing interests. Data and materials availability: Sequencing data are available from the European Nucleotide Archive, accession PRJEB30874; genotype dataset is available as supplementary material.

Stay Connected to Science


Navigate This Article