Resurrecting Surviving Neandertal Lineages from Modern Human Genomes

See allHide authors and affiliations

Science  28 Feb 2014:
Vol. 343, Issue 6174, pp. 1017-1021
DOI: 10.1126/science.1245938

Neandertal Shadows in Us

Non-African modern humans carry a remnant of Neandertal DNA from interbreeding events that have been postulated to have occurred as humans migrated out of Africa. While the total amount of Neandertal sequence is estimated to be less than 3% of the modern genome, the specific retained sequences vary among individuals. Analyzing the genomes of more than 600 Europeans and East Asians, Vernot and Akey (p. 1017, published online 29 January) identified Neandertal sequences within modern humans that taken together span approximately 20% of the Neandertal genome. Some Neandertal-derived sequences appear to be under positive selection in humans, including several genes associated with skin phenotypes.


Anatomically modern humans overlapped and mated with Neandertals such that non-African humans inherit ~1 to 3% of their genomes from Neandertal ancestors. We identified Neandertal lineages that persist in the DNA of modern humans, in whole-genome sequences from 379 European and 286 East Asian individuals, recovering more than 15 gigabases of introgressed sequence that spans ~20% of the Neandertal genome (false discovery rate = 5%). Analyses of surviving archaic lineages suggest that there were fitness costs to hybridization, admixture occurred both before and after divergence of non-African modern humans, and Neandertals were a source of adaptive variation for loci involved in skin phenotypes. Our results provide a new avenue for paleogenomics studies, allowing substantial amounts of population-level DNA sequence information to be obtained from extinct groups, even in the absence of fossilized remains.

Hybridization between closely related species, and the concomitant transfer or introgression of DNA, is widespread in nature (1, 2). In hominin evolution, the sequencing of Neandertals (3) and their sister lineage, Denisovans (4, 5), provided evidence for introgression of these lineages into modern humans. Specifically, ~1 to 3% of each non-African human genome is estimated to have been inherited from Neandertals (3, 5). Although initial inferences of introgression between Neandertals and humans may not have been robust to alternative explanations—most notably, archaic population structure (3, 6)—subsequent analyses have provided evidence for gene flow (79).

We hypothesized that a substantial amount of the Neandertal genome may be recovered from the analysis of contemporary humans despite the limited amounts of admixture, as introgressed sequences may vary among individuals (Fig. 1A). Coalescent simulations for a broad range of admixture models suggest that 35 to 70% of the Neandertal genome persists in the DNA of present-day humans (figs. S1 and S2) (10). By identifying Neandertal sequences from a large sample of modern humans, we hope to discover surviving lineages that may come from multiple archaic ancestors (Fig. 1A), allowing for the recovery of population-level data.

Fig. 1 Recovering Neandertal lineages from the DNA of modern humans.

(A) Schematic representation illustrating that low levels of introgression may facilitate the recovery of substantial amounts of archaic sequence. Lines represent DNA from contemporary individuals, and colored boxes indicate archaic sequences. Different colored boxes represent sequences inherited from distinct archaic ancestors. (B) Genealogies of loci in Europeans and Africans in the presence of introgression. The expected signature of an introgressed lineage (blue) that our method exploits is high levels of divergence that persists over relatively long haplotype blocks. (C) Receiver operator curve (red) illustrating the performance of S* for detecting an introgressed sequence in simulated data (10). The black diagonal dashed line represents random predictions. (D) Distribution of P values testing for an enrichment of Neandertal variants for S* candidate and randomly selected regions. (E) Amount of Neandertal sequence recovered as a function of FDR. The inset Venn diagram shows the amount of sequence overlap between East Asians (ASN) and Europeans (EUR) at a FDR of 5%. (F) Violin plots showing the distribution of the amount of introgressed sequence identified per individual for East Asian and European populations (population abbreviations are described in table S1).

To identify surviving Neandertal lineages, we developed a two-stage computational strategy (fig. S3) (10). First, we identify candidate introgressed sequences by using an extension of a previously developed summary statistic referred to as S* (11), which is sensitive to the signatures of introgression (Fig. 1B) and is calculated without using the Neandertal reference genome. We performed coalescent simulations for a wide variety of demographic scenarios and found that our implementation of S* can distinguish introgressed from nonintrogressed sequences (Fig. 1C and fig. S4). Second, we refine the set of candidate introgressed sequences using an orthogonal approach by comparing them to the Neandertal reference genome and testing whether they match significantly more than expected by chance (10). We estimate that the use of S* alone, as compared to our two-staged approach, would recover ~30% of Neandertal lineages at a false discovery rate (FDR) = 20% (fig. S5) (10).

We applied this framework to whole-genome sequences from 379 Europeans and 286 East Asians from the 1000 Genomes Project (table S1) (12). Specifically, we calculated S* in 50-kb sliding windows (tables S2 to S8) (10) and used a computationally efficient approach to determine statistical significance through coalescent simulations (fig. S6) (10). At an S* threshold corresponding to P ≤ 0.01, we identified ~40 Gb of candidate introgressed sequence. Note that S* P values are robust to demographic uncertainty (fig. S7). The distribution of Neandertal-match P values for this set of candidate introgressed sequences (Fig. 1D) demonstrates a strong skew toward zero, consistent with the hypothesis that these sequences are strongly enriched for Neandertal lineages. The distribution of Neandertal-match P values for sequences that do not possess significant evidence of introgression, as revealed by S*, is approximately uniform (Fig. 1D) (10), indicating that our statistical approach is able to distinguish between introgressed and nonintrogressed lineages (fig. S8) (10).

At FDR = 5%, we identified more than 15 Gb of introgressed sequence across all individuals, spanning ~20% (600 Mb) of the Neandertal genome (Fig. 1E and table S9). Of the 600 Mb of distinct sequence, ~25% (149 Mb) was shared between Europeans and East Asians. On average, we found 23 Mb of introgressed sequence per individual (Fig. 1F), with East Asian individuals inheriting 21% more Neandertal sequence than Europeans. Within subpopulations, we found small but statistically significant variation in the amount of introgressed sequence among Europeans (Kruskal-Wallis rank sum test, P = 4.2 × 10–12), but not among East Asians (P = 0.43).

The average length of introgressed haplotypes was ~57 kb (Fig. 2A), and ~26% of all protein-coding genes had one or more exons that overlapped a Neandertal sequence (Fig. 2B). On a broad scale, the genomic distribution of Neandertal lineages exhibits marked heterogeneity, with particular chromosomal arms, such as 8q and 17q, depleted of Neandertal sequence (Fig. 2A). These qualitative patterns were confirmed by multiple logistic regression, which showed that chromosomal arm was a significant predictor (P < 10−16) of the odds that a 50-kb window possessed introgressed sequence (10) (Fig. 2C and figs. S9 and S10). Furthermore, odds ratios were negatively correlated with fixed differences between modern humans and Neandertals (Fig. 2D) (Spearman’s ρ = –0.80, P < 5.8 × 10−8). A strong depletion of Neandertal lineages spanning ~17 Mb on 7q encompasses the FOXP2 locus (Fig. 2A), a transcription factor that plays an important role in human speech and language (13). The observed negative correlation between odds ratio and divergence remained significant when East Asians and Europeans were analyzed separately (fig. S11) and when explicitly controlling for the presence of Neandertal lineages in modern humans (10) (figs. S12 and S13). These results suggest that sequence divergence between modern humans and Neandertals was a barrier to gene flow in some regions of the genome and was associated with deleterious fitness consequences (14).

Fig. 2 Genomic distribution of surviving Neandertal lineages.

(A) Neandertal lineages identified in East Asians (ASN, red) and Europeans (EUR, blue). Gray shading denotes regions that did not pass filtering criteria (10); black circles represent centromeres. (B) Visual genotype illustrations of introgressed sequences identified in the BNC2 and POU2F3 genes. Rows denote individuals, columns indicate variant sites, and rectangles are colored according to genotype (red, predicted Neandertal variant that matches the allele present in the Neandertal reference genome; blue, predicted Neandertal variant that does not match the allele present in the Neandertal reference genome; black, other variants). Introgressed variants that overlap a PhastCons conserved element, DNaseI hypersensitive site (DHS), or putative enhancer elements are shown as boxes (10). (C) Odds of finding an introgressed lineage on each chromosomal arm calculated from a logistic regression model (10). Odds ratios (ORs) are expressed using chromosome 1p as the baseline level. Horizontal bars represent 95% CIs. (D) Relation between the OR and the number of fixed differences per megabase between humans and Neandertals. ρ, Spearman’s rank correlation coefficient.

We next leveraged the catalog of introgressed sequences in East Asians and Europeans to refine admixture models and infer parameters of gene flow between modern humans and Neandertals (figs. S14 and S15). Specifically, with the use of an approximate Bayesian computation framework (10), we statistically tested a model with a single pulse of introgression into the common ancestor of Europeans and East Asians (3), as well as a second model with gene flow both in the common ancestor and a second, smaller pulse into East Asians shortly after the two populations split (Fig. 3A). Consistent with recent inferences (5, 9), observed patterns of introgression were incompatible with a one-pulse model (Fig. 3B), suggesting that gene flow between Neandertals and humans occurred multiple times. Although we varied many parameters of each model (10) (fig. S14), only the ratio of ancestral effective population size between Europeans and East Asians (NeEUR/NeASN) and the relative amount of introgression between the second and first pulse (m2/m1) had appreciable effects on model fit (Fig. 3B). We estimate that NeEUR/NeASN is 1.29 [95% confidence interval (CI) of 1.15 to 1.57] and that East Asians received 20.2% (95% CI of 13.4 to 27.1%) more Neandertal sequence in the second pulse (10). We note that additional unexplored models may provide a better fit to the data, and refining demographic models of hominin evolution is an important area for future work.

Fig. 3 Organization and characteristics of Neandertal sequence in Europeans and East Asians suggests at least two admixture events.

(A) Schematic diagrams of the one- and two-pulse admixture models. NeANC, NeASN, and NeEUR denote effective population sizes of the ancestral, East Asian, and European populations, respectively. In the one-pulse model, gene flow (m1) between Neandertals and the ancestors of Europeans and East Asians occurs at time TI. In the two-pulse model, a second pulse of gene flow (m2) occurs into East Asians shortly after the divergence of Europeans and East Asians at time TS. (B) Values of summary statistics calculated from 2000 simulations under each model (red, blue, and grey points; horizontal and vertical bars denote 95% CIs) show that a single-pulse model is incompatible with the observed data (white box, corrected for sample size differences between populations; limits of box denote 95% CI). Simulations that varied NeASN/NeEUR are shown in red, and those with variable m2/m1 are shown in blue (color bars indicate parameter values).

The collection of surviving Neandertal lineages that we identified allows us to search for signatures of adaptive introgression (15, 16). First, we used introgressed variants that exhibit large allele frequency differences between Europeans and East Asians (FST > 0.40, P < 0.001 by simulation) (10) to identify four significantly differentiated regions (Fig. 4 and table S10) (10). Introgressed haplotypes in two of these regions span genes that play important roles in the integumentary system: BNC2 on chromosome 9 and POU2F3 on chromosome 11. BNC2 encodes a zinc finger protein expressed in keratinocytes and other tissues (17) and has been associated with skin pigmentation levels in Europeans (18). The adaptive haplotype has a frequency of ~70% in Europeans and is completely absent in East Asians (Fig. 2B). POU2F3 is a homeobox transcription factor expressed in the epidermis and mediates keratinocyte proliferation and differentiation (19, 20). The adaptive haplotype in East Asians has a frequency of ~66% and is found at less than 1% frequency in Europeans (Fig. 2B). No coding introgressed variants were found in BNC2 or POU2F3, although several highly differentiated introgressed variants were located in functional noncoding elements (21) (Fig. 2B), suggesting that modern humans acquired adaptive regulatory sequences at these loci. We also searched for shared signatures of adaptive introgression between East Asians and Europeans, identifying six distinct regions that have introgressed haplotype frequencies greater than 40% in both populations (Fig. 4 and table S11) (P < 10−4 by simulation) (10). One of these regions lies in the type II cluster of keratin genes on 12q13 (table S11), further suggesting that Neandertals provided modern humans with adaptive variation for skin phenotypes. In total, 8 of the 10 candidate introgressed regions overlap protein-coding genes (Fig. 4).

Fig. 4 Signatures of adaptive introgression.

A scatter plot of introgressed haplotype frequency in Europeans and East Asians is shown. Significantly differentiated and common shared haplotypes are represented in magenta and blue, respectively. Protein-coding genes that overlap candidate adaptively introgressed loci are also shown.

This study shows that the fragmented remnants of the Neandertal genome carried in the DNA of modern humans can be robustly identified, allowing, in aggregate, substantial amounts of Neandertal sequence to be recovered. In principle, our approach can be used in the absence of an archaic reference sequence, potentially allowing the discovery and characterization of previously unknown hominins that interbred with modern humans (2224). This fossil-free paradigm of sequencing archaic genomes holds considerable promise for revealing insights into hominin evolution, the population genetics characteristics of archaic hominins, how introgression has influenced extant patterns of human genomic diversity, and narrowing the search for genetic changes that endow distinctly human phenotypes.

Supplementary Materials

Materials and Methods

Figs. S1 to S15

Tables S1 to S11

References (2545)

References and Notes

  1. Supplementary materials are available on Science Online.
  2. Acknowledgments: We thank members of the Akey laboratory, S. Browning, B. Browning, and J. Duffy for critical feedback related to this work; S. Pääbo for providing access to high-coverage Neandertal sequence data; and L. Jáuregui for help in figure preparation. A description of where sequence data used in our analyses can be found in the supplementary materials. Introgressed regions and variants can be downloaded from J.M.A. is a paid consultant of Glenview Capital.
View Abstract

Navigate This Article