Research Article

A Draft Sequence of the Neandertal Genome

Science  07 May 2010:
Vol. 328, Issue 5979, pp. 710-722
DOI: 10.1126/science.1188021
  • Fig. 1

    Samples and sites from which DNA was retrieved. (A) The three bones from Vindija from which Neandertal DNA was sequenced. (B) Map showing the four archaeological sites from which bones were used and their approximate dates (years B.P.).

  • Fig. 2

    Nucleotide substitutions inferred to have occurred on the evolutionary lineages leading to the Neandertals, the human, and the chimpanzee genomes. In red are substitutions on the Neandertal lineage, in yellow the human lineage, and in pink the combined lineage from the common ancestor of these to the chimpanzee. For each lineage and each bone from Vindija, the distributions and numbers of substitutions are shown. The excess of C to T and G to A substitutions are due to deamination of cytosine residues in the Neandertal DNA.

  • Fig. 3

    Divergence of Neandertal and human genomes. Distributions of divergence from the human genome reference sequence among segments of 100 kb are shown for three Neandertals and the five present-day humans.

  • Fig. 4

    Selective sweep screen. (A) Schematic illustration of the rationale for the selective sweep screen. For many regions of the genome, the variation within current humans is old enough to include Neandertals (left). Thus, for SNPs in present-day humans, Neandertals often carry the derived allele (blue). However, in genomic regions where an advantageous mutation arises (right, red star) and sweeps to high frequency or fixation in present-day humans, Neandertals will be devoid of derived alleles. (B) Candidate regions of selective sweeps. All 4235 regions of at least 25 kb where S (see SOM Text 13) falls below two standard deviations of the mean are plotted by their S and genetic width. Regions on the autosomes are shown in orange and those on the X chromosome in blue. The top 5% by S are shadowed in light blue. (C) The top candidate region from the selective sweep screen contains two genes, ZFP36L2 and THADA. The red line shows the log-ratio of the number of observed Neandertal-derived alleles versus the number of expected Neandertal-derived alleles, within a 100 kilobase window. The blue dots above the panel indicate all SNP positions, and the green dots indicate SNPs where the Neandertal carries the derived allele.

  • Fig. 5

    Segments of Neandertal ancestry in the human reference genome. We examined 2825 segments in the human reference genome that are of African ancestry and 2797 that are of European ancestry. (A) European segments, with few differences from the Neandertals, tend to have many differences from other present-day humans, whereas African segments do not, as expected if the former are derived from Neandertals. (B) Scatter plot of the segments in (A) with respect to their divergence to the Neandertals and to Venter. In the top left quandrant, 94% of segments are of European ancestry, suggesting that many of them are due to gene flow from Neandertals.

  • Fig. 6

    Four possible scenarios of genetic mixture involving Neandertals. Scenario 1 represents gene flow into Neandertal from other archaic hominins, here collectively referred to as Homo erectus. This would manifest itself as segments of the Neandertal genome with unexpectedly high divergence from present-day humans. Scenario 2 represents gene flow between late Neandertals and early modern humans in Europe and/or western Asia. We see no evidence of this because Neandertals are equally distantly related to all non-Africans. However, such gene flow may have taken place without leaving traces in the present-day gene pool. Scenario 3 represents gene flow between Neandertals and the ancestors of all non-Africans. This is the most parsimonious explanation of our observation. Although we detect gene flow only from Neandertals into modern humans, gene flow in the reverse direction may also have occurred. Scenario 4 represents old substructure in Africa that persisted from the origin of Neandertals until the ancestors of non-Africans left Africa. This scenario is also compatible with the current data.

  • Table 1

    Estimates of human DNA contamination in the DNA sequences produced. Numbers in bold indicate summary contamination estimates over all Vindija data.

    mtDNA contaminationY chromosomal contaminationNeandertal diversity (1/2)
    plus contamination*
    Nuclear ML contamination
    HumanNeandertalPercent95% C.I.ObservedExpectedPercent95% C.I.PercentUpper 95% C.I.Percent
    (95% C.I.)
    Vi33.165620,4560.270.21–0.3542551.570.43–3.971.42.2n/a
    Vi33.2571,6910.410.17–0.8502010.00.00–1.821.01.7n/a
    Vi33.26104,8100.210.10–0.3802100.00.00–1.741.11.9n/a
    All data7326,9570.270.21–0.3446660.600.16–1.531.21.60.7 (0.6–0.8)

    *Assuming similar extents of contamination in the three bones and that individual heterozygosity and population nucleotide diversity is the same for this class of sites.

    • Table 2

      Amino acid changes that are fixed in present-day humans but ancestral in Neandertals. The table is sorted by Grantham scores (GS). Based on the classification proposed by Li et al. in (87), 5 amino acid substitutions are radical (>150), 7 moderately radical (101 to 150), 33 moderately conservative (51 to 100) and 32 conservative (1 to 50). One substitution creates a stop codon. Genes showing multiple substitutions have bold SwissProt identifiers. (Table S15 shows the human and chimpanzee genome coordinates, additional database identifiers, and the respective bases.) Genes with two fixed amino acids are indicated in bold.

      IDPosAAGSDescription/function
      RPTN785*/RMultifunctional epidermal matrix protein
      GREB11164R/C180Response gene in estrogen receptor–regulated pathway
      OR1K1267R/C180Olfactory receptor, family 1, subfamily K, member 1
      SPAG17431Y/D160Involved in structural integrity of sperm central apparatus axoneme
      NLRX1330Y/D160Modulator of innate immune response
      NSUN378S/F155Protein with potential SAM-dependent methyl-transferase activity
      RGS16197D/A126Retinally abundant regulator of G-protein signaling
      BOD1L2684G/R125Biorientation of chromosomes in cell division 1-like
      CF170505S/C112Uncharacterized protein: C6orf170
      STEA1336C/S112Metalloreductase, six transmembrane epithelial antigen of prostate 1
      F16A2630R/S110Uncharacterized protein: family with sequence similarity 160, member A2
      LTK569R/S110Leukocyte receptor tyrosine kinase
      BEND2261V/G109Uncharacterized protein: BEN domain-containing protein 2
      O52W151P/L98Olfactory receptor, family 52, subfamily W, member 1
      CAN15427L/P98Small optic lobes homolog, linked to visual system development
      SCAP140I/T89Escort protein required for cholesterol as well as lipid homeostasis
      TTF1474I/T89RNA polymerase I termination factor
      OR5K4175H/D81Olfactory receptor, family 5, subfamily K, member 4
      SCML1202T/M81Putative polycomb group (PcG) protein
      TTL10394K/T78Probable tubulin polyglutamylase, forming polyglutamate side chains on tubulin
      AFF3516S/P74Putative transcription activator, function in lymphoid development/oncogenesis
      EYA2131S/P74Tyrosine phosphatase, dephosphorylating “Tyr-142” of histone H2AX
      NOP14493T/R71Involved in nucleolar processing of pre-18S ribosomal RNA
      PRDM101129N/T65PR domain containing 10, may be involved in transcriptional regulation
      BTLA197N/T65B and T lymphocyte attenuator
      O2AT4224V/A64Olfactory receptor, family 2, subfamily AT, member 4
      CAN15356V/A64Small optic lobes homolog, linked to visual system development
      ACCN4160V/A64Amiloride-sensitive cation channel 4, expressed in pituitary gland
      PUR8429V/A64Adenylsuccinate lyase (purine synthesis)
      MCHR2324A/V64Receptor for melanin-concentrating hormone, coupled to G proteins
      AHR381V/A64Aromatic hydrocarbon receptor, a ligand-activated transcriptional activator
      FAAH1476A/G60Fatty acid amide hydrolase
      SPAG171415T/A58Involved in structural integrity of sperm central apparatus axoneme
      ZF106697A/T58Zinc finger protein 106 homolog / SH3-domain binding protein 3
      CAD16342T/A58Calcium-dependent, membrane-associated glycoprotein (cellular recognition)
      K1C16306T/A58Keratin, type I cytoskeletal 16 (expressed in esophagus, tongue, hair follicles)
      LIMS2360T/A58Focal adhesion protein, modulates cell spreading and migration
      ZN502184T/A58Zinc finger protein 502, may be involved in transcriptional regulation
      MEPE391A/T58Matrix extracellular phosphoglycoprotein, putative role in mineralization
      FSTL4791T/A58Follistatin-related protein 4 precursor
      SNTG1241T/S58Syntrophin, gamma 1; binding/organizing subcellular localization of proteins
      RPTN735K/E56Multifunctional epidermal matrix protein
      BCL9L543S/G56Nuclear cofactor of beta-catenin signaling, role in tumorigenesis
      SSH21033S/G56Protein phosphatase regulating actin filament dynamics
      PEG31521S/G56Apoptosis induction in cooperation with SIAH1A
      DJC28290K/Q53DnaJ (Hsp40) homolog, may have role in protein folding or as a chaperone
      CLTR250F/V50Receptor for cysteinyl leukotrienes, role in endocrine and cardiovascular systems
      KIF15827N/S46Putative kinesin-like motor enzyme involved in mitotic spindle assembly
      SPOC1355Q/R43Uncharacterized protein: SPOC domain containing 1
      TTF1229R/Q43RNA polymerase I termination factor
      F166A134T/P38Uncharacterized protein: family with sequence similarity 166, member A
      CL066426V/L32Uncharacterized protein: chromosome 12 open reading frame 66
      PCD16763E/Q29Calcium-dependent cell-adhesion protein, fibroblasts expression
      TRPM51088I/V29Voltage-modulated cation channel (VCAM), central role in taste transduction
      S36A4330H/R29Solute carrier family 36 (proton/amino acid symporter)
      GP132328E/Q29High-affinity G-protein couple receptor for lysophosphatidylcholine (LPC)
      ZFY26237H/R29Zinc finger FYVE domain-containing, associated with spastic paraplegia-15
      CALD1671I/V29Actin- and myosin-binding protein, regulation of smooth muscle contraction
      CDCA2606I/V29Regulator of chromosome structure during mitosis
      GPAA1275E/Q29Glycosylphosphatidylinositol anchor attachment protein
      ARSF200I/V29Arylsulfatase F precursor, relevant for composition of bone and cartilage matrix
      OR4D9303R/K26Olfactory receptor, family 4, subfamily D, member 9
      EMIL2155R/K26Elastin microfibril interface-located protein (smooth muscle anchoring)
      PHLP216K/R26Putative modulator of heterotrimeric G proteins
      TKTL1317R/K26Transketolase-related protein
      MIIP280H/Q24Inhibits glioma cells invasion, down-regulates adhesion and motility genes
      SPTA1265N/D23Constituent of cytoskeletal network of the erythrocyte plasma membrane
      PCD16777D/N23Calcium-dependent cell-adhesion protein, fibroblasts expression
      CS028326L/F22Uncharacterized protein: chromosome 19 open reading frame 28
      PIGZ425L/F22Mannosyltransferase for glycosylphosphatidylinositol-anchor biosynthesis
      DISP11079V/M21Segment-polarity gene required for normal Hedgehog (Hh) signaling
      RNAS744M/V21Protein with RNase activity for broad-spectrum of pathogenic microorganisms
      KR241205V/M21Keratin-associated protein, formation of a rigid and resistant hair shaft
      SPLC3108I/M10Short palate, lung, and nasal epithelium carcinoma-associated protein
      NCOA6823I/M10Hormone-dependent coactivation of several receptors
      WWC2479M/I10Uncharacterized protein: WW, C2, and coiled-coil domain containing 2
      ASCC1301E/D0Enhancer of NF-kappa-B, SRF, and AP1 transactivation
      PROM2458D/E0Plasma membrane protrusion in epithelial and nonepithelial cells
    • Table 3

      Top 20 candidate selective sweep regions.

      Region (hg18)SWidth (cM)Gene(s)
      chr2:43265008-43601389-6.040.5726ZFP36L2;THADA
      chr11:95533088-95867597-4.780.5538JRKL;CCDC82;MAML2
      chr10:62343313-62655667-6.10.5167RHOBTB1
      chr21:37580123-37789088-4.50.4977DYRK1A
      chr10:83336607-83714543-6.130.4654NRG3
      chr14:100248177-100417724-4.840.4533MIR337;MIR665;DLK1;RTL1;MIR431;MIR493;MEG3;MIR770
      chr3:157244328-157597592-60.425KCNAB1
      chr11:30601000-30992792-5.290.3951
      chr2:176635412-176978762-5.860.3481HOXD11;HOXD8;EVX2;MTX2;HOXD1;HOXD10;HOXD13;
      HOXD4;HOXD12;HOXD9;MIR10B;HOXD3
      chr11:71572763-71914957-5.280.3402CLPB;FOLR1;PHOX2A;FOLR2;INPPL1
      chr7:41537742-41838097-6.620.3129INHBA
      chr10:60015775-60262822-4.660.3129BICC1
      chr6:45440283-45705503-4.740.3112RUNX2;SUPT3H
      chr1:149553200-149878507-5.690.3047SELENBP1;POGZ;MIR554;RFX5;SNX27;CGN;TUFT1;PI4KB;PSMB4
      chr7:121763417-122282663-6.350.2855RNF148;RNF133;CADPS2
      chr7:93597127-93823574-5.490.2769
      chr16:62369107-62675247-5.180.2728
      chr14:48931401-49095338-4.530.2582
      chr6:90762790-90903925-4.430.2502BACH2
      chr10:9650088-9786954-4.560.2475
    • Table 4

      Neandertals are more closely related to present-day non-Africans than to Africans. For each pair of modern humans H1 and H2 that we examined, we reported D (H1, H2, Neandertal, Chimpanzee): the difference in the percentage matching of Neandertal to two humans at sites where Neandertal does not match chimpanzee, with ±1 standard error. Values that deviate significantly from 0% after correcting for 38 hypotheses tested are highlighted in bold (|Z| > 2.8 SD). Neandertal is skewed toward matching non-Africans more than Africans for all pairwise comparisons. Comparisons within Africans or within non-Africans are all consistent with 0%.

      Population comparisonH1H2% Neandertal matching to H2 – % Neandertal matching to H1(±1 standard error)
      ABI3730 sequencing (~750 bp reads) used to discover H1-H2 differences
      African to AfricanNA18517 (Yoruba)NA18507 (Yoruba)-0.1 ± 0.6
      NA18517 (Yoruba)NA19240 (Yoruba)1.5 ± 0.7
      NA18517 (Yoruba)NA19129 (Yoruba)-0.1 ± 0.7
      NA18507 (Yoruba)NA19240 (Yoruba)-0.5 ± 0.6
      NA18507 (Yoruba)NA19129 (Yoruba)0.0 ± 0.5
      NA19240 (Yoruba)NA19129 (Yoruba)-0.6 ± 0.7
      African to Non-AfricanNA18517 (Yoruba)NA12878 (European)4.1 ± 0.8
      NA18517 (Yoruba)NA12156 (European)5.1 ± 0.7
      NA18517 (Yoruba)NA18956 (Japanese)2.9 ± 0.8
      NA18517 (Yoruba)NA18555 (Chinese)3.9 ± 0.7
      NA18507 (Yoruba)NA12878 (European)4.2 ± 0.6
      NA18507 (Yoruba)NA12156 (European)5.5 ± 0.6
      NA18507 (Yoruba)NA18956 (Japanese)5.0 ± 0.7
      NA18507 (Yoruba)NA18555 (Chinese)5.8 ± 0.6
      NA19240 (Yoruba)NA12878 (European)3.5 ± 0.7
      NA19240 (Yoruba)NA12156 (European)3.1 ± 0.7
      NA19240 (Yoruba)NA18956 (Japanese)2.7 ± 0.7
      NA19240 (Yoruba)NA18555 (Chinese)5.4 ± 0.9
      NA19129 (Yoruba)NA12878 (European)3.9 ± 0.7
      NA19129 (Yoruba)NA12156 (European)4.9 ± 0.7
      NA19129 (Yoruba)NA18956 (Japanese)5.1 ± 0.8
      NA19129 (Yoruba)NA18555 (Chinese)4.7 ± 0.8
      Non-African to Non-AfricanNA12878 (European)NA12156 (European)-0.5 ± 0.8
      NA12878 (European)NA18956 (Japanese)0.4 ± 0.8
      NA12878 (European)NA18555 (Chinese)0.3 ± 0.8
      NA12156 (European)NA18956 (Japanese)-0.3 ± 0.8
      NA12156 (European)NA18555 (Chinese)1.3 ± 0.7
      NA18956 (Japanese)NA18555 (Chinese)2.5 ± 0.9
      Illumina GAII sequencing (~76 bp reads) used to discover H1-H2 differences
      African - AfricanHGDP01029 (San)HGDP01029 (Yoruba)-0.1 ± 0.4
      African to Non-AfricanHGDP01029 (San)HGDP00521 (French)4.2 ± 0.4
      HGDP01029 (San)HGDP00542 (Papuan)3.9 ± 0.5
      HGDP01029 (San)HGDP00778 (Han)5.0 ± 0.5
      HGDP01029 (Yoruba)HGDP00521 (French)4.5 ± 0.4
      HGDP01029 (Yoruba)HGDP00542 (Papuan)4.4 ± 0.6
      HGDP01029 (Yoruba)HGDP00778 (Han)5.3 ± 0.5
      Non-African to Non-AfricanHGDP00521 (French)HGDP00542 (Papuan)0.1 ± 0.5
      HGDP00521 (French)HGDP00778 (Han)1.0 ± 0.6
      HGDP00542 (Papuan)HGDP00778 (Han)0.7 ± 0.6
    • Table 5

      Non-African haplotypes match Neandertal at an unexpected rate. We identified 13 candidate gene flow regions by using 48 CEU+ASN to represent the OOA population, and 23 African Americans to represent the AFR population. We identified tag SNPs for each region that separate an out-of-Africa specific clade (OOA) from a cosmopolitan clade (COS) and then assessed the rate at which Neandertal matches each of these clades by further subdividing tag SNPs based on their ancestral and derived status in Neandertal and whether they match the OOA-specific clade or not. Thus, the categories are AN (Ancestral Nonmatch), DN (Derived Nonmatch), DM (Derived Match), and AM (Ancestral Match). We do not list the sites where matching is ambiguous.

      ChromosomeStart of candidate
      region in Build 36
      End of candidate
      region in Build 36
      Span (bp)ST
      (estimated ratio of OOA/AFR gene tree depth)
      Average frequency of tag in OOA cladeNeandertal (M)atches OOA-specific clade
      AM DM
      Neandertal does (N)ot match OOA-specific clade
      AN DN
      Qualitative assessment*
      1168,110,000168,220,000110,0002.96.3%51010OOA
      1223,760,000223,910,000150,0002.86.3%1400OOA
      4171,180,000171,280,000100,0001.95.2%1200OOA
      528,950,00029,070,000120,0003.83.1%161660OOA
      666,160,00066,260,000100,0005.728.1%6600OOA
      932,940,00033,040,000100,0002.84.2%71400OOA
      104,820,0004,920,000100,0002.69.4%9500OOA
      1038,000,00038,160,000160,0003.58.3%5920OOA
      1069,630,00069,740,000110,0004.219.8%2201OOA
      1545,250,00045,350,000100,0002.51.1%5610OOA
      1735,500,00035,600,000100,0002.9(no tags)
      2020,030,00020,140,000110,0005.164.6%00105COS
      2230,690,00030,820,000130,0003.54.2%0252COS
      Relative tag SNP frequencies in actual data34%46%15%5%
      Relative tag SNP simulated under a demographic model without introgression34%5%33%27%
      Relative tag SNP simulated under a demographic model with introgression23%31%37%9%

      *To qualitatively assess the regions in terms of which clade the Neandertal matches, we asked whether the proportion matching the OOA-specific clade (AM and DM) is much more than 50%. If so, we classify it as an OOA region, and otherwise a COS region. One region is unclassified because no tag SNPs were found. We also compared to simulations with and without gene flow (SOM Text 17), which show that the rate of DM and DN tag SNPs where Neandertal is derived are most informative for distinguishing gene flow from no gene flow.

      Additional Files


      • A Draft Sequence of the Neandertal Genome
        Richard E. Green, Johannes Krause, Adrian W. Briggs, Tomislav Maricic, Udo Stenzel, Martin Kircher, Nick Patterson, Heng Li, Weiwei Zhai, Markus Hsi-Yang Fritz, Nancy F. Hansen, Eric Y. Durand, Anna-Sapfo Malaspinas, Jeffrey D. Jensen, Tomas Marques-Bonet, Can Alkan, Kay Prüfer, Matthias Meyer, Hernán A. Burbano, Jeffrey M. Good, Rigo Schultz, Ayinuer Aximu-Petri, Anne Butthof, Barbara Höber, Barbara Höffner, Madlen Siegemund, Antje Weihmann, Chad Nusbaum, Eric S. Lander, Carsten Russ, Nathaniel Novod, Jason Affourtit, Michael Egholm, Christine Verna, Pavao Rudan, Dejana Brajkovic, Zeljko Kucan, Ivan Gusic, Vladimir B. Doronichev, Liubov V. Golovanova, Carles Lalueza-Fox, Marco de la Rasilla, Javier Fortea, Antonio Rosas, Ralf W. Schmitz, Philip L. F. Johnson, Evan E. Eichler, Daniel Falush, Ewan Birney, James C. Mullikin, Montgomery Slatkin, Rasmus Nielsen, Janet Kelso, Michael Lachmann, David Reich, Svante Pääbo

        Supporting Online Material

        This supplement contains:
        Materials and Methods
        SOM Text
        Figs. S1 to S51
        Tables S1 to S58
        References

        This file is in Adobe Acrobat PDF format.

      Cited By...