Whole-Genome Shotgun Sequencing of Mitochondria from Ancient Hair Shafts

See allHide authors and affiliations

Science  28 Sep 2007:
Vol. 317, Issue 5846, pp. 1927-1930
DOI: 10.1126/science.1146971


Although the application of sequencing-by-synthesis techniques to DNA extracted from bones has revolutionized the study of ancient DNA, it has been plagued by large fractions of contaminating environmental DNA. The genetic analyses of hair shafts could be a solution: We present 10 previously unexamined Siberian mammoth (Mammuthus primigenius) mitochondrial genomes, sequenced with up to 48-fold coverage. The observed levels of damage-derived sequencing errors were lower than those observed in previously published frozen bone samples, even though one of the specimens was >50,000 14C years old and another had been stored for 200 years at room temperature. The method therefore sets the stage for molecular-genetic analysis of museum collections.

Short fragments of mitochondrial DNA (mtDNA) have been the predominant genetic marker applied to phylogenetic and population-genetic studies of ancient samples (13). Although the use of complete mitochondrial genomes would provide greater analytical power, the degraded state of ancient DNA (aDNA) has prevented recovery and assembly of the full genome by conventional genetic methods. Although aDNA has been applied to phylogenetic questions for more than 20 years (4), only six complete mitochondrial genomes from ancient samples have been explicitly published: four from extinct moa species—Emeus crassus (two genomes), Anomalopteryx didiformis, and Dinornis giganticus (5, 6)—and two from extinct woolly mammoth (Mammuthus primigenius) specimens (7, 8).

Despite the field's slow start, recent developments in DNA amplification, sequencing, and analysis technologies have begun to revolutionize aDNA research, enabling the application of wholegenome shotgun sequencing approaches to a variety of aDNA sources. Recent applications of such approaches have demonstrated that nuclear DNA sequence (nuDNA), in addition to mtDNA, can be recovered and analyzed. For example, Noonan et al.(9) obtained more than 25,000 base pairs (bp) of nuDNA from a 40,000-14C-year-old cave bear (Ursus spelaeus) bone. Using the recently developed sequencing-by-synthesis (SBS) technology (10), Poinar et al.(11) determined 13 million bp (Mbp) of nuDNA from a 28,000-14C-year-old mammoth bone. The success of this study rapidly paved the way for application of SBS to extinct hominid samples and resulted in 1 Mbp of nuDNA from Neandertal bones (12, 13). These reports have set the stage for a new era in aDNA research, but difficult challenges remain. For example, only one of these studies—the one that used exceptionally well-preserved frozen mammoth bone (11)—yielded sufficient quantities of endogenous DNA (i.e., DNA derived from the host and not bacterial, human, or other external contaminants) to make it economical to sequence entire nuclear genomes of extinct species.

Hair shafts are a promising source of aDNA. Long-term hair survival occurs in a variety of natural environments, and large quantities are present in taxonomic collections representing most extant, and many recently extinct, mammalian taxa. Most hair-based genetic studies have used roots, instead of shafts, as a DNA source (14), primarily because hair shafts comprise dead keratinized cells that contain relatively low levels of DNA. However, several studies have reported shafts as a viable source of modern (15) and ancient (16) mtDNA. Furthermore, several properties of shafts suggest that they constitute an attractive DNA source for SBS. First, their relative abundance (when present) renders them preferable to bones, because the destructive nature of sampling can lead to the loss of important morphological information. Second, turnover of keratinocytes in the hair bulb is exceedingly high, second only to that of the cells of the gut epithelium (17). Therefore, baseline mitochondrial levels in these cells (and thus the precortical cells that develop into the bulk of the shaft) may be higher than those in other tissues commonly used for aDNA analyses. Third, even when degraded, shafts are resistant to contamination from exogenous DNA such as bacteria, blood, and skin cells (16, 18). We demonstrate here that hair shafts surpass comparably stored bone as an aDNA source for use in SBS approaches, in regard to preservation and concentration of mtDNA.

We successfully extracted sufficient DNA for SBS from 10 samples of mammoth coat-hair shafts, collected from permafrost deposits spanning northern Siberia [Table 1, Fig. 1, and supporting online material (SOM) text]. Due to the pilot nature of this study, we used as much hair as was readily available (0.2 to 5.2 g per extraction). The degradation of aDNA correlates exponentially with temperature (19), thus DNA survival depends on sample age and the storage history (including the time and temperature at which it has been stored pre- and postcollection). Surprisingly, we successfully extracted DNA from the sample (M13) that had been at room temperature for the longest period and that had the lowest amount of material available [0.2 g, in comparison to 0.75 and 1 g bone (7, 11) and up to 0.4 g frozen muscle (8) used in the previous studies]. Although hair morphology varies significantly both between species (20), and among hair types on individuals, and thus the general applicability of this method remains to be shown, previous studies have demonstrated successful recovery of DNA from a variety of modern hair types and species (SOM text). Thus, this method will likely be widely applicable.

Fig. 1.

Sites of recovery of the mammoth hair specimens whose mitochondrial genome sequences are reported here. The locations of M1, M4, and M5 are not known, but most probably originate from Northern Yakutia (about 66° to 76°N, 106° to 160°E). Recovery sites for other mitochondrial genomes used in this study—Krause (7), Rogaev (8), and Poinar (11, 21)—are indicated as blue squares labeled K, R, and P, respectively.

Table 1.

Description of mammoth mitochondrial sequences, including the year that the sample was discovered, where known; the 14C reference of specimens dated in this study; the percentage of mitochondrial sequences among SBS sequences; the number of contigs assembled out of mitochondrial sequences; the average read length before trimming, based on Krause (7) sequence; the average percentage identity with respect to assembly after automatic computational quality processing (i.e., the final read used in alignment); and the percentage difference from M1 sequence. nd, not determined.

View this table:

The combined use of hair shafts and SBS resulted in 10 full mitochondrial genome sequences, with 7.3- to 48.0-fold coverage (Table 1). The sequences are complete, except that we have not tried to assemble the variable number of tandem repeats (VNTR), which is difficult to sequence [even with polymerase chain reaction (PCR) and sequencing (8)] or to align with any certainty. For example, in the sequence of Krause et al.(7), this region of the mammoth mitochondrial genome is 320 bp, whereas it is 393 bp in the sequence of Rogaev et al. (8), so comparison of these regions is essentially uninformative. Overall, the yield of mtDNA sequence was 5.75 to 26 times as high as that from the permafrost-preserved bone reported previously (11, 21), supporting previous hypotheses that in comparison to bone, the ratio of mtDNA to nuDNA in the hair shaft is elevated (16, 22).

Three widely recognized difficulties are associated with sequencing aDNA: DNA damage, sequencing errors, and numts. Numts are mitochondrial sequences that were inserted into the nuclear genome during genome evolution after duplication and may cause artifacts in PCR-based studies or shotgun assemblies with low coverage. Our approach solves all of these problems through the high redundancy of our sequencing and the fact that SBS targets unique, individual DNA template molecules.

We assessed the state of DNA preservation through two parameters—untrimmed read length and DNA damage [cytosine-to-thymine (C→T) miscoding lesions, derived from the hydrolytic deamination of cytosine to uracil, observed in the pyrosequencing data] (21, 23). The sizes of unbroken aDNA fragments could be measured because the study was conducted on a SBS instrument (Roche GS FLX) that can generate reads up to a length of 250 bp. We observed an average sample-dependent mitochondrial read length between 60.5 and 128.1 bp. The previously described average read length of 101 bp from a bone sample (11) was limited by the instrument read length (Roche GS20), leaving open the possibility that the bone sample retained longer fragments of mtDNA than those that we observed. However, comparing the individual reads versus locations in the assembly consensus sequence containing C, the hair-generated data show a substantially improved (i.e., lower) C→T DNA damage rate of 0.24 to 0.9% versus 1.7% in bone. In contrast to the bone, which was kept frozen for the entire period postexcavation from the permafrost, most of the hair samples have been at room temperature for a number of years (Table 1).

To investigate what effect this might have on the DNA preservation of the samples, we calculated approximate thermal ages (19) of those specimens for which we knew or could estimate sufficient information for the calculation [including for comparison the Poinar mammoth (11)]. The model incorporated temperature data from weather stations local to the respective sites, with altitude correction (lapse rate of +6.5°C km–1) that used elevations estimated from the sample coordinates (with the use of GoogleEarth v.4.1). Furthermore, to control for differences in sample burial depth (and thus temperature of the burial site), the model incorporated two depths of burial for each sample—shallow (where the sample temperature could be expected to fluctuate during the year) and deep (where the sample would experience a constant temperature)—and factored in time and temperature (at a conservative assumption of 10°C) since collection (19) (Fig. 2 and SOM text).

Fig. 2.

Comparison of estimated thermal age of samples against percentage C-T damage with the use of alternative temperature models for Siberia (reflecting the range of published estimates). Approximate thermal ages were calculated according to the methods of Smith et al. (19) for mammoths for which sufficient information was known, with the use of two alternative burial models. The mean 14Cage for each sample is also shown.

The data indicates that although the approximate thermal ages of several of the samples are older than the Poinar mammoth, their numbers of damaged derived miscoding lesions were lower (Fig. 2). The explanation for this remains unclear. It is possible that as hypothesized previously, hair cell keratinization protects the DNA within hair shafts from contact with free water, a requisite of the hydrolytic deamination underlying C→T damage (16). DNA may also be conserved because the hair, in contrast to porous bone, prevents access of bacteria to the site of DNA storage, thereby restricting the breakdown of biopolymers. Alternatively, the observation may be explained by other as-yet untested hypotheses. For example, special properties of hair shaft keratinocytes may confer advantages, such as an absence of postmortem cell autolysis; other molecules within the hair shaft (e.g., melanin) may provide protection; or the relatively unique preservation conditions that hair preservation in the archaeological record requires may in turn limit DNA degradation. Whatever the explanation, DNA degradation within the hair shafts does not appear to conform to current hypotheses about DNA degradation, and by inference the limits within which usable levels of DNA can be recovered from ancient samples may be greater than conventionally believed. This is in many ways unsurprising, given that many models of DNA degradation are based on theoretical degradation rates that were initially calculated to apply to DNA in free solution (19), and therefore it is plausible that their general applicability across biological tissues may not be straightforward.

Sequencing error—i.e., the difference between the (possibly damaged) molecule and the machine output—was also lower with the GS FLX. In all cases, the sum of damage plus sequencing error, as measured by the difference between the consensus sequence and the individual reads, was between 0.14 and 0.4%. Note that a C→T damage rate of 0.8% creates roughly a 0.2% component of the overall error rate, because only about one-quarter of nucleotides are C. Furthermore, although numts have been known to cause complications in mtDNA extracted from various mammalian tissues (including hair from some elephants) (24), a careful analysis (see SOM text for details) showed that contamination of our assemblies by numts was negligible.

Our findings have profound implications on the scope of future studies. Included in our data set are recently discovered mammoth permafrost specimens, including the Jarkov (M2), the Fishhook (M3), and the baby Dima (M8). Perhaps the most well-known sample among those we analyzed is M13, known colloquially as the Adams mammoth. This was the first mammoth to be scientifically studied, and the resulting documentation showed beyond reasonable doubt that an animal species can go extinct. The almost perfectly preserved permafrost mummy was found in 1799 by a hunter of the Tungus tribe, who collected its tusks in the summer of 1804 and eventually helped the Russian botanist Michael Adams to collect the remainder of the specimen in 1806. To this date, the Adams skeleton is one of the most complete, and it has been continuously on display at the Zoological Museum in St. Petersburg (25). In the process of recovering the entire skeleton, large amounts of hair, a total of 36 pounds (16.4 kg), were taken to St. Petersburg and distributed to other institutions around the world for investigation. The hair specimens have been stored for the past 200 years at room temperature, similar to most other samples that might be available for future analysis. Notably, even though these storage conditions are not optimal for DNA preservation (19), we were able to obtain a complete mitochondrial sequence from this specimen with the use of our whole-genome shotgun method, on no more than 0.2 g of hair shaft. The finding that aDNA can be extracted from a specimen kept at room temperature for two centuries puts a large number of collections stored in natural history museums within reach of molecular genomic analysis and may allow us to add molecular-genetic data to the collections of Charles Darwin, Alexander von Humboldt, and Carl von Linné.

Supporting Online Material

SOM Text

Figs. S1 and S2

Tables S1 and S2


References and Notes

View Abstract

Navigate This Article