Report

Genomic Sequencing of Pleistocene Cave Bears

See allHide authors and affiliations

Science  22 Jul 2005:
Vol. 309, Issue 5734, pp. 597-599
DOI: 10.1126/science.1113485

Abstract

Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of ∼1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones contained cave bear inserts, yielding 26,861 base pairs of cave bear genome sequence. Comparison of cave bear and modern bear sequences revealed the evolutionary relationship of these lineages. The metagenomic approach used here establishes the feasibility of ancient DNA genome sequencing programs.

Genomic DNA sequences from extinct species can help reveal the process of molecular evolution that produced modern genomes. However, the recovery of ancient DNA is technologically challenging, because the molecules are degraded and mixed with microbial contaminants, and individual nucleotides are often chemically damaged (1, 2). In addition, ancient remains are invariably contaminated with modern DNA, which amplifies efficiently compared with ancient DNA, and therefore inhibits the detection of ancient genomic sequences (1, 2). These factors have limited most previous studies of ancient DNA sequences to polymerase chain reaction (PCR) amplification of mitochondrial DNA (38). In exceptional cases, small amounts of single-copy nuclear DNA have been recovered from ancient remains less than 20,000 years old obtained from permafrost or desert environments, which are well suited to preserving ancient DNA (912). However, the remains of most ancient animals, including hominids, have not been found in such environments.

To circumvent these challenges, we developed an amplification-independent direct-cloning approach to constructing metagenomic libraries from ancient DNA (Fig. 1). Ancient remains are obtained from natural environments in which they have resided for thousands of years, and their extracted DNA is a mixture of genome fragments from the ancient organism and sequences derived from other organisms in the environment. A metagenomic approach, in which all genome sequences in an environment are anonymously cloned into a single library, may therefore be a powerful alternative to the targeted PCR approaches that have been used to recover ancient DNA molecules. We chose to explore this strategy with the extinct cave bear instead of an extinct hominid, to unambiguously assess the issue of modern human contamination (1, 2). In addition, because of the close evolutionary relationship of bears and dogs, cave bear sequences in these libraries can be identified and classified by comparing them to the available annotated dog genome. The phylogenetic relationship of cave bears and modern bear species has also been inferred from mitochondrial sequences, providing the opportunity to compare the phylogenetic information content of cave bear mitochondrial and genomic DNA (13).

Fig. 1.

Schematic illustration of the ancient DNA extraction and library construction process. Extracts prepared from a cave bear tooth (library CB1) and a bone (library CB2) contain cave bear DNA (red) and a mixture of DNA from other organisms (gray).

We extracted DNA from a cave bear tooth recovered from Ochsenhalt Cave, Austria, and a cave bear bone from Gamssulzen Cave, Austria, dated at 42,290 (error +970/–870) and 44,160 (+1400/–1190) years before the present, respectively, by accelerator mass spectrometry radiocarbon dating (table S1). We used these ancient DNA molecules to construct two metagenomic libraries, designated CB1 and CB2 (Fig. 1) (14). These libraries were constructed in a laboratory into which modern carnivore DNA has never been introduced. Ancient DNA molecules were blunt end–repaired before ligation but were otherwise neither enzymatically treated nor amplified. We sequenced 9035 clones [1.06 megabases (Mb)] from library CB1 and 4992 clones (1.03 Mb) from library CB2. The average insert sizes for each library were 118 base pairs (bp) and 207 bp, respectively.

We compared each insert in these libraries to GenBank nucleotide, protein, and environmental sequences, and the July 2004 dog whole genome shotgun assembly, by using Basic Local Alignment Search Tool (BLAST) software with an expect value cutoff of 0.001 and a minimum hit size of 30 bp (1416). 1.1% of clones in library CB1 (Fig. 2A) and 5.8% of clones in library CB2 (Fig. 2B) had significant hits to dog genome or modern bear sequences. Our direct-cloning approach produces chimeric inserts, so we defined as candidate cave bear sequence only that part of the insert that had a hit to dog or bear sequence. The average hit for the 100 clones in library CB1 with significant hits to carnivore sequence was 68 bp long and covered 58% of the insert; whereas the average hit for the 289 clones in library CB2 with significant carnivore hits was 70 bp and covered 49% of the insert. None of these clones showed homology to cave bear mitochondrial DNA sequences, which is consistent with the expected ratio of nuclear versus mitochondrial clones, given the much greater size of the nuclear genome (14). Based on the amount of cave bear genomic DNA used to construct library CB2, we estimate that it could yield >10-fold coverage of the cave bear genome if sequenced completely (14).

Fig. 2.

Characterization of two independent cave bear genomic libraries. Predicted origin of 9035 clones from library CB1 (A) and 4992 clones from library CB2 (B) are shown, as determined by BLAST comparison to GenBank and environmental sequence databases. Other refers to viral or plasmid-derived DNAs. Distribution of sequence annotation features in 6,775 nucleotides of carnivore sequence from library CB1 (C) and 20,086 nucleotides of carnivore sequence from library CB2 (D) are shown as determined by alignment to the July 2004 dog genome assembly.

BLAST hits to the dog genome from libraries CB1 and CB2 were on average 92.4 and 92.3% identical to dog, respectively. To confirm that these sequences were indeed those of the cave bear, we designed primers against 124 putative cave bear sequences and successfully amplified and sequenced 116 orthologous sequences from the modern brown bear. All 116 of these modern bear sequences were at least 97% identical to their cave bear orthologs, verifying that all or nearly all of the carnivore sequences in both libraries are genuine cave bear genomic sequences. Only 6 of 14,027 sequenced clones had an insert that was identical to modern human genomic DNA (Fig. 2, A and B). The average BLAST hit length to the human genome for these clones was 116 bp, and the average hit covered 76% of the insert. Although we cannot establish a formal insert length or insert coverage threshold that differentiates between ancient and modern inserts because of the limited number of modern sequences we obtained, these values are significantly greater than the corresponding values for clones with cave bear sequences (P < 0.05 for the difference in both average clone length and average insert coverage calculated by two-tailed t test). This result suggests that it may be possible to discriminate between inserts derived from short ancient DNA molecules and inserts containing modern undamaged DNA in ancient DNA libraries. This may have relevance to the application of these methods to ancient hominids, in which the ability to distinguish ancient hominid DNA from modern contamination will be essential.

The remaining inserts with BLAST hits to sequences from known taxa were derived from other eukaryotic sources, such as plants or fungi, or from prokaryotic sources (bacteria and archaea), which provided the majority of known sequences in each library. The end-repair reaction performed on each ancient DNA extract is likely biased toward less-damaged ancient DNA fragments and modern DNA, which could contribute to the abundance of prokaryotic sequences relative to cave bear sequences in these libraries. The representation we observe in the libraries thus reflects the proportion of clonable sequences from each source, not the true abundances of such sequences in the original extracts. However, the results from both libraries demonstrate that substantial quantities of genuine cave bear genomic DNA are efficiently end-repaired and cloned, despite this possible bias. A considerable fraction of inserts in each library (17.3% in library CB1 and 11.2% in library CB2; Fig. 2) had hits only to uncharacterized environmental sequences. The majority of these clones had BLAST hits to GenBank sequences derived from a single soil sample (17), consistent with the contamination of each cave bear bone with soil bacteria from the recovery site. As in other metagenomic sequencing studies, most inserts in each library had no similarity to any sequences in the public databases.

To annotate cave bear genomic sequences, we aligned each cave bear sequence to the dog genome assembly using BLAST-like alignment tool (BLAT) (18). 6.1% of 6775 cave bear nucleotides from library CB1 and 4.1% of 20,086 cave bear nucleotides from library CB2 aligned to predicted dog RefSeq exons, in a total of 21 genes distributed throughout the dog genome (Fig. 2, C and D, and table S2). 4.1% and 6.2% of cave bear nucleotides, respectively, from library CB1 and library CB2 aligned to constrained nonexonic positions in the dog genome with phastCons conservation scores ≥ 0.8 (conserved noncoding, Fig. 2, C and D) (14). The majority of cave bear sequence in each library, however, aligned to dog repeats or regions of the dog genome with no annotated sequence features. These latter sequences are likely fragments of neutrally evolving, nonrepetitive sequence from the cave bear genome. Constrained sequences are slightly overrepresented in our set: Only 1.7% of bases in the dog genome assembly were annotated as RefSeq exons, and ∼10% of the cave bear sequences we obtained appear to be constrained overall, whereas 5 to 8% of positions in sequenced mammalian genomes are estimated to be under constraint (19). This discrepancy may be caused by our use of BLAST sequence similarity to identify cave bear sequences, an approach that is biased in favor of more-constrained sequences. Nevertheless, coding sequences, conserved noncoding sequences, and repeats appear in both cave bear genomic libraries at frequencies roughly proportional to what has been observed in modern mammalian genomes.

To determine whether the cave bear sequences we obtained contain sufficient information to reconstruct the phylogeny of cave bears and modern bears, we generated and aligned 3201 bp of orthologous sequences from cave bears and modern black, polar, and brown bears and estimated their phylogeny by maximum likelihood (Fig. 3A) (20). This phylogeny is topologically equivalent to phylogenies previously obtained using cave bear and modern bear mitochondrial DNA (13). This result further indicates that our libraries contain genuine cave bear sequences and demonstrates that we can obtain sufficient ancient sequences from those libraries to estimate the evolutionary relationships between ancient and modern lineages.

Fig. 3.

(A) Phylogenetic relationship of cave bear and modern bear sequences obtained by maximum-likelihood estimation using 3201 aligned sites. (B) Phylogeny obtained using all sites from (A), excluding cave bear and orthologous modern bear sites corresponding to two heavily damaged cave bear clones (table S3). Substitution rates, total substitutions (black), and GC–AT transitions (red) for each branch are shown. Orthologous dog sequence was used to root the trees. Percent support for internal nodes is also shown.

The substitution rate we estimated for cave bears is higher than that in any other bear lineage. On the basis of results from PCR-amplified ancient mitochondrial DNA, cytosines in ancient DNA can undergo deamination to uracil, which results in an excess of G-to-A and C-to-T (GC–AT) transitions (21). The inflated substitution rate in cave bears is likely caused by an excess of such events, because many of the substitutions assigned to the cave bear lineage are GC–AT transitions (Fig. 3A). These presumably damage-induced substitutions complicate phylogenetic reconstruction and the identification of functional sequence differences between extinct and modern species. However, these substitutions are not randomly distributed among all cave bear sequences but are clustered on a few clones from library CB2. Thus, two clones from library CB2 have three and four GC–AT transitions specific to cave bears, an observation that is extremely unlikely given that the occurrence of true randomly arising substitutions on cave bear clones should follow a Poisson distribution (table S3). When these two clones are excluded from the analysis (Fig. 3B), the apparent substitution rate and the excess of GC–AT transitions in cave bears are reduced, with little impact on estimates of substitution rates in the modern bear lineages. Although we cannot distinguish individual GC–AT substitutions from deamination-induced damage, these observations provide a quantitative means to identify cloned ancient DNA fragments with an excess of cytosine deamination events. It is also likely that heavily degraded ancient DNA fragments are poorly end-repaired and will therefore appear at reduced frequency in ancient DNA genomic libraries.

Although small amounts of genomic sequence have previously been obtained by amplification from <20,000-year-old remains (12), the direct cloning strategy employed here has yielded considerably more genomic DNA sequence from much older samples. Based on our results, ancient DNA sequencing programs for extinct Pleistocene species, including hominids, are feasible using a metagenomic approach. By revealing the phylogenomic terrain of recent mammalian evolution, these efforts should help identify the molecular events underlying adaptive differences among modern species.

Supporting Online Material

www.sciencemag.org/cgi/content/full/1113485/DC1

Materials and Methods

Tables S1 to S3

References

References and Notes

View Abstract

Navigate This Article