PerspectiveBiochemistry

Unlocking Ancient Protein Palimpsests

See allHide authors and affiliations

Science  21 Mar 2014:
Vol. 343, Issue 6177, pp. 1320-1322
DOI: 10.1126/science.1249274

Over the past 30 years, ancient DNA studies have grown from a scientific curiosity into a powerful research tool with applications in molecular evolution, archaeology, paleontology, conservation genetics, and forensics. The analysis of ancient proteins has lagged behind that of ancient DNA, but recent developments in high-throughput, high-resolution mass spectrometry (MS) have the potential to provide the accuracy and robustness required for confident and reliable sequencing of ancient proteins. This methodology may allow biomolecular recovery to be extended further back in time, because DNA chains fall apart 10 times faster (1) than proteins (2). Furthermore, mass spectrometry enables investigation of protein expression specific for different tissues, developmental phases, or biological processes, including disease states.

Although the study of ancient proteins can trace its roots back to 1954 (see the figure), analytical methods were unable to obtain sequence information. Bulk amino acid analysis and antibody-based immunoassays provided only tentative or indirect protein identification of previously defined targets. Edman degradation sequencing (by scission) requires high concentrations of a purified and undamaged target protein. This is often difficult in fresh samples, and is wholly unsuited for the analysis of complex mixtures of degraded ancient proteins.

First applied to ancient proteins in 2000 (3), mass spectrometry is particularly well suited to studying ancient proteins, because it characterizes (and quantifies) proteins by identifying multiple short peptides released from complex mixtures after enzymatic digestion. Initial studies focused on the most abundant proteins (osteocalcin and collagen) from the most prevalent archaeological fossil material: bone. For example, claims of collagen sequences recovered from dinosaur bones were used to provide molecular evidence to support their phylogenetic placement (4). This approach is now mainly used to identify bone fragments on the basis of collagen peptide fingerprints (5). More recently, attempts have been made to identify multiple proteins in complex mixtures, based on methodological improvements that enable the recovery of up to hundreds of different proteins from ancient samples (6).

For example, shotgun proteomics applied to human medieval dental calculus identified several bacterial highly antigenic virulence proteins, such as gingipains, known to provoke strong immunological reaction. Inflammatory and anti-inflammatory host proteins involved in the innate immune system response, some of which have antimicrobial and bactericidal properties, were also identified (7). In a 500-year-old Inca mummy, a similar approach detected the presence of several proteins consistent with severe inflammation and neutrophil infiltration of the airways at the time of death, most probably as a host immune system response to severe bacterial infection (8).

A short history of ancient protein analysis.

Collagen-like amino acid profiles were first detected in ancient fossil bone in 1954. The detection of immunologically cross-reactive material in 1974 led to a number of antibody-based investigations of ancient proteins. Efforts to directly sequence proteins began with Edman degradation in 1980, but the methodology was not well suited to ancient proteins. In 2000, following the advent of soft-ionization mass spectrometry, the pace of research on ancient proteins increased (3), with an initial focus on the identification and sequencing of single target proteins. Advances in high-resolution instrumentation now allow researchers to explore ancient proteomes (6).

CREDIT: P. HUEY/SCIENCE

The key technological basis for these advances has been the development and availability of high-resolution mass spectrometers. The latest generation of these instruments can reliably and (relatively) rapidly detect proteins in complex mixtures, even at the limited quantities typical of ancient samples. Despite these developments, ancient protein research is still in its infancy, comparable to ancient DNA research at the turn of the millennium. Methods appropriate for typical modern samples must be modified to take into account the lower concentrations and higher levels of damage found in ancient samples. Such methodological tailoring is now being undertaken; for example, sample preparation has been substantially simplified to minimize losses (9).

As in the case of ancient DNA, it is important to analyze the target and identify or exclude secondary contaminants. DNA and protein contamination, though, are conceptually different. Unlike DNA, proteins are not amplified, and damage in the extracted sample (10) is the first evidence of authenticity. Furthermore, the tissue-specific nature of protein expression provides a powerful additional test of authenticity. As in any mass spectrometry experiment, extensive fragmentation and accurate measurements of precursor and fragment ions are critical for identification. Candidate lists of identified peptides or proteins should be validated on the basis of strict filtering statistics and manual inspection of the key evidence. Both the raw data and details of search criteria should be publicly available. Sensitive samples should be prepared in spaces that meet standards equivalent to those used for ancient DNA extraction. Last but not least, contamination from proteins routinely used as MS standards can occur in proteomics facilities; evidence for wool (sheep keratins), milk (bovine casein), and albumins should therefore be validated extremely carefully.

Peptide sequences are routinely identified by matching spectra against a protein reference database, usually derived from annotation of the genome sequence of the biological species under investigation. However, ancient genomic data are rarely of sufficient quality to enable confident gene prediction. Instead, MS data sets from extinct species need to be searched against a protein reference database of the nearest available extant relatives; for example, mammoth MS spectra must be searched against the elephant reference protein list (6). This approach will bias against novel sequences, which are the most interesting for pathophysiological and phylogenetic reconstructions. De novo or hybrid sequencing solutions, initially developed for antibody sequencing or to improve peptide identification in unsequenced living organisms, allow for identification of amino acid substitutions not previously reported (11). This approach, although still computationally challenging at the moment, has the potential to provide groundbreaking results.

We remain remarkably ignorant of the details of protein degradation. Adoption of tools to search the entire spectrum of known spontaneous chemical modifications affecting amino acids is beginning to uncover more details of decomposition pathways. This in turn will feed information back into studies of long-lived proteins and the processing and storage of proteinaceous materials. Massively parallel DNA sequencing-by-synthesis has revealed patterns of purine deamination, used to support authenticity of sequences. Similarly, researchers are now exploring patterns of glutamine deamidation (10) as a marker of protein age.

The full potential of ancient proteomics will only be realized once quantification of ancient protein expression is achieved. Quantitative proteomics is a growing trend within biomedical research, underpinning the ability to understand cellular function and regulation. The foreseeable application to ancient samples of quantitative proteomics methods could offer the opportunity to reconstruct pathophysiological phenotypes characterized by specific protein expression patterns and not necessarily hard-coded at the DNA level.

Ongoing developments in the analysis of ancient biomolecules and integration of high-throughput methods to sequence multiple categories of ancient biomolecules have the potential to provide insights into biological processes in the distant past. Ancient DNA analysis remains at the forefront of ancient biomolecular studies (12, 13). Ancient proteomics has the potential to complement this exciting work—for example, to shed light on disease processes that cannot be captured by DNA studies.

References

Navigate This Article