PerspectiveHuman Genetics

Primate Shadow Play

See allHide authors and affiliations

Science  28 Feb 2003:
Vol. 299, Issue 5611, pp. 1331-1333
DOI: 10.1126/science.1082931

Programs for the large-scale DNA sequencing of animal and plant genomes seem to be perpetually at a crossroads. With completion of the genome sequencing of human, mouse, rat, several fish and smaller model species, the question arises regarding which organisms should be analyzed next. Different characteristics (including experimental and economic relevance) make other creatures attractive candidates for genome sequencing, but even these criteria generate a rather short list. A more compelling argument is to distribute sequencing efforts around the tree of life in order to maximize the discovery of conserved coding sequences (exons) and regulatory elements. On page 1391 of this issue, Rubin and colleagues (1) present data from their sequencing of select genome regions of multiple primate species closely related to the human. They use these data in a method called “phylogenetic shadowing” that differs from previous cross-species genomic comparisons and works very effectively to reveal coding and regulatory regions in the human genome. Their work argues for prioritization of the genome sequencing of animals that are closely related to us.

The premise of cross-species genomic discovery is that “what is important is conserved.” The basic techniques of cross-species genomic comparison (pioneered long before genome-scale DNA sequencing was possible) and the ability to cross-hybridize DNA probes among species have been widely used to demonstrate the presence of coding regions in the human genome. The emergence of larger amounts of DNA sequence information from distant species has dramatically advanced the value of these techniques, because in silico analyses can define conserved regulatory elements in the genome with high base specificity. Key studies have shown that gene sequences conserved between human and mouse retained their capacity for tissue-specific expression when reconstructed in appropriate cell types (2). Ansari-Lari et al. (3) found new genes and exons with this approach but also observed large numbers of DNA sequence alignments between mouse and human that were noncoding and apparently nonfunctional. The mouse draft genome sequence (4) reveals that these alignments sum to a total of about 40% of the mouse genome, and their ubiquity has unfortunate practical consequences. Even with the excellent software now available (5), the signal from regulatory regions of the genome can be masked by the noise from sequences that are shared but are of no apparent importance. Recent studies from Eric Green's group show that further calibration of the phylogenetic distance of pairs of species can improve cross-species comparisons, but even multiple pairs spaced far apart do not completely overcome the caveats described above (6).

Rubin and co-workers (1) now offer a refreshing variation on the basic principle: “What is important is conserved.” Their method of phylogenetic shadowing, which adds to the repertoire of methods for cross-species sequence comparisons, can be simply restated as “what is not critical can vary—at least some of the time.” This inverted view requires a very different set of data where many closely related species are sampled, rather than pairs of evolutionarily distant species. Their work describes the phylogenetic shadowing of 17 primate species closely related to Homo sapiens, spanning 40 million years of evolution. In this method, sequences of closely related species are compared taking into account the phylogenetic relationships of the species analyzed.

Close examination of the sequence differences among these primate species revealed that although similarity is the rule, unerring conservation is the exception. Summing these exceptions reveals that the coding exons (as expected) as well as multiple regions smaller than typical exons (which may be regulatory elements) are highly conserved (see the figure). To aid the analysis of their primate sequence collection, Rubin's group developed a probabilistic model based on alternative assumptions of evolutionary rates. This model identified the boundaries of those sequences that are most conserved most of the time (see the figure). The authors experimentally analyzed several of these candidate regulatory regions with protein binding tests and gene reporter assays. They found binding of the predicted DNA regulatory sequences to nuclear proteins and enhanced transcription in reporter constructs. These data validated their computational predictions and reinforced the underlying rationale for examining several close human relatives instead of just a few distant ones.

Primates in shadowland.

Phylogenetic shadowing enables multiple comparisons among DNA sequences from closely related primate species including human (1, 7). In this way, the least variable regions of the genome, which should include exons and regulatory elements, can be identified.

At least part of the reason for the success of phylogenetic shadowing is that the sum of the evolutionary distances spanned by several close relatives is as great as that between two distant species. But does this mean that we need to completely sequence the genomes of 17 different primates to gain all this knowledge? The eventual answer may be yes, if we are to get the full benefit of this approach. In the meantime, happily, much of the benefit comes from four to six close relatives of the human. With the readouts of chimpanzee DNA sequences accumulating and the complete sequencing of other primate genomes under discussion, we may be close to generating a basic data set that complements the genome sequencing of our more evolutionarily distant relatives.

Generating the data needed for human phylogenetic shadowing has other potential benefits. Sequencing a collection of closely related primate species could yield a better appreciation for the range of DNA sequence alterations that take place during speciation. These are expected to be a combination of regulatory, structural, and functional changes. Sorting out the contributions of each type of alteration will best be accomplished by both broad and deep comparisons of the genomes of numerous closely related species. Thus, Rubin and colleagues present us with a way forward for the large-scale sequencing projects that will enhance short-term goals to identify gene control elements, as well as long-term aims to understand overall differences among species.

References and Notes

View Abstract

Navigate This Article