Report

Prospects for Building the Tree of Life from Large Sequence Databases

See allHide authors and affiliations

Science  12 Nov 2004:
Vol. 306, Issue 5699, pp. 1172-1174
DOI: 10.1126/science.1102036

You are currently viewing the abstract.

View Full Text

Log in to view the full text

Log in through your institution

Log in through your institution

Abstract

We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two “supermatrices” suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.

    View Full Text