RT Journal Article SR Electronic T1 Prospects for Building the Tree of Life from Large Sequence Databases JF Science JO Science FD American Association for the Advancement of Science SP 1172 OP 1174 DO 10.1126/science.1102036 VO 306 IS 5699 A1 Driskell, Amy C. A1 Ané, Cécile A1 Burleigh, J. Gordon A1 McMahon, Michelle M. A1 O'Meara, Brian C. A1 Sanderson, Michael J. YR 2004 UL http://science.sciencemag.org/content/306/5699/1172.abstract AB We assess the phylogenetic potential of ∼300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two “supermatrices” suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.