News of the WeekBiodiversity

Biodiversity Databases Spread, Prompting Unification Call

See allHide authors and affiliations

Science  26 Jun 2009:
Vol. 324, Issue 5935, pp. 1632-1633
DOI: 10.1126/science.324_1632

LONDON—“For millennia, researchers have hoarded data, because essentially data [are] power,” says Norman MacLeod, keeper of paleontology at London's Natural History Museum. That attitude has faded in recent years, as scientists increasingly recognize the value of collaborative, open-access data sharing for understanding the world. But there's still a wide gap between wanting to share and figuring out how to do it right, discovered those who attended an international meeting on biodiversity here this month.

MacLeod was one of the organizers of e-Biosphere 09*, a meeting for creators and users of the Encyclopedia of Life (EOL), the Consortium for the Barcode of Life (CBOL), the Catalogue of Life (CoL), and other major efforts to build and manage open-access biodiversity databases. CoL now lists more than 1.1 million species, for example, and EOL has compiled more than 150,000 vetted species pages and 1.4 million short articles, called “stubs,” that will be expanded as information on each species is gathered. “At e-Biosphere, many groups demonstrated they are now providing actual services, around the clock, with interfaces that large numbers of people can use,” says Jesse Ausubel, program director of the Alfred P. Sloan Foundation in New York City. The goal of the conference was to figure out how to combine data from at least 100 systems into one gigantic, online, open-access database that will eventually cover all life on Earth, with lots of information, including primary research.


The bar codes for the Great Horned Owl (above, left) and the Barn Owl show great variation in the mitochondrial gene sequenced from these two birds. A global bird bar-code database is under construction.


But whether these researchers are ready to create one-stop shopping for biodiversity remains to be seen. They are behind in gathering data, some may soon be strapped for cash, and not everyone is eager to share information. “None of the groups has a permanent, sustainable business model,” says Rainer Froese, coordinator of FishBase, a comprehensive database on fish.

Many of the individual projects have run short of funding or underestimated how long it would take to meet their targets. Begun in 2007 with $12.5 million, EOL hoped to profile 700,000 to 1 million species by 2011 (Science, 11 May 2007, p. 818). But 2 years into the project, it has just 150,000 on its computers and will be happy to hit between 500,000 and 700,000 by that point. CoL, essentially a species list that provides a “taxonomic backbone” for many of the other databases, including EOL, has pushed back by 3 years its 2011 goal of covering all 1.75 million known species. It won't say how much it's spent so far but now needs new money to complete the job.

Funding has also been an issue for the International Barcode of Life (iBOL) project, a yet-to-launch international project based at the University of Guelph in Canada that is building a database of DNA bar codes—short sequences of DNA that can be used to “tag” species (Science, 18 February 2005, p. 1037). iBOL needs $60.7 million for the next 6 years. It has promises of $24 million from various Canadian agencies but this year got only $1.7 million of $21.7 million expected from the government's Genome Canada. Paul Hebert of iBOL is confident the other $20 million will be awarded by next February, but that still leaves the project searching for another $15 million or so. iBOL's barcoding centers around the world will have to come up with their own funding as well.

Pick a species.

Like baseball cards, species pages from the Encyclopedia of Life give the vital statistics on an organism.


Scientists have also discovered that making the databases comprehensive is a tough process. The Global Biodiversity Information Facility (GBIF) Web site, for example, allows users to generate biodiversity maps. “If you look at [maps on] the GBIF Web site, you'll see a lot of blank spots,” says CBOL's executive secretary, David Schindel. “Those are either countries that don't have a lot of digital records or for some reason have decided not to share their data.”

Converting scientists from data hoarders to data sharers can still be a problem, says MacLeod. Yet combining data across disciplines can be quite useful. One, for example, can link a species' global-positioning data with its biological makeup, and then with climate-modeling data, to get a clearer picture of threats to its existence.

Because funding is often an issue, projects that have something to offer government agencies may have an easier time keeping afloat. The Fish Barcode of Life Initiative is developing bar codes for commercial fish for the Regulatory Fish Encyclopedia at the U.S. Food and Drug Administration and also for the National Oceanic and Atmospheric Administration. “In general, we are seeing more interest from government agencies [in bar coding],” says Schindel, who hopes these agencies will eventually provide funding for these efforts.

But even if the projects can meet their targets, how will they then ensure that they have sustainable funding to maintain the databases and continue to add new species and primary research data? At the London meeting, the participants agreed that some of their efforts may be short-lived. “A lot of competition is going on, and [some] people are creating similar sorts of tools,” says Schindel.

For projects that do survive, funding will remain a key concern. Dave Roberts of the European Distributed Institute of Taxonomy says that it's “unrealistic to give guarantees” for funding of many projects in the current economic climate—a sentiment echoed by many of the project leaders in attendance. However, many hope they can make a good case for sustained funding, citing the 27-year history of GenBank, a public archive of DNA sequence data. “We know GenBank will still be there 10 years from now,” says CoL's Frank Bisby. “I believe this will also be true of the CoL and several other key components of the biodiversity informatics community.”

Two in one.

They look alike, but these two butterflies are two species—so says their different bar codes—an observation confirmed by watching what their caterpillars eat.


For some of the projects, public involvement and interest outside of the scientific community will be crucial to their sustainability. For EOL, allowing access to update pages and add pictures will help keep the project relevant to the wider world, says EOL's species page group director, Cynthia Sims Parr. But access will be controlled to some extent to avoid “the Wikipedia problem,” which is caused when users introduce errors into articles unchecked. EOL is now recruiting site curators for various groups of species; they will monitor and verify if information added is correct.

But as researchers try to fix problems facing individual projects, they need to also figure out how to integrate these initiatives. Bisby notes, for example, that GBIF failed to include species added this year because it was still using last year's version of CoL's species checklist. In general, scientists must now navigate each database site separately to pull together information needed on a species. But aligning software from all the databases to enable interoperability will be a huge challenge, involving reformatting large amounts of data into a standardized form.

After the conference ended, a smaller working group from the major initiatives discussed a “digital road map” of how to connect the databases on the Web. The first steps they agreed upon included completing a list of global species names, on which many other databases can be built, and reaching out to potential collaborators in the computer science community to help construct the road map. Initial progress will be presented either at October's GBIF GB16 meeting in Copenhagen or the Taxonomic Database Working Group's November 2009 meeting in Montpellier, France. The e-Biosphere working group doesn't have any milestones set yet but hopes to present incremental progress and outline new action items toward creating the road map. Ballpark estimates are that the “virtual laboratory” envisioned will take 10 years to construct. But EOL Executive Director James Edwards says that the “guts” of the system, an integrated database incorporating the largest projects such as EOL, GBIF, and CoL, should be available within 2 years.

  • * The e-Biosphere 09 International Conference on Biodiversity Informatics, London, 3–5 June,

View Abstract

Stay Connected to Science

Navigate This Article