Considerations in Creating Online Archives

See allHide authors and affiliations

Science  06 Apr 2001:
Vol. 292, Issue 5514, pp. 51
DOI: 10.1126/science.292.5514.51b

In their discussion of a proposed online archive of published science, PubMed Central (PMC), Richard J. Roberts and co-authors and the editors of Science raise excellent points (Science's Compass, Viewpoint, 23 Mar., p. 2318). As Editor-in-Chief of The Journal of Cell Biology (JCB), I agree with much of what they have to say. Like all of the other academic scientist-editors responsible for running the JCB, I am deeply committed to enhancing the free exchange of scientific information. As a result, the JCB will make all of our back content free after 6 months. Although the material will remain on our servers, it will be posted without any password or entrance controls.

If other publishers took similar steps, the most important goal of Roberts et al. could be realized, and without the unavoidable risks that would accompany release of material to multiple servers. We believe that the interests of our authors, readers, and the community at large will be best served by this approach. Efforts would not be duplicated, and the quality of posted material would not be endangered.

Roberts et al. say that making existing electronic content available on other servers would be “natural and simple.” Those of us involved in the day-to-day practicalities of scientific publishing know that the process is anything but simple. The electronic posting and decoding of scientific text and symbols involve complex parsers that are custom-made for each journal. Every new symbol must be sent to and tagged by each server host, a process prone to error. Thus, hosting content with multiple providers would measurably increase this workload. Biologists may tolerate a certain number of errors (e.g., disappearing statistics symbols), but when “μg” is transformed into “mg” in a medical paper, there is cause for concern. The same considerations apply to the reproduction of complex digital images.

There is no need, however, to risk such a loss of quality control, because duplicating content on PMC is unnecessary. Roberts et al. assert that only a single comprehensive collection can be “efficiently indexed, searched, and linked to.” This, however, would be akin to AltaVista claiming that they can only index a Web site if the complete content of that site is sent to them and hosted on their server. Clearly, this is not the case. The ability to search across thousands of servers, as long as those servers do not have access controls, is the very reason that the Web is such a powerful tool. I believe that centralization of information is an outmoded concept. Roberts and co-authors argue that a central repository is necessary for full-text searching; this is also incorrect. PubMed is already developing methods for full-text searching of articles on other servers. And finally, PMC duplicates the archiving efforts of entities such as the journal site developed by HighWire Press (a department at Stanford University, in the University Libraries). If such efforts were redirected, PMC would have funds to help develop cross-server search capabilities and to electronically archive older material that as yet has no electronic presence.

I find it difficult to justify spending public funds that might otherwise be available for research and training to underwrite efforts to provide what already exists, especially when what exists is immensely successful. However, at the JCB we believe that a great deal of good can come from PMC if its supporters will abandon the idea of duplicative and error-prone release of content to multiple servers. Instead, they should focus their efforts on ensuring that all journals, both nonprofit and commercial, make their content freely available to those of us who have produced the work in the first place.

Navigate This Article