Science and the Semantic Web

See allHide authors and affiliations

Science  24 Jan 2003:
Vol. 299, Issue 5606, pp. 520-521
DOI: 10.1126/science.1078874

Scientists have become increasingly reliant on the World Wide Web [HN1] for supporting their research endeavors. The Web is used for finding preprints and papers in online repositories, for participating in online discussions at sites such as Science Online, for accessing databases through specialized Web interfaces, and even for ordering scientific supplies. When searching [HN2] for a specific Web site or a paper on a particular topic, engines like Google can do a phenomenal job of sorting through billions of possibilities and identifying potentially useful candidates, often within the first few search results. On specialized Web sites, domain-specific search engines can do even better, for example, enabling the mathematician to easily find papers on “symplectic geometry” or the physicist to see preprints relating to “mesoscopic systems and the quantum hall effect.” In fact, the Web has become indispensible for supporting the traditional communications within our disciplines and the needs of scientists within their disciplinary boundaries.

However, as modern science continues its exponential growth in complexity and scope, the need for more collaboration among scientists at different institutions, in different subareas, and across scientific disciplines is becoming increasingly important. Researchers working at one level of analysis may need to find and explore results from another level, from another part of the field, or from a completely different scientific field. On the Web, however, scientists looking for results in sites developed for different scientific communities are often at a loss. For example, a scientist searching for a technique to analyze some image-based data may not know to look for papers on Laplacean invariants (found under the symplectic geometry category in many math sites). A general search on image analysis will find thousands of possibilities but will provide little or no guidance as to which sites can explain how to use the techniques, as opposed to finding papers formalizing the mathematical background, sites for instructors teaching the topics, or reports describing a case where the technique was used. In addition, the Web is even more limited when it comes to the integration of information from multiple sites or for looking for nontextual information. Current Web technology is clearly insufficient for the needs of interdisciplinary science and comes up short when it comes to supporting the needs of the collaborative and interdisciplinary “e-Science.” Fortunately, new Web technologies are emerging with the potential to revolutionize the ability of scientists to do collaborative work. However, to realize this potential, scientists and information technologists must forge new models of cooperation, and new thinking must go into the funding and dissemination of this next generation of scientific tools on the Web.


A new generation of Web technology, called the Semantic Web [HN3] (1), is designed to improve communications between people using differing terminologies, to extend the interoperability of databases, to provide tools for interacting with multimedia collections, and to provide new mechanisms for the support of “agent-based” computing [HN4] in which people and machines work more interactively.

Whereas the current Web provides links between pages that are designed for human consumption, the Semantic Web augments this with pages designed to contain machine-readable descriptions of Web pages and other Web resources. These documents can be linked together to provide information to the computer as to how the terms in one relate to those in another. To achieve this, the Semantic Web uses new Web languages based on RDF (the Resource Description Framework) [HN5] (2), which go beyond the presentation capabilities of HTML [HN6] (Hypertext Markup Language, which is used on most Web sites today) and the document-tagging capabilities of the Extensible Markup Language (XML) [HN7], a more recent innovation being used to allow parts of documents to be more precisely delineated.

The Center for Bioinformatics of the U.S. National Cancer Institute (NCI), as part of the Metathesaurus [HN8] project (3), is turning a large vocabulary of cancer research terms into a machine-readable “ontology” [HN9]—essentially an expanded thesaurus that delineates precise relationships between the vocabulary items and that is available in the RDF-based Web Ontology Language, OWL [HN10] (4). For example, it can provide an expanded definition for an oncogene like the one shown in the center of the figure (left), with respect to organism, function, locus, and associated diseases. Specifically, MYC is found in humans, it has the functions of gene transcription and transcriptional regulation (each of which would be defined elsewhere in the ontology), its unique location is 8q24, and it is associated with Burkitt's lymphoma. In addition, the definition contains some restrictions on these properties, for example, that there can only be a single, unique value for the chromosomal location and that at least one of the diseases associated with an oncogene must be a cancer.

These new documents provide a way to build a knowledge base that is not restricted to particular keywords and to situate this knowledge base in a distributed way among the documents and resources on the Web. Thus, the oncogene definition, by virtue of being made machine-readable can be linked to by different Web sites, databases, devices, and programs. A Web page describing an ongoing research project in Burkitt's lymphoma could be linked to the definition of MYC by the fact that the disease is associated with that gene. A biotechnologist who has been sequencing chromosome 8 might link data about location 8q24 to this same definition and also link to the loci associated with it, such as PVT1. Other online resources, such as PubMed [HN11], could also link to these ontological terms—for example, providing a link from PVT1 to a paper titled “Rearrangement of a DNA sequence homologous to a cell-virus junction fragment in several Moloney murine leukemia virus-induced rat thymomas” (5). Thus, the Semantic Web would contain the links needed to find this paper in response to a researcher's query for “chromosomal locus-associated Burkitt's lymphoma,” even though the paper does not specifically mention Burkitt's (or MYC) and thus could not be found with a current Web keyword search.

Currently, ontologies like the ones above are just starting to come to the Web, and the links between them are just beginning to be made. In the foreseeable future, the web of links between documents, databases, and programs, using definitions like the ones above, can provide a new level of interaction among scientific communities. For example, the World Health Organization Classification of Neoplastic Diseases (6, 7) could become a model, with links to other diseases, databases, and clinical trials. Those trials could be linked to research on epidemiology or causative factors or to ontologies from other fields. Recent workshops have focused on the use of the Semantic Web to support the biosciences (8) and environmental science (9). Other examples include the National Virtual Observatory [HN12] (10), which is exploring the use of Semantic Web technologies for linking numerous astronomical resources together, and the British MONET project [HN13] (11), which is exploring the use of the Semantic Web for making mathematical algorithms Web-accessible from a variety of software packages. Further, as these models will use the Semantic Web's common, machine-processible structure, it will become possible for computers to help us make links where relationships are currently unsuspected.

New software tools are being developed for mapping and linking the terms between different ontologies; for using ontologies in the markup of Web sites, scientific publications, and databases; and for capturing semantic metadata about images and other multimedia objects. New search technologies are under development to exploit ontological and other Semantic Web technologies, as well as to extend the capabilities of Semantic Web languages to allow more complex information to be expressed (for example, representing how a particular process might change over time, or how a set of Web-accessible programs could be automatically combined). Of particular note are some of the first demonstrations of Semantic Web “agents” that can integrate the information from Web pages and databases and can pass them to programs for analysis and query processing.

Unfortunately, most scientists are unaware of the Semantic Web effort, and most of the current development is going on separate from the scientific enterprise. This situation parallels that of the development of the original Web, where scientists largely served as customers and users of Web technology, rather than helping to evolve the technology toward the needs of their fields. In fact, much of the information technology research investment for science has gone into technologies that could not compete with the Web and that ended up less used than the commercially available Web technology. Scientific Web site development is often done by publishers or students in their spare time, and being good at bringing science to the Web is typically not seen as a major career enhancer.

There are many reasons for this, but one important one is that crosscutting efforts like this are hard to fund within traditional discipline-oriented review. The e-Science initiative in the UK [HN14] (12) is a good example of how research scientists and information technologists can work together for the betterment of science, and recent efforts to unite the Semantic Web and Grid computing [HN15] (13-15) show great promise. Scientists around the world should unite with their colleagues in the field of Computer Science and Information Technology to push similar interdisciplinary programs. In addition, scientists and information technologists need to work together to make sure Semantic Web technologies are included in programs such as the U.S. National Science Foundation's proposed CyberInfrastructure [HN16] or the National Center for Research Resources' BioInformatics Research Network [HN17] in the United States and similar scientific-infrastructure support programs internationally.

There is also another issue on which information technologists and scientists must start to speak with a single voice. The success of the Semantic Web will be significantly limited if content and tools are not widely shared, at least in the early period of Semantic Web exploration. Much as the original World Wide Web grew from an open-source, open-content model [HN18], so too must the Semantic Web. Although it is possible that, in the long run, methods may be developed to blend open and restricted access to Semantic Web content, in the short run, an atmosphere of exploration and cooperation must be fostered. Research scientists must team with their computer science brethren and fight against the intellectual property policies and runaway patent madness that make free dissemination of our products impossible. The original World Wide Web revolution was enabled by open-code, free software, and the wide dissemination of low-cost computing technology. The Semantic Web requires similar openness.

HyperNotes Related Resources on the World Wide Web

General Hypernotes

Dictionaries and Glossaries

WhatIs.com provides definitions of information technology terms.

The Webopedia is an online encyclopedia and glossary of computer and Internet technology.

A glossary with links is provided by T. Berners-Lee's Weaving the Web Web site.

Web Collections, References, and Resource Lists

The Virtual Library: Computing lists among its resources the Virtual Museum of Computing and the Virtual Library of Web Development.

The Yahoo! Directory provides links to online resources related to the Internet and the World Wide Web.

The Librarian's Index to the Internet includes listings for Internet guides, search tools, and Web design and Internet topics.

Internet links to Semantic Web resources are provided by J. Pan, Department of Computer Science, University of Manchester, UK.

SemanticWeb.org is a portal for the Semantic Web community.

Online Texts and Lecture Notes

The World-Wide Web Consortium (W3C) was created in 1994 “to lead the World Wide Web to its full potential by developing common protocols that promote its evolution and ensure its interoperability.” Information about W3C activities and Web technologies are provided.

R. Wyllys, School of Information, University of Texas, makes available lecture notes and other resources for a course on information technologies and the information professions.

A. Sheth, Department of Computer Science, University of Georgia, provides course materials and resources for a course on the Semantic Web.

J. Maluszynski and U. Aßmann, Department of Computer and Information Science, Linköping University, Sweden, make available student presentations, readings, and other resources for a course on the Semantic Web.

J. Hendler, Department of Computer Science, University of Maryland, provides lecture notes and Web resources for a course on artificial intelligence on the Web.

The History Department, Leiden University, makes available lecture notes by R. Griffiths for a course titled “History of the Internet, Internet for Historians (and just about everyone else).”

General Reports and Articles

The May 2001 issue of Scientific American had an article by T. Berners-Lee, J. Hendler, and O. Lassila titled “The Semantic Web.”

IEEE Spectrum Online makes available a September 2002 feature by S. Cherry titled “Weaving a web of ideas: Engines that search for meaning rather than words will make the Web more manageable.”

The January 2002 issue of the Library Association Record had an article by A. Gilchrist titled “From Aristotle to the 'Semantic Web'.”

The March-April 2001 issue of IEEE Intelligent Systems was a special issue on the Semantic Web. (The issue is available for download as a free trial.)

The October 2002 issue of ERCIM News (published by the European Research Consortium for Informatics and Mathematics) was a special issue on the Semantic Web.

The February 2002 issue of Physics Today had an article by I. Foster titled “The Grid: A new infrastructure for 21st century science.”

The Globus Project makes available (in PDF format) an article by I. Foster, C. Kesselman, and S. Tuecke titled “The anatomy of the Grid: Enabling scalable virtual organizations.”

First Monday is an online peer-reviewed journal of articles about the Internet.

Numbered Hypernotes

1. The World Wide Web and its history. Webopedia provides a definition of the World Wide Web (WWW) and an entry on the difference between the Internet and the WWW. CERN offers a presentation on the WWW and its history. The Living Internet, a Web resource about the features of the Internet, includes a resource page on the World Wide Web. R. Griffith's Internet history course includes a presentation on the Web. W3C provides a page about the WWW and makes available a commemorative lecture by T. Berners-Lee titled “The World Wide Web — Past present and future.” Hobbes' Internet Timeline is maintained by R. Zakon. A history of the Internet and WWW is provided by G. Gromov. The Internet Society provides a collection of links to resources on Internet history. OCLC's Web Characterization Project provides usage statistics, publications, and links. The UCLA Center for Communication Policy is conducting a research project on the Internet; the 2001 report Surveying the Digital Future: Year Two is available in PDF format. M. Dodge, Centre for Advanced Spatial Analysis, University College London, makes available an Atlas of Cyperspaces, a collection of maps and graphic representations of the Internet and the World-Wide Web, as well as the Geography of Cyberspace Directory. CyberAtlas provides news about Web usage and demographics. The Complete Guide to Internet Statistics and Research provides links to Internet resources on usage and demographics. Internet History and WWW History: Internet Resources are provided by J. Vissing Laursen. The papers presented at the Eleventh International World Wide Web Conference (Honolulu, 1-11 May 2002) are available on the Web.

2. Searching the Web. R. Griffith's Internet history course includes a section on search engines. D. Sullivan's Search Engine Watch makes available news, reviews, tips, and other resources related to Web search engines; an introduction to how search engines work is provided, as are lists of medical search engines, computer search engines, and science search engines. The University of California, Berkeley, Library offers a tutorial on finding information on the Internet. Searching the World Wide Web is a tutorial provided by the Tilburg University Library, Netherlands. The October 2001 issue of The Searcher had an article by G. Price titled “Web search engines FAQs: Questions, answers, and issues”; the October 2002 issue had an article by G. Price titled “Specialized search engine FAQs: More questions, answers and issues.” The 1998 7th International WWW Conference had a presentation by Google developers S. Brin and L. Page titled “The anatomy of a large-scale hypertextual Web search engine.”

3. The Semantic Web. An introduction to the Semantic Web is provided by S. Palmer. O'Reilly's XML.com offers a November 2000 primer on the Semantic Web by E. Dumbill. W3C provides a resource page on the Semantic Web; a slide presentation on the Semantic Web by T. Berners-Lee and the Webcast of it are among the presentations available. The Semantic Web Research Group (SWAP) at the Maryland Information and Network Dynamics Laboratory (MIND Lab), University of Maryland, makes available tools for download and a collection of papers, which includes an October 2002 meeting paper (in Word format) by J. Golbeck et al. titled “New tools for the Semantic Web.” J. Heflin, Department of Computer Science and Engineering, Lehigh University, provides links to readings for a course on the Semantic Web and also makes available (in PDF format) his 2001 PhD thesis Towards the Semantic Web: Knowledge Representation in a Dynamic, Distributed Environment. The June 2002 issue of New Architect had an article by U. Ogbuji titled “The languages of the Semantic Web.” The May 2002 issue of D-Lib Magazine had an article by R. Heery and H. Wagner titled “A metadata registry for the Semantic Web.”

4. Agent-based computing. What is an agent is a section of the FAQ provided by AgentLink, the European Commission's network for agent-based computing. AgentWeb is a resource provided by the Laboratory for Advanced Information Technology, University of Maryland, Baltimore County. The Intelligence, Agents, Multimedia Group at the Department of Electronics and Computer Science, University of Southampton, UK, provides an overview (in PDF format) of agent-based computing. Nature makes available an 11 March 1999 article by J. Hendler titled “Is there an intelligent agent in your future?” P. Maes, Software Agents Group at MIT Media Lab, offers a software agents tutorial. Agents Portal, an information resource about agent and multi-agent technology, is provided by S. Mellouli, Department of Computer Science and Software Engineering, University of Laval.

5. Resource Description Framework (RDF). W3C provides a resource page on RDF; among the introductions and overviews offered is a primer titled “Getting into RDF & Semantic Web using N3.” O'Reilly's XML.com offers an introduction to RDF by T. Bray and an article by E. Dumbill titled “Putting RDF to work.” D. Beckett, Institute for Learning and Research Technology, University of Bristol, UK, maintains a RDF Resource Guide. The May 1998 issue of D-Lib Magazine had an article by E. Miller titled “An introduction to the Resource Description Framework.”

6. An introduction to HTML and its history are provided by the HTML Source Web site. A HyperText Markup Language (HTML) Home Page is provided by W3C.

7. Extensible Markup Language (XML). What is XML is the first section of P. Flynn's XML FAQ. An XML Home Page is provided by W3C. An XML tutorial and XML links are provided by XML Files.com. O'Reilly's XML.com provides a technical introduction to XML and an XML resource guide. OASIS's Cover Pages offer an XML resources page; an information page about XML and the Semantic Web is also provided. A tutorial presentation titled “The XML revolution: Technologies for the future Web” is made available by A. Møller, Department of Computer Science, University of Aarhus, Denmark. T. Brooks, Information School, University of Washington, offers lecture notes on XML for a course on information needs, searching, and presentation. MINDSWAP makes available (in PDF format) an article by J. Hendler and B. Parsia titled “XML and the Semantic Web.” The May 1999 issue of Scientific American had an article by J. Bosak and T. Bray titled “XML and the second-generation Web.” The proceedings of the 2001-2002 Extreme Markup Languages Conferences are made available by IDEAlliance.

8. The Metathesaurus was developed by the Enterprise Vocabulary Service of the NCI Center for Bioinformatics.

9. Web ontologies. SemanticWeb.org provides an introduction to ontologies. Introduction to Ontologies on the Semantic Web is a resource page offered by J. Hendler. A guided tour of ontology is provided by J. Sowa. H. Kautz, Department of Computer Science and Engineering, University of Washington, provides lecture notes on ontologies for a course on artificial intelligence. O'Reilly's XML.com makes available a 6 November 2002 article by M. Denny titled “Ontology building: A survey of editing tools.” OntoWeb and OntoWeb Edu provide resources related to Web ontologies. S. Staab, Knowledge Management Group, Institute AIFB, Karlsruhe University, Germany, makes available a collection of papers on the standardization of the Web ontology language.

10. The OWL Web ontology language. W3C's Web Ontology Working Group provides a guide to OWL and other resources. A meeting presentation by J. Hendler titled “OWL: A Web ontology language” is made available by W3C. OASIS's Cover Pages offer an information page about the OWL Web Ontology Language.

11. PubMed is a service of the National Library of Medicine (NLM). PubMed Central is the NLM's digital archive of life sciences journal literature.

12. The National Virtual Observatory. The U.S. National Virtual Observatory (NVO) Web site provides an introduction to the project; a February 2002 article from Physics Today titled “Astronomers envision linking world data archives” is made available. The 21 September 2001 issue of Science had a Viewpoint article by A. Szalay and J. Gray titled “The world-wide telescope.” The Astronomical Data Analysis Software and Systems (ADASS) Conferences Web site makes available a November 2000 paper by A. Szalay (from a November 2002 conference) titled “The National Virtual Observatory.” The NVO Science Definition Team Web site provides publications and links about the scientific objectives of the project. NPACI & SDSC Online had a 31 October 2001 article about the NVO.

13. The MONET project is an investigation into mathematical Web services funded by the European Commission.

14. The UK e-Science initiative. The Research Councils UK Web site provides information about the e-Science core program. The National e-Science Centre in Edinburgh defines e-Science and provides publications and other resources; links to the regional e-Science centers are provided. BBC News had an 25 April 2002 article titled “Computing power brought online” and a 22 July 2002 article titled “Computer experts back the 'Grid'.” ZDNet UK offers a 26 April 2002 article titled “UK e-Science centre pushes grid computing.” The UK Particle Physics and Astronomy Research Council Web site offers a resource page on e-Science and the Grid.

15. Grid computing. The UK e-Science GRID Web site provides a Grid overview. Grid Today provides news for the Grid computing community; the 9 December 2002 issue had an article by A. Grimshaw titled “What is a Grid?” The Globus Project makes available a FAQ about the Grid. The Grid Computing Info Centre is maintained by R. Buyya, Department of Computer Science and Software Engineering, University of Melbourne, Australia. A resource on Grid computing (at IEEE Distributed Systems Online) is maintained by M. Baker, Distributed Systems Group, University of Portsmouth, UK. The Global Grid Forum is a community-initiated forum of individual researchers and practitioners working on Grid technologies. A Grid computing primer is provided by the Grid Research Integration Development and Support (GRIDS) Center Web site. I. Foster, Distributed Systems Lab, Argonne National Laboratory, provides a list of major Grid projects. The 17 August 2001 issue of Science had a News of the Week article by J. Mervis titled “NSF launches TeraGrid for academic research.” The vol. 2, no. 9, 2002 issue of Web Services Journal had an article by D. Hamstra titled “Grid computing: Electrifying Web services.” InfoWorld had a 3 August 2001 article by T. Sullivan and E. Scannell titled “Plugging into the Global Grid.”

16. NSF's proposed CyberInfrastructure. The NSF Directorate for Computer and Information Science and Engineering provides information about the Advisory Committee for Cyberinfrastructure. An April 2002 draft of the NSF Cyberinfrastucture Panel report is made available (in PDF format) on the Web site of a November 2002 workshop titled “Cyberinfrastructure for environmental research and education.” GridComputingPlanet.com had a 19 November 2002 article by P. Shread titled “Cyberinfrastructure will fuel scientific discovery, NSF chief says” about R. Colwell's presentation titled “Computing: Getting us on the path to wisdom.”

17. An introduction to the Biomedical Informatics Research Network (BIRN) is provided by NIH's National Center for Research Resources. The BIRN Web site provides links to news and Web sites of interest. The BIRN Coordinating Center at the National Biomedical Computation Resource, University of California, San Diego, provides an introduction to BIRN and project descriptions.

18. Open source. An extended definition of open source is provided by WhatIs.com. R. Wyllys provides lecture notes on the open-source movement for a course on information technologies and the information professions. The October 2001 issue of Information Today had an article by R. Poynder titled “The open source movement.” The Open Source Initiative Web site provides an open source definition and a FAQ, as well as case studies and press coverage. The Center for the Public Domain, a philanthropic foundation based in Durham, NC, provides information and presentations on the public domain and intellectual property. Open Sources: Voices from the Open Source Revolution is a 1999 book made available online by O'Reilly's Open Source Web site. E. Roberts, Computer Science Department, Stanford University, makes available a student project on the open source movement prepared for a course on computers, ethics, and social responsibility. The Globus Project provides a statement about its commitment to open source.

19. J. Hendler is in the Department of Computer Science, University of Maryland.

Reference and Notes

View Abstract

Stay Connected to Science

Navigate This Article