Special Viewpoints

Informatics and Quantitative Analysis in Biological Imaging

See allHide authors and affiliations

Science  04 Apr 2003:
Vol. 300, Issue 5616, pp. 100-102
DOI: 10.1126/science.1082602

Abstract

Biological imaging is now a quantitative technique for probing cellular structure and dynamics and is increasingly used for cell-based screens. However, the bioinformatics tools required for hypothesis-driven analysis of digital images are still immature. We are developing the Open Microscopy Environment (OME) as an informatics solution for the storage and analysis of optical microscope image data. OME aims to automate image analysis, modeling, and mining of large sets of images and specifies a flexible data model, a relational database, and an XML-encoded file standard that is usable by potentially any software tool. With this design, OME provides a first step toward biological image informatics.

Recent excitement in optical microscopy centers on the extraction of quantitative numerical information from digital images to generate and test specific scientific hypotheses. For example, combining computer vision and speckle microscopy makes it possible to test specific mechanistic models of actin flow during cell movement (1). The potential for automation in digital imaging is also driving interest in the use of microscopy for large-scale “screening by imaging,” in which cells or organisms are treated with libraries of small molecules, banks of small inhibitory RNAs, etc. to identify chemicals or genes that affect a particular biological process by virtue of a change in cellular behavior or appearance (2, 3) (Fig. 1). However, the routine application of automated image analysis and large-scale screening is held back by substantial limitations in the software used to store, process, and analyze the large volumes of information generated by digital imaging. It is possible to interpret images only if we know the context in which they were acquired. Current software for microscopy automates image acquisition and provides hardware and software solutions for three- dimensional (3D) imaging (using deconvolution and confocal and other methods) but does not keep track of image and analytical data in a rigorous way. It is usually possible to specify file name, date, and experimenter, but few packages systematically record the identities of the genes being studied, the labels used, etc. (4). Interoperability between different software systems involves the exchange of TIFF files, which preserve none of the contextual information. In this Viewpoint, we describe the conceptual challenges faced by image informatics as applied to biological microscopy and describe some of the solutions incorporated in an open-source image informatics system currently under development in our laboratories: the Open Microscopy Environment (OME) (5).

Figure 1

Applications for quantitative imaging. The image shows an XlK2 cell during the process of cytokinesis stained for DNA (blue), microtubules (green), and the aurora-B protein kinase (red) (13). Although the image demonstrates the relative localization of different cellular components and structures, quantitative analysis reveals specific characteristics that can be used to assay the effects of inhibitors or expressed proteins. For example, integrating the signal from a DNA-specific fluorophore might reveal defects in chromosome segregation during mitosis. Measuring the overlap of microtubules and aurora-B [for example, using a cross-correlation analysis (14)] within a subregion of a dividing cell might be used to assess effectors of cytokinesis. The image is displayed within the Image Viewer that ships with OME. The Viewer includes support for displaying multidimensional image data (top left) and some of the associated metadata about each image (bottom right).

The primary goal of OME is to enable the automatic analysis, modeling, and mining of large image sets with reference to specific biological hypotheses. OME aims to manage images from all optical microscopes, including confocal, wide-field, and multiphoton systems, but other image types (such as computed tomography scans) are not necessarily supported. OME also aims to store, without loss or degradation, primary image data and the metadata that specify the context and meaning of an image. Some metadata are devoted to describing the optics of the microscope, some to the experimental setup and sample, and some to information derived by analysis. Finally, OME aims to provide a flexible mechanism for incorporating new and existing image analysis routines and storing the output of those routines in a self-consistent and accessible manner.

The OME Data Model and Database

The OME data model is a formal description of the structure, meaning, and behavior of data stored and manipulated by the system, and is instantiated via both a database and a file format (Fig. 2). The OME data model has three parts: binary image data, data type semantics for managing modular image analysis, and image metadata definitions for recording contextual information. Image data in OME are stored as time-lapse, 3D, multispectral files (“5D images”) (6, 7). Data type semantics for OME are designed to allow analytic modules to be strung together in a flexible and simple fashion and are described in detail below. Image metadata describe the optics of the microscope, the filter sets, the objective lens, etc. We hope that microscope manufacturers will agree (through OME or other projects) on a common format for metadata describing microscope hardware and image acquisition. Image metadata also describe the experimental setup, including genes under study, fluorophores, etc. Whenever possible, OME metadata definitions derive from preexisting ontologies such as the Medical Subject Headings (MeSH) and the Microarray Gene Expression Data Society (MGED) and those being developed by the Minimal Information About a Microarray Experiment (MIAME) effort (8).

Figure 2

The database is the interface: OME architecture. OME is constructed as a standard three-tier application with a relational database that stores information in a table-based structure (blue table), an application server that processes data, and a client that lives on the user's desktop and communicates via the Internet (that is, via IP). Multiple clients can communicate with OME, including Web browsers, commercial microscopy software, and data-mining applications. The OME data model is instantiated via a relational database (“OME database”; blue) in which metadata are stored in tables as specified by the schema, and binary image data are stored in a trusted file system (the “image repository”; red). When data are transported between databases or stored in a flat file, metadata and image data in the database are translated into XML (“OME XML File”; green). The OME database communicates with analysis modules via a subsystem (“analysis subsystem”) that ensures the consistent treatment of semantic data types. The analysis modules also calculate and store the history of the analysis chain (see supporting online material). When analysis modules are chained together, each communicates independently with the database (“actual data path”; yellow block), even though the conceptual path is from one module to the next (“conceptual data path”). Existing commercial or independent software tools can read OME data without substantial modification. OME itself is open-source and available through a Lesser General Public License, but applications that talk to it can be either open or proprietary.

OME is designed to connect a desktop computer to an Oracle or PostgreSQL relational database using a standard client-server paradigm (Fig. 2; blue cylinder). The relational structure of OME makes it easy to access images on the basis of content and meaning: “Find all images of HeLa cells recorded by Jason in 2002.” Queries of this type are accomplished via an application layer comprising import and export routines, interfaces for analytic and visualization tools, and ancillary software. As an aid to performance, binary image data are stored in a file system (a repository) accessible only to OME (Fig. 2; red box). Images from commercial file formats are imported into OME using a translator that reads the image data and converts them into a multidimensional image repository format. Any metadata stored with the image (usually in a “header” that precedes the pixel data) are extracted from an input file and stored in the appropriate database tables (Fig. 2; blue table). The net result is the conversion of a polyglot of commercial file formats into a single database representation. The OME file format is used when image data and metadata must be translated into a file for transport between OME databases or for storage outside of a database. In OME files, each piece of data is associated with a tag (such as <filter_wavelength>) that defines its meaning in extensible markup language (XML), providing a vendor-neutral file format that conforms to public Web-compliant standards (Fig. 2; green box). We anticipate that commercial software tools will eventually be able to interact with the OME database as clients or possibly even directly read and write OME files.

Data Semantics for Image Analysis

Searchable image archives are useful, but we really require informatic systems that can extract and store quantitative information derived from images. Typically, image analysis involves several processing steps, but the precise steps and their sequence necessarily depend on the properties of the image and on scientific goals. We therefore require an extensible toolbox of algorithms (including fourth-generation languages such as MatLab) that can be applied in different combinations to different images. Consider the problem of tracking labeled vesicles in a time-lapse movie. A segmentation algorithm finds the vesicles and produces a list of centroids, volumes, signal intensities, etc.; a tracker then defines trajectories by linking centroids at different time points according to a predetermined set of rules; and finally a viewer displays the analytic results overlaid on the original movie. As designers of OME, we cannot anticipate exactly which tools work best for vesicle tracking. Instead, we must build general mechanisms for linking an analysis toolbox to images and storing analysis results.

Although we typically think of databases as storage systems, databases are also an ideal mechanism for linking independent pieces of software together in a modular fashion. Conceptually, the data path in OME is from one analytical module to the next (Fig. 2; dashed box), but in practice, each module communicates independently with the database. The advantage of this architecture is that the problem of building links between analysis modules written in different computer languages is simplified to the task of linking each module independently to the database using known methods. For this to work, however, the output of one module must match the input of the next module. The OME data model therefore includes a set of semantic data types that describe analytic results such as “centroid,” “trajectory,” “maximum signal,” etc. Semantic typing defines the types of relationships that a data type can participate in, and thus determines which analytic modules can use the data as inputs and outputs. However, the process of data analysis is tightly tied to prior knowledge of the biological system, the experiment, and the properties of the analytic routines. It is simply not possible to create a standards body that will rule on which definitions of “centroid” are valid and which are not. However, a database can solve this problem by linking each result to an operational record of the data-processing steps that produced it, including the algorithm used and the states of any settings or variables. Thus, semantic data types such as “centroid” can be defined broadly and then given specific meaning by the recorded history of their derivation. In this way, we can judge each result in light of the methods that generated it and determine the accuracy of measurements a posteriori, given the known operation of the analytic algorithms and characteristics of the data.

A final challenge for OME is providing a mechanism for adding new analysis modules. In some cases, the inputs and outputs of the new module correspond to existing OME semantic data types (supporting online text). A more complex situation arises if OME lacks the necessary data types for a new module to interact with the system. In this case, the database must be augmented with tables to store the new data and present the changes to the user interface. This is a challenging problem in database design and represents a type of extensibility that is absolutely critical, but usually absent, in most bioinformatics software. Our solution is to specify the data requirements (the inputs and outputs) of an analytical module in an XML description written to the OME XML specification. OME is designed to then create the necessary tables on the fly. The net result is an analytic system that is extensible, modular, and language-independent.

Image Informatics in Practice

The OME system is being developed as an open-source collaboration between academic labs and commercial hardware and software imaging companies (9). OMEv1.0 (10) demonstrated the utility of general-purpose image informatics software (11). A tutorial demonstrating this system is available (12). A system with features necessary for general use is being developed as OMEv2.0 for release late in 2003 (10). Our software is open-source, but commercial code plays a vital role in modern digital microscopy and image analysis. OME is therefore designed to integrate effectively with commercial code. Like DNA and protein sequence analysis, biological imaging must develop features of an information science to meet the demands of screening and hypothesis-driven analysis. Macromolecular and imaging bioinformatics have many things in common, but the complexity and unstructured nature of biological images present a unique set of challenges in data analysis and interpretation. We are confident that if commercial and academic microscopists can solve the informatic problems that currently make quantitative analysis of microscope images difficult, quantitative image analysis will assume an important position in the future of bioinformatics.

Supporting Online Material

www.sciencemag.org/cgi/content/full/300/5616/100/DC1

SOM Text

Fig. S1

References

REFERENCES AND NOTES

View Abstract

Navigate This Article