Microarrays--Guilt by Association

See allHide authors and affiliations

Science  10 Oct 2003:
Vol. 302, Issue 5643, pp. 240-241
DOI: 10.1126/science.1090887

From their inception, DNA microarrays (1, 2) have been touted as having the potential to shed light on cellular processes by identifying groups of genes that appear to be coexpressed (3). Although promising, this “guilt by association” approach has fallen into disfavor because of its many perceived limitations. On page 249 of this issue, Stuart et al. (4) weigh in with their study of the coexpression of groups of genes across species, which suggests that there may be merit to the guilt-by-association approach provided it is applied in an evolutionary context.

Most microarray experiments focus on identifying patterns of gene expression in a particular system—for example, comparing tumor tissue with normal tissue, analyzing gene expression responses to a stress stimulus, or comparing expression patterns in a particular tissue over time. Conceptually, differences between the various states should be reflected in changes in particular cellular pathways. Microarray data should allow users to identify the products of new genes and their contributions to these pathways. It is usually possible to identify cellular processes where most of the genes associated with a particular biological function are up-or down-regulated in a similar way. However, some genes known to be involved in a particular pathway invariably are missed, whereas other apparently unrelated genes exhibit expression profiles that are strikingly similar to bona fide pathway components. There is no consensus about how to interpret the gene expression patterns of hypothetical genes, genes of unknown function, or transcripts identified only by expressed sequence tags.

This perceived failure of microarrays has led some to portray the technique as “noisy” or “unreliable.” Yet increasingly, when results from microarray studies are subjected to independent validation using other techniques (such as quantitative reverse transcription polymerase chain reaction), the confirmation rate—at least at the level of recapitulating the observed gene expression pattern if not the absolute magnitude—is well over 90%. So why aren't pathways emerging from microarray data that reveal groups of coexpressed genes?

One potential answer lies in the observation that many microarray studies fail to sufficiently sample the biological variability within a system. Increasingly persuasive arguments from statisticians (57), combined with improvements in the underlying protocols and technology used to collect expression data, have led to more sophisticated experimental designs encompassing increasingly broad surveys of diversity within a system. As a result, genes whose patterns of expression are identified as being statistically significant can be assigned a greater degree of confidence. Further, the validation rate—even among independent biological samples—is greater than in more naïve experimental designs. But has this resulted in the identification of new pathways? Not surprisingly, the answer is no. It seems that statistical significance is not always identical to biological significance.

Stuart et al. attempt to address this problem by taking the idea of sampling biological variability one step further. They devised a computational method to look for correlated patterns of gene expression in more than 3182 DNA microarrays of tissues from humans, fruit flies, worms, and yeast. They postulated that genes with conserved biological functions in different species (orthologs) would be likely to retain similar patterns of expression while other associations occurring by chance would be filtered out by analysis of such a large multispecies data set. Stuart and colleagues predicted that conserved functions of groups of genes should be reflected in similar patterns of gene expression among yeast, worms, fruit flies, and humans.

The starting point for their analysis was the use of sequence homology to identify likely orthologs—genes that have retained their functions through evolutionary history. For example, Stuart and co-workers grouped together the human gene Psmd4, the worm gene rpn-10, the fruit fly gene Pros54, and the yeast gene Rpn10, all of which encode a non-ATPase subunit of the 19S proteasome cap. They used the expression values for these “metagene” groups taken from the entire data set to construct gene expression vectors. The authors then calculated pairwise Pearson correlation coefficients between these expression vectors. The resulting correlation matrix served as the basis for constructing an interaction network of metagenes, subnets of which might be associated with particular biological functions. The results are both surprising and encouraging.

From the data, the investigators identified metagenes not previously associated with the cell cycle or proliferation. Yet these new metagenes exhibited significant association with these cellular processes. Such metagenes included several encoding proteins of unknown function as well as one encoding a small nuclear ribonucleoprotein (snRNP) involved in splicing and a second encoding a protein that interacts with nucleoporin. To test the hypothesis that these genes are in fact involved in the cell cycle, Stuart et al. first looked at expression of these metagenes in a previously published study comparing human pancreatic tumors with normal pancreatic tissue (8). As might be expected, the data showed that these genes were markedly up-regulated in rapidly dividing cancer cells compared with normal cells. Although this harks back to the guilt-by-association approach, the prediction of the association from an independent data set suggests that there might be real biology here. To lend further support to the involvement of these genes in the cell cycle and proliferation, the authors performed an RNA interference experiment in the worm to down-regulate the worm gene ZK652.1, which encodes the snRNP splicing protein. They analyzed the resulting loss-of-function phenotype and discovered that the germline cells of treated worms contained extra nuclei. This suggested that the product of the ZK652.1 gene normally suppresses proliferation of germline cells in the worm.

So have Stuart et al. provided evidence to vindicate the guilt-by-association strategy for analyzing microarrays? Not completely. Moving from associations to pathways will require additional work. And it remains unclear how this strategy can be applied to genes that are limited to a less evolutionarily diverse set of species. Yet this work represents an important conceptual step forward, extending comparative genomics to functional phylogenomics.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.

Navigate This Article