Of Chips and ChIPs

See allHide authors and affiliations

Science  26 Apr 2002:
Vol. 296, Issue 5568, pp. 666-669
DOI: 10.1126/science.1062936

Expression profiling using cDNA or oligonucleotide microarrays allows a global description of all the genes expressed in a cell in response to specific signals or at different stages of development [reviewed in (1, 2)]. Hierarchical clustering methods then allow the allocation of genes, coregulated in time or in response to specific treatments, into expression groupings called regulons (1, 2). A host of recent papers illustrate how data obtained from microarray expression profiling, combined with technologies such as the chromatin immunoprecipitation assay (ChIP), can be harnessed to explore transcriptional regulatory networks in cells. We know that at the simplest level, transcription of genes into messenger RNAs (mRNAs) is governed by transcription factors, which bind to cis-regulatory regions of the DNA in the vicinity of the target gene. There are, however, many more complexities that now can be explored across the whole genome. Do large coregulated groups of genes share cis-regulatory elements that bind to common transcription factors? Are coregulated genes located close to each other along the linear DNA of the chromosome, or are they simply colocalized in the nucleus? Can we efficiently map global binding sites for transcription factors and chromatin proteins along the chromosomes in vivo?

Systematic nonbiased searching of upstream sequences of genes in a regulon for shared protein binding motifs can help to identify common control units. Tavazoie and colleagues (3) have applied this approach to microarray expression profiling data for yeast at different stages of the cell cycle. They discovered not only known protein binding motifs that regulate genes involved in the cell cycle, but also completely new cis-regulatory elements that are the binding sites for transcription factors (4). Many of these motifs were highly selective for the regulon in which they were identified and could also be assigned to groups of genes with related functions within that regulon. A similar approach has recently been applied to the identification of pharyngeal-specific genes in the nematode Caenorhabditis elegans (5). Expression profiling identified a large group of genes specifically expressed in the pharynx, a subset of which were further characterized and shown to contain binding sites for the transcription factor PHA-4. When PHA-4, a forkhead-type transcription factor, is deleted in worms, the pharynx does not form (6). The combination of experimental microarray screening with computational approaches enables the generation of meaningful biological hypotheses that can then be tested experimentally.

Dipping into global control.

Expression profiling data obtained from cDNA or oligonucleotide arrays can be used to search for common regulatory elements as well as to generate chromosome correlation maps identifying the chromosomal locations of coregulated genes. Chromatin immunoprecipitation (ChIP) can be combined with microarrays to generate information on genomewide binding patterns for transcription factors (TFs). These types of data, when used together, generate information on active chromatin domains as well as defining global regulatory networks.

Analysis of yeast microarray data has also been used to investigate whether coregulated genes are located close to each other in the yeast genome. Cohen and co-workers, using the yeast cell cycle microarray expression data set, have developed chromosome correlation maps (7). These maps display the patterns of coregulated genes in linear space along the chromosomes. Such chromosome correlation maps may represent for other organisms what were visualized many years ago as chromosome puffs (regions of high gene expression) on the polytene chromosomes of Drosophila salivary glands. Several regions of yeast chromosomes, some as large as 20 to 30 kb, have been found to contain large groups of coregulated genes. One simple explanation for coregulation of these genes is the presence of similar cis-regulatory elements in the promoters of the coregulated genes. This turns out not to be the case, however, for certain groups of coregulated genes in yeast. Areas of the chromosome that contain groups of coregulated genes are likely to represent regions of open chromatin structure. Thus, information on changes in gene transcription through changes in chromatin structure have to be combined with knowledge of common cis-regulatory elements to generate a more global view of gene expression patterns. These findings are perhaps not surprising for those who have studied groups of coregulated and colocated genes such as the globin gene cluster (8) or the cytokine gene cluster on human chromosome 5 (9), but they do emphasize the general nature of clustering not only of functionally related but also of coregulated genes. Such chromosome correlation maps displaying gene expression domains will undoubtedly help us to unravel how active domains in chromatin are generated and altered during development or in response to environmental stimuli.

One of the major issues in gene transcription is the in vivo relevance of transcription factor binding sites that have been identified in vitro. The ChIP assay is being successfully exploited to confirm in vivo binding sites of specific transcription factors [e.g., (10)]. In this assay, an antibody to a specific DNA binding protein is used to immunoprecipitate cross-linked protein-DNA complexes. Several groups have now combined the ChIP assay with microarray expression profiling to probe the genomewide distribution of DNA binding sites for specific yeast transcription factors (1113). Microarrays are generated using all intergenic regions, that is, regions of the yeast genome that lie outside the coding sequences of genes. Then, immunoprecipitated DNA from ChIP assays is labeled and hybridized to the microarray together with a differentially labeled control DNA sample that has not been enriched by immunoprecipitation. The intergenic regions that show stronger hybridization with the immunoprecipitated DNA represent binding sites for the specific transcription factor. Using this approach, Ren et al. have mapped binding sites for the transcription factors Gal4 and Ste12 across the entire yeast genome (12). Likewise, Iyer et al. have mapped the genomic binding sites for the yeast cell cycle transcription factors SBF and MBF (11). Similar analyses have now been extended to nine yeast transcription factors known to be involved in the control of gene expression during the cell cycle (13). The serial nature of this regulatory pathway is revealed by the finding that transcription factors that operate during one stage of the cell cycle regulate the activators that operate during the next stage. It is also clear from these analyses that functionally related groups of genes are bound by, and presumably regulated by, the same transcription factors. The identification of transcription factor binding sites per se cannot always be equated with the location of genes that they switch on. Thus, to precisely correlate binding with activity, it is essential to combine such genomewide location studies with expression profiling using cells that have mutant transcription factors.

It is also clear from other experiments that sequence alone does not determine binding; access to genes, controlled by chromatin structure, is an important criterion. For example, an analysis of global Rap1 binding sites on a microarray containing both intergenic regions and open reading frames from the yeast genome revealed that despite the presence of Rap1 consensus sequences spread throughout the genome, Rap1 preferentially bound to intergenic regions and, in particular, to the promoter regions of genes (14). Such data again point to the fact that both chromatin structure and accessibility of transcription factor binding sites contribute to the control of gene transcription.

The great advantage of combining global transcription factor binding analysis with expression profiling is that the direct targets of the transcription factors can be distinguished from indirect downstream effects, all of which are observed if, for example, expression profiles alone are analyzed. Can these techniques that appear so easy to apply in yeast be applied to higher eukaryotic species? Generating chromosome correlation maps and searching for new regulatory motifs in regulons, at least in the proximal promoter regions of genes, will be possible. One caveat, however, is that higher eukaryotic genes are often controlled by composite regulatory elements binding two or more families of transcription factors, making computer comparisons more challenging. Moreover, the specific regulatory regions that govern gene transcription in higher eukaryotes can be spread over hundreds of kilobases, thus making assignment to specific genes more difficult. Just recently, two publications have appeared (15, 16) describing powerful search algorithms to identify complex regulatory modules in the Drosophila genome. This work is based on the knowledge that known regulatory regions contain clusters of transcription factor binding sites, either for the same protein or for a group of proteins that cooperate to drive transcription. The investigators searched for clustered binding sites of the fruit fly transcription factor Dorsal across the Drosophila genome. Surprisingly, they identified only 15 clusters containing three or more high-affinity binding sites within a 400-base pair region (16). Although five of these clusters are linked to genes whose expression patterns fit with the known activity of Dorsal during development, it is still unclear whether the other 10 clusters are targets of Dorsal. Some may be targets of related transcription factors, but more experimental analysis is required to verify the computational predictions.

In a similar manner, investigators searched for regulatory modules across the entire Drosophila genome using known binding site information for a group of five transcription factors active in early Drosophila embryos (15). Many genes known to be controlled by these factors were identified (showing the reliability of the algorithm) as well as many new regulatory clusters that lie close to genes with appropriate expression patterns. Although only one module was tested in functional assays, such large-scale identification of possible regulatory modules, combined with expression profiling and global binding site analysis, provides a powerful tool to examine the regulatory network of complex eukaryotic genomes.

Two exciting papers have applied the techniques of global transcription factor binding to mammalian cells (17, 18). Both papers examined the binding sites for the E2F4 transcription factor, which is thought to be a negative regulator of cell cycle events. One approach used a microarray containing genomic DNA fragments enriched for DNA with a high content of CG dinucleotides. Regions rich in CG dinucleotides, referred to as CpG islands, often correspond to promoter regions (18). The second approach used a microarray of 1500 promoter regions associated with cell cycle-regulated genes (17). Each experimental strategy identified groups of genes involved in cell cycle control whose promoters were bound by E2F4. One interesting outcome from both papers was that a minority of the target gene promoters did not have obvious E2F4 binding consensus sequences. Some of these targets were confirmed by conventional ChIP analysis, implying that they were not false positives. It is possible that E2F4 is recruited by other proteins, or alternatively is binding to another control region (such as an enhancer) that interacts with the promoter region being interrogated.

An alternative to ChIP has been developed by van Steensel et al. to examine the location of several chromatin-binding proteins (such as Sir2 and Hp1) on 500 Drosophila genes (19). This methodology, unlike the ChIP assay, does not rely on the availability of a good antibody for the protein of interest. In this assay, tethering of the Escherichia coli enzyme DNA adenine methylase to specific transcription factors illuminates the chromosomal locations of binding sites for these regulatory proteins (19). This approach does, however, require the introduction of a fusion protein that might in some cases alter the DNA binding capabilities of the protein to be tested. Obviously, these studies only searched a fraction of the genome for transcription factor binding sites, but they do demonstrate the feasibility of this approach. Developments in microarray technology will soon allow the entire human genome to be displayed on one or a small number of chips.

These are exciting times for researchers in the gene transcription field. By developing and embracing new technologies, according to the knowledge derived from the sequencing of complete genomes, researchers can now start to answer many difficult questions. These approaches will lead to enormous insights into the functional features of the genome and should prove to be a powerful tool for the discovery and mapping of global regulatory networks.


Navigate This Article