Genome-Wide Location and Function of DNA Binding Proteins

See allHide authors and affiliations

Science  22 Dec 2000:
Vol. 290, Issue 5500, pp. 2306-2309
DOI: 10.1126/science.290.5500.2306


Understanding how DNA binding proteins control global gene expression and chromosomal maintenance requires knowledge of the chromosomal locations at which these proteins function in vivo. We developed a microarray method that reveals the genome-wide location of DNA-bound proteins and used this method to monitor binding of gene-specific transcription activators in yeast. A combination of location and expression profiles was used to identify genes whose expression is directly controlled by Gal4 and Ste12 as cells respond to changes in carbon source and mating pheromone, respectively. The results identify pathways that are coordinately regulated by each of the two activators and reveal previously unknown functions for Gal4 and Ste12. Genome-wide location analysis will facilitate investigation of gene regulatory networks, gene function, and genome maintenance.

Many proteins bind to specific sites in the genome to regulate genome expression and maintenance. Transcriptional activators, for example, bind to specific promoter sequences and recruit chromatin modifying complexes and the transcription apparatus to initiate RNA synthesis (1–3). The reprogramming of gene expression that occurs as cells move through the cell cycle, or when cells sense changes in their environment, is effected in part by changes in the DNA binding status of transcriptional activators. Distinct DNA binding proteins are also associated with origins of DNA replication, centromeres, telomeres, and other sites, where they regulate chromosome replication, condensation, cohesion, and other aspects of genome maintenance (4, 5). Our understanding of these proteins and their functions is limited by our knowledge of their binding sites in the genome.

The genome-wide location analysis method we have developed allows protein-DNA interactions to be monitored across the entire yeast genome (6). The method combines a modifiedchromatin immunoprecipitation (ChIP) procedure, which has been previously used to study protein-DNA interactions at a small number of specific DNA sites (7), with DNA microarray analysis. Briefly, cells were fixed with formaldehyde, harvested, and disrupted by sonication. The DNA fragments cross-linked to a protein of interest were enriched by immunoprecipitation with a specific antibody. After reversal of the cross-links, the enriched DNA was amplified and labeled with a fluorescent dye (Cy5) with the use of ligation-mediated–polymerase chain reaction (LM-PCR). A sample of DNA that was not enriched by immunoprecipitation was subjected to LM-PCR in the presence of a different fluorophore (Cy3), and both immunoprecipitation (IP)-enriched and -unenriched pools of labeled DNA were hybridized to a single DNA microarray containing all yeast intergenic sequences (Fig. 1). A single-array error model (8) was adopted to handle noise associated with low-intensity spots and to permit a confidence estimate for binding (P value). When independent samples of 1 ng of genomic DNA were amplified with the LM-PCR method, signals for greater than 99.8% of genes were essentially identical within the error range (P value ≤10−3). The IP-enriched/unenriched ratio of fluorescence intensity obtained from three independent experiments was used with a weighted average analysis method to calculate the relative binding of the protein of interest to each sequence represented on the array.

Figure 1

The genome-wide location profiling method. (A) Close-up of a scanned image of a microarray containing DNA fragments representing 6361 intergenic regions of the yeast genome. The arrow points to a spot where the red intensity is over-represented, identifying a region bound in vivo by the protein under investigation. (B) Analysis of Cy3- and Cy5-labeled DNA amplified from 1 ng of yeast genomic DNA using a single-array error model (8). The error model cutoffs for P values equal to 10−3 and 10−5 are displayed. (C) Experimental design. For each factor, three independent experiments were performed and each of the three samples were analyzed individually using a single-array error model. The average binding ratio and associated P value from the triplicate experiments were calculated using a weighted average analysis method (6).

To investigate the accuracy of the genome-wide location analysis method, we used it to identify sites bound by the transcriptional activator Gal4 in the yeast genome. Gal4 activates genes necessary for galactose metabolism and is among the best characterized transcriptional activators (1, 9). We found 10 genes to be bound by Gal4 (P value ≤0.001) and induced in galactose using our analysis criteria (Fig. 2A). These included seven genes previously reported to be regulated by Gal4 (GAL1,GAL2, GAL3, GAL7, GAL10,GAL80, and GCY1). The MTH1,PCL10, and FUR4 genes were also bound by Gal4 and activated in galactose. Each of these results was confirmed by conventional ChIP analysis (Fig. 2B) (6), andMTH1, PCL10, and FUR4 activation in galactose was found to be dependent on Gal4 (Fig. 2C). Both microarray and conventional ChIP showed that Gal4 binds to GAL1,GAL2, GAL3, and GAL10 promoters under glucose and galactose conditions, but the binding was generally weaker in glucose (6). The consensus Gal4 binding sequence that occurs in the promoters of these genes (CGGN11CCG) can also be found at many sites through the yeast genome where Gal4 binding is not detected; therefore, sequence alone is not sufficient to account for the specificity of Gal4 binding in vivo. Previous studies of Gal4-DNA binding have suggested that additional factors such as chromatin structure contribute to specificity in vivo (10,11).

Figure 2

Genome-wide location of Gal4 protein. (A) Genes whose promoter regions were bound by myc-tagged Gal4 (P value < 0.001) and whose expression levels were induced at least twofold by galactose are listed. The weight-averaged ratios and P values are shown for Gal4 binding in galactose and glucose. Binding ratios are also displayed using a blue and white color scheme and expression ratios of galactose/glucose are displayed using a red and green color scheme. (B) Confirmation of microarray data for each gene in panel A using conventional chromatin IP procedure. Strains with (+) or without (–) a myc-tagged Gal4 protein were grown in galactose. Amplification of the unenriched DNA (I) and IP-enriched DNA (P) is shown. ARN1 (control) was used as a negative control. (C) Galactose-induced expression ofFUR4, MTH1, and PCL10 is Gal4-dependent. Samples from wild-type and gal4– strains were taken before and after addition of galactose. The expression ofFUR4, MTH1, and PCL10 was monitored by quantitative reverse transcriptase–PCR (RT-PCR) and was quantified by phosphorimaging. (D) Model summarizing the role of Gal4 in galactose-dependent cellular regulation. The products of genes newly identified as directly regulated by Gal4 are shown as green circles; those previously identified are shown in blue.

The identification of MTH1, PCL10, andFUR4 as Gal4-regulated genes reveals previously unknown functions for Gal4 and explains how regulation of several different metabolic pathways can be coordinated (Fig. 2D). MTH1encodes a transcriptional repressor of certain HXT genes involved in hexose transport (12). Our results suggest that the cell responds to galactose by increasing the concentration of its galactose transporter at the expense of other transporters. In other words, while Gal4 activates expression of the galactose transporter gene GAL2, Gal4 induction of the MTH1 repressor gene leads to reduced levels of glucose transporter expression. The Pcl10 cyclin associates with Pho85p and appears to repress the formation of glycogen (13). Thus, the observation thatPCL10 is Gal4-activated suggests that reduced glycogenesis occurs to maximize the energy obtained from galactose metabolism.FUR4 encodes a uracil permease (14), and its induction by Gal4 may reflect a need to increase intracellular pools of pyrimadines to permit efficient uridine 5′-diphosphate (UDP) addition to galactose catalyzed by Gal7.

We next investigated the genome-wide binding profile of the transcription activator Ste12, which functions in the response of haploid yeast to mating pheromones (15). Activation of the pheromone-response pathway by mating pheromones causes cell cycle arrest and transcriptional activation of more than 200 genes in a Ste12-dependent fashion (8, 15). However, it is not clear which of these genes is directly regulated by Ste12 and which are regulated by other ancillary factors. The genome-wide binding profile of epitope-tagged Ste12, determined before and after pheromone treatment in three independent experiments, indicates that 29 pheromone-induced genes are regulated directly by Ste12. Figure 3A lists the yeast genes whose promoter regions are bound by Ste12 at the 99.5% confidence level (i.e.,P value <0.005) and whose expression is induced by α factor. These 29 genes are likely to be directly regulated by Ste12 because (i) all have promoter regions bound by Ste12, (ii) exposure to pheromone causes an increase in their transcription, and (iii) pheromone induction of transcription is dependent on Ste12.

Figure 3

Genome-wide location of the Ste12 protein. (A) Genes whose promoter regions were bound by Ste12 (P value < 0.005) and whose expression levels were induced by α factor (ratio > 1 and P < 0.001) are listed. The weight-averaged ratios and P values are shown for Ste12 binding before and 30 min after the addition of α factor. The binding ratios and the fold changes of gene expression are displayed as in Fig. 2A. The gene expression data, obtained from reference (8), represent changes in mRNA levels between wild-type cells treated with α factor for the specified period versus untreated cells. The α-0′ time point (where ′ indicates min) was obtained by comparing cells harvested immediately after α factor treatment to untreated cells. The Gal:Ste12 and Ste12Δ data were obtained by comparing Gal:Ste12 (over-expressing Ste12) and Ste12Δ cells to wild-type cells, respectively. Ste12Δ + α data were obtained by comparing cells lacking Ste12 before and 30 min after α factor treatment. (B) Model summarizing the role of Ste12 target genes in the yeast mating pathway. Gray boxes denote the cellular processes known to be involved in mating; yellow boxes denote cellular processes that may be associated with mating. Genes in black were previously reported to be associated with the mating process; genes in red are Ste12 targets that may play a role in mating.

Of the genes that are directly regulated by Ste12, 11 are already known to participate in various steps of the mating process (Fig. 3B).FUS3 and STE12 encode components of the signal transduction pathway involved in the response to pheromone (16); AFR1 and GIC2 are required for the formation of mating projections (17–19); FIG2, AGA1,FIG1, and FUS1 are involved in cell fusion (20–23); and CIK1 and KAR5are required for nuclear fusion (24). Furthermore,FUS3 and FAR1 are required for pheromone-induced cell cycle arrest (25, 26) (Fig. 3B). Among the Ste12 target genes identified in this study that were not previously reported to be involved in mating, many are involved in processes likely to be relevant to mating. CHS1,PCL2, ERG24, SPC25, HYM1, and PGM1 encode proteins involved in cell wall biosynthesis, cell morphology, membrane biosynthesis, nuclear congression and regulation of gene expression (Fig. 3B). Furthermore,YER019W, YOR129C, and SCH9 are among the genes that are cell cycle regulated (27).

The genes that are regulated by Ste12 can be divided into two classes: those bound by Ste12 both before and after pheromone exposure (e.g., STE12, PCL2, FIG2, andFUS1) and those bound by Ste12 only after exposure to pheromone (e.g., CIK1 and CHS1) (Fig. 3A). The first class of genes is induced immediately after pheromone exposure, most likely by a mechanism that converts an inactive DNA-bound Ste12 protein to an active transcriptional activator. This could take place by removal of repressors of Ste12 such as Dig1/Rst1 and Dig2/Rst2 (28). In the second class of genes, induction of transcription is relatively slow. In this case, the binding of Ste12 appears to be limited before pheromone exposure. It is also possible that the epitope tag on Ste12 is masked at these promoters before pheromone treatment, perhaps due to the presence of additional regulatory proteins.

We have shown that a combination of genome-wide location and expression analysis can identify the global set of genes whose expression is controlled directly by transcriptional activators in vivo. The application of location analysis to two yeast transcriptional activators revealed how multiple functional pathways are coordinately controlled in vivo during the response to specific changes in the extracellular environment. All of the known targets for these two activators were confirmed, and functional modules were discovered that are regulated directly by these factors.

Expression analysis with DNA microarrays allows investigators to identify changes in mRNA levels in living cells, but the inability to distinguish direct from indirect effects limits the interpretation of the data in terms of the genes that are controlled by specific regulatory factors. Genome-wide location analysis provides information on the binding sites at which proteins reside through the genome under various conditions in vivo, and will prove to be a powerful tool for further discovery of global regulatory networks.

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed. E-mail: young{at}


View Abstract

Navigate This Article