Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network

See allHide authors and affiliations

Science  04 May 2001:
Vol. 292, Issue 5518, pp. 929-934
DOI: 10.1126/science.292.5518.929


We demonstrate an integrated approach to build, test, and refine a model of a cellular pathway, in which perturbations to critical pathway components are analyzed using DNA microarrays, quantitative proteomics, and databases of known physical interactions. Using this approach, we identify 997 messenger RNAs responding to 20 systematic perturbations of the yeast galactose-utilization pathway, provide evidence that approximately 15 of 289 detected proteins are regulated posttranscriptionally, and identify explicit physical interactions governing the cellular response to each perturbation. We refine the model through further iterations of perturbation and global measurements, suggesting hypotheses about the regulation of galactose utilization and physical interactions between this and a variety of other metabolic pathways.

For organisms with fully sequenced genomes, DNA microarrays are an extremely powerful technology for measuring the mRNA expression responses of practically every gene (1). Technologies for globally and quantitatively measuring protein expression are also becoming feasible (2), and developments such as the two-hybrid system are enabling construction of a map of interactions among proteins (3). Although such large-scale data have proven invaluable for distinguishing cell types and biological states, new approaches are needed which, by integrating these diverse data types and assimilating them into biological models, can predict cellular behaviors that can be tested experimentally. We propose and apply one such strategy here, consisting of four distinct steps:

(i) Define all of the genes in the genome and the subset of genes, proteins, and other small molecules constituting the pathway of interest. If possible, define an initial model of the molecular interactions governing pathway function, drawn from previous genetic and biochemical research.

(ii) Perturb each pathway component through a series of genetic (e.g., gene deletions or overexpressions) or environmental (e.g., changes in growth conditions or temperature) manipulations. Detect and quantify the corresponding global cellular response to each perturbation with technologies for large-scale mRNA- and protein-expression measurement.

(iii) Integrate the observed mRNA and protein responses with the current, pathway-specific model and with the global network of protein-protein, protein-DNA, and other known physical interactions.

(iv) Formulate new hypotheses to explain observations not predicted by the model. Design additional perturbation experiments to test these, and iteratively repeat steps (ii), (iii), and (iv).

As proof-of-principle, we now implement this integrated approach to explore the process of galactose utilization (GAL) in the yeastSaccharomyces cerevisiae. The GAL pathway is a classic example of a genetic regulatory switch, in which enzymes required specifically for transport and catabolism of galactose are expressed only when galactose is present and repressing sugars such as glucose are absent. Extensive biochemical studies (4) and saturating mutant screens (5) have defined the genes, gene products, and metabolic substrates required for function of this process and have elucidated the key molecular interactions that lead to pathway activation or inhibition. Thus, by combining this prior work with the complete sequence of the yeast genome, step (i) above has in large part already been accomplished. In steps (ii) through (iv) that follow, we report the first large-scale comparison of mRNA and protein responses, describe an ongoing attempt to systematically explain these responses using existing databases of regulatory and other physical interactions, and explore a number of refinements to the GAL model suggested by these integrative studies.

Step (i): As shown in Fig. 1, galactose utilization consists of a biochemical pathway that converts galactose into glucose-6-phosphate and a regulatory mechanism that controls whether the pathway is on or off. This process has been reviewed extensively (4, 6) and involves at least three types of proteins. A transporter gene (GAL2) encodes a permease that transports galactose into the cell; several other hexose transporters (HXTs) may also have this ability (7). A group of enzymatic genes encodes the proteins required for conversion of intracellular galactose, including galactokinase (GAL1), uridylyltransferase (GAL7), epimerase (GAL10), and phosphoglucomutase (GAL5/PGM2). The regulatory genesGAL3, GAL4, and GAL80 exert tight transcriptional control over the transporter, the enzymes, and to a certain extent, each other. GAL4p is a DNA-binding factor that can strongly activate transcription, but in the absence of galactose, GAL80p binds GAL4p and inhibits its activity. When galactose is present in the cell, it causes GAL3p to associate with GAL80p. This association causes GAL80p to release its repression of GAL4p, so that the transporter and enzymes are expressed at a high level.

Figure 1

Model of galactose utilization. Yeast metabolize galactose through a series of steps involving the GAL2transporter and enzymes produced by GAL1, GAL7,GAL10, and GAL5. These genes are transcriptionally regulated by a mechanism consisting primarily ofGAL4, GAL80, and GAL3. GAL6produces another regulatory factor thought to repress the GAL enzymes in a manner similar to GAL80. Dotted interactions denote model refinements supported by this study.

Although these genes and interactions form the core of the GAL pathway, the complete regulatory mechanism is more complex (8–11) and involves genes whose roles in galactose utilization are not entirely clear (12, 13). For instance, the gene GAL6 (LAP3) functions predominantly in a drug-resistance pathway, but can suppress transcription of the GAL transporter and enzymes under certain conditions and may itself be transcriptionally controlled byGAL4 (14).

Step (ii): Guided by the current model, we applied 20 initial perturbations to the GAL pathway. Wild-type (wt) and nine genetically altered yeast strains were examined (15), each with a complete deletion of one of the nine GAL genes: transport (gal2Δ), enzymatic (gal1Δ,gal5Δ, gal7Δ, or gal10Δ), or regulatory (gal3Δ, gal4Δ, gal6Δ, or gal80Δ). These strains were perturbed environmentally by growth in the presence (+gal) or absence (–gal) of 2% galactose, with 2% raffinose provided in both media (16).

We examined global changes in mRNA expression resulting from each perturbation, with DNA microarrays of approximately 6200 nuclear yeast genes as described (17). In each experiment, fluorescently labeled cDNA from a perturbed strain was hybridized against labeled cDNA from a reference strain (wt, grown in +gal media). To obtain robust estimates of fluorescent intensity, four replicate hybridizations were performed for each perturbation. Using a statistical method based on maximum-likelihood estimation (18), we identified 997 genes whose mRNA levels differed significantly from reference under one or more perturbations. This set was then divided into 16 clusters using self-organizing maps (19), where each cluster contained genes with similar expression responses over all perturbations. Figure 2 displays a matrix summarizing the effects of perturbation on mRNA expression of the GAL genes and gene clusters [complete data provided in Web table 1 (20)].

Figure 2

Perturbation matrix. Microarrays were used to measure the mRNA expression profiles of yeast growing under each of 20 perturbations to the GAL pathway. (A) Each spot represents the change in expression of a GAL gene due to a particular perturbation (listed above each column); medium gray (i.e., the same level as figure background) represents no change, whereas darker or lighter shades represent increased or reduced expression, respectively. The left half of the matrix shows expression changes for each deletion strain as compared to wt, with both strains grown in the presence of galactose; the right half shows the same differential comparison, but with both strains grown in the absence of galactose (35). The wt+gal versus wt–gal perturbation (far left) isolates the effects of growth with and without galactose, whereas the wt+gal versus wt+gal perturbation (second from left) serves as a negative control. (B) Expression profiles as in (A), with significant changes (λ ≥ 45) circled in magenta. Also superimposed are the qualitative changes (+/−) that we expect using the Fig. 1 model [see Step (iv)]. (C) Average expression profiles for genes in each of 16 clusters. Clusters contained genes involved in a variety of metabolic processes, as well as genes of unknown function [Web table 1 (20)]; particular Biological Processes [Gene Ontology Database, March 2001 (36)] occurring at higher-than-expected frequencies within each cluster are annotated at right. (D) Cellular doubling time in each of the 20 conditions, measured before harvest.

Are the observed changes in mRNA expression also reflected at the level of protein abundance? To address this question, we examined differences in protein abundance between wt+gal andwt–gal conditions using isotope-coded affinity tag (ICAT) reagents and tandem mass spectrometry (MS/MS) (21). Equal amounts of protein extracts from wt+gal andwt–gal cultures were labeled with isotopically heavy and normal ICAT reagents, respectively, then combined and digested with trypsin. The resulting peptide mixture was fractionated by multidimensional chromatography and analyzed by MS/MS. Computational analysis of the tandem mass spectra was used to identify the proteins from which specific peptides originated and to indicate relative abundances of the heavy and normal ICAT isotopes of each of these peptides.

We obtained protein-abundance ratios for a total of 289 proteins [Web table 1 (20)], including all of the GAL enzymes and the transporter. Figure 3 shows protein-abundance ratios versus the corresponding mRNA-expression ratios obtained with DNA microarrays: as a whole, protein-abundance ratios were moderately correlated with their mRNA counterparts (r = 0.61, P < 1.3 × 10−20). Although approximately 30 proteins displayed clear changes in abundance between the wt+gal andwt–gal conditions (∣log10 ratio∣ > 0.25), mRNA levels for 15 of these did not change significantly in response to any perturbation, suggesting that these proteins may be regulated posttranscriptionally. In addition, many ribosomal-protein genes increased three- to fivefold in mRNA but not in protein abundance in response to galactose addition. These results underscore the importance of integrated mRNA- and protein-expression measurements for understanding biological systems.

Figure 3

Scatter plot of protein expression versus mRNA expression ratios. Ratios of wt+gal to wt−gal protein expression, measured for each of 289 genes using the ICAT technique, are plotted against the corresponding mRNA expression ratios measured by microarray. Many genes with elevated mRNA or protein expression in wt+gal were metabolic (▴) or ribosomal (⧫), whereas genes involved in respiration (▾) almost always had reduced expression levels. Names of genes that were indistinguishable in both mRNA and protein (due to high sequence similarity) are separated by a slash.

Step (iii): Can we attribute the observed mRNA and protein changes to underlying regulatory interactions in the cell? Although we already have a model of interactions among the GAL genes, it does not address changes in expression observed for the hundreds of other genes appearing in Figs. 2 and 3. To supplement this model, we assembled a catalog of previously observed physical interactions in yeast by combining a published list of 2709 protein-protein interactions (3) with 317 protein→DNA interactions recorded in the transcription-factor databases (22). A total of 348 genes associated with interactions in this catalog were affected in mRNA or protein expression by at least one perturbation or involved in two or more interactions with affected genes. Figure 4A displays these genes graphically, along with their 362 associated interactions, as a physical-interaction network.

Figure 4

Integrated physical-interaction network. Nodes represent genes, a yellow arrow directed from one node to another signifies that the protein encoded by the first gene can influence the transcription of the second by DNA binding (protein→DNA), and a blue line between two nodes signifies that the corresponding proteins can physically interact (protein-protein). Highly interconnected groups of genes tend to have common biological function and are labeled accordingly. (A) Effects of the gal4Δ+gal perturbation are superimposed on the network, with GAL4 colored red and the gray scale intensity of other nodes representing changes in mRNA as inFig. 2 (node diameter also scales with the magnitude of change). Regions corresponding to (B) galactose utilization and (C) amino acid synthesis are detailed at right. Graphical layout and network display were performed automatically using software based on the LEDA toolbox (37). An enlarged version of (A) is provided in (20).

Genes linked by physical interactions in the network tend to have more strongly correlated expression profiles than genes chosen at random (P < 0.001). We believe that these correlations identify network interactions that are likely to have transmitted a change in expression from one gene (or protein) to another over our 20 perturbations. Most straightforwardly, a protein→DNA interaction may be responsible for directly transmitting an expression change from a transcription factor to a highly correlated target gene (e.g., Mcm1→Far1 and Mig1→Fbp1; mRNA expression profile correlations are rMcm1,Far1 = 0.82 andrMig1,Fbp1 = 0.63). Alternatively, genes A and B may be under control of a common transcription factor C→(A,B): coexpression of A and B provides evidence that C transmits these changes, regardless of whether C itself changes detectably in expression. This is the case for the GAL enzymes regulated by Gal4 (Fig. 4B), amino acid synthesis genes regulated by Gcn4 (Fig. 4C), and a class of gluconeogenic genes controlled by Sip4 (Sip4→Fbp1, Pck1, Icl1). Finally, we may scan the network for indirect effects, such as a change in A transmitted to B through a protein-protein interaction with a signaling protein (e.g., Gcr2–Gcr1→Tpi1;rGcr2,Tpi1 = –0.86). Many other physically interacting, strongly correlated genes are listed in Web table 2 (20); each of these associates an observed change in gene expression with the regulatory interaction(s) likely to have caused it.

Ultimately, we wish to determine paths through the network connecting perturbed GAL genes to every other affected gene. This is not always possible, because many of the required interactions linking galactose utilization to other metabolic processes are still unknown. However, analysis of our expression data suggests that Gal4p directly regulates genes in several of these processes through novel protein→DNA interactions. To identify putative interactions, we looked for the well-characterized Gal4p-binding site (23) upstream of genes in expression clusters 1, 2, and 3, which together contained all seven genes with established Gal4p-binding sites. Of the 87 remaining genes in these three clusters, nine had Gal4p-binding sites not previously identified [Web table 3 (20)], a significantly greater proportion than were found in clusters 4 through 16 (10.3% versus 2.8%; P < 0.002). This set of nine contained genes involved in glycogen accumulation and protein metabolism as well as several genes of unknown function (e.g.,YMR318C, a gene shown in Fig. 3 to have strong mRNA and protein responses to galactose induction) (24). As shown inFig. 1, we suggest that Gal4p may regulate these genes by direct binding.

Step (iv): Lastly, how do the observed responses of GAL genes compare to their predicted behavior? Figure 2B shows the qualitative changes (+ and –) in mRNA expression that we predicted based on the model shown in Fig. 1 and from current knowledge of galactose utilization as summarized in Step (i). In general, these were in good agreement with the observed changes. For example, growth of wild-type cells in +gal versus –gal media significantly inducedGAL1, GAL2, GAL7, GAL10, and GAL80 as expected, while deleting the positive regulators GAL3 and GAL4 led to a significant expression decrease in many of these genes. In –gal media, deletion of the repressor GAL80 caused a dramatic increase in GAL-enzyme expression; in +gal, this deletion had little or no effect on these genes, presumably because they were already highly expressed.

A number of observations were not predicted by the model and are listed in Web table 4 (20); in many cases, these suggest new regulatory phenomena that may be tested by hypothesis-driven approaches. For example, in the presence of galactose, gal7 and gal10 deletions unexpectedly reduced the expression levels of other GAL enzymes. Because the metabolite Gal-1-P is known to accumulate in cells lacking functional Gal7 and is detrimental in large quantities (25), one hypothesis is that the observed expression-level changes are dependent on build-up of Gal-1-P or one of its metabolic derivatives. Under this model, the cell would limit metabolite accumulation by first sensing toxic levels through an unknown mechanism, then triggering a decrease in GAL-enzyme expression (Fig. 1). Alternative scenarios are also possible, such as a model in which GAL10 influences the expression of GAL7 and GAL1 through transcriptional interference within the GAL1-10-7 locus (9).

To test the hypothesis that the effects ofgal7Δ and gal10Δ are dependent on increased levels of Gal-1-P or a derivative molecule, we examined the expression profile of a gal1Δgal10Δ double deletion growing in +gal conditions (relative to the wt+gal reference). We predicted that in this strain, the absence ofGAL1 activity would prevent build-up of Gal-1-P and the changes in GAL gene expression would not occur. Conversely, if the expression changes did not depend on Gal-1-P (e.g., are caused by chromosomal interactions at the GAL1-10-7 locus), they would also be likely to occur in the gal1Δgal10Δ strain. Consistent with our initial hypothesis, GAL-enzyme expression was not significantly affected by this perturbation, and as shown in Fig. 5, the expression profile ofgal1Δgal10Δ over all affected genes was more similar to the profile of gal1Δ+gal than to that ofgal10Δ+gal or any other perturbation.

Figure 5

Tree comparing gene-expression changes resulting from different perturbations to the GAL pathway. We used theNeighbor and Drawtree programs (38) to construct a hierarchical-clustering tree (39) based on Euclidean distance between perturbation profiles, where each profile consists of log10 mRNA expression ratios over the set of 997 significantly affected genes. The closer two perturbations are to each other through the branches of the tree, the more similar their observed changes in gene expression. Leaves of the tree are labeled with the relevant genetic perturbation (wild-type or gene deletion) followed by the environmental perturbation (+/– gal). Twenty initial perturbations (solid branches) and three follow-up perturbations are shown (dotted branches). As in Fig. 2, profiles for all genetic perturbations are relative to that of the wild type, with both strains grown in identical media (+gal or –gal).

Another unanticipated observation was the slow growth of the gal80Δ mutant in –gal conditions (Fig. 2D), the large number of gene clusters affected by this perturbation (Fig. 2C, compare rightmost column to the other eight columns in the –gal set), and the corresponding large distance between thegal80Δ–gal expression profile and every other profile in Fig. 5. Since this perturbation leads to constitutive expression of the GAL enzymes and transporter, we wished to determine whether the widespread expression changes were dependent on these genes. Accordingly, we measured the expression profile of agal4Δgal80Δ–gal double deletion, in which the GAL enzymes and transporter are not expressed. Both the doubling time (144 min) and overall expression profile of this strain (Fig. 5) were more similar to those of gal4Δ (129 min) thangal80Δ (205 min), suggesting that the effects of thegal80Δ perturbation are indeed mediated by other GAL genes. To further determine which GAL genes were important for the effect, we measured the expression profile of agal2Δgal80Δ–gal perturbation, in which the GAL transporter was absent. This profile was more similar to that ofgal2Δ than gal80Δ, providing evidence that the transporter is necessary to produce the slow growth and expression changes seen for the gal80Δ perturbation.

We expect that more directed experimental approaches (i.e., biochemistry, genetics, cell biology) will be required to test these ideas and further deepen our understanding of galactose utilization and its interacting networks. Even so, global and integrated analyses are extremely powerful for suggesting new hypotheses, especially with regard to the regulation of a pathway and its interconnections with other pathways. As technologies for cellular perturbation and global measurement mature, these approaches will soon become feasible in higher eukaryotes.

  • * To whom correspondence should be addressed at The Institute for Systems Biology. E-mail: tideker{at}


View Abstract

Navigate This Article