Global Analysis of Protein Activities Using Proteome Chips

See allHide authors and affiliations

Science  14 Sep 2001:
Vol. 293, Issue 5537, pp. 2101-2105
DOI: 10.1126/science.1062191


To facilitate studies of the yeast proteome, we cloned 5800 open reading frames and overexpressed and purified their corresponding proteins. The proteins were printed onto slides at high spatial density to form a yeast proteome microarray and screened for their ability to interact with proteins and phospholipids. We identified many new calmodulin- and phospholipid-interacting proteins; a common potential binding motif was identified for many of the calmodulin-binding proteins. Thus, microarrays of an entire eukaryotic proteome can be prepared and screened for diverse biochemical activities. The microarrays can also be used to screen protein-drug interactions and to detect posttranslational modifications.

A daunting task after a genome has been fully sequenced is to understand the functions, modification, and regulation of every encoded protein (1). Currently, much effort is devoted toward studying gene, and hence protein, function and regulation by analyzing mRNA expression profiles, gene disruption phenotypes, two-hybrid interactions, and protein subcellular localization (2). Although these studies are useful, much information about protein function can be derived from the analysis of biochemical activities (3–7). In principle, the biochemical activities of proteins can be systematically probed by producing proteins in a high-throughput fashion and analyzing the functions of hundreds or thousands of protein samples in parallel using protein microarrays (5, 6, 8). Major hurdles in screening an entire proteome array have been the ability to generate the necessary expression clones and also the expression and purification of proteins in a high-throughput fashion.

We have constructed a yeast proteome microarray containing approximately 80% yeast proteins and screened it for a number of biochemical activities. We first built a high-quality collection of 5800 yeast open reading frames (ORFs) (93.5% of the total) cloned into a yeast high-copy expression vector using recombination cloning (9). The yeast proteins are fused to glutathioneS-transferase–polyhistidine (GST-HisX6) at their NH2-termini and expressed in yeast using the inducibleGAL1 promoter (5, 9). The yeast expression strains contain individual plasmids in which the correct yeast ORFs have been shown to be fused in-frame to GST by DNA sequencing. The proteins were expressed in yeast to help ensure that the proteins were modified and folded properly. Using a 96-well format, 1152 samples were purified at once from yeast extracts using glutathione-agarose beads (10). We included 0.1% Triton in the lysis buffer and washes to ensure that the purified proteins were free of lipids. The quality and quantity of the purified proteins were monitored using immunoblot analysis of 60 random samples (Fig. 1A). Greater than 80% of the strains produced detectable amounts of fusion proteins of the expected molecular weight.

Figure 1

GST::yeast proteins were purified in a 96-well format. (A) Sixty samples were examined by immunoblot analysis using anti-GST; 19 representative examples are shown. Greater than 80% of the preparations produce high yields of fusion protein. (B) 6566 protein samples representing 5800 unique proteins were spotted in duplicate on a single nickel-coated microscope slide. The slide was probed with anti-GST (10). (C) An enlarged image of one of the 48 blocks is depicted to the right of the proteome chip.

To prepare the proteome chips, we printed 6566 protein preparations, representing 5800 different yeast proteins, in duplicate onto glass slides using a commercially available microarrayer. Our initial experiments used aldehyde-treated microscope slides (6) in which fusion proteins attach to the surface through primary amines at their NH2-termini or other residues of the protein. In subsequent experiments, we spotted proteins onto nickel-coated slides, in which the fusion proteins attach through their HisX6 tags and presumably uniformly orient away from the surface. Although both slides were successful, the nickel-coated slides gave superior signals for our particular protein preparations (Fig. 1B).

To determine how much fusion protein was covalently attached to different glass surfaces and the reproducibility of the protein attachment, we probed the chips with antibodies to GST (anti-GST). More than 93.5% of the protein samples gave signals significantly above background (i.e., greater than 10 fg of protein), and 90% of the spots contained 10 to 950 fg of protein. Our results also demonstrate that it is feasible to spot, with excellent resolution, 13,000 protein samples in one-half the area of a standard microscope slide (Fig. 1C). To test the reproducibility of the protein spotting, we compared the signals from each pair of duplicated spots with one another; 95% of the signals were within 5% of the average (10).

The proteome chips were tested by probing for several protein-protein interactions and protein-lipid interactions. To test for protein-protein interactions, the yeast proteome was probed with biotinylated calmodulin in the presence of calcium (11). Calmodulin is a highly conserved calcium-binding protein involved in many calcium-regulated cellular processes and has many known partners (12). The bound biotinylated protein was detected using Cy3-labeled streptavidin. As a control, we also probed with Cy3-labeled streptavidin alone. These studies identified six known calmodulin targets (Fig. 2A): Cmk1p and Cmk2p are the type I and II calcium/calmodulin-dependent serine/threonine protein kinases (12), Cmp2p is one of the two yeast calcineurins (13), Dst1p plays a role in transcription elongation (14), Myo4p is a class V myosin heavy chain (15), and Arc35p is a component of the Arp2/3 actin-organizing complex (16). Arc35p was recently shown to interact with calmodulin in a two-hybrid study (17); thus, our data confirm that Arc35p and calmodulin interact in vitro. Of the six known calmodulin targets that we did not detect, two are not in our collection and the rest were not detectable in the GST probing experiments. In addition to known partners, the calmodulin probe identified 33 additional potential partners. These include many different types of proteins [supplementary table 1 (10)], consistent with a role for calmodulin in many diverse cellular processes.

Figure 2

(A) Examples of different assays on the proteome chips. Proteome chips containing 6566 yeast proteins were spotted in duplicate and incubated with the biotinylated probes indicated. The positive signals in duplicate (green) are in the bottom row of each panel; the top row of each panel shows the same yeast protein preparations of a control proteome chip probed with anti-GST (red). The upper panel shows the amounts of GST fusion proteins as detected by the anti-GST (red). (B) A putative calmodulin-binding motif (32) is shown, which was identified by searching for amino acid sequences that are shared by the different calmodulin targets (10). Fourteen of 39 positive proteins share a motif whose consensus is (I/L)QXK(K/X)GB, where X is any residue and B is a basic residue. The size of the letter indicates the relative frequency of the amino acid indicated.

Sequence searching (5) revealed that 14 of the 39 calmodulin-binding proteins contain a motif whose consensus is (I/L)QXXK(K/X)GB, where X is any residue and B is a basic residue (Fig. 2B). A related sequence in myosins, IQXXXXKXXXR, has been shown previously to bind calmodulin (18). Thus, we demonstrate that the domain is found in many calmodulin-binding proteins. Presumably the other targets that lack this motif have other calmodulin-binding sequences (10).

In addition to the calmodulin-binding targets, we also identified one protein, Pyc1p, that bound Cy3-labeled streptavidin. Pyc1p encodes a pyruvate carboxylase 1 homolog that contains a highly conserved biotin attachment region (19). Thus, as predicted by its sequence, Pyc1p is biotinylated in vivo. With appropriate detection assays, we expect that proteome chips can identify many types of posttranslational modification of proteins.

To test whether proteome chips could be used to identify activities that might not be accessible by other approaches, such as protein-drug interactions and protein-lipid interactions, we screened for phosphoinositide (PI)–binding proteins. PIs are important constituents of cellular membrane and also serve as second-messengers that regulate diverse cellular processes, including growth, differentiation, cytoskeletal rearrangements, and membrane trafficking (20). Because they are often present only transiently and in low abundance within cells, PIs have not been characterized extensively, and little is known about which proteins bind different phospholipids (20).

Five types of PI liposomes and one liposome lacking PIs were used to probe the proteome chips. Each contains phosphatidylcholine (PC) with 1% (w/w)N-(biotinoyl)-1,2-dihexadecanoyl-sn-glycero-3-phosphoethanolamine, triethylammonium salt (biotin DHPE); the biotinylated lipid serves as a label that can be detected by Cy3-streptavidin (21). In addition to PC, the five other liposomes contain either 5% (w/w) PI(3)P, PI(4)P, PI(3,4)P2, PI(4,5)P2, or PI(3,4,5)P3 (Fig. 2A). All of these phospholipids have been found in yeast except PI(3,4,5)P3 (20).

The six liposomes identified a total of 150 different protein targets that produced signals significantly higher than the background; an algorithm was devised to assist in the identification of positive signals (22). Fifty-two (35%) of the lipid-binding proteins correspond to uncharacterized proteins. Of the 98 known proteins, 45 proteins are membrane-associated and either have, or are predicted to have, membrane-spanning regions (23, 24). This includes integral membrane proteins, those with lipid modifications [e.g., the glycosylphosphatidylinositol (GPI) anchor proteins Tos6p and Sps2p (23) and prenylated proteins (Gpa2p and the mating pheromone a-factor) (25)], as well as peripherally associated proteins [e.g., Kcc4p and Myo4p (15,26)]. Eight others are involved in lipid metabolism (e.g., Bpl1p) or inositol ring phosphorylation (e.g., Kcs1p) or are predicted to be involved in membrane or lipid function (e.g., Ylr020cp has homology to triacylglycerol lipase). Of the 52 uncharacterized proteins, 13 (25%) are predicted to be associated with membranes (24) and others contain basic stretches, as might be expected for electrostatic interactions with negatively charged lipids. Surprisingly, 19 of the lipid-binding proteins are kinases, and 17 of these are protein kinases.

The phospholipid-binding proteins were sorted into whether they bound lipids strongly or weakly, on the basis of the phosphoplipid-binding signal relative to the amount of GST (Fig. 3) (22). We found that more (72%) of the strong lipid-binding proteins (Fig. 3, A and B) were characterized relative to the weakly binding proteins (54%) (Fig. 3, C and D) and more strong lipid-binding proteins are known or predicted to be membrane-associated, relative to the weaker binding proteins (Fig. 3, “Membrane” column). Interestingly, 13 of 17 of the protein kinases bind very strongly to the PIs. We further grouped the proteins by whether they preferentially bound one or more PIs over PC. One-hundred and one proteins bound to PC as well as or nearly as well as to the PIs (PI/PC < 1.3) (Fig. 3, B and D). However, 49 proteins bound to one or more PIs preferentially (PI/PC > 1.3) (Fig. 3, A and C). Analysis of the strong PI-binding proteins revealed that many of them specifically bound particular PIs. For example, Stp22p, which is required for vacuolar targeting of plasma membrane proteins such as Ste2p and Can1p, preferentially binds PI(3)P (27). Nine protein kinases specifically bind PI(4)P and PI(3,4)P2 strongly and one binds these lipids weakly. Atp1p, a subunit of the F1-ATP synthase of the mitochondrial inner membrane, preferentially binds PI(4,5)P2 (28). Sps2p, which is localized to the prospore membrane (29), also interacts specifically with PI(3,4)P2. Preferential binding of Myo4p to PI(3,4)P2 may be important for its interaction at the cell cortex and/or its regulation. No strong lipid-binding targets were found that specifically bound PI(3,4,5)P3, although some proteins bound both this lipid and others (Fig. 3). These results demonstrate that many membrane-associated proteins, including integral membrane proteins and peripherally associated proteins, preferentially bind specific phospholipids in vivo.

Figure 3

Analysis of the PI lipid-binding proteins. To determine the PI-binding specificity of 150 positive proteins, their binding signals were normalized against the corresponding binding signals of PC. On the basis of the ratios (PI/PC), the proteins were grouped into four categories: (A) 30 strong and specific, (B) 43 strong and nonspecific, (C) 19 weak and specific, and (D) 58 weak and nonspecific PI-binding proteins. The green color intensity represents the PI/PC signal ratio as shown by the scale in the figure. The first column to the right of the PI/PC binding ratios indicates the maximum binding signal intensity (open boxes) and its confidence interval (solid horizontal lines); the numbers indicate the log of the values. Blue, yellow, light-yellow, and red boxes in columns to the right of confidence interval column identify membrane-associated proteins, protein kinases, other kinases, and uncharacterized ORFs, respectively.

Several proteins involved in glucose metabolism were identified as phospholipid-binding proteins. This includes (i) three enzymes involved in sequential glycolytic steps [phosphoglycerate mutase (Gpm3p), enolase (Eno2p), and pyruvate kinase (Cdc19p/Pyk1p)], (ii) hexokinase (Hxk1p), and (iii) two protein kinases (Snf1p and Rim15p). Although unexpected, previous studies indicate that some of these might interact with lipids. Hxk1p binds zwitterrion micelles, which stimulate its activity (30), and Eno2p is secreted, suggesting an interaction with membranes (31). We speculate that either phospholipids regulate steps involved in glucose metabolism or many steps of glucose metabolism occur on membrane surfaces. In the latter case, the phospholipids would serve as a scaffold to efficiently carry out glycolytic steps.

Six proteins not expected to be involved in membrane function or lipid signaling, Rim15p, Eno2p, Hxk1p, Sps1p, Ygl059wp, and Gcn2p, were further tested for PI-binding using two types of standard assays (30). For three proteins, Rim15p, Eno2p, and Hxk1p, PI(4,5)P2 liposomes were first adhered to a nitrocellulose membrane; different amounts of the GST fusion proteins and a GST control were used to probe the membrane, and bound proteins were detected using anti-GST. As shown in Fig. 4A, each yeast fusion protein tightly bound PI(4,5)P2, whereas GST alone did not. We also carried out the reverse assay for GST fusion proteins of Rim15p, Sps1p, Ygl059wp, and Gcn2p (30). Different amounts of these purified proteins were spotted onto nitrocellulose and probed with the six different liposomes (Fig. 4B); the bound liposomes were detected using a horseradish peroxidase (HRP)–conjugated streptavidin. As with the microarrays, liposomes bound to each protein, but not the bovine serum albumin control. Sps1p bound all five PI-containing liposomes nearly equally. Rim15p, Gcn2p, and Ygl059wp exhibited different affinities to different liposomes (see Fig. 4B for Rim15p); PI(3)P, PI(4)P, and PI(3,4)P2 bound strongest. In each case, a linear correlation between the binding signal and the level of protein was revealed (10). In summary, these results demonstrate that PI-binding proteins identified in the proteome array also bind lipids in conventional assays.

Figure 4

Conventional methods confirm protein-lipid interactions detected by the proteome microarrays (30). (A) PI(4,5)P2 liposomes were first adhered to a nitrocellulose membrane; a dilution series of Rim15p, Eno2p, and Hxk1p, and a GST control were used to probe the membrane. The bound proteins were detected using anti-GST and an ECL kit. (B) A reverse assay was carried out to test potential protein-lipid interactions. The proteins were prepared and spotted onto nitrocellulose filters in a dilution series and probed with the six different liposomes. As a control, the six liposomes were also added to the membrane. After extensive washing, the bound liposomes were detected using an HRP-conjugated streptavidin and an ECL kit.

One concern about our experiments is that because proteins are purified from yeast, we might detect indirect interactions through associated proteins. Most of the interactions that we detect are expected to be directly or at least tightly associated with the protein of interest, because proteins were prepared using stringent conditions, and for the seven samples examined, contaminating bands were not detected using Coomassie staining. Another limitation is that properly folded extracellular domains and secreted proteins are likely to be underrepresented in our collection, because GST and a HisX6 tag are fused at the NH2-terminus. Thus, proteins with a signal peptide may not be delivered to the secretory pathway and may not be folded or modified properly, although we did detect three signal peptide–containing proteins, suggesting that at least some are produced and contain functional domains. Another limitation is that not all interactions are detected because not all proteins are readily overproduced and purified in this high-throughput approach; we expect that 80% of the arrayed yeast proteins are full length and at reasonable levels for screening.

Regardless, the use of proteome chips has significant advantages over existing approaches. Random expression libraries are incomplete, the clones are often not full length, and the libraries are tedious to screen. A recent alternative approach is to generate defined arrays and screen them using a pooling strategy (4). The pooling strategy requires two steps, the actual number of proteins screened is not known, and the method does not work well when large numbers of reacting proteins exist, because each pool will test positive. Another method for detecting interactions is the two-hybrid approach (2), in which interactions are typically detected in the nucleus, thus limiting the types of interactions that can be detected. The advantage of the proteome chip approach is that a comprehensive set of individual proteins can be directly screened in vitro for a wide variety of activities, including protein-drug interactions, protein-lipid interactions, and enzymatic assays using a wide range of in vitro conditions. Furthermore, once the proteins are prepared, proteome screening is significantly faster and cheaper. Using similar procedures, it is clearly possible to prepare protein arrays of 10 to 100,000 proteins for global proteome analysis in humans and other eukaryotes.

  • * To whom correspondence should be addressed. E-mail: michael.snyder{at}


View Abstract

Navigate This Article