A Biochemical Genomics Approach for Identifying Genes by the Activity of Their Products

See allHide authors and affiliations

Science  05 Nov 1999:
Vol. 286, Issue 5442, pp. 1153-1155
DOI: 10.1126/science.286.5442.1153


For the identification of yeast genes specifying biochemical activities, a genomic strategy that is rapid, sensitive, and widely applicable was developed with an array of 6144 individual yeast strains, each containing a different yeast open reading frame (ORF) fused to glutathione S-transferase (GST). For the identification of ORF-associated activities, strains were grown in defined pools, and GST-ORFs were purified. Then, pools were assayed for activities, and active pools were deconvoluted to identify the source strains. Three previously unknown ORF-associated activities were identified with this strategy: a cyclic phosphodiesterase that acts on adenosine diphosphate–ribose 1"-2" cyclic phosphate (Appr>p), an Appr-1"-p–processing activity, and a cytochrome c methyltransferase.

A major task in the analysis of any biochemical activity is the purification and identification of the polypeptide or polypeptides responsible for that activity. Purification is often difficult, time consuming, and expensive, yet it is often a necessary prerequisite for cloning of the gene and subsequent detailed biochemical and genetic study. An alternative to purification is expression cloning: the introduction of cDNA pools into various host cells, followed by screening for activity and identifying the responsible cDNA (1). This method is inherently limited to those proteins that are easily detectable in the background of host cell proteins. Yet, given the accumulation of complete genome sequences, such as that of the yeast Saccharomyces cerevisiae, the sequences of genes encoding every biochemical activity of these organisms are already available. The challenge is how to use this information to connect biochemical activity with a specific gene.

We developed a rapid and sensitive genomic method for identifying yeast genes encoding biochemical activities, which is applicable for almost any detectable activity. We first constructed an array of 6144 strains, each of which bears a plasmid expressing a different GST-ORF fusion under control of the PCUP1 promoter (2). To identify genes encoding particular biochemical activities, we purified this genomic set of GST-ORFs in 64 pools of 96 fusions each (3). Then, we assayed the pools for a particular activity and deconvoluted active pools to identify the strain and ORF responsible for the activity. Assay of the GST-ORF pools demonstrates that each of two previously known tRNA splicing activities is detected only in the pools that contain their respective GST fusions: tRNA ligase (4) in pool 35 (Fig. 1A) and 2′-phosphotransferase (5) in pool 46 (Fig. 1B).

Figure 1

Genomic assay of GST-ORF pools. (A) Transfer RNA ligase activity. [α-32P]adenosine triphosphate (ATP)–labeled precursor tRNAPhe transcript was incubated in 10-μl mixtures containing endonuclease (10), ATP, and 2 μl of each GST-ORF pool, and products were resolved by PAGE (4). Lane a, precursor–tRNA (pre-tRNA); lane b, treated with endonuclease; lane c, treated with endonuclease and tRNA ligase. (B) 2′-phosphotransferase activity. [5′-32P]-ApAppA was incubated in 10-μl mixtures containing nicotinamide adenine dinucleotide and 0.1 μl of GST-ORF pools, and products were resolved on polyethyleneimine-cellulose thin-layer plates (11). Asterisk indicates position of label. Lane a, substrate; lane b, with Tpt1 protein.

The GST-ORFs were used to identify three previously unknown genes by biochemical assay of their products. A highly specific cyclic phosphodiesterase (CPDase) (6) that could convert Appr>p, produced during tRNA splicing (7), to Appr-1"-p was localized to pool 4 (Fig. 2A), and an otherwise uncharacterized Appr-1"-p–processing activity was found in pool 6 (Fig. 2B). We further explored the usefulness of the pools by searching for a protein-modifying enzyme. Yeast cytochrome c is known to have a trimethyllysine (8), and pool 23 has a methyltransferase that is active with horse cytochrome c, but not with bovine serum albumin (Fig. 2C).

Figure 2

Genomic assays of previously unassigned activities. (A) Appr>p cyclic phosphodiesterase. Reaction mixtures (10 μl) contained buffer, Ap*pr>p, and 2 μl of each GST-ORF pool, and products were resolved on thin-layer plates (6). Lane a, substrate; lane b, with cyclic phosphodiesterase. (B) Appr-1"-p processing. Ap*pr>p (lane a) was treated with cyclic phosphodiesterase (6) to generate Ap*pr-1"-p (lane b) in 8-μl reaction mixtures, followed by the addition of 2 μl of each GST-ORF pool, further incubation, and resolution of products on thin-layer plates. (C) Cytochrome c methyltransferase. Reaction mixtures (100 μl) containing 0.1 M Hepes (pH 7.9), 2 mM EDTA, 4 mM MgCl2, 1 mM DTT, 1 μCi of [3H-methyl] Adomet, 0.28 mg of horse cytochrome c, and 2 μl of each GST-ORF were incubated for 1 hour, precipitated with trichloroacetic acid, and counted to measure incorporation (12). cpm, counts per minute.

To determine the strain responsible for each activity, we prepared and assayed the GST-ORFs from each of the 8 rows and 12 columns of strains from the corresponding microtiter plates. In this way, CPDase activity was associated with strain MRM 319 (expressing YGR247w) in row C and column 7 (position C7) of plate 4 (Fig. 3A), Appr-1"-p processing was associated with MRM 546 (expressing YBR022w) at position F6 of pool 6, and cytochrome c methyltransferase activity was associated with MRM 2122 (expressing YHR109w) at position A10 of plate 23. In all three cases, we separately confirmed that the GST-ORF preparation from the individual strain had the expected activity (Fig. 3A) and that the plasmid DNA contained the expected ORF. No homologs of these ORFs were detected by routine BLAST searches.

Figure 3

Association of YGR247w with CPDase activity. (A) Deconvolution of CPDase from pool 4 implicates YGR247w. GST-ORFs prepared from rows and columns of strains on microtiter plate 4 and from MRM 319 (position C7) were assayed for CPDase. (B) CPDase is associated with YGR247w during conventional purification. Portions from the penultimate hydroxyapatite peak (lane a) and fractions 11 through 17 from the final Orange A Sepharose column (lanes b through h), containing the indicated number of units of activity, were resolved by SDS-PAGE and stained with silver. Arrow indicates the 25-kD band judged to copurify with activity.

Although these GST-ORFs copurify with the corresponding activities, they may instead interact with or be a subunit of their respective enzyme. For example, conventional purification of CPDase from crude extracts of wild-type cells confirms that ORF YGR247w copurifies with activity (9). SDS–polyacrylamide gel electrophoresis (PAGE) analysis of fractions from the final column indicated that the upper band of the 25-kD doublet copurified with CPDase activity (Fig. 3B, arrow), and mass spectrometry indicated that this band is YGR247w. However, we infer that YGR247w requires some other limiting factor for CPDase activity because overexpression of the ORF (or the GST-ORF) in yeast or the His6 ORF in Escherichia coli does not result in increased activity in extracts. It is also conceivable that any of these three enzymes might use other substrates in vivo; more biochemical and in vivo analyses are required to fully assess their cellular roles.

In principle, this biochemical genomics approach can be used to identify the gene associated with any biochemical activity, provided that the GST-ORF is functional, is solubilized during extraction, and retains other required components when purified. A large number of NH2-terminal fusions are functional (for example, in two-hybrid screens), and simple modifications of our procedure could be used for the study of membrane proteins, protein complexes, and proteins that might be toxic when overproduced.

This biochemical genomics strategy has distinct advantages over conventional purification or expression cloning. First, it is rapid. The identification of an ORF-associated activity presently takes ∼2 weeks, starting with the 64 purified GST-ORF pools, and with appropriate pooling strategies, identification could take 1 day. In contrast, months or years are required for purification or expression cloning. Second, it is sensitive. Activities can be detected in the purified GST-ORF pools that simply cannot be detected in extracts or cells, the starting point of both conventional purification and expression cloning. Because the GST-ORFs are individually expressed at high levels and are largely free of extract proteins after purification, activities can be measured for hours without competing activities that destroy the substrate, the product, or the enzymes.

In addition to the conventional use demonstrated here, this array could be used in two other ways: (i) to determine the range of potential substrate proteins for any protein-modifying enzyme (such as a protein kinase) before genetic or biochemical tests to establish authentic substrates and (ii) to identify genes encoding proteins that bind any particular macromolecule, ligand, or drug. Thus, one could rapidly ascribe function to many presently unclassified yeast proteins, complementing other genomic approaches to deduce gene function from expression patterns, mutant phenotypes, localization of gene products, and identification of interacting partners.

  • * To whom correspondence should be addressed. E-mail: eric_phizicky{at}


View Abstract

Navigate This Article