Report

Protein Interaction Mapping in C. elegans Using Proteins Involved in Vulval Development

See allHide authors and affiliations

Science  07 Jan 2000:
Vol. 287, Issue 5450, pp. 116-122
DOI: 10.1126/science.287.5450.116

Abstract

Protein interaction mapping using large-scale two-hybrid analysis has been proposed as a way to functionally annotate large numbers of uncharacterized proteins predicted by complete genome sequences. This approach was examined in Caenorhabditis elegans, starting with 27 proteins involved in vulval development. The resulting map reveals both known and new potential interactions and provides a functional annotation for approximately 100 uncharacterized gene products. A protein interaction mapping project is now feasible for C. elegans on a genome-wide scale and should contribute to the understanding of molecular mechanisms in this organism and in human diseases.

Complete genome sequences are available for Escherichia coli, Saccharomyces cerevisiae, and C. elegans, and are expected soon for other model organisms and humans (1). In addition to facilitating the identification and cloning of genes and providing valuable insights on evolution, this information is likely to change the way biological questions are addressed. It is becoming possible to study molecular mechanisms globally in the context of complete sets of genes, rather than analyzing genes individually. For example, DNA microarrays and chips can be used to monitor simultaneously the expression of nearly all genes of an organism (2). However, the function of most gene products predicted from sequencing projects is still completely uncharacterized, and it is widely accepted that this limitation needs to be overcome before full advantage can be taken of complete genome sequences. Functional assays aimed at characterizing the cellular localization of proteins, their spatial and temporal expression patterns, and their potential interacting partners should provide a backbone of functional annotations from which new biological questions can be formulated (3). Because the number of unannotated gene products in each model organism ranges from thousands to tens of thousands, it is important to develop standardized functional assays in which the same procedure can be applied to many proteins at a time, allowing utilization of high-throughput procedures.

Protein-protein interactions are crucial for many biological processes. Therefore, the knowledge of potential interactions involving otherwise uncharacterized proteins may provide insight into their function. The two-hybrid system, a standardized functional assay, facilitates the identification of potential protein-protein interactions and has been proposed as a method for the generation of protein interaction maps (4–7). Before the approach can be applied on a genome-wide scale, however, conceptual and technical issues need to be addressed. Conceptually, the biological information generated by two-hybrid analyses is often questioned because of the inherent artificial nature of the assay. Therefore, this method should be tested in a model organism using groups of proteins for which functional data are available. Technically, the cloning of open reading frames (ORFs) into appropriate expression vectors with the current techniques is laborious and expensive when dealing with hundreds or thousands of genes. Consequently, methods to standardize this process are required. Furthermore, it is necessary to decide upon a format in which the interaction data will be made available to the research community. Finally, it will be necessary to develop methods to determine the biological relevance of the potential interactions identified.

To address the issue of the potential biological relevance of protein interaction maps, we selected C. elegans as a model organism (8). The nearly complete C. elegansgenome sequence led to the prediction of ∼20,000 gene products of which approximately 700 have been functionally characterized [for example, (9)]. As a starting point, we chose to focus onC. elegans genes involved in the regulation of vulval development. At least four different pathways function coordinately to form a single vulva in the adult hermaphrodite, including a receptor tyrosine kinase (RTK)/Ras pathway (RTK/Ras), a Notch pathway (Notch), and two functionally redundant synthetic multivulva pathways: synMuv class A (synMuv A) and B (synMuv B) (10). Because many protein-protein interactions have been reported to be important for this process, the rate of false negatives in two-hybrid analyses could be estimated. However, the relationships between the products of many other genes involved in vulval development still remain to be determined. For example, among 15 synMuv gene products (11), four have been characterized in more detail so far: LIN-35[retinoblastoma protein (pRB)], its associated proteins LIN-53(RbAp48) and HDA-1 [histone deacetylase (HDAC)] (12), and LIN-36 (13). This provided an opportunity to determine whether protein interaction maps might be helpful to point to novel functional relationships between the products of less characterized (or uncharacterized) genes.

To address the technical problem of cloning multiple genes simultaneously, we took advantage of a novel method, “recombinational cloning” (RC) (Fig. 1A). RC is based on the recombination reactions involved in phage lambda integration into, and excision from, the E. coli genome. This method allows both the directional cloning of PCR products into a reference vector and the subsequent transfer of the resulting DNA inserts into many different expression vectors in vitro (14). Importantly, restriction enzymes and ligase are not required for any of these steps. The ORFs corresponding to the genes involved in vulval development (vORFs) were introduced into two-hybrid vectors using this technique (15) and the resulting clones were subsequently verified using PCR analysis (Fig. 1, B and C) (16).

Figure 1

Cloning of ORFs of genes involved in vulval development (vORFs). (A) Recombinational cloning (RC) (14). RC is based on the recombination reactions that mediate the integration and excision of phage λ into and from theE. coli genome, respectively. The integration involves recombination of the attP site of the phage DNA within the attB site located in the bacterial genome (BP reaction) and generates an integrated phage genome flanked by attL and attR sites. The excision recombines attL and attR sites back to attP and attB sites (LR reaction). The integration reaction requires two enzymes [the phage protein Integrase (Int) and the bacterial protein integration host factor (IHF)] (BP clonase). The excision reaction requires Int, IHF, and an additional phage enzyme, Excisionase (Xis) (LR clonase). Artificial derivatives of the 25-bp bacterial attB recombination site, referred to as B1 and B2, were added to the 5′ end of the primers used in PCR reactions to amplify the vORFs (Fig. 1B). The resulting products were BP cloned into a “Donor vector” containing complementary derivatives of the phage attP recombination site (P1 and P2) using BP clonase. The resulting “Entry clones” contain vORFs flanked by derivatives of the attL site (L1 and L2) and were subcloned into two-hybrid “destination vectors” which contain derivatives of the attL-ompatible attR sites (R1 and R2) using LR clonase. This resulted in “expression clones” in which vORFs are flanked by B1 and B2 and fused in frame to the DNA-binding domain (DB) or the activation domain (AD) of Gal4p. To ensure that both NH2- and COOH-terminal fusion proteins can be generated, the B1 and B2 sequences were designed to be in frame with the vORF sequences. Note that different RC vectors harbor different selectable markers. In addition, both Entry and Destination vectors contain a toxic gene which prevents growth of most commonly used E. coli strains. This allows a genetic selection for the desired end products of each reaction. In addition to R1 and R2 RC sites DB-dest and AD-dest vectors contain yeast ARS and CEN sequences and LEU2 or TRP1selectable marker, respectively. Because protein immunoblotting techniques are not compatible with high-throughput experiments, full-length vORF expression was tested using COOH-terminal fusions to GFP. However, no pDB-GFP destination vector is available at this point. Thus, vORFs were shuffled by PCR-Gap repair (17). (B) PCR amplification of vORFs. A summary of the functional information available for the vORFs shown here can be found on WormPD (23). A high-quality poly-dT primed cDNA library (AD-wrmcDNA) generated using mRNAs derived from all stages of development (15) was used as template DNA and the resulting vORF PCR products were analyzed on an agarose gel. PCR reactions were considered successful when a single band of the expected size was observed (M = DNA size-markers). Twenty-nine vORFs were successfully cloned (16) (four are not shown here) by RC into an Entry vector and subsequently into DB-dest and AD-dest vectors. The design and sequence of the PCR primers will be described elsewhere (15). (C) Recombinational cloning of vORFs. The success rate of the first RC cloning step was measured by PCR (15). Briefly, the size of the insert of Entry clones was verified and in most cases more than 50% of the colonies contained a correct size insert. For a subset of clones, the fidelity of the second RC step was verified and in all cases an insert of the correct size was observed. Five vORFs could not be cloned because of unsuccessful PCR reactions. For lin-15B, a correct-size PCR product was obtained but could not be cloned. (I), intracellular domain; *, PCR product obtained from plasmid template DNA.

The vORF two-hybrid clones were used in two versions of the two-hybrid system. First, a matrix experiment was performed with 29 vORF-encoded proteins to determine the percentage of recovery of previously reported interactions. For each DB-vORF/AD-vORF pairwise combination a diploid yeast strain was generated by mating (17,18) and tested for protein-protein interactions by scoring two-hybrid phenotypes (Fig. 2 A). At least 50% (6 of 11) of the interactions reported in the literature, either in C. elegans or in other model organisms, were detected. In most cases, failure to detect interactions can be explained by the inherent restrictions of the two-hybrid assay or the physiology of yeast cells (19). Interestingly, two novel potential interactions, LIN-10/LIN-10 and LIN-53(RbAp48)/LIN-37 were identified using this approach.

Figure 2

(Opposite) Protein interaction mapping. (A) Matrix of two-hybrid interactions between vORF-encoded proteins. The 29 vORFs cloned into pDB-dest and pAD-dest (Fig. 1) were transformed into yeast cells of opposite mating types (MaV103 and MaV203, respectively) (17). Diploids for every pairwise combination were generated by mating and tested for two-hybrid phenotypes. Color coding is as follows. Dark gray squares: selfactivation (SA) levels that are too high for two-hybrid screens (SA occurs from the ability of a DB-bait protein to up-regulate two-hybrid reporter gene expression in the absence of any AD interactor); light gray squares: intermediate SA levels which are compatible with two-hybrid screening using higher concentrations of 3-aminotriazole (3AT) (17); blue squares: interactions previously reported either in C. elegans or in other model organisms (potential interologs, Fig. 3A) and undetected in either the DB-X/AD-Y or the AD-X/DB-Y orientation (false negatives); red squares: interactions previously reported and detected in the Matrix assay; pink squares: interactions previously reported and detected in the Matrix assay in the opposite configuration only; orange square: interaction not found in the Matrix but uncovered in the screens described in Fig. 2B; yellow squares: novel potential interactions between the products of vORFs. (B) Exhaustive two-hybrid screens using 27 DB-vORFs as baits. DB-vORF baits were tested for SA on plates containing different concentrations of 3AT. SA ranged between levels for which concentrations of 100 mM 3AT were not sufficient to prevent growth (####) to levels only detectable on X-Gal (#). Different concentrations of 3AT were used (“3AT”) depending on the level of selfactivation of the corresponding bait. After transformation of the AD-wrmcDNA library (15), yeast colonies expressing potential interactions were selected on appropriate 3AT plates. The number of colonies screened varied between 0.8 and 4.2x106 (# Colonies). Other abbreviations are as follows: (3AT)R: number of 3AT resistant colonies; (3AT)R*: number of 3ATR clones that exhibited at least one additional two-hybrid phenotype [growth on plates lacking uracil or no growth on plates containing 5-Fluoorotic acid (5-FOA), or expression of β-Galactosidase activity on X-Gal-containing plates (17)]; ISTs (interaction sequence tags): number of distinct genes isolated as potential interaction partners (inserts of the AD-Y interacting clones were amplified by PCR directly from yeast colonies and subsequently sequenced using an ABI protocol). (C) ISTs in ACeDB. For each of the vORF that corresponds to a bait screened in this project, a window can be opened to retrieve ISTs. An example is shown for DB-LIN-53(RbAp48). Genomic and functional information on each IST can be retrieved. In the example shown, 13 independent cDNAs were selected for the LIN-53(RbAp48)/LIN-37 IST. The intron-exon junctions are shown in a format similar to ESTs currently available in ACeDB. If an interactor has itself been used as a bait, for example LIN-37 (thin, vertical red box in the center of the screen), a new window can be opened and the process reiterated.

Second, a more extensive protein interaction map was generated by exhaustive two-hybrid screens (20) of an AD-Y worm cDNA library with 27 DB-vORFs (two DB-vORFs were removed from the assay because they strongly activate reporter-gene expression in the absence of any interacting protein) (Fig. 2B). These two-hybrid selections identified 992 AD-Y encoding sequences that were subsequently amplified by PCR directly from yeast colonies and sequenced to generate “interaction sequence tags” (ISTs). Such ISTs corresponded to a total of 148 interactions involving 124 different potential interactors, of which 15 have been previously identified genetically and 109 were predicted from the C. elegans genome sequence. The number of ISTs identified for each DB-vORF bait varied between 0 and 29 (Fig. 2B). We systematically verified that the interacting sequence expressed in frame with AD corresponded to ORFs predicted by the genome sequencing project, rather than out-of-frame sequences encoding short irrelevant peptides (6).

To make the IST data publicly available, we took advantage of the ACeDB (a C . elegans database) database management system (21). ACeDB stores the genetic and physical maps and the nearly complete genome sequence along with its predicted ORFs. In addition, for many of the predicted ORFs, expressed sequence tags (ESTs) are available. The IST information was introduced into a local version of ACeDB (22) with the goal of connecting ORFs on the basis of a functional parameter rather than a genetic or physical link. Through ISTs, several ORFs are now linked by virtue of the ability of their products to interact in the context of a yeast two-hybrid assay. The information can be found by querying ACeDB for vORFs (Fig. 2C). In addition, we have introduced the IST data on a Web page (22) with hyperlinks to “WormPD” (23). WormPD, a recently released database, outlines published functional information on C. elegans genes in a standard format similar to that of YPD, a database of yeast functional annotation. In the future, the IST hyperlinks to WormPD should allow the integration of the protein interaction map with other worm functional genomics projects [for example, (24)].

Because the two-hybrid system is an artificial assay, the IST data should first be integrated with other information to evaluate the likelihood of biological relevance of each potential interaction. This in turn should allow the formulation of meaningful hypotheses. Therefore, we classified the potential interactions according to two-hybrid criteria and/or known biological information. We explored the possibility that the knowledge of interactions conserved in other organisms might represent useful biological information (X/Y conserved interactions are referred to here as worm “interologs” of X′/Y′ interactions in other species if X′ and Y′ are orthologs of X and Y, respectively). Hence, for each partner of worm DB-X/AD-Y potential interactions, we performed BLAST searches to identify X′ and Y′ orthologs and concentrated on those that have been reported to interact in other species. The first class observed consisted of interactions previously reported both in C. elegans and in at least one other model organism (Fig. 3A, class I). For example, the LET-60(Ras)/SUR-8 interaction is an interolog of the human Ras/hsSUR-8 interaction (25). The second class represents novel potential interologs. These interactions have been shown in other model organisms but have not been reported previously inC. elegans (Fig. 3A, class II). We propose that such potential interologs point to new hypotheses of function for the corresponding C. elegans proteins. For example, the SEL-10(Cdc4p)/SKP-1(Skp1p) potential interaction is a probable interolog of the yeast Cdc4p/Skp1p interaction (26). Because Cdc4p/Skp1p plays a role in protein degradation in yeast and SEL-10 interacts physically with LIN-12(Notch) and SEL-12(Presenilin) (Fig. 2A), it is possible that worm SKP-1 is involved in the degradation of components of the Notch pathway. Similarly, in the Ras pathway, LET-60(Ras)/F28B4.2(RalGDS) is a potential interolog of the human Ras/RalGDS interaction (27), suggesting that F28B4.2(RalGDS) might modulate LET-60(Ras) activity in C. elegans. Finally, LIN-53(RbAp48)/EGR-1[Metastasis Associated Protein (MTA1)] (28) might be an interolog of a human interaction because human MTA1 was found in the NURD complex that also contains RbAp48 (29).

Figure 3

Classification and validation of potential interactions. (A) Interologs. Class I: known interologs. Class II: potential interologs. (B) IST clustering. Clusters are shown for several synMuv gene products and their potential interactors. Gene products previously characterized are indicated by circles. Shaded circles indicate that loss-of-function mutations confer similar phenotypes. Adjacent circles indicate that physical interactions have been demonstrated biochemically. Arrows point directionally from the baits to potential interactors. (C) Correlation between loss of function and loss of interaction for the LIN-53(RbAp48)/LIN-37 potential interaction. The missense allele lin-53(n833), which contains a single amino acid substitution (L292F), was cloned by RC into an Entry vector and subsequently recombined into the pDB-dest vector. After transformation into MaV103, binding to two LIN-53(RbAp48) interactors was tested by scoring for growth on a selective plate. Negative controls include DB and AD-LIN-53(RbAp48). Sc-L-T: synthetic complete medium lacking tryptophan and leucine (permissive plate), Sc+3AT20: synthetic complete medium lacking tryptophan, leucine, and histidine and containing 20mM 3AT (selective plate). The four yeast patches at the bottom of each panel are controls for growth conditions. From left to right: 1st patch is a negative control (DB/AD), 2nd patch is a weak positive control for interaction (DB-pRB/AD-E2F1), 3rd patch is a strong positive control for interaction (DB-Fos/AD-Jun), 4th patch is the Gal4p positive control (DB-AD/AD) (17).

Although potentially powerful, applying the interolog concept alone in estimating the significance of potential interactions precludes the finding of novel connections previously unidentified in other model organisms. As an alternative method to classify the IST information, we used a systematic clustering analysis (Fig. 3B). This approach is modeled after a method developed by Lipman and colleagues (30). We established contiguous connections between vORF- encoded proteins as follows: X interacts with Y, which interacts with Z, which interacts with W, and so on (X/Y/Z/W/…). Because two-hybrid screens are not random, we reasoned that clusters formed by contiguous connections that form closed loops (such as X/Y/Z/W/X) might increase the likelihood of biological relevance for the corresponding potential interactions. Such two-hybrid clusters have been identified for known proteins in both macromolecular complexes and signal transduction pathways (7, 31). After removal of promiscuous interactors (32), we searched the ISTs of each vORF bait (X) for potential interactors that are identical to another bait (Y), or an interactor of another bait (Y′). For each hit, we then searched the IST list of Y or Y′ and reiterated the process for several (n) cycles until the starting bait (X) was recovered, leading to clusters of contiguous connections in closed loops (for example, X/Y/Z/…n…/X).

The clustering analysis was most informative in the case of ISTs identified for synMuv gene products (Fig. 3B). A cluster centered around LIN-35(pRB) has already been reported [LIN-35(pRB)/LIN-53(RbAp48)/HDA-1 (HDAC)/LIN-35(pRB)] (12). We identified several additional synMuv IST clusters: for example, LIN-53(RbAp48)/LIN-37/F10G8.8/LIN-36/EGR-1(MTA1)/LIN-53(RbAp48), and LIN-36/T05E7.5/LIN-15A/Y54E2A.3/LIN-36 (Fig. 3B). Many ISTs belong to more than one cluster, and thus, most clusters could be organized as overlapping sets: for example, LIN-53(RbAp48)/LIN-37/F10G8.8/LIN-36/EGR-1(MTA1)/LIN-53(RbAp48), and LIN-37/F10G8.8/LIN-36/Y54E2A.3/LIN-37. It is also possible that a higher likelihood of biological significance can be assigned to ISTs that correspond to interologs and belong to clusters. For example, the potential DB-LIN-53(RbAp48)/EGR-1(MTA1) interaction is both a potential interolog of a human interaction in the NURD complex (Fig. 3A) and a partner in IST clusters related to LIN-35(pRB) and HDA-1(HDAC) (Fig. 3B).

Hypotheses resulting from protein interaction maps are most useful in the context of other functional data. At this point, such information is not available on a genome-wide scale. However, in addition to LIN-35(pRB), LIN-53(RbAp48), and HDA-1(HDAC) such data are available for LIN-15A, LIN-36, and LIN-37, which are involved in the IST clusters described above (12, 13, 33). Most importantly, loss-of-function mutations in lin-36,lin-37, lin-53, and lin-35 confer identical phenotypes in a synMuv class A background. In addition, the LIN-36 and LIN-37 gene products have been detected in the nuclei of vulval precursor cells (VPCs) at developmental stages consistent with a role in vulval development. This expression pattern overlaps with that of LIN-35(pRB) and LIN-53(RbAp48). Finally, although previous mosaic analysis suggested that LIN-15A functions in hyp7, this analysis did not exclude the possibility that this protein also acts in VPC cells (34). Thus, synMuv IST clusters, together with the existence of potential interologs and available functional data, suggest that LIN-37, LIN-36, EGR-1(MTA1), and LIN-15A may interact physically and perhaps belong to a single complex. Nevertheless, it is important to note that subsets of potential interactions might take place in different cells or at different times.

A powerful approach to directly test the biological relevance of potential interactions is to correlate loss-of-function mutations with loss of interaction. Single amino acid changes that confer a synMuv phenotype in vivo without grossly affecting the structure of the corresponding protein may provide useful tools to address whether loss-of-function correlates with loss of interaction. We initiated such analysis with LIN-53(RbAp48)/LIN-37 since the likelihood of biological relevance for this interaction is relatively high (35). The previously described lin-53(n833) allele is an excellent candidate for this approach because it confers a dominant negative phenotype and contains a single amino acid change (L292F) (12). The lin-53(n833) ORF was cloned by RC to test the ability of its encoded protein to interact with the potential partners of wild-type LIN-53(RbAp48), particularly LIN-37 (Fig. 3C). The interactions of DB-LIN-53(n833) with AD-M03C11.4(HAT) (Fig. 3C) and AD-EGR-1(MTA1) (36) were readily detected in the two-hybrid assay, suggesting that the structure and expression levels of the LIN-53 mutant protein are not grossly affected. Furthermore, LIN-53(n833) can also bind to LIN-35(pRB) in an in vitro assay (12). However, no interaction of DB-LIN-53(n833) with AD-LIN-37 was detected in the two-hybrid assay (Fig. 3C). Thus, it is tempting to speculate that LIN-53(n833) fails to interact with LIN-37 in vivo while retaining the ability to interact with other partners such as LIN-35(pRB). This might explain the dominant negative nature oflin-53(n833). Taken together, these observations suggest that the LIN-53(RbAp48)/LIN-37 interaction is important for synMuv function in vivo.

We have addressed the feasibility of generating a genome-wide protein interaction map for C. elegans, the first animal model for which a complete genome sequence is available. According to our current throughput, we estimate that the scale of aC. elegans protein interaction mapping project should be on the order of one-tenth that of the genome sequencing project. We are currently generating a nearly complete set of cloned C. elegans ORFs using an automated version of RC (15). This should be useful for screening DB-ORFs against near complete AD-ORF arrays (Fig. 2A). By eliminating the need for sequencing ISTs, such arrays will increase the throughput substantially. We show that such genome-wide protein interaction maps can be interpreted at the biological level. In our hands, the two-hybrid method detected approximately 50% of reported interactions, which should allow a useful coverage of biologically important interactions. Alternative functional genomics projects based on other standardized assays will be useful to identify additional interactions. The inherent versatility of RC (14) should be of great value for such alternative approaches. On the other hand, although it is difficult to estimate the rate of false positives until more interpretation of the current IST data is performed, the data shown here suggest that false positives will not preclude the identification of relevant interactions. For example, three synMuv gene products (LIN-36, LIN-37, and LIN-15A), which had not been previously assigned to any particular step in the pathway, are now linked to each other and to LIN-35(pRB), LIN-53(RbAp48), and HDA-1(HDAC) by potential interactions. Perhaps classifications such as interologs and IST clustering will be useful for the interpretation of potential interactions. Importantly, this classification system is amenable to computation and thus should be applicable on a genome-wide scale. We also propose that relatively high-throughput genetic approaches can be used to directly test hypotheses resulting from IST information. For example, we show that a functionally defective allele of LIN-53 is specifically affected in its ability to bind LIN-37, a potential interactor of wild-type LIN-53. Finally, it is tempting to speculate that potential interactions identified here may help in understanding the molecular mechanisms involved in human tumorigenesis. Particularly, it is possible that human orthologs of LIN-36 and LIN-37 act in a pRB repressor complex and conversely, EGR-1, a worm MTA1 ortholog, might genetically interact with LIN-35(pRB).

  • * Present address: Dana Farber Cancer Institute, Department of Genetics, Harvard Medical School, Boston, MA 02115, U.S.A.

  • Present address: Department of Anatomy, University of California San Francisco, San Francisco, CA 94143, USA.

  • To whom correspondence should be addressed. E-mail: Marc_Vidal{at}DFCI.Harvard.edu

REFERENCES AND NOTES

View Abstract

Navigate This Article