Report

Evolutionary Rate in the Protein Interaction Network

See allHide authors and affiliations

Science  26 Apr 2002:
Vol. 296, Issue 5568, pp. 750-752
DOI: 10.1126/science.1068696

Abstract

High-throughput screens have begun to reveal the protein interaction network that underpins most cellular functions in the yeastSaccharomyces cerevisiae. How the organization of this network affects the evolution of the proteins that compose it is a fundamental question in molecular evolution. We show that the connectivity of well-conserved proteins in the network is negatively correlated with their rate of evolution. Proteins with more interactors evolve more slowly not because they are more important to the organism, but because a greater proportion of the protein is directly involved in its function. At sites important for interaction between proteins, evolutionary changes may occur largely by coevolution, in which substitutions in one protein result in selection pressure for reciprocal changes in interacting partners. We confirm one predicted outcome of this process—namely, that interacting proteins evolve at similar rates.

A protein's rate of evolution is thought to depend both on its dispensability to the organism and on the proportion of potential amino acid changes that are compatible with proper protein function (1). We recently analyzed functional genomic data (2) in conjunction with genomic comparisons (3) to confirm and further characterize the relation between protein dispensability and evolutionary rate (4). Here we apply a similar approach to investigate how protein function constrains evolution. Early studies of the structure and function of individual proteins suggested that, because molecular interactions require precisely specified structures, they impose constraints on sequence evolution (5, 6). Recent advances in the rapid detection of protein-protein interactions (7–9), as well as in the sequencing of complete genomes, allow us to expand the scale on which the evolutionary effects of molecular interactions are investigated and shift from a focus on individual proteins to a broad survey of the proteome and characterization of the general relation between protein interaction and evolution.

We compiled a list of 3541 interactions between 2445 different yeast proteins (10). To estimate the evolutionary rates of these proteins, we compared putatively orthologous sequences betweenSaccharomyces cerevisiae and the nematodeCaenorhabditis elegans (11). A subclass of putative orthologs, which we called “well-conserved orthologs,” exhibited >50% amino acid identity over aligned regions; 1531 sequence pairs met our criteria for putative orthologs, and 309 of these were in the well-conserved class. For each pair of orthologs, we estimated the evolutionary distance (K) that separates the two sequences, where K is defined as the number of substitutions per amino acid site that have taken place since the fungi-animal split (12). There were 164 yeast proteins for which we had both an estimate of the number of interactors and a well-conserved ortholog in the nematode. Among these proteins, there is a significant negative correlation between each protein's number of interactors I and protein evolutionary rate, as estimated by distance K [Fig. 1; linear regression: K = −0.0175I + 0.8995, Pearson's rIK = −0.24, P = 0.002; Spearman's rank correlation rIK = −0.21, P = 0.007 (13)]. We have corroborated this relation between protein interaction and rate of evolution with data from two recent studies (14,15) that were not considered in our initial compilation of protein interactions [supplemental fig. 1 (16)].

Figure 1

The relation between the number of protein-protein interactions (I) in which a yeast protein participates and that protein's evolutionary rate, as estimated by the evolutionary distance (K) to the protein's well-conserved ortholog in the nematode C. elegans.

Interactions could reduce evolutionary rate in two distinct ways (Fig. 2). First, if different interactions depend on different sites, proteins with more interactors could evolve more slowly because a greater proportion of the protein is involved in protein functions (Fig. 2, arrow a). Alternatively, if proteins with many interactors have a greater effect on organism fitness, they could evolve more slowly, not because a greater proportion of the sequence is required for proper function, but because the entire sequence is subject to stronger selection against slightly deleterious mutations (4). Under this hypothesis, the correlation shown inFig. 1 emerges because a protein's number of interactors is correlated with its effect on organism fitness, which in turn affects rate of evolution (Fig. 2, arrows b and c). To determine which of these two hypotheses provides a more likely explanation for the correlation between number of interactors and evolutionary rate, we analyzed our data on interactions and evolutionary rate in conjunction with results from genetic footprinting (17) and parallel analysis (2), high-throughput methods for estimating the growth rates of yeast strains in which a single gene has been disrupted or deleted. As expected in view of the recent demonstration that highly interactive proteins are more likely to be required for viability (18), we found that a protein's fitness effect F, estimated as the reduction in relative growth rate due to deleting or disrupting the gene that encodes the protein, is positively correlated with that protein's number of interactors I; with fitness effects measured by parallel analysis for 2235 proteins for which interaction data were available, rIF = 0.15,P = 3.4 × 10−13. In addition, among all putative orthologs, evolutionary rate is negatively correlated with fitness effect (4); with parallel analysis data for 1484 yeast proteins with putative orthologs, rFK = −0.13, P = 4.3 × 10−7(19). Thus, among all putative orthologs, both correlations required by our second hypothesis are present: Number of interactors is correlated with fitness effect (Fig. 2, arrow b), which is correlated with evolutionary rate (Fig. 2, arrow c).

Figure 2

The causal model for alternative hypotheses to explain the correlation between number of interactors and evolutionary rate. One hypothesis, represented by arrow a, is that protein interactions impose structural constraints, which limit the number of substitutions that are compatible with proper protein function. A second hypothesis, represented by arrows b and c, is that proteins with more interactions have a greater effect on organism fitness and are therefore subject to stronger purifying selection. The second hypothesis can be rejected because the effect of protein interactions on evolutionary rate is not mediated by protein fitness effect.

However, when we consider only well-conserved orthologs, for which the correlation between protein interaction and evolutionary rate is strongest (Fig. 1), no relation between fitness effect and evolutionary rate (Fig. 2, arrow c) is detected. Therefore, protein fitness effect is very unlikely to mediate the correlation between protein interaction and evolutionary rate. We can confirm this conclusion statistically by using parametric (20) and nonparametric (21) partial correlation to estimate the correlation between number of interactors and evolutionary rate while fitness effect is held constant. The parametric path coefficient (pIK = −0.25, P = 0.001) and nonparametric partial measure of association (Kendall's partial τIK = −0.15, P = 0.002) indicate a significant correlation between number of interactions and evolutionary rate that does not depend on overall protein fitness effect.

Protein sites may be involved in interactions directly, through participation in intermolecular contacts; or indirectly, through effects on overall protein conformation. In either category of sites, substitutions would be likely to perturb proper interaction and would often be removed by selection. However, removal might not occur if a substitution in one protein were followed by a complementary change in its interacting partner. In this case, the pair of substitutions might be fixed by drift or positive selection (22). If such coevolution is indeed an important mode of change in proteins constrained by interactions, then interacting proteins should evolve at similar rates. We tested this prediction by examining all 411 protein interactions in which each protein had a putative ortholog in C. elegans and showed no significant sequence similarity with its interacting partner. For each interaction, we calculated ΔK, the difference between the evolutionary distances separating the yeast proteins from their respective orthologs in the nematode. We then averaged these differences across all 411 interactions to find the mean difference in evolutionary rate between interacting proteins, Δ* = 1.3 substitutions per site. To assess the significance of this difference, we repeatedly permuted our list of 411 interactions into random protein pairs and calculated the mean difference in evolutionary rate between arbitrarily paired proteins: 10,000 permutations yielded the distribution of Δ values shown in Fig. 3A. In all but 44 of the 10,000 permutations, our observed Δ* < Δ, indicating that interacting proteins evolve at rates significantly closer than is expected to occur by chance (P = 0.0044).

Figure 3

Interacting proteins have similar fitness effects, but this cannot explain the similarity in their rates of evolution. (A) The distribution of mean difference in evolutionary rate (Δ) between yeast proteins randomly chosen from the list of all 411 interacting protein pairs in which both members had an ortholog in C. elegans. The mean difference in evolutionary rate between proteins that interact (Δ* = 1.3 substitutions per site) is indicated by an arrow. (B) The distribution of mean difference in fitness effect (Δ) between yeast proteins randomly chosen from the list of all 2821 interactions in which the effect on growth rate of deleting each protein was estimated by parallel analysis (2). The mean difference in fitness effect between proteins that interact (Δ* = 0.41) is indicated by an arrow. (C) The causal model for path analysis to determine whether similarity in fitness effects between interacting proteins explains the similarity in their evolutionary rates. The correlation between evolutionary rates of interacting proteins that is expected to result from observed correlations between fitness effects (rF 1 F2), and between fitness effect and evolutionary rate (pFK ), can be estimated as K1 K2 ∼ (pFK )2 r F1 F2. The observed correlation between evolutionary rates is much larger than that expected to result from fitness effects (r K1 K2 >>r̂ K1 K2), indicating that one or more additional factors must contribute to the similarity of evolutionary rates of interacting proteins. (Observed correlation coefficients, including essential proteins:r F1 F2= 0.16, P < 10−15;pFK = −0.06, P = 0.02;rK 1 K2= 0.11, P = 0.03. Excluding essential proteins:rF 1 F2= 0.07, P = 0.01; pFK = −0.14, P = 2 × 10−5.)

Although coevolution provides an appealing explanation for the similarity in the evolutionary rates of interacting proteins, alternative hypotheses must be considered. The proteins in an interacting pair presumably act in the same functional pathway and therefore are likely to have similar effects on organism fitness. Because the dispensability of a protein influences its rate of evolution (4), the similarity in the evolutionary rates of interacting proteins could be a consequence of similarity in their fitness effects. Our test of this hypothesis involved two steps.

First, we tested whether proteins that interact do indeed have similar effects on organism fitness. A randomization test showed that the mean difference in fitness effects between interacting proteins, Δ* = 0.41, was significantly smaller than the mean difference between arbitrarily paired proteins Δ(P < 10−5) (Fig. 3B). Thus, interacting proteins do have similar effects on organism fitness.

Second, we determined whether the observed similarity in fitness effects of interacting proteins was sufficient to explain the similarity in their rates of evolution. Path analysis based on the causal model shown in Fig. 3C indicated that the correlation between the fitness effects of interacting proteins contributes only slightly to the correlation between their evolutionary rates. Thus, similarity in fitness effects is not sufficient to explain the observed similarity in the evolutionary rates of interacting proteins.

We also considered two other alternatives to the coevolutionary hypothesis. First, interacting proteins might evolve at similar rates simply because they have similar numbers of interactors, and, as shown in Fig. 1, the number of interactors influences the rate of evolution. However, we found that proteins that interact do not have similar numbers of interactors (rI 1 I2= 0.02, P = 0.26). A second possibility is that interacting proteins evolve at similar rates because they exhibit structural homology and therefore have similar distributions of constrained sites. The most likely origin of structural homology between interacting proteins is duplication of the gene that encodes a homodimeric protein, followed by evolution of one copy of the gene. This process would result in homology not only between the structures, but also between the sequences, of interacting proteins. Hence, we have ensured that none of the interactions in our data set occur between proteins that exhibit detectable sequence similarity. Thus, to account for the similarity in evolutionary rates that we observe, structural similarity would have to be independent of sequence, which would be difficult to explain evolutionarily. In sum, having considered a number of alternative hypotheses, we conclude that the coevolution of interacting proteins may be largely responsible for the observed similarity in their rates of evolution.

Beyond describing the relation between a protein's interactions and its rate of evolution, the correlations presented here could find application in the rapid assessment of functional genomic data. Much as gene expression levels have recently been used to assess protein-protein interaction data sets (23), the correlation between protein interaction and evolutionary rate may allow one to use simple genomic sequence comparisons to statistically assess the quality of large interaction data sets. More generally, correlations between protein interaction, fitness effect, and evolutionary rate may provide a means by which multiple bioinformatic data sets can be quickly cross-referenced to assess the reliability of any single method or data set.

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed. E-mail: hunter{at}ocf.berkeley.edu

REFERENCES AND NOTES

View Abstract

Navigate This Article