Report

Relating Three-Dimensional Structures to Protein Networks Provides Evolutionary Insights

See allHide authors and affiliations

Science  22 Dec 2006:
Vol. 314, Issue 5807, pp. 1938-1941
DOI: 10.1126/science.1136174

Abstract

Most studies of protein networks operate on a high level of abstraction, neglecting structural and chemical aspects of each interaction. Here, we characterize interactions by using atomic-resolution information from three-dimensional protein structures. We find that some previously recognized relationships between network topology and genomic features (e.g., hubs tending to be essential proteins) are actually more reflective of a structural quantity, the number of distinct binding interfaces. Subdividing hubs with respect to this quantity provides insight into their evolutionary rate and indicates that additional mechanisms of network growth are active in evolution (beyond effective preferential attachment through gene duplication).

Protein interaction networks are principal components of a systems-level description of the cell (14). Many previous studies have explored global aspects of network topology, clearly linking it to protein function, expression dynamics, and other genomic features (59). In particular, a protein's degree (number of interaction partners) is an important factor, and proteins with high degree (hubs) have been found to be essential (3, 7). However, most network studies have not considered the structural and chemical aspects of interactions; only recently have there been proposals to use structural information for systems biology (10). One specific problem with the current treatment is that protein interaction networks do not differentiate between many types of relationships—e.g., high-affinity and direct versus loose and transient. Sometimes, in fact, interactions are reported that connect two proteins that never touch each other physically but are only linked through a third protein (11, 12).

Here we address this problem by combining structural modeling with network analysis. In particular, we compiled a consensus yeast interaction network from various sources (13), filtering out low-confidence interactions by using statistical methodologies (4). We then annotated many of the edges in this network structurally on the basis of sequence similarity to known complexes (Fig. 1). We used simple three-dimensional (3D)–structural exclusion to distinguish the interfaces of each interaction. Consider two or more proteins interacting with a common partner protein. If they use the same interface on the partner (as known from the structures), the interactions are classified as mutually exclusive. Conversely, if they use different interfaces, the interactions are simultaneously possible (Fig. 1). The network resulting from this analysis (the structural interaction network, SIN) contains 873 nodes (proteins) and 1269 edges (interactions), 438 of which are mutually exclusive (fig. S1). It contains parts of 147 complexes, suggesting that it covers a representative range of interactions. [For the SIN data set and further discussion, see SOM Text, tables S1 and S3, and (13)]

Fig. 1.

The creation of the structural interaction network (SIN) data set. All interactions from the filtered protein interaction data set are mapped to Pfam domains (30). The Pfam domains are mapped to known structures of protein interactions by means of iPfam (31). Only those interactions in which both interaction partners (or a homologous domain of either) can be found in a 3D structure of a protein complex are kept. All interactions are then classified into mutually exclusive and simultaneously possible by 3D structural exclusion. When a protein has more than one simultaneously possible interaction, the number of interaction interfaces is counted.

After building the SIN, we examined its two different kinds of interactions with respect to the properties of the linked proteins. As shown in Table 1, proteins connected by simultaneously possible interactions are more likely to share the same function than are those connected by mutually exclusive ones [in terms of Gene Ontology (GO) cellular component, molecular function, and biological process designations]; also, they are more likely to be expressed at the same time. Consequently, we expect most of the mutually exclusive interactions to be temporary or transient, because they cannot occur at the same time. Likewise, the simultaneously possible interactions are enriched in permanent associations, connecting members of the same complex (table S4).

Table 1.

Differences of simultaneously possible versus mutually exclusive interactions with respect to GO annotations (shared by the interacting proteins) and coexpression correlation coefficients. GO biological process, molecular function, and cellular component are taken from SGD lite (26) and coexpression correlation from the compendium expression data set (27). All differences are significant, with P ≪ 0.01.

Simultaneously possible interactionsMutually exclusive interactions
Fraction with same biological process 14% 24%
Fraction with same molecular function 18% 33%
Fraction with same cellular component 12% 27%
Coexpression correlation 0.17 0.23

Turning to global statistics of the SIN, we find that it has a degree distribution with a notably shorter tail than either the complete yeast interactome or a core, filtered subset of this (fig. S2). In particular, hubs in the SIN have a maximum of 14 interaction partners. This is similar to the number of close-packed neighbors in crystal lattices (12 in hexagonal packing) and reflects the direct, physical constraints on interactions in the SIN. In contrast, in early yeast interactomes some hubs had >200 interaction partners, and even in newer data sets, >30 partners are noted for some proteins (7, 14).

Within hubs in the SIN, we compared those with many physical interfaces (as detected by our approach) to those with a few—to uncouple degree from interface number (which are correlated). We defined hubs by setting an arbitrary cutoff of ≥5 interaction partners; variations in this cutoff did not affect our results (table S5). We detected differences in the properties between multi- and single-interface hubs. However, more statistically significant differences were evident if we distinguished between hubs with one or two interfaces (singlish-interface) and those with more than two interfaces (multi-interface). First, we examined the essentiality of both kinds of hubs. Although hubs in general are more likely than other proteins to be essential for cellular viability (3), as shown in Table 2, multi-interface hubs are twice as likely to be essential as singlish-interface ones, which, in turn, are no more likely to be essential than the average protein in the SIN. This result suggests the notion of hubs having a higher essentiality due to their network centrality is somewhat incomplete: It is the number of interaction interfaces that leads to higher essentiality.

Table 2.

Correlation of genomic features with singlish and multi-interface hubs. The fraction of proteins that are products of essential genes (28), the average expression correlation with their neighbors (27), and the evolutionary rate [dN/dS ratio, from (29)] was calculated for the entire proteome, the entire SIN, singlish-interface protein hubs, and multi-interface protein hubs. The P-values of the differences between the whole data set and the singlish interface hubs (all-singlish) and the singlish and multi-interface hubs (singlish-multi) were calculated with the Wilcoxon rank-sum test (see Methods in the SOM).

Entire proteomeAll in data setP-value (all-singlish)Singlish-interface hubs onlyP-value (singlish-multi)Multi-interface hubs only
Protein essentiality 18.6% 32.3% 0.9 31.8% <0.01 64.9%
Expression correlation 0.20 0.3 0.17 <0.05 0.25
Evolutionary rate 0.077 0.047 0.5 0.051 <0.01 0.029

Furthermore, Table 2 shows that multi-interface hubs are more likely to be coexpressed with their neighbors than are singlish-interface ones. This provides a straightforward structural explanation for the existence of two types of expression dynamics for hubs (10), date and party hubs (7). In particular, singlish-interface hubs seem to correspond to date hubs (which are expressed at different times than their interaction partners; table S7), and multi-interface hubs correspond to party hubs (which are expressed at the same times as their interaction partners). It is quite reasonable that the interaction partners of singlish-interface hubs are not coexpressed, because they would compete for the same binding interface. On the other hand, for the partners of multi-interface hubs, it makes sense to be expressed simultaneously, because they bind to different interfaces. Multi-interface hubs, in fact, correspond to central members of protein complexes, as is evident from cross-referencing them with known complexes (table S8). A representative multi-interface hub, for example, is Arp2p, a member of the Arp2/3 complex. Conversely, a good example for a singlish-interface hub is Snf1p, a central protein kinase (see SOM Text and table S6).

There has been some controversy over whether hubs are slower-evolving than other proteins (1518). A commonly used measure of evolutionary rate is the dN/dS ratio (the ratio of nonsynonymous to synonymous substitutions, also referred to as Ka/Ks ratio). Table 2 shows that it is significantly lower for multi-interface hubs than for the average SIN protein, but not so for singlish-interface hubs. Although the dependence of evolutionary rate on protein degree has been attributed to an underlying effect of expression level (18), we find that the relationship of evolutionary rate to the number of interfaces is independent of expression level (whereas that to the degree is not) (fig. S4). The aforementioned controversy may have arisen because previous studies did not differentiate between singlish and multi-interface hubs. A larger number of interfaces may give rise to a lower evolutionary rate because a larger fraction of residues participate in interactions. Indeed, Fig. 2 shows that the variation in a protein's evolutionary rate can be accounted for better by changes in the fraction of its accessible surface area involved in interactions than by degree. This result can be explained simply from a structural point of view: The average mutational rate for exposed surface residues is more than twice as high as for those at an interface, which, in turn, is slightly higher than the one for buried residues (table S9) (19). Thus, as suggested previously (20) and shown in our analysis, the proportion of a protein's available surface area involved in interactions should correlate inversely with evolutionary rate.

Fig. 2.

Dependence of the average evolutionary rate (dN/dS ratio) of a protein with the degree and the interacting accessible surface area (adjusted by protein size, as estimated from molecular weight). For the degree correlation coefficient, we get r2 = 0.05, and for the adjusted interface surface area, r2 = 0.12, suggesting that more than twice as much of the variation in dN/dS is accounted for by adjusted interface surface area (12%) than by the degree (5%).

Finally, we examined network evolution from a structural perspective. The existing scale-free network topology (the dominance of hubs) may have evolved through preferential attachment (21). Gene duplication is one possible cause of such an evolutionary process (22, 23), but other factors, such as preferential rewiring, could contribute as well (23, 24). As depicted in Fig. 3, if a hub evolves by duplication, its interaction partners are expected to be enriched in paralogs (products of homologous genes originating from within-genome duplications). As expected (25), we found that two proteins are significantly more likely to be paralogs if they share a common partner (Table 3). However, this is true only if they also share an interaction interface. We did not find enrichment for paralogs among interaction partners binding to different interfaces (Table 3). That is, our analysis is consistent with the evolution of singlish-interface hubs through duplication-mutation, whereas it does not support such an evolution of multi-interface hubs (Fig. 3). Because multi-interface hubs are often parts of larger protein complexes, it appears that protein-complex evolution could follow a different mechanism.

Fig. 3.

The concept of network evolution by gene duplication. A given protein may acquire a new interaction by duplication of an existing one. Given equal likelihood of any gene to be duplicated, a protein with many partners is more likely to get a new partner than one with few—hence, there is effective preferential attachment. For singlish-interface hubs, this mechanism is straightforward. However, for multi-interface hubs, it would then require coevolution of the hub and the duplicated gene to form a new interface.

Table 3.

Fraction of protein pairs that are paralogs of each other. Random pair: randomly chosen protein pair from our data set (average); Same partner: fraction of pairs with the same interaction partner that are paralogs; Same partner, same interface: Fraction of pairs that bind to the same interface that are paralogs; Same partner, different interface: fraction of pairs with the same interaction partner, but different interacting interface that are paralogs (calculated from the platinum standard set only; see Methods in the SOM).

Random pairSame partnerSame partner, same interfaceSame partner, different interface
Fraction paralogs 0.23% 4.10% 8.10% 0.00%

From our 3D structural analysis of protein interaction networks, we find that we can distinguish two fundamentally different types of network edges. On the one hand, we find a group of interactions that are simultaneously possible and a set of multi-interface hubs associated with these. Multi-interface hubs correspond, in many respects, to our “classic” notion of network hubs. They are more likely to be essential and more conserved. They are most likely members of large and stable complexes. However, they do not follow canonical models of network evolution, growing through gene duplication. On the other hand, we find a second class of interactions, mutually exclusive ones, which have a transient character and occur in singlish-interface hubs. Singlish-interface hubs are distinctly “nonclassical”: They are neither likely to be essential, nor conserved. However, in respect to network growth they do follow the canonical preferential gene duplication model.

Supporting Online Material

www.sciencemag.org/cgi/content/full/314/5807/1938/DC1

Materials and Methods

SOM Text

Figs. S1 to S4

Tables S1 to S9

References

References and Notes

View Abstract

Navigate This Article