Research Article

Uncovering disease-disease relationships through the incomplete interactome

See allHide authors and affiliations

Science  20 Feb 2015:
Vol. 347, Issue 6224, 1257601
DOI: 10.1126/science.1257601

A network approach to finding disease modules

Shared genes represent a powerful but limited representation of the mechanistic relationship between two diseases. However, the analysis of protein-protein interactions has been hampered by the incompleteness of interactome maps. Menche et al. formulated the mathematical conditions needed to allow a disease module (a localized region of connections between disease-related proteins) to be observed. Only diseases with data coverage that exceeds a specific threshold have identifiable disease modules. The network-based distance between two disease modules revealed that disease pairs that are predicted to have overlapping modules had statistically significant molecular similarity. These similarities encompassed their protein components, gene expression, symptoms, and morbidity. Molecular-level links between diseases lacking shared disease genes could also be identified.

Science, this issue 10.1126/science.1257601

Structured Abstract


A disease is rarely a straightforward consequence of an abnormality in a single gene, but rather reflects the interplay of multiple molecular processes. The relationships among these processes are encoded in the interactome, a network that integrates all physical interactions within a cell, from protein-protein to regulatory protein–DNA and metabolic interactions. The documented propensity of disease-associated proteins to interact with each other suggests that they tend to cluster in the same neighborhood of the interactome, forming a disease module, a connected subgraph that contains all molecular determinants of a disease. The accurate identification of the corresponding disease module represents the first step toward a systematic understanding of the molecular mechanisms underlying a complex disease. Here, we present a network-based framework to identify the location of disease modules within the interactome and use the overlap between the modules to predict disease-disease relationships.


Despite impressive advances in high-throughput interactome mapping and disease gene identification, both the interactome and our knowledge of disease-associated genes remain incomplete. This incompleteness prompts us to ask to what extent the current data are sufficient to map out the disease modules, the first step toward an integrated approach toward human disease. To make progress, we must formulate mathematically the impact of network incompleteness on the identifiability of disease modules, quantifying the predictive power and the limitations of the current interactome.


Using the tools of network science, we show that we can only uncover disease modules for diseases whose number of associated genes exceeds a critical threshold determined by the network incompleteness. We find that disease proteins associated with 226 diseases are clustered in the same network neighborhood, displaying a statistically significant tendency to form identifiable disease modules. The higher the degree of agglomeration of the disease proteins within the interactome, the higher the biological and functional similarity of the corresponding genes. These findings indicate that many local neighborhoods of the interactome represent the observable part of the true, larger and denser disease modules.

If two disease modules overlap, local perturbations causing one disease can disrupt pathways of the other disease module as well, resulting in shared clinical and pathobiological characteristics. To test this hypothesis, we measure the network-based separation of each disease pair, observing a direct relation between the pathobiological similarity of diseases and their relative distance in the interactome. We find that disease pairs with overlapping disease modules display significant molecular similarity, elevated coexpression of their associated genes, and similar symptoms and high comorbidity. At the same time, non-overlapping disease pairs lack any detectable pathobiological relationships. The proposed network-based distance allows us to predict the pathobiological relationship even for diseases that do not share genes.


Despite its incompleteness, the interactome has reached sufficient coverage to allow the systematic investigation of disease mechanisms and to help uncover the molecular origins of the pathobiological relationships between diseases. The introduced network-based framework can be extended to address numerous questions at the forefront of network medicine, from interpreting genome-wide association study data to drug target identification and repurposing.

Diseases within the interactome.

The interactome collects all physical interactions between a cell’s molecular components. Proteins associated with the same disease form connected subgraphs, called disease modules, shown for multiple sclerosis (MS), peroxisomal disorders (PD), and rheumatoid arthritis (RA). Disease pairs with overlapping modules (MS and RA) have some phenotypic similarities and high comorbidity. Non-overlapping diseases, like MS and PD, lack detectable clinical relationships.


According to the disease module hypothesis, the cellular components associated with a disease segregate in the same neighborhood of the human interactome, the map of biologically relevant molecular interactions. Yet, given the incompleteness of the interactome and the limited knowledge of disease-associated genes, it is not obvious if the available data have sufficient coverage to map out modules associated with each disease. Here we derive mathematical conditions for the identifiability of disease modules and show that the network-based location of each disease module determines its pathobiological relationship to other diseases. For example, diseases with overlapping network modules show significant coexpression patterns, symptom similarity, and comorbidity, whereas diseases residing in separated network neighborhoods are phenotypically distinct. These tools represent an interactome-based platform to predict molecular commonalities between phenotypically related diseases, even if they do not share primary disease genes.

View Full Text

Stay Connected to Science