Network analytics in the age of big data

See allHide authors and affiliations

Science  08 Jul 2016:
Vol. 353, Issue 6295, pp. 123-124
DOI: 10.1126/science.aah3449

We live in a complex world of interconnected entities. In all areas of human endeavor, from biology to medicine, economics, and climate science, we are flooded with large-scale data sets. These data sets describe intricate real-world systems from different and complementary viewpoints, with entities being modeled as nodes and their connections as edges, comprising large networks. These networked data are a new and rich source of domain-specific information, but that information is currently largely hidden within the complicated wiring patterns. Deciphering these patterns is paramount, because computational analyses of large networks are often intractable, so that many questions we ask about the world cannot be answered exactly, even with unlimited computer power and time (1). Hence, the only hope is to answer these questions approximately (that is, heuristically) and prove how far the approximate answer is from the exact, unknown one, in the worst case. On page 163 of this issue, Benson et al. (2) take an important step in that direction by providing a scalable heuristic framework for grouping entities based on their wiring patterns and using the discovered patterns for revealing the higher-order organizational principles of several real-world networked systems.

To mine the wiring patterns of networked data and uncover the functional organization, it is not enough to consider only simple descriptors, such as the number of interactions that each entity (node) has with other entities (called node degree), because two networks can be identical in such simple descriptors, but have a very different connectivity structure (see the figure). Instead, Benson et al. use higher-order descriptors called graphlets (e.g., a triangle) that are based on small subnetworks obtained on a subset of nodes in the data that contain all interactions that appear in the data (3). They identify network regions rich in instances of a particular graphlet type, with few of the instances of the particular graphlet crossing the boundaries of the regions. If the graphlet type is specified in advance, the method can uncover the nodes interconnected by it, which enabled Benson et al. to group together 20 neurons in the nematode worm neuronal network that are known to control a particular type of movement. In this way, the method unifies the local wiring patterning with higher-order structural modularity imposed by it, uncovering higher-order functional regions in networked data.

Network structures

The four networks shown have exactly the same size (the same number of nodes and edges), and each node in each network has the same degree (the number of interactions with other nodes), but each network has a very different structure.


The importance of this result lies in its applicability to a broad range of networked data that we must understand to answer fundamental questions facing humanity today, from climate change and impacts of genetically modified organisms, to the environment (4), to food security, human migrations, economic and societal crises (3, 5), understanding diseases, aging, and personalizing medical treatments (613). For example, the cell is a complex system of interacting molecules, in which genes are transcribed into RNAs and translated into proteins, which adopt various three-dimensional structures to carry out particular cellular functions. Molecular interactions are captured by different high-throughput biotechnologies and modeled with different types of networks. Individual analyses of molecular networks have revealed that molecules involved in similar functions tend to group together in a network and are similarly wired (13), leading to better understanding of gene functions (6) and molecular organization of the cell (7) and to improved therapeutics (812).

However, each network type provides limited information about the phenomenon under study. For example, a disease is rarely the consequence of a single mutated gene, or of a single broken molecular interaction. Rather, it is the product of multiple perturbations of complex interactions within and across cells. Network medicine couples network analytics with data integration to mine the wealth of complementary data and reveal common molecular mechanisms between seemingly unrelated diseases (811). By contrast, patients with seemingly the same disease may have very different molecular mechanisms of disease and reactions to treatment (e.g., cancer heterogeneity) (811). Therefore, personalized medicine aims at delivering individualized therapies based on genetic and molecular profiles of individual patients that may involve repurposing of known drugs to different patient groups, hence helping to ease the pharmaceutical industry bottlenecks related to the cost and time required to develop new drugs (11, 12). Methods for network data analytics and integration will be fundamental to these nascent areas, as full understanding can only come from holistically mining all available genetic, molecular, and clinical data (11).

Holistic analyses of our interconnected world call for conceptual and methodological paradigm shifts. Rather than analyzing a single data source in isolation, such as aligning genetic sequences (which has already revolutionized our understanding of biology) (14), further insights will come from aligning all types of data within a single framework—“the data alignment.” For example, all genetic and molecular interaction data about a cell can be integrated into the same computational framework, and methods need to be developed for aligning these “integrated cells” within a new paradigm of “the cell alignment.” Similarly, the world's economic system includes networks of trade, financial exchanges, and investments, which thus far have been studied individually (3, 5). But a complete understanding of the origins of wealth, crises, and economic recoveries can only come from aligning and collectively analyzing all of these layers of networked economic and geopolitical data. Likewise, climatic measurements are captured by various network types encoding the relationships between climatic elements across geographical regions (e.g., wind speed, atmospheric pressure, and temperature) (4), and holistic, data-aligned analyses may help to explain this complex, dynamic system and better predict the effects of human-caused alterations. Mathematical formalisms capable of capturing the intricacies of higher-order organization of the data, along with the algorithms to compute and extract information from those formalisms, should be developed and applied (15). Extending the framework of Benson et al. to finding higher-order structures within these integrated and aligned data systems may be a way forward. Computational issues remain to be addressed, arising from large sizes, complexity, heterogeneity, noisiness, and different time and space scales of the data.

References and Notes

Acknowledgments: This work was supported by the European Research Council Starting Independent Researcher Grant (278212), the National Science Foundation Cyber-Enabled Discovery and Innovation Program (OIA-1028394), the Slovenian Research Agency (ARRS) Project (J1-5454), and the Serbian Ministry of Education and Science Project (III44006).


Navigate This Article