Report

Hierarchical Organization of Modularity in Metabolic Networks

See allHide authors and affiliations

Science  30 Aug 2002:
Vol. 297, Issue 5586, pp. 1551-1555
DOI: 10.1126/science.1073374

Abstract

Spatially or chemically isolated functional modules composed of several cellular components and carrying discrete functions are considered fundamental building blocks of cellular organization, but their presence in highly integrated biochemical networks lacks quantitative support. Here, we show that the metabolic networks of 43 distinct organisms are organized into many small, highly connected topologic modules that combine in a hierarchical manner into larger, less cohesive units, with their number and degree of clustering following a power law. Within Escherichia coli, the uncovered hierarchical modularity closely overlaps with known metabolic functions. The identified network architecture may be generic to system-level cellular organization.

The identification and characterization of system-level features of biological organization is a key issue of postgenomic biology (1–3). The concept of modularity assumes that cellular functionality can be seamlessly partitioned into a collection of modules. Each module is a discrete entity of several elementary components and performs an identifiable task, separable from the functions of other modules (1,4–8). Spatially and chemically isolated molecular machines or protein complexes (such as ribosomes and flagella) are prominent examples of such functional units, but more extended modules, such as those achieving their isolation through the initial binding of a signaling molecule (9), are also apparent.

Simultaneously, it is recognized that the thousands of components of a living cell are dynamically interconnected, so that the cell's functional properties are ultimately encoded into a complex intracellular web of molecular interactions (2–6, 8). This is perhaps most evident with cellular metabolism, a fully connected biochemical network in which hundreds of metabolic substrates are densely integrated through biochemical reactions. Within this network, however, modular organization (i.e., clear boundaries between subnetworks) is not immediately apparent. Indeed, recent studies have demonstrated that the probability that a substrate can react withk other substrates [the degree distributionP(k) of a metabolic network] decays as a power law P(k) ∼k γ with γ ≅ 2.2 in all organisms (10, 11), suggesting that metabolic networks have a scale-free topology (12). A distinguishing feature of such scale-free networks is the existence of a few highly connected nodes (e.g., pyruvate or coenzyme A), which participate in a very large number of metabolic reactions. With a large number of links, these hubs integrate all substrates into a single, integrated web in which the existence of fully separated modules is prohibited by definition (Fig. 1A).

Figure 1

Complex network models. (A) A schematic illustration (left) of a scale-free network, whose degree distribution follows a power law. In such a network, a few highly connected nodes, or hubs (blue circles), play an important role in keeping the whole network together. A typical configuration (right) of a scale-free network with 256 nodes is also shown, obtained using the scale-free model, which requires the addition of a new node at each time such that existing nodes with higher degrees of connectivity have a higher chance of being linked to the new nodes (12). The nodes are arranged in space with a standard clustering algorithm (30) to illustrate the absence of an underlying modularity. (B) Schematic illustration (left) of a manifestly modular network made of four highly interlinked modules connected to each other by a few links. This intuitive topology does not have a scale-free degree distribution, as most of its nodes have a similar number of links, and hubs are absent. A standard clustering algorithm uncovers the network's inherent modularity (right) by partitioning a modular network of N = 256 nodes into the four isolated structures built into the system. (C) The hierarchical network (left) has a scale-free topology with embedded modularity. The hierarchical levels are represented in increasing order from blue to green to red. Standard clustering algorithms (right) are less successful in uncovering the network's underlying modularity. A detailed quantitative characterization of the three network models is available in (16).

Yet, the dilemma of a modular versus a highly integrated module-free metabolic network organization remains. A number of approaches for analyzing the functional capabilities of metabolic networks indicate the existence of separable functional elements (13,14). Also, from a purely topologic perspective, the metabolic network of Escherichia coli is known to possess a high clustering coefficient (11), a property that is suggestive of a modular organization. In itself, this implies that the metabolism of E. coli has a modular topology, potentially comprising several densely interconnected functional modules of varying sizes that are connected by few intermodule links (Fig. 1B). However, such clear-cut modularity imposes severe restrictions on the degree distribution, implying that most nodes have approximately the same number of links, which contrasts with the metabolic network's scale-free nature (10, 11).

To determine whether such a dichotomy is indeed a generic property of all metabolic networks, we first calculated the average clustering coefficient for 43 different organisms (10, 15, 16) as a function of the number of distinct substrates N present in their metabolism. The clustering coefficient, defined asCi = 2n/ki (ki – 1), wheren denotes the number of direct links connecting theki nearest neighbors of node i(17), is equal to 1 for a node at the center of a fully interlinked cluster, and it is 0 for a metabolite that is part of a loosely connected group (Fig. 2A). Therefore, Ci averaged over all nodesi of a metabolic network is a measure of the network's potential modularity. We found that, for all 43 organisms, the average clustering coefficient is about an order of magnitude larger than that expected for a scale-free network of similar size (Fig. 2B), suggesting that metabolic networks in all organisms are characterized by a high intrinsic potential modularity. We also observed that, in contrast with the prediction of the scale-free model, for which the clustering coefficient decreases as N −0.75(18), the clustering coefficient of metabolic networks is independent of their size (Fig. 2B).

Figure 2

Evidence of hierarchical modularity in metabolic networks. (A) The clustering coefficient offers a measure of the degree of interconnectivity in the neighborhood of a node (17). For example, a node whose neighbors are all connected to each other has C = 1 (left), whereas a node with no links between its neighbors has C = 0 (right). (B) The average clustering coefficientC(N) for 43 organisms (10) is shown as a function of the number of substrates N present in each of them. Species belonging to archaea (purple), bacteria (green), and eukaryotes (blue) are shown. The dashed line indicates the dependence of the clustering coefficient on the network size for a module-free scale-free network, and the diamonds denote Cfor a scale-free network with the same parameters (N and number of links) as observed in the 43 organisms. (C throughE) The dependence of the clustering coefficient on the node's degree in three organisms: Aquidex aeolicus(archaea) (C), Escherichia coli (bacterium) (D), andSaccharomyces cerevisiae (eukaryote) (E). (F) TheC(k) curves averaged over all 43 organisms is shown, and the inset displays all 43 species together. The data points are color coded as in Fig. 2B. In (C through F), the dashed lines correspond to C(k) ∼k −1, and in (C through E), the diamonds represent the C(k) value expected for a scale-free network (Fig. 1A) of similar size, indicating the absence of scaling. The wide fluctuations are due to the small size of the network.

These results demonstrate a fundamental conflict between the predictions of the current models of metabolic organization. The high, size-independent clustering coefficient offers strong evidence for modularity, whereas the power law degree distribution of all metabolic networks (10, 11) strongly supports the scale-free model and rules out a manifestly modular topology. To resolve this apparent contradiction, we propose a simple heuristic model of metabolic organization, which we refer to as a “hierarchical” network (Fig. 1C) (19). In such a model network, our starting point is a small cluster of four densely linked nodes. Next, we generate three replicas of this hypothetical module and connect the three external nodes of the replicated clusters to the central node of the old cluster, obtaining a large 16-node module. Subsequently, we again generate three replicas of this 16-node module and connect the peripheral nodes to the central node of the old module (Fig. 1C). These replication and connection steps can be repeated indefinitely, in each step quadrupling the number of nodes in the system. The architecture of such a network integrates a scale-free topology with an inherent modular structure. It has a power law degree distribution with degree exponent γ = 1 + (ln 4)/(ln 3) = 2.26, in agreement with γ = 2.2 observed in metabolic networks. Its clustering coefficient C ≅ 0.6 is also comparable with that observed for metabolic networks. Most important, the clustering coefficient of the model is independent of the size of the network, in agreement with the results of Fig. 2B.

A unique feature of the proposed network model, not shared by either the scale-free (Fig. 1A) or modular (Fig. 1B) models, is its hierarchical architecture. This hierarchy, which is evident from a visual inspection, is intrinsic to the assembly by repeated quadrupling of the system. The hierarchy can be characterized quantitatively by using the recent observation (20) that, in deterministic scale-free networks, the clustering coefficient of a node withk links follows the scaling lawC(k) ∼ k −1. This scaling law quantifies the coexistence of a hierarchy of nodes with different degrees of modularity, as measured by the clustering coefficient, and is directly relevant to our model (Fig. 1C). Indeed, the nodes at the center of the numerous 4-node modules have a clustering coefficient C = 3/4, those at the center of a 16-node module have k = 13 and C = 2/13, and those at the center of the 64-node modules havek = 40 and C = 2/40, indicating that the higher a node's connectivity, the smaller its clustering coefficient, asymptotically following the 1/k law.

To investigate whether such hierarchical organization is present in cellular metabolism, we measured the C(k) function for the metabolic networks of all 43 organisms. As shown inFig. 2, C through F, for each organism, C(k) is well approximated by C(k) ∼k −1, in contrast to thek-independent C(k) predicted by both the scale-free and modular networks. This provides direct evidence for an inherently hierarchical organization. Such hierarchical modularity reconciles within a single framework all the observed properties of metabolic networks: their scale-free topology; high, system size–independent clustering coefficient; and the power law scaling ofC(k).

A key issue from a biological perspective is whether the identified hierarchical architecture reflects the true functional organization of cellular metabolism. To uncover potential relations between topological modularity and the functional classification of different metabolites, we concentrated on the metabolic network of E. coli, whose metabolic reactions have been exhaustively studied, both biochemically and genetically (21). Using a previously established graph-theoretical representation (10), we first subjected E. coli's metabolic organization to a three-step reduction process, replacing nonbranching pathways with equivalent links, allowing us to decrease its complexity without altering the network topology (16). Next, we calculated the topological overlap matrix OT (i, j) of the condensed metabolic network (Fig. 3A). A topological overlap of 1 between substrates i and j implies that they are connected to the same substrates, whereas a 0 value indicates thati and j do not share links to common substrates among the metabolites they react with. The metabolites that are part of highly integrated modules have a high topological overlap with their neighbors, and we found that the larger the overlap between two substrates within the E. coli metabolic network, the more likely it is that they belong to the same functional class.

Figure 3

Uncovering the underlying modularity of a complex network. (A) Topological overlap illustrated on a small hypothetical network. For each pair of nodes, i andj, we define the topological overlapOT (i, j) =Jn (i, j)/[min (ki, kj )], where Jn (i, j) denotes the number of nodes to which both iand j are linked (plus 1 if there is a direct link between i and j) and [min (ki, kj )] is the smaller of the ki and kj degrees. On each link, we indicate the topological overlap for the connected nodes, and in parentheses next to each node, we indicate the node's clustering coefficient. (B) The topological overlap matrix corresponding to the small network shown in (A). The rows and columns of the matrix were reordered by the application of an average linkage clustering method (22) to its elements, allowing us to identify and place close to each other those nodes that have high topological overlap. The color code denotes the degree of topological overlap between the nodes. The associated tree reflects the three distinct modules built into the model of Fig. 3A, as well as the fact that the EFG andHIJK modules are closer to each other in the topological sense than to the ABC module.

As the topological overlap matrix is expected to encode the comprehensive enzyme catalyzed functional relatedness of the substrates forming the metabolic network, we investigated whether potential functional modules encoded in the network topology can be uncovered automatically. Initial application of an average-linkage hierarchical clustering algorithm (22) to the overlap matrix of the small hypothetical network shown in Fig. 3A placed those nodes that have a high topological overlap close to each other (Fig. 3B). Also, the method identified the three distinct modules built into the model ofFig. 3A, as illustrated by the fact that the EFG andHIJK modules are closer to each other in a topological sense, with the ABC module being farther from both (Fig. 3B). Application of the same technique on the E. colioverlap matrix OT (i, j) provides a global topologic representation of E. colimetabolism (Fig. 4A). Groups of metabolites forming tightly interconnected clusters are visually apparent, and on closer inspection, the hierarchy of nested topologic modules of increasing sizes and decreasing interconnectedness is also evident. To visualize the relation between topological modules and the known functional properties of the metabolites, we color-coded the branches of the derived hierarchical tree according to the predominant biochemical class of the substrates it produces, using the classification of metabolism based on standard, small molecule biochemistry (15). As shown in Fig. 4A, and in the three-dimensional representation in Fig. 4B, most substrates of a given small molecule class are distributed on the same branch of the tree (Fig. 4A) and correspond to relatively well delimited regions of the metabolic network (Fig. 4B). Therefore, there are strong correlations between shared biochemical classification of metabolites and the global topological organization of E. coli metabolism (Fig. 4A, bottom) (16).

Figure 4

Identifying the topological modules inE. coli metabolism. (A) The topologic overlap matrix corresponding to E. coli metabolism, together with the corresponding hierarchical tree (top) that quantifies the relation between the different modules. The branches of the tree are color coded to reflect the predominant biochemical classification of their substrates. The biochemical classes that we used to group the metabolites represent carbohydrate metabolism (blue); nucleotide and nucleic acid metabolism (red); protein, peptide, and amino acid metabolism (green); lipid metabolism (cyan); aromatic compound metabolism (dark pink); monocarbon compound metabolism (yellow); and coenzyme metabolism (light orange) (15). The color code of the matrix denotes the degree of topological overlap shown in the matrix. The large-scale functional map of the metabolism, as suggested by the hierarchical tree, is also shown (bottom). (B) Three-dimensional representation of the reduced E. colimetabolic network. Each node is color coded by the predominant biochemical class to which it belongs and is identical to the color code applied to the branches of the tree shown in (A). The different functional classes are visibly segregated into topologically distinct regions of metabolism. The blue-shaded region denotes the nodes belonging to pyrimidine metabolism. (C) Enlarged view of the substrate module of pyrimidine metabolism. The colored boxes denote the first two levels of the three levels of nested modularity suggested by the hierarchical tree. CDP, cytidine 5′-diphosphate; CMP, cytidine 5′-monophosphate; CTP, cytidine 5′-triphosphate; dCDP, deoxycytidine 5′-diphosphate; dCMP, deoxycytidine 5′-monophosphate; dCTP, deoxycytidine 5′-triphosphate; dUDP, deoxyuridine 5′-diphosphate; dUMP, deoxyuridine 5′-monophosphate; dUTP, deoxyuridine 5′-triphosphate; UTP, uridine 5′-triphosphate. (D) A detailed diagram of the metabolic reactions that surround and incorporate the pyrimidine metabolic module. Red-outlined boxes denote the substrates directly appearing in the reduced metabolism and the tree shown in (C). Substrates in green-outlined boxes are internal to pyrimidine metabolism but represent members of nonbranching pathways or end pathways branching from a metabolite with multiple connections (16). Blue- and black-outlined boxes show the connections of pyrimidine metabolites to other parts of the metabolic network. Black-outlined boxes denote core substrates belonging to other branches of the metabolic tree (A), and blue-outlined boxes denote nonbranching pathways (if present) leading to those substrates. With the exception of carbamoyl phosphate and S-dihydroorotate, all pyrimidine metabolites are connected with a single biochemical reaction. The shaded boxes around the reactions highlight the modules suggested by the hierarchical tree. The shaded blue boxes along the links display the enzymes catalyzing the corresponding reactions, and the arrows show the direction of the reactions according to theWIT metabolic maps (15). cCMP, cyclic cytidine 5′-monophosphate; cUMP, cyclic uridine 5′-monophosphate; dTDP, deoxythymidine 5′-diphosphate; dTMP, deoxythymidine 5′-monophosphate; dTTP, deoxythymidine 5′-triphosphate; TDP, thymidine diphosphate; TMP, thymidine monophosphate; TTP, thymidine triphosphate.

To correlate the putative modules obtained from our graph theory–based analysis to actual biochemical pathways, we concentrated on the pathways involving the pyrimidine metabolites. Our method divided these pathways into four putative modules (Fig. 4C), which represent a topologically well-limited area of E. colimetabolism (Fig. 4B, blue-shaded region). As shown in Fig. 4D, all highly connected metabolites (Fig. 4D, red-outlined boxes) correspond to their respective biochemical reactions within pyrimidine metabolism, together with those substrates that were removed during the original network reduction procedure, and then added again (Fig. 4D, green-outlined boxes). However, it is also apparent that putative module boundaries do not always overlap with intuitive “biochemistry-based” boundaries. For instance, the synthesis of uridine 5′-monophosphate (UMP) from l-glutamine is expected to fall within a single module based on a linear set of biochemical reactions, whereas the synthesis of uridine 5′-diphosphate from UMP leaps putative module boundaries. Thus, further experimental and theoretical analyses will be needed to understand the relation between the decomposition of E. colimetabolism offered by our topology-based approach and the biologically relevant subnetworks.

The organization of metabolic networks is likely to combine a capacity for rapid flux reorganization with a dynamic integration with all other cellular function (11). Here we show that the system-level structure of cellular metabolism is best approximated by a hierarchical network organization with seamlessly embedded modularity. In contrast to current, intuitive views of modularity (Fig. 1B), which assume the existence of a set of modules with a nonuniform size potentially separated from other modules, we find that the metabolic network has an inherent self-similar property: There are many highly integrated small modules, which group into a few larger modules, which in turn can be integrated into even larger modules. This is supported by visual inspection of the derived hierarchical tree (Fig. 4A), which offers a natural breakdown of metabolism into several large modules, which are further partitioned into smaller, but more integrated submodules.

The mathematical framework proposed here to uncover the presence or absence of such hierarchical modularity and to delineate the modules based on the network topology could apply to other cellular and complex networks as well. As scale-free topology has been found at many different organizational levels, ranging from genetic (23) to protein interaction and protein domain (24) networks, it is possible that biological networks are always accompanied by a hierarchical modularity. Some nonbiological networks, ranging from the World Wide Web to the Internet, often combine a scale-free topology with a community structure (i.e., modularity) (25–27); therefore, these networks are also potential candidates for hierarchical modularity. For biological systems, hierarchical modularity is consistent with the notion that evolution may act at many organizational levels simultaneously: The accumulation of many local changes, which affect the small, highly integrated modules, could slowly impact the properties of the larger, less integrated modules. The emergence of the hierarchical topology through copying and reusing existing modules (1) and motifs (8), a process reminiscent of the results of gene duplication (28, 29), offers a special role to the modules that appeared first in the network. Although the model of Fig. 1C reproduces the large-scale features of the metabolism, understanding the evolutionary mechanism that explains the simultaneous emergence of the observed hierarchical and scale-free topology of the metabolism, as well as its generality to cellular organization, is now a prime challenge.

Supporting Online Material

www.sciencemag.org/cgi/content/full/297/5586/1551/DC1

Materials and Methods

SOM Text

Figs. S1 to S15

  • * To whom correspondence should be addressed. E-mail: zno008{at}northwestern.edu (Z.N.O.) and alb{at}nd.edu (A.-L.B.)

REFERENCES AND NOTES

View Abstract

Navigate This Article