## Abstract

Spatially or chemically isolated functional modules composed of several cellular components and carrying discrete functions are considered fundamental building blocks of cellular organization, but their presence in highly integrated biochemical networks lacks quantitative support. Here, we show that the metabolic networks of 43 distinct organisms are organized into many small, highly connected topologic modules that combine in a hierarchical manner into larger, less cohesive units, with their number and degree of clustering following a power law. Within *Escherichia coli*, the uncovered hierarchical modularity closely overlaps with known metabolic functions. The identified network architecture may be generic to system-level cellular organization.

The identification and characterization of system-level features of biological organization is a key issue of postgenomic biology (1–3). The concept of modularity assumes that cellular functionality can be seamlessly partitioned into a collection of modules. Each module is a discrete entity of several elementary components and performs an identifiable task, separable from the functions of other modules (1,4–8). Spatially and chemically isolated molecular machines or protein complexes (such as ribosomes and flagella) are prominent examples of such functional units, but more extended modules, such as those achieving their isolation through the initial binding of a signaling molecule (9), are also apparent.

Simultaneously, it is recognized that the thousands of components of a living cell are dynamically interconnected, so that the cell's functional properties are ultimately encoded into a complex intracellular web of molecular interactions (2–6, 8). This is perhaps most evident with cellular metabolism, a fully connected biochemical network in which hundreds of metabolic substrates are densely integrated through biochemical reactions. Within this network, however, modular organization (i.e., clear boundaries between subnetworks) is not immediately apparent. Indeed, recent studies have demonstrated that the probability that a substrate can react with*k* other substrates [the degree distribution*P*(*k*) of a metabolic network] decays as a power law *P*(*k*) ∼*k ^{–}
*

^{γ}with γ ≅ 2.2 in all organisms (10, 11), suggesting that metabolic networks have a scale-free topology (12). A distinguishing feature of such scale-free networks is the existence of a few highly connected nodes (e.g., pyruvate or coenzyme A), which participate in a very large number of metabolic reactions. With a large number of links, these hubs integrate all substrates into a single, integrated web in which the existence of fully separated modules is prohibited by definition (Fig. 1A).

Yet, the dilemma of a modular versus a highly integrated module-free metabolic network organization remains. A number of approaches for analyzing the functional capabilities of metabolic networks indicate the existence of separable functional elements (13,14). Also, from a purely topologic perspective, the metabolic network of *Escherichia coli* is known to possess a high clustering coefficient (11), a property that is suggestive of a modular organization. In itself, this implies that the metabolism of *E. coli* has a modular topology, potentially comprising several densely interconnected functional modules of varying sizes that are connected by few intermodule links (Fig. 1B). However, such clear-cut modularity imposes severe restrictions on the degree distribution, implying that most nodes have approximately the same number of links, which contrasts with the metabolic network's scale-free nature (10, 11).

To determine whether such a dichotomy is indeed a generic property of all metabolic networks, we first calculated the average clustering coefficient for 43 different organisms (10, 15, 16) as a function of the number of distinct substrates *N* present in their metabolism. The clustering coefficient, defined as*C _{i}
* = 2

*n/k*(

_{i}*k*), where

_{i}– 1*n*denotes the number of direct links connecting the

*k*nearest neighbors of node

_{i}*i*(17), is equal to 1 for a node at the center of a fully interlinked cluster, and it is 0 for a metabolite that is part of a loosely connected group (Fig. 2A). Therefore,

*C*averaged over all nodes

_{i}*i*of a metabolic network is a measure of the network's potential modularity. We found that, for all 43 organisms, the average clustering coefficient is about an order of magnitude larger than that expected for a scale-free network of similar size (Fig. 2B), suggesting that metabolic networks in all organisms are characterized by a high intrinsic potential modularity. We also observed that, in contrast with the prediction of the scale-free model, for which the clustering coefficient decreases as

*N*

^{−0.75}(18), the clustering coefficient of metabolic networks is independent of their size (Fig. 2B).

These results demonstrate a fundamental conflict between the predictions of the current models of metabolic organization. The high, size-independent clustering coefficient offers strong evidence for modularity, whereas the power law degree distribution of all metabolic networks (10, 11) strongly supports the scale-free model and rules out a manifestly modular topology. To resolve this apparent contradiction, we propose a simple heuristic model of metabolic organization, which we refer to as a “hierarchical” network (Fig. 1C) (19). In such a model network, our starting point is a small cluster of four densely linked nodes. Next, we generate three replicas of this hypothetical module and connect the three external nodes of the replicated clusters to the central node of the old cluster, obtaining a large 16-node module. Subsequently, we again generate three replicas of this 16-node module and connect the peripheral nodes to the central node of the old module (Fig. 1C). These replication and connection steps can be repeated indefinitely, in each step quadrupling the number of nodes in the system. The architecture of such a network integrates a scale-free topology with an inherent modular structure. It has a power law degree distribution with degree exponent γ = 1 + (ln 4)/(ln 3) = 2.26, in agreement with γ = 2.2 observed in metabolic networks. Its clustering coefficient C ≅ 0.6 is also comparable with that observed for metabolic networks. Most important, the clustering coefficient of the model is independent of the size of the network, in agreement with the results of Fig. 2B.

A unique feature of the proposed network model, not shared by either the scale-free (Fig. 1A) or modular (Fig. 1B) models, is its hierarchical architecture. This hierarchy, which is evident from a visual inspection, is intrinsic to the assembly by repeated quadrupling of the system. The hierarchy can be characterized quantitatively by using the recent observation (20) that, in deterministic scale-free networks, the clustering coefficient of a node with*k* links follows the scaling law*C*(*k*) ∼ *k*
^{−1}. This scaling law quantifies the coexistence of a hierarchy of nodes with different degrees of modularity, as measured by the clustering coefficient, and is directly relevant to our model (Fig. 1C). Indeed, the nodes at the center of the numerous 4-node modules have a clustering coefficient *C* = 3/4, those at the center of a 16-node module have *k* = 13 and *C* = 2/13, and those at the center of the 64-node modules have*k* = 40 and *C* = 2/40, indicating that the higher a node's connectivity, the smaller its clustering coefficient, asymptotically following the 1/*k* law.

To investigate whether such hierarchical organization is present in cellular metabolism, we measured the *C*(*k*) function for the metabolic networks of all 43 organisms. As shown inFig. 2, C through F, for each organism, *C*(*k*) is well approximated by *C*(*k*) ∼*k*
^{−1}, in contrast to the*k*-independent *C*(*k*) predicted by both the scale-free and modular networks. This provides direct evidence for an inherently hierarchical organization. Such hierarchical modularity reconciles within a single framework all the observed properties of metabolic networks: their scale-free topology; high, system size–independent clustering coefficient; and the power law scaling of*C*(*k*).

A key issue from a biological perspective is whether the identified hierarchical architecture reflects the true functional organization of cellular metabolism. To uncover potential relations between topological modularity and the functional classification of different metabolites, we concentrated on the metabolic network of *E. coli*, whose metabolic reactions have been exhaustively studied, both biochemically and genetically (21). Using a previously established graph-theoretical representation (10), we first subjected *E. coli*'s metabolic organization to a three-step reduction process, replacing nonbranching pathways with equivalent links, allowing us to decrease its complexity without altering the network topology (16). Next, we calculated the topological overlap matrix *O _{T}
*(

*i, j*) of the condensed metabolic network (Fig. 3A). A topological overlap of 1 between substrates

*i*and

*j*implies that they are connected to the same substrates, whereas a 0 value indicates that

*i*and

*j*do not share links to common substrates among the metabolites they react with. The metabolites that are part of highly integrated modules have a high topological overlap with their neighbors, and we found that the larger the overlap between two substrates within the

*E. coli*metabolic network, the more likely it is that they belong to the same functional class.

As the topological overlap matrix is expected to encode the comprehensive enzyme catalyzed functional relatedness of the substrates forming the metabolic network, we investigated whether potential functional modules encoded in the network topology can be uncovered automatically. Initial application of an average-linkage hierarchical clustering algorithm (22) to the overlap matrix of the small hypothetical network shown in Fig. 3A placed those nodes that have a high topological overlap close to each other (Fig. 3B). Also, the method identified the three distinct modules built into the model ofFig. 3A, as illustrated by the fact that the *EFG* and*HIJK* modules are closer to each other in a topological sense, with the *ABC* module being farther from both (Fig. 3B). Application of the same technique on the *E. coli*overlap matrix *O _{T}
*(

*i, j*) provides a global topologic representation of

*E. coli*metabolism (Fig. 4A). Groups of metabolites forming tightly interconnected clusters are visually apparent, and on closer inspection, the hierarchy of nested topologic modules of increasing sizes and decreasing interconnectedness is also evident. To visualize the relation between topological modules and the known functional properties of the metabolites, we color-coded the branches of the derived hierarchical tree according to the predominant biochemical class of the substrates it produces, using the classification of metabolism based on standard, small molecule biochemistry (15). As shown in Fig. 4A, and in the three-dimensional representation in Fig. 4B, most substrates of a given small molecule class are distributed on the same branch of the tree (Fig. 4A) and correspond to relatively well delimited regions of the metabolic network (Fig. 4B). Therefore, there are strong correlations between shared biochemical classification of metabolites and the global topological organization of

*E. coli*metabolism (Fig. 4A, bottom) (16).

To correlate the putative modules obtained from our graph theory–based analysis to actual biochemical pathways, we concentrated on the pathways involving the pyrimidine metabolites. Our method divided these pathways into four putative modules (Fig. 4C), which represent a topologically well-limited area of *E. coli*metabolism (Fig. 4B, blue-shaded region). As shown in Fig. 4D, all highly connected metabolites (Fig. 4D, red-outlined boxes) correspond to their respective biochemical reactions within pyrimidine metabolism, together with those substrates that were removed during the original network reduction procedure, and then added again (Fig. 4D, green-outlined boxes). However, it is also apparent that putative module boundaries do not always overlap with intuitive “biochemistry-based” boundaries. For instance, the synthesis of uridine 5′-monophosphate (UMP) from l-glutamine is expected to fall within a single module based on a linear set of biochemical reactions, whereas the synthesis of uridine 5′-diphosphate from UMP leaps putative module boundaries. Thus, further experimental and theoretical analyses will be needed to understand the relation between the decomposition of *E. coli*metabolism offered by our topology-based approach and the biologically relevant subnetworks.

The organization of metabolic networks is likely to combine a capacity for rapid flux reorganization with a dynamic integration with all other cellular function (11). Here we show that the system-level structure of cellular metabolism is best approximated by a hierarchical network organization with seamlessly embedded modularity. In contrast to current, intuitive views of modularity (Fig. 1B), which assume the existence of a set of modules with a nonuniform size potentially separated from other modules, we find that the metabolic network has an inherent self-similar property: There are many highly integrated small modules, which group into a few larger modules, which in turn can be integrated into even larger modules. This is supported by visual inspection of the derived hierarchical tree (Fig. 4A), which offers a natural breakdown of metabolism into several large modules, which are further partitioned into smaller, but more integrated submodules.

The mathematical framework proposed here to uncover the presence or absence of such hierarchical modularity and to delineate the modules based on the network topology could apply to other cellular and complex networks as well. As scale-free topology has been found at many different organizational levels, ranging from genetic (23) to protein interaction and protein domain (24) networks, it is possible that biological networks are always accompanied by a hierarchical modularity. Some nonbiological networks, ranging from the World Wide Web to the Internet, often combine a scale-free topology with a community structure (i.e., modularity) (25–27); therefore, these networks are also potential candidates for hierarchical modularity. For biological systems, hierarchical modularity is consistent with the notion that evolution may act at many organizational levels simultaneously: The accumulation of many local changes, which affect the small, highly integrated modules, could slowly impact the properties of the larger, less integrated modules. The emergence of the hierarchical topology through copying and reusing existing modules (1) and motifs (8), a process reminiscent of the results of gene duplication (28, 29), offers a special role to the modules that appeared first in the network. Although the model of Fig. 1C reproduces the large-scale features of the metabolism, understanding the evolutionary mechanism that explains the simultaneous emergence of the observed hierarchical and scale-free topology of the metabolism, as well as its generality to cellular organization, is now a prime challenge.

## Supporting Online Material

www.sciencemag.org/cgi/content/full/297/5586/1551/DC1

Materials and Methods

SOM Text

Figs. S1 to S15

↵* To whom correspondence should be addressed. E-mail: zno008{at}northwestern.edu (Z.N.O.) and alb{at}nd.edu (A.-L.B.)