Report

Community Structure in Time-Dependent, Multiscale, and Multiplex Networks

See allHide authors and affiliations

Science  14 May 2010:
Vol. 328, Issue 5980, pp. 876-878
DOI: 10.1126/science.1184819

This article has a correction. Please see:

Network Notation

Networks are often characterized by clusters of constituents that interact more closely with each other and have more connections to one another than they do with the rest of the components of the network. However, systematically identifying and studying such community structure in complicated networks is not easy, especially when the network interactions change over time or contain multiple types of connections, as seen in many biological regulatory networks or social networks. Mucha et al. (p. 876) developed a mathematical method to allow detection of communities that may be critical functional units of such networks. Application to real-world tasks—like making sense of the voting record in the U.S. Senate—demonstrated the promise of the method.

Abstract

Network science is an interdisciplinary endeavor, with methods and applications drawn from across the natural, social, and information sciences. A prominent problem in network science is the algorithmic detection of tightly connected groups of nodes known as communities. We developed a generalized framework of network quality functions that allowed us to study the community structure of arbitrary multislice networks, which are combinations of individual networks coupled through links that connect each node in one network slice to itself in other slices. This framework allows studies of community structure in a general setting encompassing networks that evolve over time, have multiple types of links (multiplexity), and have multiple scales.

The study of graphs, or networks, has a long tradition in fields such as sociology and mathematics, and it is now ubiquitous in academic and everyday settings. An important tool in network analysis is the detection of mesoscopic structures known as communities (or cohesive groups), which are defined intuitively as groups of nodes that are more tightly connected to each other than they are to the rest of the network (13). One way to quantify communities is by a quality function that compares the number of intracommunity edges to what one would expect at random. Given the network adjacency matrix A, where the element Aij details a direct connection between nodes i and j, one can construct a quality function Q (4, 5) for the partitioning of nodes into communities as Q = ∑ij(AijPij)δ(gi, gj), where δ(gi, gj) = 1 if the community assignments gi and gj of nodes i and j are the same and 0 otherwise, and Pij is the expected weight of the edge between i and j under a specified null model.

The choice of null model is a crucial consideration in studying network community structure (2). After selecting a null model appropriate to the network and application at hand, one can use a variety of computational heuristics to assign nodes to communities to optimize the quality Q(2, 3). However, such null models have not been available for time-dependent networks; analyses have instead depended on ad hoc methods to piece together the structures obtained at different times (69) or have abandoned quality functions in favor of such alternatives as the Minimum Description Length principle (10). Although tensor decompositions (11) have been used to cluster network data with different types of connections, no quality-function method has been developed for such multiplex networks.

We developed a methodology to remove these limits, generalizing the determination of community structure via quality functions to multislice networks that are defined by coupling multiple adjacency matrices (Fig. 1). The connections encoded by the network slices are flexible; they can represent variations across time, variations across different types of connections, or even community detection of the same network at different scales. However, the usual procedure for establishing a quality function as a direct count of the intracommunity edge weight minus that expected at random fails to provide any contribution from these interslice couplings. Because they are specified by common identifications of nodes across slices, interslice couplings are either present or absent by definition, so when they do fall inside communities, their contribution in the count of intracommunity edges exactly cancels that expected at random. In contrast, by formulating a null model in terms of stability of communities under Laplacian dynamics, we have derived a principled generalization of community detection to multislice networks, with a single parameter controlling the interslice correspondence of communities.

Fig. 1

Schematic of a multislice network. Four slices s = {1, 2, 3, 4} represented by adjacencies Aijs encode intraslice connections (solid lines). Interslice connections (dashed lines) are encoded by Cjrs, specifying the coupling of node j to itself between slices r and s. For clarity, interslice couplings are shown for only two nodes and depict two different types of couplings: (i) coupling between neighboring slices, appropriate for ordered slices; and (ii) all-to-all interslice coupling, appropriate for categorical slices.

Important to our method is the equivalence between the modularity quality function (12) [with a resolution parameter (5)] and stability of communities under Laplacian dynamics (13), which we have generalized to recover the null models for bipartite, directed, and signed networks (14). First, we obtained the resolution-parameter generalization of Barber’s null model for bipartite networks (15) by requiring the independent joint probability contribution to stability in (13) to be conditional on the type of connection necessary to step between two nodes. Second, we recovered the standard null model for directed networks (16, 17) (again with a resolution parameter) by generalizing the Laplacian dynamics to include motion along different kinds of connections—in this case, both with and against the direction of a link. By this generalization, we similarly recovered a null model for signed networks (18). Third, we interpreted the stability under Laplacian dynamics flexibly to permit different spreading weights on the different types of links, giving multiple resolution parameters to recover a general null model for signed networks (19).

We applied these generalizations to derive null models for multislice networks that extend the existing quality-function methodology, including an additional parameter ω to control the coupling between slices. Representing each network slice s by adjacencies Aijs between nodes i and j, with interslice couplings Cjrs that connect node j in slice r to itself in slice s (Fig. 1), we have restricted our attention to unipartite, undirected network slices (Aijs = Ajis) and couplings (Cjrs = Cjsr), but we can incorporate additional structure in the slices and couplings in the same manner as demonstrated for single-slice null models. Notating the strengths of each node individually in each slice by kjs = ∑iAijs and across slices by cjs = ∑rCjsr, we define the multislice strength by κjs = kjs + cjs. The continuous-time Laplacian dynamics given byEmbedded Image (1)respects the intraslice nature of Aijs and the interslice couplings of Cjsr. Using the steady-state probability distribution Embedded Image, where 2μ = ∑jrκjr, we obtained the multislice null model in terms of the probability ρis|jr of sampling node i in slice s conditional on whether the multislice structure allows one to step from (j, r) to (i, s), accounting for intra- and interslice steps separately asEmbedded Image (2)where ms = ∑jkjs. The second term in parentheses, which describes the conditional probability of motion between two slices, leverages the definition of the Cjsr coupling. That is, the conditional probability of stepping from (j, r) to (i, s) along an interslice coupling is nonzero if and only if i = j, and it is proportional to the probability Cjsrjr of selecting the precise interslice link that connects to slice s. Subtracting this conditional joint probability from the linear (in time) approximation of the exponential describing the Laplacian dynamics, we obtained a multislice generalization of modularity (14):Embedded Image (3)where we have used reweighting of the conditional probabilities, which allows a different resolution γs in each slice. We have absorbed the resolution parameter for the interslice couplings into the magnitude of the elements of Cjsr, which, for simplicity, we presume to take binary values {0,ω} indicating the absence (0) or presence (ω) of interslice links.

Community detection in multislice networks can then proceed using many of the same computational heuristics that are currently available for single-slice networks [although, as with the standard definition of modularity, one must be cautious about the resolution of communities (20) and the likelihood of complex quality landscapes that necessitate caution in interpreting results on real networks (21)]. We studied examples that have multiple resolutions [Zachary Karate Club (22)], vary over time [voting similarities in the U.S. Senate (23)], or are multiplex [the “Tastes, Ties, and Time” cohort of university students (24)]. We provide additional details for each example in (14).

We performed simultaneous community detection across multiple resolutions (scales) in the well-known Zachary Karate Club network, which encodes the friendships between 34 members of a 1970s university karate club (22). Keeping the same unweighted adjacency matrix across slices (Aijs = Aij for all s), the resolution associated with each slice is dictated by a specified sequence of γs parameters, which we chose to be the 16 values γs = {0.25, 0.5, 0.75, …, 4}. In Fig. 2, we depict the community assignments obtained for coupling strengths ω = {0, 0.1, 1} between each neighboring pair of the 16 ordered slices. These results simultaneously probe all scales, including the partition of the Karate Club into four communities at the default resolution of modularity (3, 25). Additionally, we identified nodes that have an especially strong tendency to break off from larger communities (e.g., nodes 24 to 29 in Fig. 2).

Fig. 2

Multislice community detection of the Zachary Karate Club network (22) across multiple resolutions. Colors depict community assignments of the 34 nodes (renumbered vertically to group similarly assigned nodes) in each of the 16 slices (with resolution parameters γs = {0.25, 0.5, …, 4}), for ω = 0 (top), ω = 0.1 (middle), and ω = 1 (bottom). Dashed lines bound the communities obtained using the default resolution (γ = 1).

We also considered roll call voting in the U.S. Senate across time, from the 1st Congress to the 110th, covering the years 1789 to 2008 and including 1884 distinct senator IDs (26). We defined weighted connections between each pair of senators by a similarity between their voting, specified independently for each 2-year Congress (23). We studied the multislice collection of these 110 networks, with each individual senator coupled to himself or herself when appearing in consecutive Congresses. Multislice community detection uncovered interesting details about the continuity of individual and group voting trends over time that are not captured by the union of the 110 independent partitions of the separate Congresses. Figure 3 depicts a partition into nine communities that we obtained using coupling ω = 0.5. The Congresses in which three communities appeared simultaneously are each historically noteworthy: The 4th and 5th Congresses were the first with political parties; the 10th and 11th Congresses occurred during the political drama of former Vice President Aaron Burr’s indictment for treason; the 14th and 15th Congresses witnessed the beginning of changing group structures in the Democratic-Republican party amidst the dying Federalist party (23); the 31st Congress included the Compromise of 1850; the 37th Congress occurred during the beginning of the American Civil War; the 73rd and 74th Congresses followed the landslide 1932 election (during the Great Depression); and the 85th to 88th Congresses brought the major American civil rights acts, including the congressional fights over the Civil Rights Acts of 1957, 1960, and 1964.

Fig. 3

Multislice community detection of U.S. Senate roll call vote similarities (23) with ω = 0.5 coupling of 110 slices (i.e., the number of 2-year Congresses from 1789 to 2008) across time. (A) Colors indicate assignments to nine communities of the 1884 unique senators (sorted vertically and connected across Congresses by dashed lines) in each Congress in which they appear. The dark blue and red communities correspond closely to the modern Democratic and Republican parties, respectively. Horizontal bars indicate the historical period of each community, with accompanying text enumerating nominal party affiliations of the single-slice nodes (each representing a senator in a Congress): PA, pro-administration; AA, anti-administration; F, Federalist; DR, Democratic-Republican; W, Whig; AJ, anti-Jackson; A, Adams; J, Jackson; D, Democratic; R, Republican. Vertical gray bars indicate Congresses in which three communities appeared simultaneously. (B) The same assignments according to state affiliations.

Finally, we applied multislice community detection to a multiplex network of 1640 college students at a northeastern American university (24), including symmetrized connections from the first wave of this data representing (i) Facebook friendships, (ii) picture friendships, (iii) roommates, and (iv) student housing-group preferences. Because the different connection types are categorical, the natural interslice couplings connect an individual in a slice to himself or herself in each of the other three network slices. This coupling between categorical slices thus differs from that above, which connected only neighboring (ordered) slices. Table 1 indicates the numbers of communities and the percentages of individuals assigned to one, two, three, or four communities across the four types of connections for different values of ω, as a first investigation of the relative redundancy across the connection types.

Table 1

Communities in the first wave of the multiplex “Tastes, Ties, and Time” network (24), using the default resolution (γ = 1) in each of the four slices of data (Facebook friendships, picture friendships, roommates, and housing groups) under various couplings ω across slices, which changed the number of communities and percentages of individuals assigned on a per-slice basis to one, two, three, or four communities.

View this table:

Our multislice framework makes it possible to study community structure in a much broader class of networks than was previously possible. Instead of detecting communities in one static network at a time, our formulation generalizing the Laplacian dynamics approach of (13) permits the simultaneous quality-function study of community structure across multiple times, multiple resolution parameter values, and multiple types of links. We used this method to demonstrate insights in real-world networks that would have been difficult or impossible to obtain without the simultaneous consideration of multiple network slices. Although our examples included only one kind of variation at a time, our framework applies equally well to networks that have multiple such features (e.g., time-dependent multiplex networks). We expect multislice community detection to become a powerful tool for studying such systems.

Supporting Online Material

References and Notes

  1. See supporting material on Science Online.
  2. We thank N. A. Christakis, L. Meneades, and K. Lewis for access to and helping with the “Tastes, Ties, and Time” data; S. Reid and A. L. Traud for help developing code; and A. Clauset, J.-C. Delvenne, S. Fortunato, M. Gould, and V. Traag for discussions. Congressional roll call data are from (voteview.com/CSRI/Proceedings)(26). Supported by NSF grant DMS-0645369 (P.J.M.), James S. McDonnell Foundation grant 220020177 (M.A.P.), and the Fulbright Program (J.-P.O.).
View Abstract

Navigate This Article