## Network Notation

Networks are often characterized by clusters of constituents that interact more closely with each other and have more connections to one another than they do with the rest of the components of the network. However, systematically identifying and studying such community structure in complicated networks is not easy, especially when the network interactions change over time or contain multiple types of connections, as seen in many biological regulatory networks or social networks. **Mucha et al.** (p. 876) developed a mathematical method to allow detection of communities that may be critical functional units of such networks. Application to real-world tasks—like making sense of the voting record in the U.S. Senate—demonstrated the promise of the method.

## Abstract

Network science is an interdisciplinary endeavor, with methods and applications drawn from across the natural, social, and information sciences. A prominent problem in network science is the algorithmic detection of tightly connected groups of nodes known as communities. We developed a generalized framework of network quality functions that allowed us to study the community structure of arbitrary multislice networks, which are combinations of individual networks coupled through links that connect each node in one network slice to itself in other slices. This framework allows studies of community structure in a general setting encompassing networks that evolve over time, have multiple types of links (multiplexity), and have multiple scales.

The study of graphs, or networks, has a long tradition in fields such as sociology and mathematics, and it is now ubiquitous in academic and everyday settings. An important tool in network analysis is the detection of mesoscopic structures known as communities (or cohesive groups), which are defined intuitively as groups of nodes that are more tightly connected to each other than they are to the rest of the network (*1*–*3*). One way to quantify communities is by a quality function that compares the number of intracommunity edges to what one would expect at random. Given the network adjacency matrix **A**, where the element *A _{ij}* details a direct connection between nodes

*i*and

*j*, one can construct a quality function

*Q*(

*4*,

*5*) for the partitioning of nodes into communities as

*Q*= ∑

*(*

_{ij}*A*−

_{ij}*P*)δ(

_{ij}*g*,

_{i}*g*), where δ(

_{j}*g*,

_{i}*g*) = 1 if the community assignments

_{j}*g*and

_{i}*g*of nodes

_{j}*i*and

*j*are the same and 0 otherwise, and

*P*is the expected weight of the edge between

_{ij}*i*and

*j*under a specified null model.

The choice of null model is a crucial consideration in studying network community structure (*2*). After selecting a null model appropriate to the network and application at hand, one can use a variety of computational heuristics to assign nodes to communities to optimize the quality *Q*(*2*, *3*). However, such null models have not been available for time-dependent networks; analyses have instead depended on ad hoc methods to piece together the structures obtained at different times (*6*–*9*) or have abandoned quality functions in favor of such alternatives as the Minimum Description Length principle (*10*). Although tensor decompositions (*11*) have been used to cluster network data with different types of connections, no quality-function method has been developed for such multiplex networks.

We developed a methodology to remove these limits, generalizing the determination of community structure via quality functions to multislice networks that are defined by coupling multiple adjacency matrices (Fig. 1). The connections encoded by the network slices are flexible; they can represent variations across time, variations across different types of connections, or even community detection of the same network at different scales. However, the usual procedure for establishing a quality function as a direct count of the intracommunity edge weight minus that expected at random fails to provide any contribution from these interslice couplings. Because they are specified by common identifications of nodes across slices, interslice couplings are either present or absent by definition, so when they do fall inside communities, their contribution in the count of intracommunity edges exactly cancels that expected at random. In contrast, by formulating a null model in terms of stability of communities under Laplacian dynamics, we have derived a principled generalization of community detection to multislice networks, with a single parameter controlling the interslice correspondence of communities.

Important to our method is the equivalence between the modularity quality function (*12*) [with a resolution parameter (*5*)] and stability of communities under Laplacian dynamics (*13*), which we have generalized to recover the null models for bipartite, directed, and signed networks (*14*). First, we obtained the resolution-parameter generalization of Barber’s null model for bipartite networks (*15*) by requiring the independent joint probability contribution to stability in (*13*) to be conditional on the type of connection necessary to step between two nodes. Second, we recovered the standard null model for directed networks (*16*, *17*) (again with a resolution parameter) by generalizing the Laplacian dynamics to include motion along different kinds of connections—in this case, both with and against the direction of a link. By this generalization, we similarly recovered a null model for signed networks (*18*). Third, we interpreted the stability under Laplacian dynamics flexibly to permit different spreading weights on the different types of links, giving multiple resolution parameters to recover a general null model for signed networks (*19*).

We applied these generalizations to derive null models for multislice networks that extend the existing quality-function methodology, including an additional parameter ω to control the coupling between slices. Representing each network slice *s* by adjacencies *A _{ijs}* between nodes

*i*and

*j*, with interslice couplings

*C*that connect node

_{jrs}*j*in slice

*r*to itself in slice

*s*(Fig. 1), we have restricted our attention to unipartite, undirected network slices (

*A*=

_{ijs}*A*) and couplings (

_{jis}*C*=

_{jrs}*C*), but we can incorporate additional structure in the slices and couplings in the same manner as demonstrated for single-slice null models. Notating the strengths of each node individually in each slice by

_{jsr}*k*= ∑

_{js}*and across slices by*

_{i}A_{ijs}*c*= ∑

_{js}*, we define the multislice strength by κ*

_{r}C_{jsr}*=*

_{js}*k*+

_{js}*c*. The continuous-time Laplacian dynamics given by (1)respects the intraslice nature of

_{js}*A*and the interslice couplings of

_{ijs}*C*. Using the steady-state probability distribution , where 2μ = ∑

_{jsr}*κ*

_{jr}*, we obtained the multislice null model in terms of the probability ρ*

_{jr}

_{is}_{|}

*of sampling node*

_{jr}*i*in slice

*s*conditional on whether the multislice structure allows one to step from (

*j*,

*r*) to (

*i*,

*s*), accounting for intra- and interslice steps separately as (2)where

*m*= ∑

_{s}*. The second term in parentheses, which describes the conditional probability of motion between two slices, leverages the definition of the*

_{j}k_{js}*C*coupling. That is, the conditional probability of stepping from (

_{jsr}*j*,

*r*) to (

*i*,

*s*) along an interslice coupling is nonzero if and only if

*i*=

*j*, and it is proportional to the probability

*C*/κ

_{jsr}*of selecting the precise interslice link that connects to slice*

_{jr}*s*. Subtracting this conditional joint probability from the linear (in time) approximation of the exponential describing the Laplacian dynamics, we obtained a multislice generalization of modularity (

*14*): (3)where we have used reweighting of the conditional probabilities, which allows a different resolution γ

*in each slice. We have absorbed the resolution parameter for the interslice couplings into the magnitude of the elements of*

_{s}*C*, which, for simplicity, we presume to take binary values {0,ω} indicating the absence (0) or presence (ω) of interslice links.

_{jsr}Community detection in multislice networks can then proceed using many of the same computational heuristics that are currently available for single-slice networks [although, as with the standard definition of modularity, one must be cautious about the resolution of communities (*20*) and the likelihood of complex quality landscapes that necessitate caution in interpreting results on real networks (*21*)]. We studied examples that have multiple resolutions [Zachary Karate Club (*22*)], vary over time [voting similarities in the U.S. Senate (*23*)], or are multiplex [the “Tastes, Ties, and Time” cohort of university students (*24*)]. We provide additional details for each example in (*14*).

We performed simultaneous community detection across multiple resolutions (scales) in the well-known Zachary Karate Club network, which encodes the friendships between 34 members of a 1970s university karate club (*22*). Keeping the same unweighted adjacency matrix across slices (*A _{ijs}* =

*A*for all

_{ij}*s*), the resolution associated with each slice is dictated by a specified sequence of γ

*parameters, which we chose to be the 16 values γ*

_{s}*= {0.25, 0.5, 0.75, …, 4}. In Fig. 2, we depict the community assignments obtained for coupling strengths ω = {0, 0.1, 1} between each neighboring pair of the 16 ordered slices. These results simultaneously probe all scales, including the partition of the Karate Club into four communities at the default resolution of modularity (*

_{s}*3*,

*25*). Additionally, we identified nodes that have an especially strong tendency to break off from larger communities (e.g., nodes 24 to 29 in Fig. 2).

We also considered roll call voting in the U.S. Senate across time, from the 1st Congress to the 110th, covering the years 1789 to 2008 and including 1884 distinct senator IDs (*26*). We defined weighted connections between each pair of senators by a similarity between their voting, specified independently for each 2-year Congress (*23*). We studied the multislice collection of these 110 networks, with each individual senator coupled to himself or herself when appearing in consecutive Congresses. Multislice community detection uncovered interesting details about the continuity of individual and group voting trends over time that are not captured by the union of the 110 independent partitions of the separate Congresses. Figure 3 depicts a partition into nine communities that we obtained using coupling ω = 0.5. The Congresses in which three communities appeared simultaneously are each historically noteworthy: The 4th and 5th Congresses were the first with political parties; the 10th and 11th Congresses occurred during the political drama of former Vice President Aaron Burr’s indictment for treason; the 14th and 15th Congresses witnessed the beginning of changing group structures in the Democratic-Republican party amidst the dying Federalist party (*23*); the 31st Congress included the Compromise of 1850; the 37th Congress occurred during the beginning of the American Civil War; the 73rd and 74th Congresses followed the landslide 1932 election (during the Great Depression); and the 85th to 88th Congresses brought the major American civil rights acts, including the congressional fights over the Civil Rights Acts of 1957, 1960, and 1964.

Finally, we applied multislice community detection to a multiplex network of 1640 college students at a northeastern American university (*24*), including symmetrized connections from the first wave of this data representing (i) Facebook friendships, (ii) picture friendships, (iii) roommates, and (iv) student housing-group preferences. Because the different connection types are categorical, the natural interslice couplings connect an individual in a slice to himself or herself in each of the other three network slices. This coupling between categorical slices thus differs from that above, which connected only neighboring (ordered) slices. Table 1 indicates the numbers of communities and the percentages of individuals assigned to one, two, three, or four communities across the four types of connections for different values of ω, as a first investigation of the relative redundancy across the connection types.

Our multislice framework makes it possible to study community structure in a much broader class of networks than was previously possible. Instead of detecting communities in one static network at a time, our formulation generalizing the Laplacian dynamics approach of (*13*) permits the simultaneous quality-function study of community structure across multiple times, multiple resolution parameter values, and multiple types of links. We used this method to demonstrate insights in real-world networks that would have been difficult or impossible to obtain without the simultaneous consideration of multiple network slices. Although our examples included only one kind of variation at a time, our framework applies equally well to networks that have multiple such features (e.g., time-dependent multiplex networks). We expect multislice community detection to become a powerful tool for studying such systems.

## Supporting Online Material

## References and Notes

- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵See supporting material on
*Science*Online. - ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- ↵
- We thank N. A. Christakis, L. Meneades, and K. Lewis for access to and helping with the “Tastes, Ties, and Time” data; S. Reid and A. L. Traud for help developing code; and A. Clauset, J.-C. Delvenne, S. Fortunato, M. Gould, and V. Traag for discussions. Congressional roll call data are from (voteview.com/CSRI/Proceedings)(
*26*). Supported by NSF grant DMS-0645369 (P.J.M.), James S. McDonnell Foundation grant 220020177 (M.A.P.), and the Fulbright Program (J.-P.O.).