Research Article

The Hidden Geometry of Complex, Network-Driven Contagion Phenomena

See allHide authors and affiliations

Science  13 Dec 2013:
Vol. 342, Issue 6164, pp. 1337-1342
DOI: 10.1126/science.1245200

This article has a correction. Please see:

Predicting Disease Dissemination

In combating the global spread of an emerging infectious disease, answers must be obtained to three crucial questions: Where did the disease emerge? Where will it go next? When will it arrive? Brockmann and Helbing (p. 1337; see the Perspective by McLean) analyzed disease spread via the “effective distance” rather than geographical distance, wherein two locations that are connected by a strong link are effectively close. The approach was successfully applied to predict disease arrival times or disease source using data from the the 2003 SARS viral epidemic, 2009 H1N1 influenza pandemic, and the 2011 foodborne enterohaemorrhagic Escherichia coli outbreak in Germany.


The global spread of epidemics, rumors, opinions, and innovations are complex, network-driven dynamic processes. The combined multiscale nature and intrinsic heterogeneity of the underlying networks make it difficult to develop an intuitive understanding of these processes, to distinguish relevant from peripheral factors, to predict their time course, and to locate their origin. However, we show that complex spatiotemporal patterns can be reduced to surprisingly simple, homogeneous wave propagation patterns, if conventional geographic distance is replaced by a probabilistically motivated effective distance. In the context of global, air-traffic–mediated epidemics, we show that effective distance reliably predicts disease arrival times. Even if epidemiological parameters are unknown, the method can still deliver relative arrival times. The approach can also identify the spatial origin of spreading processes and successfully be applied to data of the worldwide 2009 H1N1 influenza pandemic and 2003 SARS epidemic.

Movie S1

Watch Video

The geographic spread of emergent infectious diseases affects the lives of tens of thousands or even millions of people (1, 2). Recent examples of emergent diseases are the SARS epidemic of 2003, the 2009 H1N1 influenza pandemic, and most recently a new strain (H7N9) of avian influenza virus (3, 4). Progressing worldwide urbanization, combined with growing connectivity among metropolitan centers, has increased the risk that highly virulent emergent pathogens will spread (58). The complexity of global human mobility, particularly air traffic (Fig. 1A), makes it increasingly difficult to develop effective containment and mitigation strategies on the time scale imposed by the speed at which modern diseases can spread (911). Because timely, accurate, and focused action can potentially save the lives of many people and reduce the socio-economic impact of infectious diseases (12, 13), understanding global disease dynamics has become a major 21st-century challenge. Unraveling the core mechanisms that underlie these phenomena and being able to distinguish key factors from peripheral ones are required to develop quantitative, efficient, and predictive models that public health authorities can employ to assess situations quickly, make informed decisions, and optimize vaccination and drug delivery plans. After the initial outbreak of an epidemic, the key questions are as follows: (i) Where did the novel pathogen emerge? (ii) Where are new cases to be expected? (iii) When is an epidemic going to arrive at distant locations? (iv) How many cases are to be expected?

Fig. 1 Complexity in global, network-driven contagion phenomena.

(A) The global mobility network (GMN). Gray lines represent passenger flows along direct connections between 4069 airports worldwide. Geographic regions are distinguished by color [classified according to network modularity maximization (39)]. (B) Temporal snapshot of a simulated global pandemic with initial outbreak location (OL) in Hong Kong (HKG). The simulation is based on the metapopulation model defined by Eq. 3 with parameters R0 = 1.5, β = 0.285 day–1, γ = 2.8 × 10–3 day–1, ε = 10–6. Red symbols depict locations with epidemic arrival times in the time window 105 days ≤ Ta ≤ 110 days. Because of the multiscale structure of the underlying network, the spatial distribution of disease prevalence (i.e., the fraction of infected individuals) lacks geometric coherence. No clear wavefront is visible, and based on this dynamic state, the OL cannot be easily deduced. (C) For the same simulation as in (B), the panel depicts arrival times Ta as a function of geographic distance Dg from the OL [nodes are colored according to geographic region as in (A)] for each of the 4069 nodes in the network. On a global scale, Ta weakly correlates with geographic distance Dg (R2 = 0.34). A linear fit yields an average global spreading speed of vg = 331 km/day (see also fig. S7). Using Dg and vg to estimate arrival times for specific locations, however, does not work well owing to the strong variability of the arrival times for a given geographic distance. The red horizontal bar corresponds to the arrival time window shown in (B). (D) Arrival times versus geographic distance from the source (Mexico) for the 2009 H1N1 pandemic. Symbols represent 140 affected countries, and symbol size quantifies total traffic per country. Arrival times are defined as the date of the first confirmed case in a given country after the initial outbreak on 17 March 2009. As in the simulated scenario, arrival time and geographic distance are only weakly correlated (R2 = 0.0394). (E) In analogy to (D), the panel depicts the arrival times versus geographic distance from the source (China) of the 2003 SARS epidemic for 29 affected countries worldwide. Arrival times are taken from WHO published data (2). As in (C) and (D), arrival time correlates weakly with geographic distance.

Historically, for cases like the spread of the Black Death in Europe, reaction-diffusion models have been quite useful in addressing these questions (14, 15). Despite their high level of abstraction, these models provide a solid intuition and understanding of spreading processes. Their mathematical simplicity permits the assessment of key properties, e.g., spreading speed, arrival times, and how pattern geometry depends on system parameters (16). However, because of long-distance travel, simple reaction-diffusion models are inadequate for the description of today’s complex, spatially incoherent spreading patterns that generically bear no metric regularity, that depend sensitively on model parameters and initial conditions (1720) (Fig. 1, B to E, and fig. S2).

Consequently, scientists have been developing powerful, large-scale computational models and sophisticated, parameter-rich epidemic simulators that tackle the above key questions in detailed ways. These consider demographics, mobility, and epidemiological data, as well as disease-specific mechanisms, all of which are believed to play a role (2123). Models range from high-level stochastic metapopulation models (5, 20, 24) to agent-based computer simulations that account for the behavior and interactions of millions of individuals in large populations (25). These approaches have become remarkably successful in reproducing observed patterns and predicting the temporal evolution of ongoing epidemics (26). Many such models reproduce similar dynamic features despite major differences in their underlying assumptions and data (27). The abundance of different, often mutually incompatible, models suggests that we still lack a fundamental understanding of the key factors that determine the observed spatiotemporal dynamics. It is unclear how the multitude of factors shape the dynamics and how much detail is required to achieve a certain level of predictive fidelity. Moreover, detailed computational models that incorporate all potentially relevant factors ab initio do not inform which factors are actually relevant and which ones are not (28). They are also hard to calibrate and of limited use when the knowledge of epidemiological parameters is uncertain.

Here, we propose an intuitive and efficient approach that remedies the situation by connecting the conceptual power of simple reaction-diffusion systems with the predictive power of high-level, computational models. Our approach is based on the idea of replacing conventional geographic distance by a measure of effective distance derived from the underlying mobility network. Based on this novel notion of distance, patterns that exhibit complex spatiotemporal structure in the conventional geographic perspective turn into regular, wavelike solutions reminiscent of simple reaction-diffusion systems. This permits the definition of effective epidemic wavefronts, propagation speeds, and the reliable estimation of epidemic arrival times, based on the knowledge of the underlying mobility network. The method, however, goes beyond remapping data. It provides two key insights. First, epidemiological parameters enter the spreading dynamics separately from the transport parameters, and second, the dynamics is dominated by only a small percentage of transport connections. Furthermore, our approach can quickly identify the geographic origin of emergent diseases, using temporal snapshots of the spatial disease distribution. This detection of the origin of complex, multiscale dynamical spreading patterns is important for three reasons: (i) to determine what has caused the disease, (ii) to develop timely mitigation strategies, and (iii) to predict its further spread (the arrival times in remote locations and the expected prevalence).

Modeling Network-Driven Contagion Phenomena

For illustration, we consider a complex network of coupled populations (a metapopulation) in which the local disease time course is described by a conventional susceptible-infected-recovered (SIR) dynamics (1):

Embedded Image (1)

where Nn is the population size of population n, M is the number of populations, and Sn, In, Embedded Image are absolute numbers of susceptible, infected, and recovered individuals, respectively. Parameter β is the mean recovery rate of individuals (for influenza-like diseases β–1 = 3 to 5 days), and R0 = α/β is the basic reproduction ratio (for which we assume typical values in the range 1.4 to 2.9). (The focus on SIR kinetics is not essential, as the following results are also valid for other types of local dynamics.) Each local population represents a node n in the global mobility network (GMN), depicted in Fig. 1A. In addition to the local dynamics, individuals travel between nodes according to the rate equation

Embedded Image (2)

where Un is a placeholder for the classes Sn, In, and Rn. The quantities wnm = Fnm/Nm represent the per-capita traffic flux from location m to n. Weighted links Fnm quantify direct air traffic (passengers per day) from node m to node n. The GMN is constructed from the worldwide air traffic between 4069 airports with 25,453 direct connections. Details on the data and network construction are provided in the supplementary materials (e.g., fig. S1 and table S1) (5, 13, 20, 29). The total network traffic is approximately Embedded Image passengers per day. Assuming that the total traffic in and out of a node is proportional to its population size, Eqs. 1 and 2 can be rewritten as

Embedded Image(3)

with sn = Sn/Nn, jn = In/Nn, and rn = 1 – snjn. A detailed derivation is provided in the supplementary text. The mobility parameter γ is the average mobility rate, i.e., Embedded Image, where Embedded Image is the total population in the system. This yields numerical values in the range γ = 0.0013 – 0.0178 day–1. The matrix Embedded Image with 0 ≤ Pmn ≤ 1 quantifies the fraction of the passenger flux with destination m emanating from node n, i.e., Pmn =Fmn/Fn, where Embedded Image. The additional sigmoid function Embedded Image with gain parameter η >> 0 accounts for the local invasion threshold ε and fluctuation effects for jn < ε (3032). Typical parameter choices for ε and η are Embedded Image and Embedded Image. Our results are robust with respect to changes in these parameters (e.g., figs. S5 and S13).

Figure 1B shows a temporal snapshot of the dynamical system defined by Eq. 3 for a hypothetical pandemic with initial outbreak location (OL) in Hong Kong (HKG) (see also Fig. 2B and fig. S2 for temporal sequences of the dynamical system for various other OLs). Generally, the metapopulation model above and related models used in the past generate solutions that are characterized by similar qualitative features. First, only during the early stage of the process does the prevalence jn(t) (i.e., the fraction of infected individuals) correlate significantly with geographic distance from the OL. Second, at intermediate and later stages, the multiscale structure of the GMN induces a spatial decoherence of the spreading pattern. Third, despite the global connectivity, the spatiotemporal patterns do not converge to the same pattern, i.e., spatiotemporal differences are not a transient effect (figs. S3 to S6 and movies S1 to S3). This type of complexity sharply contrasts the generic behavior of ordinary reaction-diffusion systems, which typically exhibit spatially coherent wavefronts.

Fig. 2 Understanding global contagion phenomena using effective distance.

(A) The structure of the shortest path tree (in gray) from Hong Kong (central node). Radial distance represents effective distance Deff as defined by Eqs. 4 and 5. Nodes are colored according to the same scheme as in Fig. 1A. (B) The sequence (from left to right) of panels depicts the time course of a simulated model disease with initial outbreak in Hong Kong (HKG), for the same parameter set as used in Fig. 1B. Prevalence is reflected by the redness of the symbols. Each panel compares the state of the system in the conventional geographic representation (bottom) with the effective distance representation (top). The complex spatial pattern in the conventional view is equivalent to a homogeneous wave that propagates outwards at constant effective speed in the effective distance representation. (C) Epidemic arrival time Ta versus effective distance Deff for the same simulated epidemic as in (B). In contrast to geographic distance (Fig. 1C), effective distance correlates strongly with arrival time (R2 = 0.973), i.e., effective distance is an excellent predictor of arrival times. (D and E) Linear relationship between effective distance and arrival time for the 2009 H1N1 pandemic (D) and the 2003 SARS epidemic (E). The arrival time data are the same as in Fig. 1, D and E. The effective distance was computed from the projected global mobility network between countries. As in the model system, we observe a strong correlation between arrival time and effective distance.

Most Probable Paths and Effective Distance

The key idea we pursue here is that, despite the structural complexity of the underlying network, the redundancy of connections, and the multiplicity of paths a contagion phenomenon can take, the dynamic process is dominated by a set of most probable trajectories that can be derived from the connectivity matrix P. This hypothesis is analogous to the dominance of the smallest resistor in a strongly heterogeneous electrical network with parallel conducting lines. Given the flux-fraction 0 ≤ Pmn ≤ 1, i.e., the fraction of travelers that leave node n and arrive at node m, we define the effective distance dnm from a node n to a connected node m as

Embedded Image (4)

This concept of effective distance reflects the idea that a small fraction of traffic nm is effectively equivalent to a large distance, and vice versa. As explained in more detail in the supplementary text, the logarithm is a consequence of the requirement that effective lengths are additive, whereas probabilities along multistep paths are multiplicative. Eq. 4 defines a quasi-distance, which is generally asymmetric, i.e., Embedded Image. The lack of symmetry is analogous to a road network of one-way streets, where the shortest distance from A to B may differ from the one from B to A. This asymmetry captures the effect that a randomly seeded disease in a peripheral node of the network has a higher probability of being transmitted to a well-connected hub than vice versa (figs. S8 to S10). More properties of effective distance as defined by Eq. 4 are discussed in the supplementary text. On the basis of effective distance, we can define the directed length Embedded Image of an ordered path Embedded Image as the sum of effective lengths along the legs of the path. Moreover, we define the effective distance Dmn from an arbitrary reference node n to another node m in the network by the length of the shortest path from n to m:

Embedded Image (5)

Again, we typically have Embedded Image. From the perspective of a chosen origin node n, the set of shortest paths to all other nodes constitutes a shortest path tree Ψn (Fig. 2A), illustrating the most probable sequence of paths from the root node n to the other nodes.

Effective Distance Perspective Reveals Hidden Pattern Geometry

The key question is how, compared to the conventional geographic representation, the same spreading process evolves on the shortest path tree. Figure 2B portrays this comparison. We see that the effective distance representation has notable advantages: It reveals simple coherent wave fronts, whereas spatiotemporal patterns in geographical space are complex, incoherent, and hard to understand. This is a generic feature that is robust against variations in epidemic parameters and true for any choice of the OL (figs. S11 and S12). Using effective distance, one can thus calculate the spreading speed and arrival times of a disease, and determine functional relationships between epidemiological and mobility parameters. The dynamic simplicity in the new representation is much more than just a trivial visual rearrangement of the spatiotemporal pattern. Simple propagating waves in the new perspective imply that the contagion process is dominated by most probable paths, as this is the underlying assumption in the derivation of Eq. 5. Also, effective distance and the shortest path trees only depend on the static mobility matrix P. This implies that, on a spatial scale described by the metapopulation model (Eq. 3), the complexity of the spatiotemporal pattern is largely determined by the structure of the mobility component in Eq. 3 and not by the nonlinearities or the disease-specific, epidemiological rate parameters of the model.

Figure 2C presents the correlation of arrival times Ta with effective distances Deff for the simulation shown in Fig. 2B. Compared to Fig. 1C, this demonstrates that effective distance generates a much higher correlation than geographic distance (Embedded Image compared to Embedded Image; see tables S2 and S3 and fig. S12 for more examples). Furthermore, the relationship of Ta and Deff is linear, which means that the effective speed veff = Deff/Ta of the wavefront is a well-defined constant. To compare the regression quality, we computed the distribution of relative residuals r = δTa/Ta, using effective or geographic distance as a regressor. The ratio of residual variances implies a more than 50-fold higher prediction quality (table S3 and fig. S13).

Although we have demonstrated the clear linear functional relationship for simulated, hypothetical scenarios of global disease spread, it is crucial to test the validity and usefulness of the effective distance approach on empirical data. Figure 2, D and E, depict arrival time versus effective distance on the basis of data for the 2009 H1N1 pandemic and the global 2003 SARS epidemic, respectively (figs. S14 to S16 and table S4). Arrival times are the same as in Fig. 1, D and E, but shown across effective rather than geographic distances. As the empirical data are available on a country resolution, we determined the traffic between countries by aggregation to specify a coarse-grained network (GMNc) (189 nodes, 5004 links) and effective distances from the origin location in each case (see supplementary text for details). Both the H1N1 and SARS data exhibit a clear linear relationship between arrival time and effective distance from the source, even though additional factors complicate the spreading of real diseases. Fluctuations, effects due to coarse graining, and errors in arrival-time measurements can add noise to the system, which increases the scatter in the linear relationship. To address the general validity of the observed effects, we also analyzed data generated by the global epidemic and mobility model (GLEAM) (, a sophisticated epidemic simulation framework (21). GLEAM incorporates air transportation and local commuter traffic on a global scale, is fully stochastic, and permits the simulation of infectious state–dependent mobility behavior, clinical states, antiviral statement, and more. The results of this analysis are shown in figs. S17 to S19 and are consistent with our claims.

Relative Arrival Times Are Independent of Epidemic Parameters

Our results reveal an important, approximate relationship between the system parameters, which can be summarized as follows:

Embedded Image(6)

This equation states that arrival times can be computed with high fidelity based on the effective distances Deff and effective spreading speed veff, and that each factor depends on different parameters of the dynamical system. The epidemiological parameters determine the effective speed, whereas effective distance depends only on the topological features of the static underlying network, i.e., the matrix P. When confronted with the outbreak of an emergent infectious disease, one of the key problems is that the disease-specific parameters are typically unknown in the beginning, and simulations based on plausible parameter ranges typically exhibit substantial variability in predicted outcomes. However, Eq. 6 allows us to compute relative arrival times without knowledge of these parameters. If, for example, the outbreak node is labeled k, while n and m are arbitrary nodes, then Ta(n|k)/Ta(m|k) = Deff(n|k)/Deff(m|k). Equation 6 states that the effective speed veff is a global property, independent of the mobility network and the outbreak location. Thus, irrespective of mobility and OL, one can investigate how the effective speed depends on rate parameters of the system.

Origin of Outbreak Reconstruction Based on Effective Distance

The concept of effective distance is particularly valuable for solving the aforementioned inverse problem: Given a spatially distributed prevalence pattern that was generated by an underlying, and potentially hidden, spreading mechanism, how can we determine the most likely initial outbreak location (33, 34)? A crucial property of the effective distance perspective is that spreading concentricity (Fig. 2B) is only observed from the perspective of the actual OL. Therefore, given a spatiotemporal snapshot of the spreading dynamics (Fig. 3A), we can represent the spreading process from the perspective of candidate OLs (Fig. 3C) and determine the degree of concentricity of the pattern in effective distance. Compared to alternative OL candidates, the representation for the actual OL exhibits the most concentric shape, identifying this node as the correct outbreak location. This type of qualitative analysis can be made more systematic by introducing a measure for concentricity. We investigated two conceptually different approaches. First, for every one of the 4069 potential outbreak locations n, we computed the shortest path tree Ψn, the effective distance to all other locations Deff(m|n), and arrival times Ta(m|n). For each candidate node, we computed the correlation coefficient c(Ta, Deff) of effective distance and arrival time for candidate location n. This approach should yield the highest correlation when n is the actual outbreak location. As expected, this is the case for the H1N1 and SARS data sets (Fig. 4, A and B). However, this approach requires knowledge of the entire time course of the epidemic, e.g., arrival times at all locations, which is typically not available in real situations. Therefore, we used an alternative approach, mathematically similar to surface roughness characterization (35), that only requires dynamic information in a small time window, e.g., one snapshot of the spreading pattern. For each of the potential candidate outbreak locations, we computed the effective distance to the subset of nodes with prevalence above a certain threshold, e.g., the red symbols in the patterns of Fig. 3A or Fig. 1B. On the basis of this set of effective distances (denoted by F), we compute the mean μF(Deff) and standard deviation σF(Deff). Concentricity increases with a combined minimization of mean and standard deviation (supplementary text). Figure 4C depicts the distribution of ensemble-normalized pairs [μF(Deff), σF(Deff)] for a simulated scenario at four different times. For all time points, the actual outbreak location is well separated from the remaining point cloud and closest to the origin. This shows that the effective distance perspective is unique from the actual outbreak location and that knowledge of a temporal snapshot of the spreading state combined with knowledge of the underlying mobility network is a powerful tool for outbreak reconstruction.

Fig. 3 Qualitative outbreak reconstruction based on effective distance.

(A) Spatial distribution of prevalence jn(t) at time T = 81 days for OL Chicago (parameters β = 0.28 day–1, R0 = 1.9, γ = 2.8 × 10–3 day–1, and ε = 10–6). After this time, it is difficult, if not impossible, to determine the correct OL from snapshots of the dynamics. (B) Candidate OLs chosen from different geographic regions. (C) Panels depict the state of the system shown in (A) from the perspective of each candidate OL, using each OL’s shortest path tree representation. Only the actual OL (ORD, circled in blue) produces a circular wavefront. Even for comparable North American airports [Atlanta (ATL), Toronto (YYZ), and Mexico City (MEX)], the wavefronts are not nearly as concentric. Effective distances thus permit the extraction of the correct OL, based on information on the mobility network and a single snapshot of the dynamics.

Fig. 4 Quantitative outbreak reconstruction based on effective distance.

(A) Correlation-based outbreak reconstruction using arrival-time data for the 2009 H1N1 pandemic. For each country n as a potential source, we computed the correlation coefficient c(Ta, Deff) of arrival time Ta(m|n) and effective distance Deff(m|n). According to our analysis, high correlation implies a concentric pattern from the chosen node. The panel depicts c(Ta, Deff) as a function of rank. Mexico (followed by the United States) exhibits the largest correlation, consistent with the known outbreak scenario. (B) Correlation rank diagram for the 2003 SARS epidemic analogous to (A). Here, China is correctly identified as the source. (C) Sequence of panels depicting an alternative outbreak reconstruction technique based on temporal snapshots in the scenario of a simulated pandemic with outbreak in Chicago (ORD) with parameters identical to those used in Fig. 3. At each time, we computed the mean µF and variance σF of effective distance (see text) from each potential OL to the wavefront. A combined low mean and variance are equivalent to a high degree of concentricity. The red symbol corresponds to the actual OL, which is clearly separated from the point cloud. Symbol size quantifies total traffic per node. (D) Along the same lines as in (C), the panels depict data pairs (µF, σF) for all countries as outbreak candidates for the H1N1 pandemic during four different weeks of the pandemic. As described in the text, we used a course-grained, country-resolution global mobility network (GMNc). Although the actual outbreak location does not separate from the main point cloud as much as in the simulated scenario, the actual outbreak location is nevertheless identified as the point with minimum combined (µF, σF), except for the last time frame where the approach would identify the United States as the source. (E) Outbreak reconstruction for the 2011 EHEC-HUS outbreak, using a model and data for food distribution among 412 districts in Germany as the underlying network. The actual district Uelzen is correctly identified in all but the last time windows. For all times, however, some other locations possess comparatively low values of (µF, σF) as well. Although this makes definite source identification difficult, it substantially reduces the number of potential outbreak locations.

Although our method works well for simulation-generated data (i.e., disease dynamics generated by Eq. 3), real data pose additional challenges: (i) Data are subject to inaccuracies and incompleteness in prevalence counts; (ii) fluctuations, not captured explicitly by our model, may play a particular role during the onset of an epidemic; and (iii) response and mitigation measures that can change the time course of disease dynamics are not accounted for by our model. Therefore, to assess the applicability of our approach in a realistic context, we validated the effective distance method using data on the 2009 H1N1 pandemic and the 2011 outbreak of food-borne enterohemorrhagic Escherichia coli (EHEC) O104:H4/HUS in Germany with ~4000 cases and 53 deaths. Although the application to the H1N1 pandemic is a proof-of-concept application, as finding the spatial origin on a country resolution was actually not the problem that we investigated here, reconstructing the spatial outbreak origin during the EHEC-HUS epidemic (district Uelzen in Northern Germany) was notoriously difficult because of the spatial incoherence of reported cases. For the application to H1N1, we used data of the worldwide prevalence count by country in weeks 14 to 30 of 2009 (36). For EHEC-HUS, we constructed a network of food distribution in Germany using a gravity model for transportation networks (37). For the spatial prevalence, we used data on case counts per district in Germany (38).

Figure 4D illustrates the results for H1N1. In analogy to Fig. 4C, we use four distinct time windows (weeks 24, 26, 27, and 29). The method successfully identifies Mexico as the source of this event, even though the time windows cover a 2-month period when the pandemic’s peak prevalence had already reached a broad geographical distribution (fig. S16). Only as late as week 29, another country (the United States) is incorrectly identified as the likely outbreak location.

Figure 4E depicts the analogous results for the 2011 EHEC-HUS epidemic, where disease spreading was promoted not by air transportation, but by food transport. The nodes in the network are 412 administrative districts in Germany, coupled by the food supply network of the country. As time windows, we chose weeks 3 to 6 after onset. For this epidemic, a local farm in Bienenbüttel, district Uelzen, was later identified as the source of contaminated sprouts (38). On the basis of prevalence distribution in the entire country, the effective distance method correctly identifies district Uelzen as the most likely geographic source. However, in this case, the separation in the mean/standard deviation diagram is not as pronounced as for disease spread by air passenger flows. Nevertheless, although the method cannot identify the OL with full reliability here, it dramatically reduces the set of potential origin locations.

In both real-world scenarios—the 2011 EHEC/HUS epidemic and the 2009 H1N1 pandemic—the OL reconstruction works surprisingly well, despite the intrinsic fluctuations and the low-incidence regime. The unexpected degree of predictability indicates that the set of links in the network contributing the shortest paths accumulates a substantial fraction of the overall transmission probability (and that this set is almost identical from the perspective of all nodes; fig. S20).


In summary, the analysis of global disease dynamics in the framework of effective distances enables researchers to understand complex contagion dynamics in multiscale networks with simple reaction-diffusion models. Given fixed values for epidemic parameters, our analysis shows that network and flux information are sufficient to predict the dynamics and arrival times. The method is a promising starting point for more detailed investigations, including the functional dependencies of key epidemic variables such as the spreading speed and related macroscopic quantities on epidemiological parameters. The successful application to real epidemic data suggests that our method is also of practical use. Finally, it seems promising to generalize the effective distance method to other contagion phenomena, such as human-mediated bioinvasion and the spread of rumors or violence, a subject of ever-more importance in an increasingly connected society.

Supplementary Materials

Supplementary Text

Figs. S1 to S20

Tables S1 to S4

Movies S1 to S3

References (4045)

References and Notes

  1. Acknowledgments: D.B. thanks W. Kath, D. Grady, W. H. Grond, O. Woolley-Meza, and A. Bentley for fruitful discussions and comments and W. Moers, B. May, and FuturICT for inspiration. We thank R. Brune for contributions during the early phase of the project and for work on the origin reconstruction and C. Thiemann for the development of the SPaTo network visualization tool ( Global mobility data was provided by OAG (, prevalence data of H1N1 and SARS by the WHO (, and EHEC data by the RKI-Survstat ( This work was supported by the Volkswagen Foundation (project: “Bioinvasion and epidemic spread in complex transportation networks”) and partially supported by the ETH project “Systemic Risks, Systemic Solutions” (CHIRP II project ETH 48 12-1).

Stay Connected to Science

Navigate This Article