## Abstract

Quantifying long-range dissemination of infectious diseases is a key issue in their dynamics and control. Here, we use influenza-related mortality data to analyze the between-state progression of interpandemic influenza in the United States over the past 30 years. Outbreaks show hierarchical spatial spread evidenced by higher pairwise synchrony between more populous states. Seasons with higher influenza mortality are associated with higher disease transmission and more rapid spread than are mild ones. The regional spread of infection correlates more closely with rates of movement of people to and from their workplaces (workflows) than with geographical distance. Workflows are described in turn by a gravity model, with a rapid decay of commuting up to around 100 km and a long tail of rare longer range flow. A simple epidemiological model, based on the gravity formulation, captures the observed increase of influenza spatial synchrony with transmissibility; high transmission allows influenza to spread rapidly beyond local spatial constraints.

Influenza epidemics occur every year during the winter season in temperate areas of the world and result in considerable morbidity, mortality, and economic burden (*1*). The virological basis for these recurrent epidemics is a continual process of small changes in influenza surface antigens to escape host immunity (antigenic drift) (*2*). A/H3N2 viruses have the highest rate of evolution among the three influenza subtypes currently circulating (*3*), with antigenically distinct strains emerging on average every 2 to 5 years (*2*); influenza-related mortality and severe morbidity are highest in seasons where A/H3N2 viruses predominate, as compared with A/H1N1 or B (*4*). Infrequently, viruses with novel surface antigens (so that all hosts are effectively susceptible) appear through antigenic shifts that lead to pandemics (*1*).

The spatial spread of influenza has been much studied, particularly with respect to pandemic invasion waves (*5*–*11*). Simulation models incorporating air and surface transportation have generated important insights into the spread of influenza (*6*, *7*, *10*, *11*); however, the key underlying relationship between human movement and disease spread has not been verified across wide spatial scales nor contrasted among multiple interpandemic seasons of varying severity and viral subtype dominance.

We address this gap by analyzing the spatial dynamics of interpandemic influenza epidemics between 1972 and 2002, using vital statistics for the lower 49 contiguous “states” in the United States (48 continental U.S. states and the District of Columbia) (Fig. 1) (*12*). To study these influenza epidemics, we use weekly state-specific “excess” mortality rates from pneumonia and influenza (P&I) (Fig. 1 and fig. S1) (*4*, *13*). [See (*12*) for a discussion of calibration against influenza morbidity data.] The weekly time series of excess mortalities appear to be useful indicators of (i) timing of epidemics, (ii) spatial spread and, with caution, (iii) incidence within each state (fig. S2).

The relative timing and correlation of epidemic trajectories can provide critical insights into the tempo and mode of spatial spread (*9*, *14*, *15*). On average, interpandemic influenza took 5.2 weeks to spread across the lower United States during 1972 to 2002 (range 2.7 to 8.4 weeks) (table S2). We use wavelet phase analysis to study the spatial synchrony in timing of epidemics through the spatial phase coherence function (*12*, *14*). This analysis reveals pronounced synchrony in the timing of epidemics across the continental United States, superimposed on a spatial correlation signature (Fig. 2A). This strong epidemiological “regionalization” is also reflected by marked correlation in the amplitude of excess mortality at the weekly and seasonal time scale (Fig. 2B and fig. S3A). The synchronous timing of epidemics is in part due to the strong winter seasonality of influenza; however, across seasons, the statewise U.S. epidemics are more synchronized than expected based on seasonality alone (table S1) (*12*).

The decay in epidemic synchrony with distance is broadly consistent with diffusive disease spread (*14*, *16*). However, pairwise synchrony and phase coherence are also significantly associated with state population sizes. To visualize this, we ranked the states by size and computed correlations in amplitude and phase within and among size quartiles (ignoring distance) (Fig. 2, C and D). The hierarchy of spread is immediately apparent: The most populous states exhibit synchronized epidemics, whereas less populated states exhibit more erratic patterns, both relative to each other and to the continental norm.

In addition to the spatial and hierarchical within-season patterns, interpandemic spread also varies with disease prevalence and predominant virus subtype. The severe epidemics, dominated by A/H3N2, are more synchronous than milder, generally A/H1N1 and B, seasons (Fig. 2E) (*17*). The median difference in the date of the local and national peak is 3.8 weeks in A/H3N2 seasons versus 6.3 weeks in A/H1N1 and B seasons (Wilcoxon test, *P* < 0.01). Interestingly, the relationship between severity and synchrony holds within as well as between subtypes (Fig. 2E). Additional analyses indicate that the slower spread in A/H1N1 and B seasons cannot be explained by the inevitably more error-prone “sampling” resulting from the lower casefatality of A/H1N1 or B infection (*12*). Furthermore, the apparent relationship between epidemic synchrony and mortality impact is apparent in the more limited flu morbidity data available for the United States: Epidemic synchrony increases with influenza virus prevalence (*P* < 0.01) (fig. S3B) and is generally higher in seasons where the prevalence of A/H3N2 is high. These patterns echo the general theoretical principle that enhanced disease transmission facilitates spatial spread (*14*, *18*).

To understand the relationship between disease spread and severity, distance, and population size, we need to compare the key epidemic metrics (timing and incidence) to relevant measures of human movement. From the U.S. Census Bureau and Department of Transportation, we compiled a variety of potential a priori predictors of influenza spread. In particular, we compared the matrix of relative timing and amplitude of epidemics among the states with Euclidian distance between the population centers of each state, as well as various measures of domestic transportation such as workflows (rates of movement of people to and from their workplaces), long-distance travel, and rates of air transportation (*6*, *10*–*12*, *19*). Mantel tests (*20*) based on rank correlation reveal a significant association between disease dynamics and all the metrics (Table 1). Partial Mantel tests (*20*), however, show that workflow is the key predictor of influenza spread among those tested (Table 1). In absolute (nonrank) terms, the relation to U.S. workflows is nonlinear and conspicuously sigmoidal (Fig. 3A).

Distance or movement matrix | Correlation with epidemic synchrony based on | |
---|---|---|

P&I weekly phases | P&I weekly death rates | |

Mantel correlations
| ||

Workflows | 0.61View inline | 0.67View inline |

Long-distance trips | 0.55View inline | 0.60View inline |

Air travel | 0.37View inline | 0.42View inline |

Euclidian distanceView inline | −0.31View inline | −0.26View inline |

Partial Mantel correlations
| ||

Workflows, adjusting for: | ||

Long-distance trips | 0.31View inline | 0.39View inline |

Air travel | 0.52View inline | 0.54View inline |

Euclidian distance | 0.55View inline | 0.65View inline |

Long-distance trips, adjusting for: | ||

Workflows | 0.004 (P = 0.47) | −0.03 (P = 0.30) |

Air travel, adjusting for: | ||

Workflows | 0.024 (P = 0.39) | 0.17 (P = 0.07) |

Euclidian distance, View inline adjusting for: | ||

Workflows | 0.003 (P = 0.51) | 0.13 (P = 0.08) |

*P* values for Mantel test given by 10,000 permutations of the original matrices:

View inline* <0.01,

View inline** < 0.001,

View inline*** < 0.0001.

View inline† Euclidian distance between state population centers. The Euclidian distance matrix is a “dissimilarity” matrix, whereas all other matrices are “similarity” matrices, hence the negative sign for the correlation with the epidemic data. Only the absolute value of the correlation coefficient is tested.

Because excess-mortality proxies for influenza incidence are only available at the statewise scale for the whole United States, a comparison with movement metrics can only be made at this scale. However, we can develop a more spatially and demographically refined model for the best fit workflow data, using county-level information (Fig. 3B and fig. S4). As a starting point, we use a general gravity model from transportation theory; this allows flexible dependence of flow on distance and the sizes of “donor” and “recipient” communities [location of residence and workplace, respectively (*15*)]. A gravity model for workflow (or disease spread) (*C*_{ij}) between community *i* (of size *P*_{i}) and *j* (of size *P*_{j}) takes the form (*21*) where θ is a proportionality constant and the exponents τ_{1}, τ_{2} and ρ, respectively, tune the dependence of dispersal on donor and recipient sizes, and the distance between the two communities, *d*_{ij}.

We found that a thresholded version of this gravity formulation captures key features of workflows in the United States. There is a rapid decline (distance exponent > 3) in movement to work up to 119 km; beyond this threshold, a small flux of movements is nearly independent of distance (distance exponent small but > 0) [Table 2 and Fig. 3B; see also (*19*) for Thailand]. The gravity formulation also allows us to explore the scaling of workflows with donor and recipient population sizes (fig. S4). The two population exponents are less than unity, indicating that smaller populations are more important per capita, both in donating and receiving workers (Table 2). Below 119 km, the population exponents are relatively high, and larger for the “recipient” of workers.

Parameter | Point estimate (standard error) | |
---|---|---|

Distance < 119 kmView inline | Distances ≥ 119 kmView inline | |

τ_{1}: population of residence county (donor) | 0.30View inline (0.004) | 0.24View inline (0.001) |

τ_{2}: population of work county (recipient) | 0.64View inline (0.004) | 0.14View inline (0.001) |

ρ: distance (km) | 3.05 (0.012) | 0.29 (0.003) |

View inline* Cut-off at 119 km chosen as the distance that minimized the residual sum of square of a piecewise (log) linear gravity model.

View inline† τ_{1} ≠ τ_{2} in both models; *P* < 0.01.

As a crude validation of the gravity like spread of influenza among states, we superimpose a model of between-state workflow movements (Table 2) on a set of standard stochastic susceptible-infected-recovered (SIR) models at the state level (*12*). To calibrate transmission in the model for interpandemic years, we estimate the effective reproduction ratio of infection, *R*, from observed epidemics, based on the weekly increase in the cumulative number of P&I excess deaths (*12*, *22*). We find a mean *R* of 1.35 in A/H3N2 epidemics (95% confidence interval 1.10 to 1.60). Figure 4A uses this model to illustrate how increasing disease transmission [reflected in increased prevalence, for example, A/H3N2 epidemics versus A/H1N1 or B (Fig. 2E)] will raise the probability of a synchronized and widespread epidemic. Here, we manipulate transmission by varying *R*; this formulation is dynamically equivalent to using a fixed basic reproduction number, *R*_{0}, and raising average susceptibility [for example, through the faster evolution rate of A/H3N2 (*3*)]. Both approaches account for a partially immune population and give similar model results (*23*).

Figure 4A explores the predicted spatiotemporal spread of typical epidemics starting in a populous, highly connected state (California), compared with a smaller, more isolated one (Wyoming). Because of the topology of gravity-like spread, epidemics beginning in California are both more spatially synchronized and widespread; infection disseminates through strong long-range connections. For the mean observed value of *R* (1.35), the duration of spread across the United States matches that seen in real epidemics (predicted mean 4.7 weeks, range 1.0 to 9.0 weeks) (Fig. 4B and table S2). [See (*12*) for further evaluations of the gravity model predictions.] By contrast, if epidemics ever were to arise from a less populous and very isolated state like Wyoming, spread is predicted to be slower and more local (Fig. 4C) [mean time to epidemic onset = 6.9 weeks (range 1.0 to 14.0 weeks)]; the national outbreak does not gain momentum until the epidemic hits a better connected state.

Turning to the onset of the national epidemic, there is a tendency for the influenza season to start in California more often than in any other state (with an average lead of 1 week for California, *P* < 0.01). This is consistent with California's being the most populous state; however, additional analyses indicate that population factors alone cannot explain the early epidemic onset in California (*12*). Teasing out the causes of this interesting geographical trend in the initial epidemic focus—in terms of population size, international connectivity, and preferred destinations (in particular Asia and Australia)—may have important implications for understanding the intercontinental spread of influenza.

We can also use the model to explore how the higher reproduction ratios of infection seen during pandemics might affect the speed of spread across the United States. Figure 4D illustrates this for a reproduction ratio of 1.89, corresponding to the reported estimate for the 1968 A/H3N2 pandemic (*6*). Spread of pandemic influenza is predicted to be more rapid than in interpandemic seasons (2.2 weeks rather than 4.7, starting from California) (Fig. 4D and table S2). Interestingly, the projected spread is also faster than the reports from the 1968 pandemic [6 to 7 weeks (*24*)]. The discrepancy may be due to an increase in the flow of movements since 1968, although there is no obvious corresponding trend of increasing synchrony in observed epidemics between 1972 and 2002. Differences in the contact network of pandemic and interpandemic influenza might also play a role, because the relative importance of adults versus school-age children may differ (*12*). Interestingly, the observed interpandemic data (on which our model was calibrated) indicate that the spread of influenza was substantially slower in the 1968 pandemic than in several subsequent epidemics. Comparing the spread of influenza in pandemic and interpandemic seasons is an important topic for future studies.

An essential step in generating realistic epidemic models is calibrating them against historical spread of outbreaks and underlying patterns of host movement. Here, we investigate statistical and dynamic models of large-scale spread of influenza at the state level for the United States; these models explicitly incorporate data on human movements and are calibrated against long-term disease statistics. We find that workflows capture the spread of influenza better than simple Euclidean distance or other movement metrics. Workflow data have been used previously to simulate disease spread (*19*), but here, long-term historical disease data have allowed us to calibrate flows of infection against flows of people. A gravity model (*21*) demonstrates a strong decay in county-level workflows (and therefore potentially the flux of infection) with distance up to ∼100 km. The population exponents for the gravity model are less than unity, suggesting that cases (and susceptibles) in small populations may be proportionately more important for disease spread.

A recurring problem for influenza epidemiology is a lack of detailed spatiotemporal incidence or prevalence data, exacerbated by the poor specificity of clinical diagnosis and relative scarcity of laboratory testing (*1*). Here, good agreement between influenza-related mortality and indicators of the prevalence of infection allows us to use mortality to study disease spread (fig. S2) (*12*). There are still caveats; for example, A/H1N1 and B viruses cause less mortality and severe morbidity than A/H3N2 viruses (*4*), so our estimates may be less precise for the milder A/H1N1-B seasons. However, other studies confirm that A/H1N1 and B virus prevalence do contribute to P&I mortality (*25*), corroborating the observed quantitative relationship between disease dispersal and mortality, for all three subtypes (Fig. 2E). In addition, detailed analyses confirm that subtype differences are not simply due to sampling biases and indicate a true difference in spread between A/H1N1-B and A/H3N2 epidemics (*12*).

We illustrate the epidemiological implications of the gravity framework with the simplest dynamical description, a statewise SIR model. Despite its simplicity and spatial coarseness, this model reproduces the observed timing and spread of seasonal influenza, capturing observed changes in synchrony with transmission, population size, and distance. As more spatially disaggregated mortality and morbidity data become available, more detailed models may help elucidate within-state spread and consequences of demographic heterogeneities. Interestingly, we did not find any association between local transmission (measured by the estimated effective reproduction ratio at the state level) and population density or population size (*12*). This lack of association echoes earlier work on the dynamics of the 1918 pandemic across U.S. cities (*26*)—as well as patterns in other disease (*27*)—that for some infections, reproduction ratios can be surprisingly invariant across a wide range of population sizes.

Our study indicates a likely increase of influenza spread and synchrony with increasing transmission. A key area for future research is how the combination of quicker evolution of the virus and intrinsic differences in viral transmissibility drives the faster geographical spread of A/H3N2 epidemics. The data-based models described here should also help us understand in a more mechanistic way the well-known spread of viral novelty (antigenic drift) that drives the ongoing pattern of partial immunity against interpandemic influenza (*2*). Understanding local spatial spread is also an important step in refining existing phylodynamic models for influenza evolution (*3*, *28*).

More generally, our results have a number of implications for the comparative dynamics of acute infections. The strong dependence of interpandemic influenza spread on workflows, coupled with a steep decline in workflows with distance [exponent >3; see also (*19*)], implies a key role for adults in the regional dissemination of infection. This is not necessarily contrary to the current consensus that children drive the local spread of influenza [within school, households, and cities in general (*29*, *30*)]. At the same time, the long-distance dissemination of influenza, between cities or states, is captured by movements linked to adults. By contrast, the spread of prevaccination measles was more local, with a distance exponent of unity (*21*); this child-focused infection was presumably entirely driven by contacts between schools. Investigating the specific role of children and adults in the spread of influenza would require not only age-specific disease data but also data on movements of children in the United States. This is a crucial area for future data collection, as pandemic influenza would probably be spread through both child and adult social networks. In essence, disease dissemination is a complex function of host movement, filtered by the age structure of susceptibility.

**Supporting Online Material**

www.sciencemag.org/cgi/content/full/1125237/DC1

Methods

SOM Text

Figs. S1 to S4

Tables S1 to S3

References