Modeling infectious disease dynamics in the complex landscape of global health

See allHide authors and affiliations

Science  13 Mar 2015:
Vol. 347, Issue 6227, aaa4339
DOI: 10.1126/science.aaa4339

Mathematical modeling of infectious diseases

The spread of infectious diseases can be unpredictable. With the emergence of antibiotic resistance and worrying new viruses, and with ambitious plans for global eradication of polio and the elimination of malaria, the stakes have never been higher. Anticipation and measurement of the multiple factors involved in infectious disease can be greatly assisted by mathematical methods. In particular, modeling techniques can help to compensate for imperfect knowledge, gathered from large populations and under difficult prevailing circumstances. Heesterbeek et al. review the development of mathematical models used in epidemiology and how these can be harnessed to develop successful control strategies and inform public health policy.

Science, this issue 10.1126/science.aaa4339

Structured Abstract


Despite many notable successes in prevention and control, infectious diseases remain an enormous threat to human and animal health. The ecological and evolutionary dynamics of pathogens play out on a wide range of interconnected temporal, organizational, and spatial scales that span hours to months, cells to ecosystems, and local to global spread. Some pathogens are directly transmitted between individuals of a single species, whereas others circulate among multiple hosts, need arthropod vectors, or persist in environmental reservoirs. Many factors, including increasing antimicrobial resistance, human connectivity, population growth, urbanization, environmental and land-use change, as well as changing human behavior, present global challenges for prevention and control. Faced with this complexity, mathematical models offer valuable tools for understanding epidemiological patterns and for developing and evaluating evidence for decision-making in global health.


During the past 50 years, the study of infectious disease dynamics has matured into a rich interdisciplinary field at the intersection of mathematics, epidemiology, ecology, evolutionary biology, immunology, sociology, and public health. The practical challenges range from establishing appropriate data collection to managing increasingly large volumes of information. The theoretical challenges require fundamental study of many-layered, nonlinear systems in which infections evolve and spread and where key events can be governed by unpredictable pathogen biology or human behavior. In this Review, we start with an examination of real-time outbreak response using the West African Ebola epidemic as an example. Here, the challenges range from underreporting of cases and deaths, and missing information on the impact of control measures to understanding human responses. The possibility of future zoonoses tests our ability to detect anomalous outbreaks and to estimate human-to-human transmissibility against a backdrop of ongoing zoonotic spillover while also assessing the risk of more dangerous strains evolving. Increased understanding of the dynamics of infections in food webs and ecosystems where host and nonhost species interact is key. Simultaneous multispecies infections are increasingly recognized as a notable public health burden, yet our understanding of how different species of pathogens interact within hosts is rudimentary. Pathogen genomics has become an essential tool for drawing inferences about evolution and transmission and, here but also in general, heterogeneity is the major challenge. Methods that depart from simplistic assumptions about random mixing are yielding new insights into the dynamics of transmission and control. There is rapid growth in estimation of model parameters from mismatched or incomplete data, and in contrasting model output with real-world observations. New data streams on social connectivity and behavior are being used, and combining data collected from very different sources and scales presents important challenges.

All these mathematical endeavors have the potential to feed into public health policy and, indeed, an increasingly wide range of models is being used to support infectious disease control, elimination, and eradication efforts.


Mathematical modeling has the potential to probe the apparently intractable complexity of infectious disease dynamics. Coupled to continuous dialogue between decision-makers and the multidisciplinary infectious disease community, and by drawing on new data streams, mathematical models can lay bare mechanisms of transmission and indicate new approaches to prevention and control that help to shape national and international public health policy.

Modeling for public health.

Policy questions define the model’s purpose. Initial model design is based on current scientific understanding and the available relevant data. Model validation and fit to disease data may require further adaptation; sensitivity and uncertainty analysis can point to requirements for collection of additional specific data. Cycles of model testing and analysis thus lead to policy advice and improved scientific understanding.


Despite some notable successes in the control of infectious diseases, transmissible pathogens still pose an enormous threat to human and animal health. The ecological and evolutionary dynamics of infections play out on a wide range of interconnected temporal, organizational, and spatial scales, which span hours to months, cells to ecosystems, and local to global spread. Moreover, some pathogens are directly transmitted between individuals of a single species, whereas others circulate among multiple hosts, need arthropod vectors, or can survive in environmental reservoirs. Many factors, including increasing antimicrobial resistance, increased human connectivity and changeable human behavior, elevate prevention and control from matters of national policy to international challenge. In the face of this complexity, mathematical models offer valuable tools for synthesizing information to understand epidemiological patterns, and for developing quantitative evidence for decision-making in global health.

Thirty-five years ago, it was believed that the health burden of infectious diseases was close to becoming insignificant as hygiene, improved nutrition, drugs, and vaccines brought about a steady decline in overall mortality (1). In recent decades, however, it has become clear that the threat from serious infectious diseases will persist, and human mortality attributed to infection is projected to remain at current levels of 13 to 15 million deaths annually until at least 2030 (2). Successes in eradicating smallpox and rinderpest have been isolated events in a landscape of endemic and epidemic infections (3). Newly emerging infectious agents represent a continuing challenge—for example, HIV in the 20th century; more recently, severe acute respiratory syndrome (SARS) and Middle Eastern respiratory syndrome (MERS)coronaviruses; West Nile Virus; Nipah virus; drug-resistant pathogens; novel influenza A strains; and a major Ebola virus outbreak in 2014–2015. Most new infections enter the human population from wildlife or livestock, and the possibilities for emergence and spread in the coming decades are likely to increase as a result of population growth, increased urbanization and land changes, greater travel, and increased livestock production to meet demands from the world’s expanding population (48). In our modern world of instant communication, the changing behavior of individuals in response to publicity about epidemics can have profound effects on the course of an outbreak (9, 10). Phylogenetic data shed light on an additional layer of complexity (11), as will increased understanding of the human genome in relation to susceptibility, infectiousness, and its duration. At the same time, the development of effective new vaccines remains a difficult challenge, especially for antigenically very variable pathogens (e.g., HIV or falciparum malaria) and for pathogens that stimulate immunity that is only partly protective (e.g., Mycobacterium tuberculosis) or temporary (e.g., Vibrio cholerae).

In the face of this complexity, computational tools (Box 1) are essential for synthesizing information to understand epidemiological patterns and for developing and weighing the evidence base for decision-making. Here, we review the contribution of these tools to our understanding of infectious disease dynamics for public health by using representative examples and by ranging into current developments. We argue that to improve decision-making for human health and for sustaining the health of our food systems, experts on infectious disease dynamics and experts on prevention and control need to collaborate on a global scale. To succeed, quantitative analysis needs to lie at the heart of public health policy formulation.

Box 1

Quantitative tools in infectious disease dynamics.

Here, we use the words “computational tools” loosely. In infectious disease dynamics, there is a broad range of relevant quantitative tools, and we refer to the entire collection. It comprises statistical methods for inference directly from data, including methods to analyze sequencing and other genetic data. This leads to estimates of important epidemiological information such as length of latency, incubation and infectious periods and their statistical distributions, inferred transmission chains and trees early in outbreaks, the risks related to various transmission routes, or estimates of rates of evolution. Mathematical models in the strict sense refer to mathematical descriptions of processes thought to be associated with the dynamics of infection—for example, in a population or within an individual. Such models take many forms, depending on the level of biological knowledge of processes involved and data available, and depending on the purpose.

Several classes of model are used, spanning the spectrum of information available. At one end of the range are detailed individual-based simulation models, where large numbers of distinct individual entities (with their own characteristic traits such as age, spatial location, sex, immune status, risk profile, or behavior pattern) are described in interaction with each other, possibly in a contact network, and with the infectious agent. At the other end are compartmental models where no individuals are recognized, but only states for individuals (for example: susceptible, infectious, immune) aggregated into compartments where everyone has the same average characteristics and where interaction is typically uniform (everybody interacts with everybody else). Such models do not describe the disease history of single individuals, but rather the time evolution of aggregated variables, such as the number of individuals that are currently susceptible.

Mathematical models can have both mechanistic parts in their description, based on assumptions about biological mechanisms involved, and more phenomenological parts, where there is a statistical or presumed relation between variables, without clear assumptions from which this relation can be derived. An example of the former is the assumption of mass action to describe interaction between individual hosts; an example of the latter is an empirical relation between the length of an infectious period in a mosquito and environmental temperature.

For infectious disease dynamics, our world is clearly stochastic, in that chance events play a role in many of the processes involved. Certainly at lower levels of biological aggregation, chance dominates—for example, in infection of individual cells or in contacts individual hosts make. At higher aggregation levels, many cells or individuals interact, and chance effects may average out to allow deterministic descriptions. There are purely stochastic models, purely deterministic models, and models that are mixed. It is important to point out that, even though the world is stochastic, stochastic descriptions are not by definition better than deterministic descriptions. Both are still models of reality, and the fact that chance plays a role may have a far less significant influence on model outcome and prediction than choices made in the relations between ingredients and variables.

Areas of rapid growth are statistical and numerical methods and tools to estimate model parameters from, often scarce, mismatched or incomplete data, and to contrast model output with real-world observations.

Models and public health policy formulation

The value of mathematical models to investigate public health policy questions was recognized at least 250 years ago when, in 1766, Daniel Bernoulli published a mathematical analysis of the benefits of smallpox inoculation (then called variolation) (12). In the past 50 years, the study of infectious disease dynamics has grown into a rich interdisciplinary field. For example, decision-making for vaccination strategies increasingly depends on model analyses in which infection dynamics are combined with cost data (Box 2, Influenza: prevention and control). In recent decades, responses to major infectious disease outbreaks, including HIV, bovine spongiform encephalopathy (BSE), foot-and-mouth disease (FMD), SARS, and pandemic and avian influenza, have shown both the need for and capabilities of models (Box 3, HIV: Test and treat strategy). Model-based analysis of such outbreaks also continually brings improvements in methodology and data, emerging from the comparison of model prediction with observed patterns.

Box 2

Influenza: prevention and control.

Human influenza—pandemic and seasonal—remains a major issue in public health owing to the continued emergence of novel genetic strains, and one where models have successfully addressed questions from basic biology to epidemiology and health policy. In recent years, modeling and other quantitative analysis has been used to study at least three major issues: pandemic preparedness and mitigation strategies (8489), rethinking vaccination strategies for seasonal influenza (70), and improved methods in phylodynamics and influenza strain evolution (11). Recent models of influenza fitness have also been developed to predict viral evolution from one year to the next, providing a principled and more precise method for the vaccine selection required every year (90).

For seasonal influenza, models have played a key role in providing the scientific evidence base for vaccination policy, making use of the information in multiple, often unavoidably biased, data sources such as syndromic time series, vaccine coverage and efficacy, economic costs, and contact patterns in the population. For example, a combined epidemic and economic model was fitted to fine-grained data from many sources to describe the dynamics of influenza in the United Kingdom, and the influence of previous vaccination programs (70). With confidence in the model’s predictions based on its ability to capture past patterns, it was used to look at alternative vaccination strategies and led to a new national policy to vaccinate school-age children (91). Targeting those individuals most likely to spread the virus, rather than only those most likely to suffer the largest morbidity, is a marked departure from established practice in the UK and is currently under consideration elsewhere (92).

Box 3

HIV: test and treat.

Mathematical modeling has played a central role in our understanding of the HIV epidemic, and in informing policy from the outset of our recognition of the pandemic (93). Some of the many insights include a model-based analysis of viral load data from inhibition experiments, which revealed the rapid and ongoing turnover of the within-host viral population (94), and the use of phylogenetic models to show that the HIV pandemic did not emerge in the 1980s, but had its roots in the early 20th century (95).

A key contribution of mathematical modeling has been to identify when viral transmission occurs over the course of infection, which determines the potential to halt spread by various measures. Models have shown that transmission of HIV depends on the epidemic phase and the sexual behavior of the population, and a large proportion of transmissions may occur late in infection (96). Model-based inference in the Netherlands also suggested that the effective reproduction number (Box 4) had fallen below 1 due to a combination of low-risk behavior and a very effective diagnosis and treatment program (97). The debate was transformed in the mid-2000s, when eradication of HIV through a “test and treat” strategy was hypothesized (98, 99). Subsequent trial results showing that antiretroviral treatment (ART) of HIV-positive individuals could practically eliminate transmission within sexual partnerships when the index case is treated (100) have further supported the role of treatment as prevention. Although these findings have not dispelled concerns about transmission early in infection (93), or about extra-couple transmission (101), it is suggested that high population coverage of ART may have reduced the incidence of HIV infection in rural KwaZulu-Natal, South Africa (102).

These findings, combined with the prospect of cheaper, more effective drugs and delivery structures, underpin UNAIDS’ goal of “zero new infections” for HIV and the initiation of a multimillion dollar cluster-randomized trial (103), which will have its outcome assessed against model predictions. In the meantime, the scientific discussion of the effectiveness of ART in preventing transmission continues, sparked by studies that fail to show a decline in incidence after increased treatment (104). Such debates are essential to elucidate areas for improvement of the models used and data needs for such improvement, and to highlight methodological limitations (105).

For infectious agents important to public health, a series of principles has emerged for modeling infection dynamics (Table 1 and Box 4). The basic reproduction number R0, for example, is a central concept characterizing the average number of secondary cases generated by one primary case in a susceptible population. This concept highlights what must be measured to interpret observed disease patterns and to quantify the impact of selected control strategies (Fig. 1).

Table 1 Principles for modeling infection dynamics.

As different infections have become the focus of public health attention, the modeling community has responded by developing improved concepts and methods. The table concentrates on the period since 1950. The first column lists the classes of infection, and the second column lists factors whose importance to infection dynamics became particularly clear in relation to those infections; the third and fourth columns highlight concepts and methods that were developed in response. For each row, only a few typical references are given. Many factors, concepts, and methods are relevant, in current use, and in continual development for much larger classes of infectious agents.

View this table:
Box 4

Some fundamental terms and concepts.

susceptible: individual who is at risk of becoming infected if exposed to an infectious agent.

basic reproduction number, R0: average number of infections caused by a typical infected individual in a population consisting only of susceptibles; if R0 > 1, the infectious agent can start to spread.

effective reproduction number, Re: average number of infections caused by a typical infected individual when only part of the population is susceptible; as long as Re > 1, the agent can continue to spread.

herd immunity: state of the population where the fraction protected is just sufficient to prevent outbreaks (Re < 1).

critical elimination threshold, pc: proportion of the susceptible population that needs to be successfully protected—for example, by vaccination—to achieve herd immunity; pc = 1: 1/R0 is a rule of thumb from models when hosts are assumed to mix randomly.

force of infection: per capita rate at which susceptible individuals acquire infection.

final size: fraction of the initial susceptible population that eventually becomes infected during an outbreak.

prevalence: proportion of the population with infection or disease at a given time point.

superspreader/supershedder: infected individual that produces substantially more new cases than the average because of greater infectiousness, longer duration of infectiousness, many more transmission opportunities and contacts, or combinations of these. Even when the average R0 is relatively small, these individuals have large effects on outbreaks.

metapopulation: collection of populations, separated in space, but connected through movement of individuals.

critical community size: minimum number of individuals in a population that allow an infectious agent to persist without importation of cases.

case fatality ratio: proportion of symptomatic infections that result in death.

SIR model: most basic model metaphor for immunizing infections where each living individual is assumed to be in one of three epidemiological states at any given time: susceptible, infected and infectious, and recovered and immune. The model specifies the rates at which individuals change their state. Individuals progress from S to I when infected, then from I to R upon recovery. Many variants exist—for example, recognizing different classes of S, I, and R individuals, depending on individual traits such as age.

Fig. 1 Modeling for public health exemplified by rubella.

(A to F) Policy questions are formulated; available data are brought to bear on the question. In this example, the incidence of rubella is shown following the introduction of vaccination in individuals under 15 or 15+ years of age in Costa Rica (127). Application of a nonlinear age-structured SIR model (see Box 4) to these circumstances led to the collection of key missing data. In the bottom right-hand plot, each square depicts a combination of birth rate and infant vaccine coverage reflecting different countries (e.g., Somalia depicted by a diamond and Nepal by a circle), colored by expected effect on congenital rubella syndrome (CRS) in newborns, related to local R0 (128). This translates into confidence that routine vaccination is likely to reduce the public health burden caused by CRS in Nepal (green), but not in Somalia (red).

Two fundamental properties of the world that shape infectious disease dynamics make computational tools key for understanding reality. The world is essentially a stochastic and highly nonlinear system. The nonlinearity derives not only from the complex interaction between factors involved in transmission, but also from the influence that the infection process has on the distribution of important characteristics at various temporal and spatial scales. This effect is seen in the age-related nature of infection and mortality in HIV changing the age distribution of the population, and in previous exposure to strains of influenza altering the distribution of influenza susceptibility. Such feedback mechanisms contribute to the nonlinearity of infection processes. Nonlinearity also leads to counterintuitive phenomena (Fig. 2) and prevents simple extrapolation of experience from one situation to another, such as when deciding whether to implement a vaccination policy in different countries (Fig. 1). Mathematical tools, relating to data and processes on a large range of interacting scales, have become essential to explore, anticipate, understand, and predict the effects of feedbacks within such complex systems, including changes caused by intervention.

Fig. 2 Examples of counterintuitive effects of nonlinear infection dynamics.

(A) Nonlinear interaction between prevalence of a helminth infection and infection pressure (as measured by the mean intensity of existing infections) means that control measures must have a disproportionately large impact on intensity before prevalence is reduced. This effect is predicted by a mathematical model (solid line) and corroborated by field data (crosses) (129). (B) Nonlinear relation between total number of cases of congenital rubella syndrome (CRS) and rubella vaccine coverage, showing that suboptimal levels of vaccine coverage cause worse health outcomes than no vaccination [adapted from (130)]. The line shows model predictions; similar effects have been documented for real rubella control situations (131). (C and D) Modeling results of rebound of gonorrhea transmission with different treatment strategies without (C) and with (D) antimicrobial resistance developing [adapted from (132)]. In the presence of resistance, focusing treatment on the high-risk core group leads to an increase in prevalence approaching that of untreated baseline prevalence, after an initially strong decline for more than a decade. (E) Box plot from field data of a nonlinear relation between R0 for dengue transmission and average dengue hemorrhagic fever (DHF) incidence across Thailand, showing that control measures that reduce R0 may paradoxically increase cases of DHF [adapted from (133)].

Current and future opportunities for models in public health

Over the past decade, key public health questions, ranging from emergence to elimination, have posed a range of challenges for modeling infectious disease dynamics, many of which rely on leveraging disparate data sources and integrating data from a range of scales from genomics to global circulation. Given commonalities in processes across pathogens, progress made in one area can lead to advances in another. Progress in the areas described above all build on and inform each other, making this a dynamic time for research in the discipline (13). A few themes are chosen to illustrate current trends in model development and public health application.

Real-time outbreak modeling: The Ebola 2014–2015 outbreak

The 2014–2015 outbreak of Ebola in West Africa serves to highlight both opportunities and challenges in modeling for public health. In the initial phase of this outbreak, real-time estimates of the reproduction number or simple exponential extrapolation (14) allowed short-term predictions of epidemic growth that were used, for example, to plan for necessary bed capacity. Quantitative phylogenetic tools applied to samples from initial victims provided important estimates of the origin of the outbreak (15). Early mechanistic models that explicitly took into account the roles played by different transmission routes or settings were informed by analysis of earlier outbreaks (16, 17). When the failure to contain the epidemic with methods successful in previous outbreaks led to a scale-up of capacity driven by international aid, such models were used to assess the impact of, for example, reducing transmission at funerals (17) and whether the construction of novel types of treatment centers could end up doing more harm than good. Ensuring that the most effective combinations of interventions were implemented required close and fast interaction between modelers and policy-makers (18). Looking forward, models are now used to help clinical trial design and inform a debate on the optimal deployment of initially scarce Ebola vaccines, once such vaccines become available.

With the opportunities of real-time modeling for public health come specific challenges. The imperative to produce reliable and meaningful analysis for those treating infected people has to be balanced against the pressures and delays of scientific publication. In an ongoing outbreak, data can be patchy and reporting delayed, and different data sources are not always synthesized. When the Ebola outbreak expanded explosively in the summer of 2014, data were often lacking on the effect on transmission dynamics of the various control measures that operated simultaneously in the hectic circumstances of the most severely hit areas. In any emerging epidemic, underreporting is a critical challenge for ongoing assessment of this epidemic and has had enormous impact on predictions of outbreak size, but also of outbreak impact—for example, in terms of the case-fatality ratio (the proportion of cases that lead to death). Early in any outbreak, this estimate of severity can suffer from imprecise information on both the numerator (if not all deaths due to the infection are identified as such; for example, because health services are overwhelmed caring for the sick) and the denominator (if cases are not reported or, conversely, noncases get reported as cases if they are not laboratory-confirmed). This caused problems early in the H1N1 influenza outbreak first reported in in Mexico in 2009, as well as in the current Ebola outbreak. Although level of underreporting can be estimated from retrospective serological studies, it is usually not identifiable in real-time data.

These limitations make it almost impossible to make reliable long-term predictions. Thus, modeling results are often based on scenarios in which a pathogen spreads unaltered by behavioral changes or the public health response. This rarely reflects reality, especially in such a devastating outbreak as Ebola, where the situation constantly changes owing to growing awareness in the community, as well as national and international intervention. Careful communication of findings is key, and data and methods of analysis (including code) must be made freely available to the wider research community. Only in this way can reproducibility of analyses and an open exchange of methods and results be ensured for maximal transparency and benefit to public health.

Emergence of novel human pathogens

There is an ever-present hazard that novel human pathogens emerge from livestock and wild mammal and bird reservoirs. Research on potential emerging zoonoses draws on concepts from across the spectrum of infectious disease dynamics, disease ecology, microbiology, and phylogenetic analysis. Particular challenges include estimating human-to-human transmissibility against a backdrop of ongoing zoonotic spillover, detecting anomalous outbreaks, and assessing the risk that more dangerous strains may arise through pathogen evolution.

The recently identified gap in methodology for zoonoses with weak human-to-human transmission (6) is being filled with new approaches for estimating R0 and other transmission-related quantities from subcritical outbreak data (1921). These studies address key public health concerns, but rely on strong assumptions regarding the quality and completeness of case observations. Better information on surveillance program efficacy could be gained through serological surveys (where blood and saliva samples reveal evidence of past and present infections) or sociological study, and modeling studies can help to design and characterize efficient surveillance programs (22). Given the predominance of zoonotic pathogens among emerging infections, models for transmission dynamics and evolution in multispecies ecosystems and food webs (consisting of host species and nonhost species interacting ecologically and epidemiologically) are a crucial area for future development (6, 23). The greatest challenge—and the greatest prize—in modeling emerging zoonoses is to assess which diseases pose the most risk to humans and how these might change over time and in different localities (24). Such tasks, which will join molecular studies to experimental infections to epidemiological and ecological surveys, will drive empirical and theoretical efforts for decades to come.

The rising availability of pathogen genome sequence data, coupled with new computational methods, presents opportunities to identify with precision “who infects whom” and the networks of infection between humans and reservoirs (25). Full realization of this potential, though, will require denser and more systematic whole-genome sampling of pathogens coupled with associated epidemiological data, as well as baseline information on genetic diversity and evolutionary rates, especially in animal hosts (26).

Pathogen evolution and phylodynamics

As pathogen genetic data become increasingly available, modelers are finding ways to synthesize these new data streams with more traditional epidemiological information in phylodynamic tools (27, 28). However, current frameworks employ compartmental epidemiological models, which do not make efficient use of individual-level epidemiological data. Although sampling theory is well developed for standard surveillance data, the relationship between a set of pathogen sequences and the phylogeny inferred from a population sample is more complex (11). Many-to-one mapping possibilities between, on the one hand, combinations of epidemiological, immunological, and evolutionary processes shaping sequences and, on the other hand, the inferred phylogeny, demand the integration of diverse data sources and an increased focus on systematic sampling.

Phylodynamic studies to date have largely focused on fast-evolving RNA viruses, driven by the large amount of data generated for clinical [e.g., hepatitis C virus (HCV) or HIV)] or surveillance (e.g., influenza) purposes (11). Replicating these efforts on an expanded array of pathogens, including DNA viruses, bacteria, fungi, protozoa (e.g., malaria), and helminths, is a promising avenue for future research (29). It is of particular importance in the context of the evolution and spread of drug-resistant variants and vaccine escape mutants. However, genome-wide pathogen data also present challenges, in particular in relation to accommodating recombination, reassortment, and mobile genetic elements. Analysis of bacterial genomes usually considers only those genes that are shared across taxa, but there are good reasons to believe that noncore genes play an important role in bacterial evolution, including the evolution of antibiotic resistance (30).

Although sequence data are extremely valuable, to link these data fully to disease dynamics, it will be important to determine how sequence changes affect functions related to pathogen fitness, such as replication rate, transmissibility, and immune recognition. Molecular epidemiological studies often treat pathogen genetic variation as simply reflecting the underlying transmission process, whereas in reality such variation may play an important role in determining transmission dynamics, as exemplified by escape from herd immunity by influenza A virus (31).

“Deep” sequencing of pathogens within individual hosts generates information on within-host diversity, resulting from evolution within the host (often in response to drug treatment), or multiple infections. To tackle within-host diversity, models that embed pathogen evolution within a transmission tree are needed. Such models, which cross the within- and between-host scales, are only just becoming analytically and computationally feasible despite being proposed several years ago (32). Similarly, although progress has been made in scaling inference from genes to genomes (33), scaling inference to large numbers of sequences is lagging far behind.

Multiple infections

Infectious disease epidemiology evolved by focusing on interactions between a single host species and a single infectious agent. It is becoming increasingly clear that multiple agents simultaneously infecting the same host populations and individuals appreciably add to the public health burden and complicate prevention and control. Coinfections in relation to HIV— for example, tuberculosis and HCV—or coinfection of different strains of influenza A virus raise important public health and evolutionary issues. Multiple agents infecting the same host individual have been shown to influence each other by increasing or decreasing susceptibility and/or infectivity of that individual, thereby influencing the population dynamics of these agents in ways that we have yet to explore and understand (34, 35).

Multiple infections of the same individual with closely related pathogens occur when infection elicits no immunity, or only a partial immune response. Macroparasites, including many of the important human helminth infections, are good examples of pathogens that evade human immune responses and cause repeated infection of the same host (36). Biological mechanisms giving rise to such multiple infections include sequential reinfections caused by antigenic drift in influenza, antigenic variation in respiratory syncytial virus (RSV), and waning (slow loss of) immunity in pertussis, while lack of cross-protection in many colonizing microparasites— for example, pneumococcus and human papilloma virus (HPV)—allows for multiple concurrent infections. Although the existence of reinfections is a clinical fact, population-level data are scarce as reinfections are often subclinical and individual-based longitudinal infection histories are often only anecdotal. Results from new analytical approaches relating to deep sequencing and neutralization tests covering multiple antigens are being utilized (37).

The immunodynamics of influenza have clear policy implications for the identification of high-risk groups in connection with pandemic planning (38), while the dynamics of waning immunity are key to the current concerns about immunization level for pertussis (39). Multivalent vaccines covering only a targeted subset from the circulating strains of pneumococcus and HPV pose important new applied problems (40). The spread of recombinant viruses implies the existence of multiple infections. One example is the Sydney 2012 strain of norovirus, but how this can occur in such an acute infection remains to be understood, as the time window for multiple exposures is limited, unless subclinical or environmental reservoirs of infection are important. Mathematical models could help to explore how, for example, such subpopulations may contribute to the dynamics of multiple infections.

Behavior of hosts

Human behavior is a fundamental determinant of infectious disease dynamics, whether by affecting how people come in contact with each other, vaccination coverage, reporting biases, or adherence to treatment. Traditional epidemic models have tended to ignore heterogeneity in contact behavior [although early HIV models addressed heterogeneity in sexual behavior by necessity (41)]. Increasing sophistication of contact network models (42), together with data on epidemiological contacts, creates opportunities for understanding and controlling transmission at a fundamental level (43) and opens up the possibility of independent study of relevant social factors (10). Recent years have seen exciting developments in the measurement of contact patterns and “who might infect whom” through advances in individual electronic identification technology. This is a promising avenue for linking pathogen genetic data and human behavior.

Contact patterns are not static and can shift during outbreaks as individuals change their behavior in response to perceived risk and public health interventions (44). Modeling has illuminated this process—for example, by the incorporation of peer influence on vaccination behavior into models of infectious disease dynamics (45, 46). Analysis of data from online social networks has also created promising opportunities to validate such approaches with empirical observations (47, 48).

Movement and travel are tightly linked to the spread of infection and have been explored through models to highlight commuting and agricultural migration driving local disease transmission (49) and global disease patterns through air travel (50). These processes are now being investigated to gain insights into the more complex case of vector-borne diseases, such as malaria and dengue, where both host and vector movement can interact to drive local (51) and large-scale dynamics (52).

Elimination and eradication

Modeling has long provided support for elimination efforts: Vector control (53), critical community size (54), herd immunity, and critical vaccination threshold (55, 56) were all powerful insights from models framed in relatively simple and homogeneous terms. Subtleties and complexities in many current eradication programs, as well as the availability of novel data sources, have called for a range of extensions in the theory. As we approach elimination targets, disease dynamics have changed in ways that were largely predicted by models, but also in unanticipated ways as a result of ignorance about key epidemiological processes (3).

Incentives for control efforts also change, both at the individual level [passive or active refusal to participate can develop (57)] and at the country level (58). This reinforces the call for development of models of human behavior and its interaction with infectious disease dynamics (9) potentially drawing on new data sources from social media (59, 60), as well as for models that can capture national and nongovernmental motivations, interactions, and competition, economical or otherwise. Long-term control puts pathogens under strong selection for resistance, calling for evolution-proof control methods (61) and novel vaccine technologies and their optimized delivery (62).

Finally, since the era of smallpox eradication, patterns of global disease circulation have changed radically. Human mobility and migration are increasing global connectivity, strengthening the need for cooperation and international synchronization of efforts (as illustrated by polio). Techniques for analysis of novel data sources are again key here; e.g., mobile phone records provide unique opportunities to understand disease source-sink dynamics (52).

Computational statistics, model fitting, and big data

By definition and design, models are not reality. The properties of stochasticity and nonlinearity strongly influence the accuracy of absolute predictions over long time horizons. Even if the mechanisms involved are broadly understood and relevant data are available, predicting the exact future course of an outbreak is impossible owing to changes in conditions in response to the outbreak itself, and because of the many chance effects in play. These stochastic effects dominate developments in situations with relatively few infected individuals that occur at emergence, approaching the threshold for sustained host-host spread, or approaching elimination and eradication. This makes it virtually impossible to predict which infectious disease agent is going to emerge and evolve next and where, or to predict when and where the next or last case in an outbreak will occur. There is, typically in complex systems, a fundamental horizon beyond which accurate prediction is impossible. The field has yet to explore where that horizon is and whether computational tools and additional data (and if so which data) can stretch predictions to this limit. In contrast, “what-if” scenarios for public health intervention can provide qualitative (and increasingly semiquantitative) insight into their population consequences.

With growing applications in public health, there is an increasing demand to validate models by making model predictions consistent with observed data. The development of ever-more-powerful computers is accompanied by new techniques utilizing this power, notably for statistically rigorous parameter estimation and model comparison. Techniques such as Markov chain Monte Carlo (MCMC) have become firmly established tools for parameter estimation from data in infectious disease models [e.g., (63)], and Monte Carlo based methods will play a pivotal role in addressing the challenges that lie in reconciling predictions and observations (64, 65). Other techniques, such as so-called particle filters, approximate Bayesian computation, emulation, and their combinations with MCMC [e.g., (66)], are rapidly developing and allow stochastic models that explicitly account for incomplete observations to be matched to time series of cases, giving insights into scenarios as diverse as cholera in Bangladesh (67) and influenza (68, 69). The need to integrate multiple data sources (70, 71), as well as to include uncertainty in model parameters and/or structure, has driven the use of Bayesian approaches.

Although the rapid expansion of infectious disease models and their application over the past decade has coincided with an increase in open access data sets available from a variety of sources, progress in data capture needs to be accelerated. Although some of these technologically advanced data streams have been incorporated into models—for example, to track the incidence of influenza in the United States (72), to elucidate the spatial dynamics of measles and malaria in Africa (53, 73), and to chart the spread of dengue globally (74)—much more remains to be done to leverage data collected from different sources (e.g., demographic, genetic, epidemiological, treatment, and travel patterns) and at different temporal and spatial scales.

Concluding remarks

Infectious diseases are an important frontier in public health, and their prevention and control call for global, rather than national or regional, coordinated efforts (7578). The success of smallpox and rinderpest eradication campaigns shows the possibilities; the global spread of newly emerged pathogens (recently avian influenza strains and MERS coronavirus), the difficulties in curbing the spread of antibiotic resistance, the upsurge of polio toward the “end-phase” of its eradication, and the recent unprecedented spread of Ebola virus, are examples that show the need for international coordination and collaboration. Nonlinearity in infectious disease dynamics and global connectivity cause suboptimal national decisions on control and prevention to have regional and even global repercussions.

Given the mismatch with regions where most expertise on infectious disease dynamics is concentrated, it is important to empower local scientists and policy-makers, in regions where the burden of disease is heaviest, about the problems facing their own countries and the consequences of local actions. It is essential to make expertise, data, models, statistical methods, and software widely available by open access. There are several initiatives (e.g.,,,, the Humanitarian Data Exchange (HDX), and the Malaria Atlas Project), but more needs to be done. Modeling tools and software for data analysis are beginning to become open source, such that findings can be replicated, additional scenarios can be evaluated, and others can incorporate methods for data analysis or simulation. Ultimately, sharing models guarantees more reproducible results, while maximizing model transparency.

Making data sets widely available is also crucial, for example, to support replication of findings and broader comparative analyses (79). As models become open access, so should much of the data collected by governments, international agencies, and epidemiology research groups. Two outbreaks never occur in exactly matching circumstances, even for the same infectious agent, so there is potential to study many outbreaks in parallel to gain insight into the determinants of outbreak pattern and severity. Looking forward, there is a major opportunity to design experiments, clinical trials [for example for vaccines (80)], and surveillance protocols to test model predictions or assumptions, and to help reduce or better target the enormous costs involved. By integrating modeling approaches throughout the full life cycle of infectious disease policies, including economic considerations (58, 70, 81), health outcomes can be improved and scientific understanding can be advanced.

At present, the evidence provided by infectious disease models is not considered by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) working group ( alongside that of conventional studies such as clinical trials. Regardless, models are essential when diverse sources of data (including GRADE-scale evidence) need to be combined and weighed to assess quality of evidence and recommendations made in health care scenarios. In many cases, the definitive trial cannot be performed, and in such circumstances models can offer insight and extract maximum value from data that are available. In recent years, uniformity of practice and quality control for models has received more attention, resulting in initial attempts to characterize good modeling practice for infectious diseases (82, 83).

The optimal use of models to inform policy decisions requires a continuous dialogue between the multidisciplinary infectious disease dynamics community and decision-makers. This is increasingly understood by governments in developed countries, in nongovernmental agencies and by large funding bodies. This dialogue will help to reduce the burden from infectious diseases by providing better-informed control strategies. Mathematical models will allow us to capitalize on new data streams and lead to an ever-greater ability to generate robust insight and collectively shape successful local and global public health policy.

  • Authors, apart from first and last author, are in alphabetical order.

  • All authors are members of this collaboration.

  • § In addition to the authors listed above, this collaboration includes: Nimalan Arinaminpathy,1 Frank Ball,2 Tiffany Bogich,3 Julia Gog,4 Bryan Grenfell,3 Alun L. Lloyd,5 Angela Mclean,6 Philip O’Neill,2 Carl Pearson,11 Steven Riley,1 Gianpaolo Scalia Tomba,12 Pieter Trapman,13 James Wood7. Affiliations: 1Imperial College, London, UK. 2University of Nottingham, Nottingham, UK. 3Princeton University, Princeton, NJ, USA. 4University of Cambridge, Cambridge, UK. 5North Carolina State University, Raleigh, NC, USA. 6University of Oxford, Oxford, UK. 11University of Florida, Gainesville, FL, USA. 12University of Rome “La Sapienza,” Rome, Italy. 13University of Stockholm, Stockholm, Sweden.

References and Notes

  1. Acknowledgments: H.H. conceived and wrote the paper; R.M.A., V.A., S.B., D.D.A., C.D., K.T.D.E., W.J.E., S.D.W.F., S.F., T.D.H., T.H., V.I., P.K., J.L., J.O.L.-S., C.J.E.M., D.M., J.R.C.P., L.P., M.G.R., and C.V. provided text and edited the manuscript; J.L., C.J.E.M. and T.D.H. produced figures; the Isaac Newton Institute IDD Collaboration jointly produced and discussed ideas for the outline and content (all of the above plus N.A., F.B., T.B., J.G., B.G., A.L.L., A.M., P.O.N., C.P., S.R., G.S.T., P.T., and J.W.). We gratefully acknowledge help by K. Koelle and D. Fisman in producing adapted versions of their figures for Fig. 2 (panels C, D, and E). This paper was conceived and developed at a program on Infectious Disease Dynamics at the Isaac Newton Institute for Mathematical Sciences, Cambridge, UK, 19 August to 13 September 2013 and 19 May to 6 June 2014 ( We gratefully acknowledge financial and infrastructural support from the Isaac Newton Institute for Mathematical Sciences which is fundamental to the success of this program. We are also grateful for the financial support the program received from the Research and Policy for Infectious Disease Dynamics (RAPIDD) program of the Science and Technology Directorate, U.S. Department of Homeland Security, and the Fogarty International Center, NIH. R.M.A. is a non–executive director of GlaxoSmithKline. C.D. acts as an advisor for the Wellcome Trust, for which he receives financial remuneration.
View Abstract

Stay Connected to Science

Navigate This Article