Unifying the Epidemiological and Evolutionary Dynamics of Pathogens

See allHide authors and affiliations

Science  16 Jan 2004:
Vol. 303, Issue 5656, pp. 327-332
DOI: 10.1126/science.1090727


A key priority for infectious disease research is to clarify how pathogen genetic variation, modulated by host immunity, transmission bottlenecks, and epidemic dynamics, determines the wide variety of pathogen phylogenies observed at scales that range from individual host to population. We call the melding of immunodynamics, epidemiology, and evolutionary biology required to achieve this synthesis pathogen “phylodynamics.” We introduce a phylodynamic framework for the dissection of dynamic forces that determine the diversity of epidemiological and phylogenetic patterns observed in RNA viruses of vertebrates. A central pillar of this model is the Evolutionary Infectivity Profile, which captures the relationship between immune selection and pathogen transmission.

The population dynamics of many host-pathogen interactions are well characterized (1, 2). However, the link between epidemic processes and pathogen evolution, within and among hosts, is not so well understood. This connection is central to many applied issues, from the evolution of drug resistance and virulence, to vaccine design and the emergence of new diseases. The current revolution in host and pathogen genomics underlines the timeliness of this issue.

Research on the dynamics of pathogen strains has illuminated the issues involved (35). However, linking pathogen dynamics and genetic diversity quantitatively at within-host and population levels is a more formidable problem, because it requires empirical information about the interplay of dynamics and genetics at both levels.

Here we seek to unify the interacting epidemiological and evolutionary processes that drive spatiotemporal incidence and phylogenetic patterns at different scales. Although our present understanding of these patterns is incomplete, outlining current knowledge allows qualitative inferences to be made. In our survey of pathogen “phylodynamics,” we focus on RNA viruses, where high mutation rates, large population sizes, and short generation times mean that epidemiological and population genetic processes occur on a similar time scale.

Observed Phylodynamic Patterns

A major determinant of epidemic (and therefore phylodynamic) behavior is the relative time scale of infection dynamics and replenishment of susceptible hosts after epidemics (6). The main epidemiological distinction is therefore between fast (acute) infections, in which the infectious period is measured in days or weeks, and slower (persistent) infections, which can last years. In contrast, phylogenetic patterns are primarily affected by natural selection that arises from cross-immunity (i.e., the differential effect of immune responses on genetically variable strains), and secondarily by neutral epidemiological processes, such as spatial population separation. Ultimately, pathogen strains and phylogenetic lineages are produced by mutation, and their survival depends on the prevailing epidemiological and immunological forces. The main phylodynamic categories of RNA viruses are summarized below; the relevant biological characteristics of example viruses are set out in table S1.

Short infections with strong cross-immunity. Measles (and other morbilliviruses) illustrate the acute extreme of the phylodynamic spectrum. Measles population dynamics show marked epidemic behavior (Fig. 1A), with complex recurrent epidemics (7, 8). Modeling shows that epidemic cycles arise from repeated exhaustion of susceptible host numbers, driven by short incubation and infectious periods, combined with strong, lifelong immunity elicited by primary infection (1, 7, 8). Before the vaccine era, these forces interacted with seasonality in transmission to produce explosive, generally biennial, measles dynamics in large cities (7).

Fig. 1.

(A) Prevaccination measles dynamics: weekly case reports for Leeds, UK (7). (B) Weekly reports of influenza-like illness for France (44). (C) Annual diagnosed cases of HIV in the United Kingdom (45). (D) Measles phylogeny: the measles virus nucleocapsid gene [63 sequences, 1575 base pairs (bp)]. (E) Influenza phylogeny: the human influenza A virus (subtype H3N2) hemagglutinin (HA1) gene longitudinally sampled over a period of 32 years (50 sequences, 1080 bp). (F) Dengue phylogeny: the dengue virus envelope gene from all four serotypes (DENV-1 to DENV-4, 120 sequences, 1485 bp). (G) HIV-1 population phylogeny: the subtype B envelope (E) gene sampled from different patients (39 sequences, 2979 bp). (H) HCV population phylogeny: the virus genotype 1b E1E2 gene sampled from different patients (65 sequences, 1677 bp). (I) HIV-1 within-host phylogeny: the partial envelope (E) gene longitudinally sampled from a single patient over 5.8 years [58 sequences, 627 bp; patient 6 from (26)]. All sequences were collected from GenBank and trees were constructed with maximum likelihood in PAUP* (46). Horizontal branch lengths are proportional to substitutions per site. Further details are available from the authors on request.

However, the powerful, strain-transcending immunity driving these oscillations is not reflected directly in the measles virus phylogeny (Fig. 1D), because an immune response that is equally potent against all strains will not generate selection. Therefore, many strains coexist, with relative frequencies determined predominantly by nonselective epidemiological processes. This does not exclude the sporadic occurrence of selection, immunologically mediated or otherwise. Rather, the measles phylogeny indicates that selection is not operating sufficiently consistently to leave its imprint. Instead, the phylogeny is determined by global spatiotemporal strain dynamics: Some lineages persist in regions with low vaccination coverage, whereas others are globally distributed and represent localized outbreaks initiated by imported strains in regions with higher coverage. The high infectiousness of measles permits the rapid geographic spread of these strains, and their phylogenetic lineages reveal substantial spatial mixing.

Short infections with partial cross-immunity. The best documented example of this category is influenza A virus in humans and other mammals (911). This is an acute respiratory infection with a short infectious period, after which humoral and cellular immunity eliminate infection (11). As with measles, primary infection generally occurs in the young; however, there is only partial subsequent cross-immunity against viral variants.

Naïve influenza susceptibles ultimately arise from the birth of new hosts, although susceptibility to novel viral variants is also regained by mutation at key antigenic sites in the viral hemagglutinin (HA) and neuraminidase (NA) genes (11). This alternative supply of susceptibles means that, in contrast to measles, influenza appears in annual patterns, ranging from highly seasonal winter incidence at high latitudes to more endemic behavior in the tropics (Fig. 1B) (11). Because of its short infectious period, influenza epidemics can be prone to local extinction (5). This dynamic fragility, coupled with viral genetic variation, generates complex transhemispheric waves of infection each year (11). Thus, strain structure in influenza has a powerful impact on epidemiological dynamics by modulating the supply of susceptibles. This feedback, along with the effect of pathogen epidemic dynamics on genetics in other systems, is illustrated schematically in fig. S1A.

Partial immunity to influenza A virus also generates strong fitness differences among strains, leading to rapid strain turnover. Such continual immune selection determines the shape of phylogenies of the HA (Fig. 1E) and NA genes; these are strongly temporal in structure with high rates of lineage extinction, so that genetic diversity at any time is limited. The central trunk depicts the ancestry of the successful lineages and has the highest rate of amino acid replacement at key antigenic sites (9), suggesting that immunological distance from previous strains determines viral fitness. Although substantial progress has recently been made in integrating the individual- and population-level dynamics of influenza (5), the role of within-host dynamics remains to be added to the picture. Influenza B, and influenza A in other mammals, generally shows more complex patterns of antigenic drift (fig. S1B). In addition to antigenic drift, influenza pandemics can be caused by novel HA and NA combinations (antigenic shift). Aquatic birds are the natural reservoirs of influenza A viruses and harbor a variety of antigenic types, thereby providing an environment in which new recombinant subtypes can arise and transmit to mammals.

This phylodynamic category also includes foot and mouth disease virus (FMDV), which causes a highly infectious acute epidemic disease of livestock. Primary infection or vaccination gives imperfect protection against other variants of the virus, and there is evidence for antigenic selection in the FMDV phylogeny (12)

Infections with immune enhancement. Host immunity (13) and ecological interference (14) generally induce competitive interactions among strains. However, in the case of dengue—the most common vector-borne viral disease of humans—there is the possibility that antibody-dependent enhancement (ADE) generates positive reinforcement between strains (15). Dengue virus comprises four serotypes that cocirculate in tropical regions. ADE means that severe symptoms (such as dengue hemorrhagic fever and dengue shock syndrome) are more common in secondary infections with multiple serotypes than in primary infections. Models show that positive feedback between strains could lead to complex, even chaotic, dynamics (16). It is also possible that ADE allows low-frequency serotypes to persist through enhancement by more abundant strains (15, 17). ADE may explain the shape of the dengue virus phylogeny, in which the four serotypes are phylogenetically equidistant (Fig. 1F). Natural selection may favor this level of antigenic dissimilarity, as cross-protective antibodies would neutralize more similar strains, whereas more divergent strains would not stimulate ADE. The phylogeny can alternatively be explained by independent cross-species (monkey-human) transmission if the serotypes predominate in different geographic areas, followed by later mixing (15). Development of more mechanistic dynamic models of ADE is clearly a priority.

Persistent infections. Pathogens can persist in their hosts through many mechanisms; we focus here on viruses that constantly evolve during infection. The two most important such viruses are human immunodeficiency virus (HIV) and hepatitis C virus (HCV). Their long infectious periods, on the order of years, means that we can observe both intra- and inter-host phylodynamic patterns.

For HIV and HCV, inter-host dynamics are relatively slow, because the time between transmission events is generally months or even years. The long infection period sustains infected numbers during an epidemic, and susceptibles are replenished by recruitment into risk groups. We therefore see simple, relatively slowly changing epidemiological trends (Fig. 1C), in place of the deep troughs and recurrent epidemics of acute infections (7). For both viruses, the level of cross-immunity among strains appears to be low, as individual hosts can be infected with multiple viral genotypes (18, 19). Although widespread intra-host adaptation to human lymphocyte antigen (HLA) types has been observed (20), there is little evidence for differential intrinsic transmissibility among strains. Consequently, the phylogenetic structure of HIV and HCV at the population level is not determined by immune selection; instead, it reflects the demographic and spatial history of transmission (Fig. 1, G and H).

In contrast, intra-host evolution in both viruses appears to be driven by continual and strong immune selection pressure, from either neutralizing antibodies or cytotoxic T lymphocytes (2123) (Fig. 1I). This intra-host evolutionary process is in effect an example of fast dynamics, because the infection of new cells takes place on a time scale of days (24). In HIV, intra-host phylogenies of the highly immunogenic envelope (env) gene resemble those obtained from influenza A virus at the population level; they are strongly temporal with limited diversity at any one time (25, 26). Although there are fewer data sets data showing long-term intra-host HCV evolution, it is likely that phylogenies of immunogenic HCV proteins will show an equivalent pattern. The dynamics of intra-host HIV and HCV infection are still poorly understood; however, the availability of uninfected host cells, the strength and breadth of the immune response (itself a function of host genetics), and the feedback that arises from antigen stimulation of the immune system are all expected to be important (27).

A Phylodynamic Framework

RNA virus variability originates with high viral mutation and replication rates. This variation is then modulated by two processes; the host's immune response to infection, which exerts a selective pressure on the pathogen, and the bottleneck at transmission, which shapes the resultant viral diversity transmitted to other hosts. If the bottleneck is nonselective, it will only amplify the action of genetic drift. At the inter-host level, population dynamics, including bottlenecks arising from seasonality or spatial heterogeneity, will further modulate the distribution of genetic variation. In some RNA viruses, notably HIV, recombination also shapes the diversity produced by mutation. The interaction of these processes for equine influenza (28) are illustrated in fig. S1B.

Static patterns. We will first consider the relationship between pathogen adaptation and average immune pressure in individual hosts (Fig. 2A). Pathogen adaptation to host immunity can be measured as the rate of fixation of advantageous mutations in viral epitopes. This rate increases with the strength of natural selection for variants that can evade immunity, which will generally be positively related to the potency of the immune response. Conversely, there is a decreasing relationship between immune response and viral population size. A simple population genetic model (29) demonstrates how these effects combine to give the net adaptation rate of the viral population to a given level of host immunity (Fig. 2A). Counterintuitively, the highest rate of pathogen adaptation occurs at intermediate levels of immunity. We can use this framework to interpret the phylodynamic patterns of various RNA viruses by identifying where they fall on this curve:

Fig. 2.

(A) Schematic diagram of a static phylodynamic model for virus adaptation as a function of average immune pressure. Numbers correspond to phylodynamic patterns: 1) no effective response and no adaptation; 2) low immune pressure and low adaptation; 3) medium immune pressure and high adaptation; 4) high immune pressure and low adaptation; and 5) overwhelming immune pressure and no adaptation. (B) A phylodynamic framework allowing for within-host viral and immune kinetics. Time is measured in days after infection. Top: Schematic viral (red) and immunological (green) trajectories in individual hosts, based on experimental infection of horses with equine influenza virus (28, 35). Bottom: The corresponding EIPs (34). Left, center, and right columns respectively reflect infection in naïve, intermediately, and solidly immune individuals. In naïve hosts, virus shedding generally peaks ∼2 days after infection, declining to negligible levels by day 5. The humoral response rises by ∼day 6, underlining the idea that innate immunity, loss of susceptible cells, or other mechanisms play the major role in initially limiting infection (11). The EIP for naïve hosts is relatively low, because little viral replication coincides with selective immunity, so these hosts are unlikely to be a major source of host-selected variants. The EIP for highly immune hosts is also very low, because adaptive immunity generally prevents substantial virus excretion, other than rare immune escape variants. For intermediately immune hosts, existing immunity limits viral excretion compared to the naïve case, also increasing earlier and more rapidly. The EIP shows a high potential for the transmission of selected viral variants, as substantial viral replication occurs during a time of substantial immune selection.

  1. No effective immune response and no adaptation—This extreme includes HCV infection in immunocompromised hosts (30), influenza A virus immediately after an antigenic shift, and the initial phase of any infection in an immunologically naïve host. No effective adaptive immune response occurs, so there is little potential for viral mutants to be advantageous and the fastest replicating variant dominates the virus population.

  2. Low immune pressure and low adaptation—A large viral population will accrue some beneficial mutations to a weak immune response, leading to an intermediate rate of adaptation. Possible examples are rapidly progressing chronic HCV and HIV (21, 31) and the asymptomatic simian immunodeficiency virus infections that occur in nonhuman primates (32).

  3. Medium immune pressure and high adaptation—This is the point of fastest viral adaptation, as substantial immune selection coincides with appreciable virus replication. This position probably captures the dynamics of antigenic drift in influenza A virus and intra-host HIV infections (33), where the coevolutionary “arms race” between host and pathogen is most intense.

  4. High immune pressure and low adaptation—Here, a stronger immune response greatly reduces viral population size, limiting the number of virus escape mutants. In HIV, this position may be represented by long-term nonprogressive hosts, who can exhibit a lower rate of amino acid change than in rapid progressors (31).

  5. Overwhelming immune pressure and no adaptation—Infection will be rapidly cleared, so there is little chance of an adaptive response from the virus. Repeat exposure to measles and other morbilliviruses, which elicits solid lifetime immunity, exemplify this position (7).

Dynamic patterns. A more refined phylodynamic picture is possible if we consider how viral and immune dynamics within hosts determine the trajectory of viral adaptation (34). The potential complexities are illustrated by equine influenza, an acute virus with imperfect immunity, for which experimental challenge studies have provided a rich database of longitudinal kinetics in hosts of different immunological experience (fig. S1B) (28, 35). Figure 2B shows a schematic summary of these studies' results, with the lower panels depicting the net transmission rate of immunologically selected mutations. We call this quantity, which reflects the interaction between immune history and viral adaptation and abundance (34), the Evolutionary Infectivity Profile (EIP).

Other immunological and dynamic factors modify this picture. First, we have assumed that adaptive immunity is specific to the prevailing viral genotype. This will rarely be the case for viruses with imperfect immunity. In addition, dynamics will be affected in experienced hosts by “original antigenic sin” (11), by which new antigenic variants induce a recall of antibodies against previous strains. The realization of the EIP at the epidemiological level also depends on prevailing host-pathogen population dynamics. In particular, selected variants must be successfully transmitted for there to be population-level adaptation. More subtly, the temporal dynamics of epidemics could influence evolution. For example, in influenza, two forces could operate during different phases of the epidemic. During the upswing of cases, when there are large numbers of available susceptibles, a given viral lineage will switch more rapidly between hosts; any adaptations transmitted will therefore reflect relatively few generations of selection on the pathogen within a given host. In contrast, during epidemic troughs, the full infectious period of the virus is more important in maintaining the chain of transmission (2, 36). Thus, viral lineages will move more slowly between hosts, providing more opportunity for immune selection per transmission than in the early epidemic. The relative timing of peak virus excretion and immunity (Fig. 2B) are also important variables here. To understand such emergent dynamics, and their contribution to the modulation of pathogen diversity, we must aim to couple experimental work with models and phylogenetic inference.

Phylodynamic Inferences from Phylogenies

The phylodynamic framework (Fig. 2) and empirical data (Fig. 1) suggest a provisional classification of virus phylogenies. Pathogen phylogenies are determined by a combination of immune selection, changes in viral population size, and spatial dynamics. Figure 3 shows idealized versions of these different phylogenetic signatures. Although there are example pathogens for each idealized tree, in many cases RNA viruses will show a combination of signatures.

Fig. 3.

Idealized tree shapes under different phylodynamic processes. The main division is between those viruses subject to continual immune-driven selection (such as human influenza A virus and intra-host HIV), in which trees have a strong temporal structure, and viruses where immune selection is absent or weak (such as many RNA viruses), in which the trees depict population size and spatial dynamics. The types of evolutionary inference that can be made from the various phylogenies are also indicated. (A, B, and C represent three subpopulations from which viruses have been sampled.)

Pathogens under strong continual selection, such as human influenza A and intra-host HIV, show high adaptation rates (Fig. 2A) and will tend to exhibit temporal phylogenetic structure, reflecting the continual appearance and extinction of strains through time (Fig. 1). It is important to distinguish between continual and sporadic selection (e.g., changes in receptor binding sites that allow infection of new hosts or cell types), as the latter is unlikely to exert the continual pressure required to produce sustained strain turnover. In viruses with no recombination, such as measles, inter-virus competition (“clonal interference”) may limit the fixation rate of advantageous mutations (37), also reducing strain turnover.

When selection is weak or absent [either because cross-protective immunity is very strong or very weak (Fig. 2A)], pathogen phylogenies harbor information about nonselective epidemiological processes such as population dynamics or spatial change. Viral diversity dynamics can be investigated through the coalescent, a statistical framework that infers rates of population growth and decline from the temporal distribution of nodes in a phylogeny (38, 39). Certain processes, particularly population bottlenecks or selective sweeps, may greatly reduce genetic variation in viral populations. In these cases, there will be less power for population dynamic inferences, but the phylogeny may still contain information about spatial strain dynamics. Future methods should aim to integrate phylogenetic information with empirical longitudinal data on epidemiological (or intra-patient) population dynamics.

When present, selection can be detected and measured from viral gene sequences. For example, the ratio of nonsynonymous to synonymous substitution rates (the dN/dS ratio) gives a measure of continual selection in the viral genome and is a useful tool that can be related to the phylodynamic model in Fig. 2A (40).


The phylodynamic framework provides a template for integrating the comparative population dynamics, population genetics, and phylogenetics of microparasites. In particular, pathogen epidemic dynamics and genetics can each potentially influence the other (fig. S1, A and B), depending on the biology of the host-parasite interaction. The framework is also relevant to key aspects of pathogen evolution not covered here (41), including the evolution of virulence and drug resistance. However, a fully quantitative understanding of phylodynamics will require empirical and theoretical work focused on three issues.

First, we must determine the immunological implications of genetic change in the virus; this is a powerful force affecting broadscale phylodynamic patterns. A promising avenue is provided by antigenic “shape space” calculations, which interpret the immunological distance between viruses measured with panels of host sera (42). The persistence and extent of cross-immunity is also a potential modulator of pathogen diversity (5).

Second, we need to elucidate the quantitative interaction between the strength of immune response, the kinetics of viral adaptation, and the timing of transmission; together these determine the EIP. Although this information is often available for persistent infections like HIV, little is known about acute infections. Replicated experimental infections and transmission experiments are key; influenza A provides a fruitful case study, because contrasting dynamical patterns in different host species can be explored. Likewise, a greater understanding of how clonal interference and epistasis mediates immune selection is of fundamental importance.

Finally, to understand host-pathogen evolution in the long-term, we need to establish how epidemic and metapopulation disease dynamics modulate selective forces, summarized in the EIP, to drive long-term phylogenetic patterns. Again, ideally, this requires both well-sampled incidence and genetic data at both individual host and population scales. Currently, such data are only episodically collected. As genotyping becomes more routine, systematic collection of this information should be a priority from both public health and basic research perspectives.

Supporting Online Material


Table S1

References and Notes

  1. The fixation rate of advantageous mutations in viral epitopes is denoted μa. m is the viral mutation rate (per capita per unit time), N is the viral population size, and p(s) is the fixation probability of mutations with selection coefficient s. If f(s) is the distribution of selection coefficients, then Math The fixation probability of beneficial mutations in a constant haploid population of size N is approximately Math (43). To model f(s), let pa be the proportion of mutations that are advantageous, and let the selection coefficients of advantageous mutations be exponentially distributed (the exact distributional form is relatively unimportant). Hence, Math where a is the mean s of advantageous mutations. If we make the reasonable assumption that Ns > 1, then the adaptation rate is Math This produces the model in Fig. 2A. ϕ is a measure of the rightward skew of f (s). The expression in parentheses is the proportion of advantageous mutations that become fixed and is ∼1 when a is large. This simple model ignores many factors, such as varying population size, clonal interference, and epistatic interactions.
  2. Let N(t) be the viral population size through time (the viral load) (Fig. 2B, red curves) and let I(t) be the strength of the immune response through time (the antibody titer) (Fig. 2B, green curves). As in the static model, ϕ increases with the strength of the immune response; here we suppose that ϕ is proportional to I(t). If a host is infected at time (t) 0, then the number of advantageous mutations that have accumulated by time x is Math The potential for transmission of these will depend on viral infectivity, which often correlates with viral load. Under the simplest model, the transmission probability at time x is linearly proportional to N(x), hence Math Figure 2B was constructed using this equation. EIP(x) represents the average amount of viral adaptation transmitted to a susceptible contacted at time x after initial infection. N(t) and I(t) in Fig. 2B were obtained with a standard susceptible-infected-recovered–type model, with a time delay between viral appearance and the onset of the immune response. This initial model is simplistic; future refinements will require more realistic relationships between viral load, infectivity, the immune response, and the distribution of selection coefficients.
  3. dN and dS are the fixation rates of nonsynonymous and synonymous mutations, respectively. The fixation rate of neutral mutations (s = 0) in a constant haploid population equals m. If (i) all synonymous mutations are neutral, (ii) a proportion x1 of nonsynonymous mutations are neutral, (iii) a proportion x2 of nonsynonymous mutations are advantageous, and (iv) deleterious mutations are not fixed, then MathMath in a constant haploid population. x1 reflects selective constraint on the gene concerned; if constraint is strong, then x1 will be small. Nϕ is the shape of the black adaptation rate curve in Fig. 2A.

Stay Connected to Science

Navigate This Article