The Epidemic Behavior of the Hepatitis C Virus

See allHide authors and affiliations

Science  22 Jun 2001:
Vol. 292, Issue 5525, pp. 2323-2325
DOI: 10.1126/science.1058321


Hepatitis C virus (HCV) is a leading worldwide cause of liver disease. Here, we use a new model of HCV spread to investigate the epidemic behavior of the virus and to estimate its basic reproductive number from gene sequence data. We find significant differences in epidemic behavior among HCV subtypes and suggest that these differences are largely the result of subtype-specific transmission patterns. Our model builds a bridge between the disciplines of population genetics and mathematical epidemiology by using pathogen gene sequences to infer the population dynamic history of an infectious disease.

An estimated 170 million people worldwide are at risk of liver cirrhosis and liver cancer due to chronic infection with HCV (1). The virus is responsible for 10,000 deaths per year in the United States, and this rate is expected to increase substantially in the next two decades (2). HCV is a rapidly evolving single-stranded positive-sense RNA virus that exhibits enormous genetic diversity. It is classified into six types (labeled 1 through 6) and numerous subtypes (labeled 1a, 1b, etc.), which differ in diversity, geographical distribution, and transmission route (3). Subtypes appear to differ in treatment response, although their role in variation of disease progression is unclear (2, 4). Any successful HCV vaccination or control strategy, therefore, requires an understanding of the nature and variability of epidemic behavior among subtypes.

HCV was first isolated in 1989, and knowledge of its long-term epidemiology before that date is limited. Highly divergent strains have been found in restricted geographic areas such as West Africa and Southeast Asia, suggesting a long period of infection in these regions. In contrast, several globally prevalent subtypes are much less divergent, indicating a recent worldwide spread of these strains (5–7).

We investigate HCV epidemiology using coalescent theory, a population genetic model that describes how the demographic history of a population determines the ancestral relationships of individuals sampled from it (8, 9). Phylogenies reconstructed from contemporary HCV gene sequences contain information about past population dynamics and can, therefore, be used to infer viral epidemic behavior (10). We also demonstrate one way in which the fundamental epidemiological quantity R 0 (the basic reproductive number of a pathogen) can be estimated from gene sequences. R 0 represents the average number of secondary infections generated by one primary case in a susceptible population and can be used to estimate the level of immunization or behavioral change required to control an epidemic (11).

The framework of coalescent theory allows us to estimateN(t), a continuous function that represents the effective number of infections at time t. Time tis zero at the present and increases into the past, henceN(0) is the effective number of infections at the present.N(t) can be considered as the inbreeding effective population size of the viral epidemic (12). Previous viral coalescent studies have used simple models forN(t), specifically, constant population size and exponential growth (13, 14). A more appropriate approach, which we use here, is to develop a basic epidemiological model, from which a suitable form forN(t) is obtained. Because there is little protection against HCV reinfection (15) and vertical transmission is rare, its epidemic spread can be represented byEmbedded Image(1)where y is the proportion of the at-risk population that is infected and D is the average duration of infectiousness. B is a combination of parameters relating the force of infection (the per capita rate of acquisition of infection) to the prevalence of infection. In this model,R 0 = BD and equilibrium prevalence is 1 – (1/R 0). A time-reversed version of Eq. 1 was solved for y and then transformed into effective population size using the relationN(t) = N(0) [y(t)/y(0)]. The resulting demographic model isEmbedded Image(2) r is the growth rate achieved in a wholly susceptible population, c is a logistic shape parameter, and k is the constant of integration. Note thatB, D, and k cannot be separated.

Given a molecular phylogeny reconstructed from contemporary viral gene sequences (16), it is possible to estimateN(0), r, and c within a maximum likelihood (ML) framework (17). Because reconstructed phylogenies represent time in units of nucleotide substitutions per site, some parameters are estimated as functions of the substitution rate (18). These parameters can be transformed back into their natural units using the substitution rate of the viral gene concerned. We estimated HCV substitution rates by reanalyzing gene sequences sampled in 1995 from individuals who were infected by a single batch of antibody to rhesus D 17 years earlier (19–21).

The above methods were used to investigate four HCV strains. HCV types 6 and 4 are genetically diverse but geographically constrained: type 6 is restricted to Southeast Asia and type 4 is found predominantly in Africa and the Middle East. In contrast, subtypes 1a and 1b are less divergent but are distributed globally (4). For each subtype, E1 and NS5 gene sequences were collated from GenBank and aligned by hand (Table 1) (22). Phylogenies were estimated from the alignments using a ML approach under the assumption of a constant rate of nucleotide substitution (23). In each case, the hypothesis of rate constancy was tested (24).

Table 1

Maximum likelihood parameter estimates for each HCV type or subtype. Seq., number of sequences.

View this table:

Table 1 reports the ML estimates of N(0), r, andc, with approximate confidence intervals (CIs), obtained from each HCV data set. CIs for estimates of r are considerably smaller than those for N(0) and c(25). Figure 1 represents these estimates graphically, and compares them with a nonparametric estimate of N(t) (26). For each subtype, the E1 and NS5 results are similar, and our model appears to fit the demographic signal in the data well.

Figure 1

Maximum likelihood estimates ofN(t), the effective number of infections through time, for each HCV data set (black curves) (17). The gray, stepwise plots represent corresponding nonparametric estimates ofN(t) (26). Genetic distances were transformed into a time scale of years using estimates of E1 and NS5 nucleotide substitution rates (20). These plots are point estimates of N(t) and, thus, contain no information about uncertainty in N(0), r andc (Table 1) or μ (20).

There are significant differences in epidemic history among the HCV strains. Subtypes 1a and 1b seem to have originated about 100 years ago, whereas types 4 and 6 appear to be much older, having arisen about 350 and 700 years ago, respectively (Fig. 1). The growth rates of subtypes 1a and 1b during the last 100 years are considerably greater than those of types 4 and 6, providing confirmation of a recent and rapid spread of subtypes 1a and 1b, in contrast to a long period of localized endemic infection for types 4 and 6 (5,6).

Types 4 and 6 appear to have reached equilibrium prevalence some time in the past, whereas subtype 1b's growth rate decreased only very recently and subtype 1a is still exponentially growing at the present (Fig. 1). These observations reflect the different modes of transmission that characterize the four strains. The recent and swift global dissemination of subtypes 1a and 1b is largely the result of their effective transmission through modern contact networks, specifically, injecting drug use (IDU) and infected blood products. Subtype 1b transmission is more commonly associated with blood transfusion and hemodialysis, suggesting improved blood screening as the cause of its recent growth rate decrease (2), whereas subtype 1a is most strongly linked to IDU (27–32). These results corroborate epidemiological surveys, which indicate a decrease in the prevalence of subtype 1b relative to subtype 1a through time (27–32). In contrast, type 4 and type 6 HCV infections in less developed regions are often “community acquired” by a variety of undefined social and domestic routes (2,33, 34), explaining the earlier spread and lower growth rates observed for these strains.

Estimates of R 0 can be obtained straightforwardly using the relation R 0 =rD + 1 (Eq. 2). Figure 2displays estimates of R 0 for each subtype under a range of plausible D (2, 35). For each D, the R 0 values of subtypes 1a and 1b are significantly higher than those of types 4 and 6. Integrating uniformly across this range and averaging the E1 and NS5 results, we obtain the following point estimates ofR 0: 2.93 for subtype 1a, 2.67 for subtype 1b, 1.68 for type 4, and 1.21 for type 6. Because there is little reason to believe D varies substantially among strains, the observed differences in R 0 probably result from differences in the transmission parameters that collectively defineB. These differences most likely arise from the association of subtypes with specific transmission routes. This conclusion is strengthened by two observations: (i) types 4 and 6 can spread quickly if they enter efficient contact networks (36–39) and (ii) in the absence of such networks, HCV type 1 in West Africa shows evidence of long-term endemic infection (7). However, the possibility of viral genetic variability in infectiousness among subtypes should not yet be discounted entirely (40).

Figure 2

Estimates of R 0 for each HCV data set, when the average duration of infectiousness,D, is 10, 20, and 30 years.

Extrapolating our estimates into the near future, it is clear that, in terms of new infections, subtype 1a poses the greatest threat to public health. Subtype 3a, which was not included in our analysis, is also strongly linked to IDU and may pose a similar risk (27–32). Furthermore, we can use our estimates of R 0 to predict that the eventual equilibrium prevalence of subtype 1a will be ∼65%. This value is not unrealistically high because it represents prevalence within the subtype 1a risk group rather than within the general population. HCV prevalence among injecting drug users is already at this level (70 to 80%) (2) and most new HCV infections occur soon after initiation of IDU (41), suggesting that the sustained exponential increase in subtype 1a infections (Fig. 1) is at least partly driven by non-IDU transmission and continual recruitment into the IDU risk group.

Our analysis has been aided by three factors: the abundance and variability of HCV sequence data, the absence of observable recombination in HCV (42), and the existence of independent substitution rate estimates. However, we recognize that selection, uncertainty in phylogeny estimation, and variable substitution rates are probably present in our data and may confound the interpretation of our results. Yet the consistency of our results among structural and nonstructural genes that are under different selective constraints (43, 44) and their concordance with current epidemiological data (27–32) suggest that our conclusions are at least qualitatively robust. Importantly, no confounding factor appears to vary in a subtype-specific manner so as to produce the results we observe.

The methods introduced here demonstrate that viral gene sequences constitute a potentially significant source of information about epidemiological processes. These methods are especially suitable for rapidly evolving viruses that do not induce lifelong immunity, because the R 0 values of such infections cannot be estimated from the average age at first infection (45). We hope that other HCV subtypes will be similarly investigated as more sequence data becomes available. However, analysis of other viruses may require more complex epidemiological models than that used here, and it is possible that coalescent-based approaches will be less effective when applied to pathogens, such as influenza, that exhibit strong cyclical population dynamics, due to the loss of genetic information during population bottlenecks.

  • * To whom correspondence should be addressed. E-mail: oliver.pybus{at}


View Abstract

Stay Connected to Science

Navigate This Article