Research Article

Predicting the Evolution of Human Influenza A

See allHide authors and affiliations

Science  03 Dec 1999:
Vol. 286, Issue 5446, pp. 1921-1925
DOI: 10.1126/science.286.5446.1921

Abstract

Eighteen codons in the HA1 domain of the hemagglutinin genes of human influenza A subtype H3 appear to be under positive selection to change the amino acid they encode. Retrospective tests show that viral lineages undergoing the greatest number of mutations in the positively selected codons were the progenitors of future H3 lineages in 9 of 11 recent influenza seasons. Codons under positive selection were associated with antibody combining site A or B or the sialic acid receptor binding site. However, not all codons in these sites had predictive value. Monitoring new H3 isolates for additional changes in positively selected codons might help identify the most fit extant viral strains that arise during antigenic drift.

Antigenic drift due to mutations in the hemagglutinin gene necessitates frequent replacement of influenza A strains in the human vaccine. Antibodies against hemagglutinin are a primary determinant of susceptibility to infection. However, the effects on antigenicity of specific mutations in the hemagglutinin gene are not well understood because multiple amino acid changes often occur together in important antigenic variants. Changes in antigenicity depend not only on the nature and position of the amino acid replacements but also on the amino acids currently encoded at other key positions in the HA1. We have developed a method for predicting the evolution of the virus that makes use of genetic data in the absence of specific knowledge about the antigenic properties of the viruses in question.

The fixation rate in the HA1 domain of hemagglutinin (H3 subtype) is high; about 5.7 × 10−3 nucleotide substitution per site per year (1). At least 18 of the 329 H3 HA1 codons have been under positive selection to change in the past (2,3) (Table 1). Positive selection is defined as a significant excess of nucleotide substitutions that result in amino acid replacements. If the selective pressure were to evade the host immune response, then viruses sustaining mutations at these codons in the past should have been more fit than other coexisting viruses. If this hypothesis is correct, then screening extant strains for additional mutations in these codons might help to identify the most fit viral strains in circulation.

Table 1

The 18 positively selected codons in the HA1 hemagglutinin gene and their membership in alternative codon sets. R, codons associated with the sialic acid receptor binding site; A or B, codons in or near antibody combining site A or B, respectively; F, codons with rapid rates of amino acid replacement.

View this table:

We tested our prediction method retrospectively by determining whether influenza isolates on phylogenetic lineages that underwent the greatest number of changes in the positively selected codons in past influenza seasons were more fit than other isolates. We define fitness as follows: if one viral strain is more closely related to future lineages than another strain, it is more fit, regardless of the amount or severity of disease either strain causes. Thus, our goal of predicting the course of evolution is not equivalent to predicting the epidemic strain for the next year, nor can it be used to predict new pandemic strains that may emerge from avian or swine reservoirs.

Retrospective Tests

Our prediction method takes advantage of the fact that phylogenetic trees constructed using hemagglutinin genes from human influenza type A viruses show, over time, a single successful lineage, which we call the “trunk lineage.” The trunk lineage is the only lineage from which strains in all subsequent years arise (Fig. 1). All trees in this study were constructed from sequences 987 nucleotides long (4) using the tree bisection-reconnection branch-swapping option of the heuristic search option of the maximum parsimony routine of PAUP (5). Each retrospective test data set contains sequences from isolates collected in 1983 through the end of 1 of 11 consecutive influenza seasons (Table 2), defined here as 1 October through 30 September. We illustrate our method with the 1993–1994 test, which uses isolates collected in 1983 through 30 September 1994. The tree representing the evolutionary history of these isolates is shown inFig. 1A. All trees used in this study were rooted on the ancestor of A/Oita/3/83, the oldest isolate in the data set.

Figure 1

Predicting the evolution of influenza A hemagglutinin. (A) The 1993–1994 test tree shows the evolution of the HA1 domain of the hemagglutinin gene of human influenza A subtype H3 from 1983 through the 1993–1994 influenza season. The single evolutionarily successful lineage, or trunk lineage, is shown as a bold line. Question mark indicates the point where we lose the ability to visually discern the trunk near the top of the 1993–1994 tree. Asterisk marks a section of the 1993–1994 test tree where two lineages cocirculated for 5 years. (B) The 1997 bootstrap tree, shows the evolution of hemagglutinin from 1983 through the 1996–1997 influenza season. The trunk has been extended through the top of the tree based on subsequently obtained data. Taken together, these trees portray a successful prediction test because the predictive isolate (A/Shangdong/5/94) found by counting replacements in the positively selected codons along all lineages on the 1993–1994 test tree is located on a lineage descending from node 12, the uppermost trunk node from which isolates from the 1993–1994 data set descend. By our definition of fitness A/Shangdong/5/94 is relatively more fit than the other isolates because it is more closely related to the path of the trunk lineage in future influenza seasons.

Table 2

Sample sizes for retrospective test data sets, and comparison of prediction results with random expectations.

View this table:

Considerable genetic heterogeneity may occur during a given epidemic season or calendar year. The most recently collected sequences typically appear in a region of relatively undifferentiated clusters in the top of the tree (question mark in Fig. 1A). Because of this lack of differentiation, and because the average life-span of a nontrunk lineage is 1.5 years (2), the course of the trunk lineage within the upper portions of a phylogenetic tree constructed with the hemagglutinin gene is identifiable only in retrospect. Usually it becomes discernible within a few years, although one recent nontrunk lineage persisted for 5 years (asterisk in Fig. 1A).

Our prediction method posits that the lineage undergoing the most amino acid replacements in the 18 positively selected codons (Table 1) along the path of branches joining the root of the tree to each of the 173 terminal nodes should identify the section of the 1993–1994 tree from which the future trunk lineage will emerge. That isolate, A/Shangdong/5/94, is called the “predictive isolate” for the 1993–1994 test (Fig. 1A).

To discover whether our prediction was correct, we must determine whether A/Shangdong/5/94 is located in the part of the 1993–1994 test tree from which the trunk eventually emerged. We know the eventual course of the trunk lineage from subsequently collected data used to construct the “1997 bootstrap tree” (6) (Fig. 1B). All nodes on this tree with less than 50% bootstrap support were collapsed to ensure that we evaluated the success of a prediction test (the relative distance of isolates up the trunk lineage) as conservatively as possible (7). It should not be inferred that the 14 trunk nodes divide the nontrunk lineages into classes based on antigenic characteristics, isolation date, or geographic point of sampling.

For the 1993–1994 prediction test to be successful by our criterion, A/Shangdong/5/94 must be at least as fit as any other isolate in the 1993–1994 data set. By our definition of fitness, this means that A/Shangdong/5/94 must be located as far up the trunk lineage of the 1997 bootstrap tree as any other isolate in the 1993–1994 data set. The 173 isolates from the 1993–1994 test tree are located in lineages descending from trunk nodes 1 through 12 on the 1997 bootstrap tree. The predictive isolate, A/Shangdong/5/94, descends from trunk node 12. Thus, according to our criterion, the prediction test for 1993–1994 is successful. We again note that our method predicts only that future lineages will be more closely related to A/ Shangdong/5/94 than to other isolates from the 1993–1994 test tree and not that this strain has the antigenic characteristics required to cause the next epidemic.

To ensure that this result is robust with respect to uncertainty in the structure of the test tree, we replicated the 1993–1994 prediction test by using 10 different but equally parsimonious 1993–1994 test trees generated by randomizing the order in which sequences were entered into PAUP. A/Shangdong/5/94 was the predictive isolate for all 10 of the 1993–1994 replicate trees. Table 3 describes how tests were scored in years when replicate test trees produced different predictive isolates.

Table 3

Results of retrospective prediction tests for 11 recent influenza seasons using codons under positive selection and seven alternative codon sets. Cells indicate the trunk nodes on the 1997 bootstrap tree (Fig. 1B) from which the predictive isolates resulting from each test descended. Successful tests (bold underlined numbers) are those in which the predictive isolate descended from the uppermost possible node on the 1997 tree (bottom row). Right-hand column shows the total number of seasons in which each codon set produced a successful test. Cell entries with a decimal place are mean trunk nodes (across the 10 replicate test trees) when these tests produced predictive isolates descending from different trunk nodes. Positively selected, the set of 18 codons under positive selection; AB, codons in or near antibody combining sites A and B; CDE, codons in or near antibody combining sites C, D, and E; RBS, codons associated with the sialic acid receptor binding site; Fast, codons undergoing the largest number of amino acid replacements.

View this table:

We performed retrospective tests for 10 other influenza seasons in the same manner as for the 1993–1994 test. The prediction tests were successful according to our criterion in 9 of 11 influenza seasons (Table 3). The prediction method was not successful in the 1987–1988 and 1988–1989 tests, when the method chose the same predictive isolate as for the 1986–1987 test. Analysis of these failures suggests that we might improve the prediction method by developing criteria with which to decide when lineages have become extinct and rejecting potential predictive isolates that occur on those lineages.

Comparison with Random Expectations

A comparison of our results with random expectations supports the validity of our predictive results. First, we determined the probability of success by using randomly chosen predictive isolates. The 173 isolates in the 1993–1994 test data set are distributed on the lineages descending from trunk nodes 1 through 12 on the 1997 bootstrap tree (Fig. 1B). Only one of the 173 isolates, the predictive isolate A/Shangdong/5/94, is located on the lineages descending from the uppermost possible trunk node, trunk node 12. The probability of obtaining a prediction test that was at least as successful as ours for the 1993–1994 influenza season, assuming that our method actually contained no predictive information, was thus only 1 of 173, or 0.6%. In other years this probability is much higher, largely because greater proportions of the isolates from the test data set happened to descend from the uppermost trunk lineage. Across influenza seasons, the probability of obtaining results at least as successful as ours by randomly choosing predictive isolates ranged from 0.6% to 42.6%, with an average of 13.9% (Table 2).

Then, we tested the hypothesis that change itself is adaptive. In this case counting mutations using randomly chosen codon sets should produce a similar rate of successful tests. We repeated each of the 11 prediction tests with 1000 sets of 18 randomly sampled codons drawn from the set of 177 codons that encoded amino acid replacements between 1983 and 1997 but were not under positive selection. In the 1993–1994 example, counting replacements at randomly chosen codons resulted in a successful prediction test only twice (0.2% of the time).

Across 11 seasons, the probability of obtaining results at least as successful as ours, had we used randomly chosen codons, averaged 10.6% (Table 2). This probability varied widely among seasons, again in part because of variation in the distribution of isolates from the test data sets on the 1997 bootstrap tree. However, some of the differences must be attributed to the fact that we were counting mutations in a different set of codons. For the 1994–1995 test, none of the random codon sets resulted in a successful prediction test even though there were four isolates from this data set in lineages descending from the uppermost possible trunk node. This is because all eight mutations on the lineages joining these isolates to the trunk lineage were in the positively selected codons.

Search for a Causal Explanation

We sought a causal explanation for our results by first determining whether they reflected selection to alter the antibody combining sites in the hemagglutinin. Of the 329 amino acids in HA1 of hemagglutinin, 131 are thought to lie in or near the five overlapping antibody combining sites, labeled A through E (8–10). Changes in 41 codons in or near antibody combining sites A and B have been associated with antigenic drift (11). We repeated the prediction tests, counting changes in just those 41 codons, and obtained seven successful tests across 11 influenza seasons compared with nine successes when the positively selected codons were used (Table 3). This suggests that the selective pressure on the positively selected codons may have been to alter antibody binding.

To determine whether association with antibody combining sites A and B was sufficient to explain the successful prediction tests obtained using the positively selected codons, we repeated the tests with the 28 codons associated with antibody combining sites A and B that were not under positive selection (9). Only 1 of 11 tests was successful (Table 3). Thus, although selection for antibody binding may be the force causing change in the positively selected codons, association with antibody combining sites A and B is not itself sufficient for a codon to have predictive value using our method. We achieved a successful prediction test in only three seasons (Table 3) when we counted changes in the 90 codons collectively associated with antibody combining sites C, D, and E (9). Only 5 of the 90 codons in this set were also under positive selection, so we did not repeat the test with these codons excluded.

We next examined whether the selective pressure on the positively selected codons reflected selection to alter the sialic acid receptor binding site of the hemagglutinin. Six highly conserved residues comprising the surface of the pocket (Tyr98, Trp153, Glu190, Leu194, His183, and Thr155), along with residues 134 to 138 and 224 to 228, which form the right and left sides of the pocket, respectively, are involved in receptor binding. Five of the positively selected codons and some of the codons associated with antibody combining sites are among this set (9, 12) (Table 1). Selective conservation of the receptor binding site might be expected to preserve function and receptor specificity; however, it is possible that mutations might be adaptive if antibody binding were reduced or if better binding to respiratory epithelium resulted. Prediction tests counting changes in the receptor binding site codons were successful in 5 of 11 seasons (Table 3). Prediction tests using only the 11 receptor binding site codons that were not also under positive selection (9) failed to produce a single successful test because they underwent very few changes (Table 3). Thus, change appears to be adaptive in the codons within the receptor binding site that are also under positive selection but not for the codon set as a whole.

Among the set of 20 codons having relatively high rates of amino acid replacements relative to other HA1 codons, 12 were also in the set of positively selected codons (9) (Table 1). This association led us to test whether a rapid rate of amino acid replacement is selectively advantageous regardless of the position of the codon in the hemagglutinin gene. Predictive isolates obtained from tests with this alternative codon set produced successful tests in 6 of 11 seasons (Table 3). We repeated the prediction tests with the set of 18 codons with the highest rates of amino acid replacement excluding those under positive selection (9). This resulted in a successful prediction test in only two seasons (Table 3). Thus, a rapid rate of change alone does not explain why change at the set of positively selected codons appears to be adaptive.

Conclusion

We have identified a small set of rapidly evolving codons in the HA1 domain of the hemagglutinin gene of human influenza A subtype H3 in which replacement substitutions in the past appear to have been selectively advantageous. Strains with more mutations in these codons were more likely to be the progenitors of successful new lineages in 9 of 11 influenza seasons. The probability of obtaining a successful test by chance varied across years. However, our method produced successful tests in many years, such as 1993–1994, when it was highly unlikely to do so. A causal explanation for our results is suggested by the significant overlap between the positively selected codons and the codons in or near antibody combining sites A or B and, to a lesser extent, codons associated with the sialic acid receptor binding site. However, codons associated with these sites of known function that are not under positive selection perform poorly in the prediction tests. The positively selected codons are among the most quickly evolving in the HA1 domain of hemagglutinin, but this characteristic also is not a sufficient explanation for our results. It appears that we have identified a subset of rapidly evolving codons in known functional sites for which change in the past has been associated with a clear selective advantage. Whether additional changes in these codons will confer a selective advantage in the future remains to be seen. Our retrospective tests indicate that, had these data been available in the past, monitoring change in the positively selected codons might have provided potentially useful information about the course of influenza evolution. These methods can be used to examine the evolution of the hemagglutinins of influenza B and influenza A H1 viruses circulating in humans to identify sets of positively selected codons that may have predictive value for these important pathogens.

  • * To whom correspondence should be addressed. E-mail: rmbush{at}uci.edu

REFERENCES AND NOTES

View Abstract

Navigate This Article