Structure of the Uncleaved Human H1 Hemagglutinin from the Extinct 1918 Influenza Virus

See allHide authors and affiliations

Science  19 Mar 2004:
Vol. 303, Issue 5665, pp. 1866-1870
DOI: 10.1126/science.1093373


The 1918 “Spanish” influenza pandemic represents the largest recorded outbreak of any infectious disease. The crystal structure of the uncleaved precursor of the major surface antigen of the extinct 1918 virus was determined at 3.0 angstrom resolution after reassembly of the hemagglutinin gene from viral RNA fragments preserved in 1918 formalin-fixed lung tissues. A narrow avian-like receptor-binding site, two previously unobserved histidine patches, and a less exposed surface loop at the cleavage site that activates viral membrane fusion reveal structural features primarily found in avian viruses, which may have contributed to the extraordinarily high infectivity and mortality rates observed during 1918.

Influenza is a viral infection of the respiratory tract that affects millions of people annually. Combined with subsequent infection from bacterial pneumonia, influenza remains one of the leading causes of death in the United States, killing on average more than 50,000 people per year. However, the 1918 pandemic killed over 500,000 people in the United States and more than 20 million worldwide (1), making it the largest and most destructive outbreak of any infectious disease in recorded history (2, 3). Why the 1918 virus was so devastating is still a mystery. The pandemic struck before viruses were known as the causative agent and, consequently, no intact virus survived. Nevertheless, fragments of the viral genome did survive in Alaskan victims buried in the permafrost and in fixed and archived autopsy material, which recently enabled gene reassembly (47).

There are three types of influenza virus (A, B, and C), and the 1918 virus is a member of type A, which accounts for all known major epidemics and pandemics. Hemagglutinin (HA) is the surface glycoprotein responsible for virus binding to the host receptor, internalization of the virus, and subsequent membrane-fusion events within the endosomal pathway in the infected cell. HA is also the most abundant antigen on the viral surface and harbors the primary neutralizing epitopes for antibodies. Fifteen avian and mammalian serotypes of HA have been identified, but only three have become adapted to humans in the last century, resulting in the emergence of pandemic strains H1 in 1918, H2 in 1957, and H3 in 1968 (see fig. S1 for sequences). Recently, three small outbreaks arose from avian subtypes (H5, H7, and H9) that managed to make a direct leap to humans, but their low transmissibility prevented major new epidemics (810). However, the emergence of future influenza virus pandemic strains is likely (11), and their severity will depend on the ability to contain and combat infection by timely development of an appropriate vaccine.

The mature HA forms homotrimers of ∼220 kD, with multiple glycosylation sites. Each monomer is synthesized as a single polypeptide precursor (HA0) that is subsequently cleaved into HA1 and HA2 subunits (12) by a candidate trypsin-type endoprotease, “tryptase Clara,” that has been isolated from rat bronchiolar epithelial Clara cells (13). Structural information is available only for influenza A HAs of the human H3 (14), swine H9 (15), and avian H5 subtypes (15), and for an influenza C HA esterase fusion (HEF) protein (16). Twenty-two years after the first structural characterization of the HA from the 1968 H3 human pandemic (14), we now present the HA crystal structure from a second human subtype (H1) derived from reassembly of the extinct 1918 influenza virus (4).

The ectodomain of the HA gene (fig. S1) from the 1918 influenza virus A/South Carolina/1/18 (18HA0) was cloned and expressed (fig. S2) in a baculovirus expression system (17, 18). 18HA0 crystallized at pH 5.5 (table S1) (19), and its structure was determined by molecular replacement (MR) to 3.0 Å resolution (table S1) (20). 18HA0 is ∼135 Å in length with two distinct domains (Fig. 1A). The cylindrical trimer has a tightly intertwined “stem” domain at its membrane-proximal base, which is composed of HA1 residues 11 to 51 and 276 to 329 and HA2 1 to 176 (Fig. 1A). The dominant feature of this stalk region is the three long parallel α helices (∼50 amino acids in length), one from each monomer, that associate to form a triple-stranded coiled coil. This region also contains the cleavage site where host enzymes normally cut HA0. The membrane-distal domain consists of a globular “head,” which can be further subdivided (Fig. 1B) into the R region, containing the receptor-binding site and major epitopes for neutralizing antibodies, and the E region, with close structural homology to the esterase domain of influenza C HEF (16).

Fig. 1.

Crystal structure of 1918 HA0 and comparison to other human, avian, and swine HAs. (A) Overview of the 18HA0 trimer, represented as a ribbon diagram. For clarity, each monomer has been colored differently [A (HA1), red; B (HA2), pink; C, dark gray; D, light gray; E, dark green; F, light green]. Carbohydrates observed in the electron-density maps are colored orange and labeled with the asparagine to which they are attached. E95 is not labeled because it is positioned immediately behind C95. The locations of the three receptor-binding and the cleavage sites are indicated on only one monomer. The basic patch is indicated in the light blue ellipse and consists of HA1 residues HisC298, HisC285, HisC47, LysC50, and HisC275 (shown from left to right). This figure was generated with Deepview (48) and rendered with Pov-Ray 3.5 ( (B) Structural comparison of the 18HA0 monomer (red) with human H3 (green), avian H5 (orange), and swine H9 (blue) HAs. Structures were first superimposed on the HA2 domain of 18HA0 through the following residues: 18HA0: A11 to A51, A276 to A324, and B1 to B160; H3 (PDB ID code: 2hmg): A11 to A51, A276 to A324, and B1 to B160; H5 (PDB ID code: 1jsm): A1 to A41, A276 to A324, and B1 to B160; and H9 (PDB ID code: 1jsd): A1 to A41, A267 to A315, and B1 to B160. Figure B was generated with VMD (47) and rendered with Tachyon (49).

The superimposition of other published HAs onto the 18HA0 monomer (Fig. 1B) by means of their HA2 domains [root mean square deviations (RMSDs) in table S2], indicates that 18HA0 is most closely related to the avian H5 subtype (RMSD 2.3 Å), whereas the human H3 subtype is the most divergent (RMSD 4.1 Å). The HA1 receptor (R) region of the human H3 subtype is displaced most (RMSD ∼7.4 Å) from its equivalent 18HA0 domain (Fig. 1B). This conformational variability in the HA globular heads stems primarily from a rigid-body rotation of the HA1 receptor domains relative to the HA2 stem domains about the central threefold axis, as described previously for H3, H5, and H9 (21). 18HA0 is rotated to the same extent (∼17°) as avian H5 relative to H3, whereas swine H9 is intermediate (∼11°) (fig. S3, A and B) (22). The individual HA1 subdomains (R and E regions in Fig. 1B) superimpose with low RMSDs (1.1 to 2.6 Å) (table S3).

The 18HA0 structure reveals a substantially different conformation for the cleavage site loop compared with the uncleaved H3 and cleaved H5 subtypes (Fig. 2, A to D). The H3 cleavage loop projects out from the glycoprotein surface, exposing it to potential proteases (Fig. 2A) (23), whereas the 18HA0 cleavage site abuts the HA surface (Fig. 2B) (24). From ProA324, the 18HA0 cleavage loop extends toward the trimer interface so that ArgA329 now covers the electronegative cavity that is normally occupied by the HA2 fusion peptide after cleavage activation (Fig. 2, E and F). The ArgA329 side chain points toward solvent as a result of repulsion from LysF39 and LysF121 of the adjacent subunit. Thus, ArgA329 is substantially less exposed than the equivalent Gln329 (∼15 Å farther from the HA surface) that was mutated from Arg to determine the crystal structure of an uncleaved H3 subtype (Fig. 2, A and D) (23). In cleaved HA structures at neutral pH, the N-terminal HA2 fusion peptide inserts into a negatively charged cavity and makes up to five hydrogen bonds from its backbone amide groups to conserved HA2 ionizable residues (AspB109 and AspB112). The 18HA0 cleavage loop does not penetrate as far into this cavity (Fig. 2D, left) and makes only one hydrogen bond, SerA325 to AspB112, between the loop and the conserved acidic residues.

Fig. 2.

Structural comparison of the 18HA0 cleavage site with other HAs. HA2 domains for human H3 HA0 (PDB ID code: 1ha0) and cleaved avian H5 HA1/HA2 (PDB ID code: 1jsm) (50) were aligned with 18HA0. The cleavage sites are colored (A) green for human H3 HA0, (B) red for 18HA0 and (C) orange for H5 HA1/HA2. RA329Q, ArgA329→GlnA329. (D) Overlay of cleavage loops of H3 HA0, H1 HA0, and avian H5 HA1/HA2. The two views differ by a rotation of 90° about the threefold vertical axis. (E) Surface views showing the trimer interface and the position of the cleavage loop. (F) Removal of the cleavage loop reveals the electronegative cavity that it masks. Arg329 is colored blue and N-acetyl-glucosamines, indicating the nearby glycosylation sites, are colored gold. (A) to (D) were generated as in Fig. 1, and (E) and (F) were generated with MSMS (46) through the program VMD (47).

These different cleavage loop conformations in the H1 and H3 structures may be influenced in part by nearby glycosylation sites. In 18HA0, AsnA20 and AsnA34 are positioned above and to the side of the cleavage loop (upper right, Fig. 2B), creating a cavity in which a section of the HA0 loop (IleB10 to TrpB21) can be accommodated. Equivalent glycosylation sites (AsnA22 and AsnA38) in uncleaved H3 are farther from the cleavage site loop (Fig. 2A and fig. S1) and, thus, may exert less influence on its conformation. Our attempts to cleave the 18HA0 trimer with tryptase [molecular weight (MW) 135 kD] from human lung failed, even after 20 hours of incubation at 28°C, yet cleavage with trypsin (MW 43 kD) was complete after 20 min [at neutral pH (25)]. Newly synthesized viral proteins are exported to the cell surface by way of the Golgi complex, where the pH becomes more acidic during progression through the secretory pathway (2628). During viral assembly, the 18HA0 cleavage loop could adopt this less exposed conformation to protect from premature cleavage (and membrane-fusion activation) by intracellular proteases.

From the cleavage site, the HA0 main chain traverses the surface below the glycosylation site at AsnA20, where it then forms another previously unobserved miniloop structure with MetB17 at its tip. In H3 and H5, the HA2 TrpB21 indole points up toward the distal end of the HA, allowing the main chain to loop around toward the trimer interface. However, in 18HA0, this miniloop alters the TrpB21 indole direction so that it now faces the HA membrane proximal end (Fig. 2D, right). Interestingly, the H5 cleaved structure (21) is more similar to the cleaved H3 subtype (29) (RMSD 0.7 Å), even though its nearby glycosylation sites map onto the H1 subtype reported here, whereas in the vicinity of TrpB21 (HA2), the 1918 H1 and avian H5 residues are virtually identical (Fig. 3, A and B). Thus, local sequence differences do not easily explain the different cleavage loop conformations. However, a major difference is the HA0 crystallization conditions (pH 5.5 for H1 and pH 7.5 for H5). At low pH, HA0 is reported to be stable (30), and the 18HA0 structure confirms this assertion. Once cleaved, HA is metastable at low pH and undergoes irreversible conformational changes, a prerequisite for membrane fusion and infection (31). What exactly triggers this change is still not clear because only the pre- and postacidification conformations have been determined (14, 32).

Fig. 3.

Structural comparisons of the environment around HA2 Trp21 in 18HA0 and H5 HA1/HA2. The avian H5 structure (PDB ID code: 1jsm) was aligned with the 18HA0 model for comparison, as in Fig. 2. In the avian structure (A), HisA18 and HisA38 are ∼3.7 Å apart, whereas in 18HA0 (B), the same residues are ∼13.5 Å apart. The TrpB21 “flip” in 18HA0 is stabilized by close proximity to the side chains of TrpB14 and AlaB36. This figure was generated in the same way as Fig. 1A.

However, the differences between H1 and H5 structures around HA2 TrpB21 and the cleavage site may hint at a possible mechanism for fusion that until now was not apparent from other structures crystallized at pH 7.5. In H1 and H5 structures, HA2 TrpB21 is surrounded by three pH-sensitive histidines (HisA18, HisA38, and HisB111) that form a largely uncharged pocket at neutral pH (Fig. 3A). These histidines are conserved in human H1, H2, and H5 sequences (, but only at two positions in 1999 H9 sequences (HisA38 and HisB111), and at one position (HisA18) in human H3. Below pH 6.0, this pocket becomes positively charged, and may account for the conformation differences observed between 18HA0 and H5 (Fig. 3B). In 18HA0, HisA38 points toward the tip of the novel loop (IleB10 to TrpB21) and the HisA18 backbone hydrogen bonds to the main chain at TrpB21. Such differences may reveal a mechanism (yet to be tested experimentally) for destabilization or even expulsion of the cleaved fusion peptide from the electronegative cavity in H1 HAs. In this uncleaved structure, the glycoprotein may not undergo its full rearrangement because of the physical connection of the HA2 fusion peptide to HA1.

The primary event in influenza infection is the binding of the virus to the host receptor. The HA receptor-binding site is situated in a shallow pocket in the membrane-distal HA1 domain. The nature of the receptor sialic acid linkage to the vicinal galactose is the primary determinant in lung epithelial cells that differentiates avian viruses from mammals (species barrier). Avian viruses preferentially bind to receptors with an α2,3 linkage, whereas human-adapted viruses are specific for the α2,6 linkage (3335). In particular, residues 226 and 228 have been linked to receptor specificity (36, 37) and are Gln226 and Gly228 in avian viruses but Leu226 and Ser228 in human-adapted H3 viruses (Fig. 4). On the contrary, in 18HA and other human H1 viruses, avian-type residues predominate. Despite these binding-sequence correlations, human H1s, such as A/PR/8/34 and A/FM/1/47, can bind sialic acid receptors with both α2,3 and α2,6 linkages, albeit with reduced affinity for the latter (38, 39). The only difference between swine- and swine-avian–adapted H1 viruses is a Glu190→ Asp190 mutation (4), that, although subtle, leads to a slight increase in the pocket size (upper left side in Fig. 4) that could perhaps increase affinity for α2,6 linkages.

Fig. 4.

Structural comparison of HA receptor-binding sites. (left) 18HA0 receptor-binding site showing key conserved HA1 residues that determine receptor specificity. The H3 and H5 subtypes are shown for comparison. (right) Corresponding solvent-excluded surfaces [probe 1.4 Å, calculated with the program MSMS (46)] of the receptor-binding sites showing the surface cavity for binding the host receptor sialic acid (51). Clearly, the narrower 1918 HA0 binding site (top) resembles the avian H5 structure (bottom), rather than more open human H3 (middle) or swine H9 binding site (25). The Glu190→ Asp190 (E190D) mutation slightly increases the width of the 1918 H1 binding site compared with avian H5. Such a small change may allow accommodations of different conformations of α2,6- versus α2,3-linked sugars. This figure was generated with VMD (47) and rendered with Tachyon (49).

Comparison of 18HA0 structure with other subtypes reveals that the receptor-binding site is more akin to avian than to human HAs (Fig. 4). The 18HA0 pocket is narrower than human H3 and swine H9 HAs, consistent with previous reports that a reduced width of the avian receptor-binding site enhances interaction of Gln226 with Ala138 with α2,3-linked disaccharides (15, 40). The question of how 18HA so efficiently infected humans remains open and will require crystal structures with bound ligands. Other, as yet unidentified, 18HA properties may also facilitate infection. A noteworthy feature is a second patch of exposed ionizable histidines on the HA1 chain adjacent to the vestigial esterase domain (E region in Fig. 1B). Four HA1 histidines (HisA47, HisA275, HisA285, and HisA298) and a lysine (LysA50) (Fig. 1A and fig. S4) contribute to a very basic patch, not observed in other HA structures (fig. S5) (41, 42). For example, H3 subtypes have a glycosylation site at position 285 that masks this region. Other viruses, such as the vesicular stomatitis virus, are reported to depend on histidine protonation for membrane fusion (43). Thus, the pH-sensitive electrostatic properties of this region in 18HA may also assist in the membrane-fusion event, giving the virus a selective advantage during infection. Clearly, experimental testing of such a proposal is required.

Four antigenic sites for H1 HAs, including 18HA, have been identified (Ca, Cb, Sa, and Sb) (4, 44). In 18HA0, with the exception of Ca, all are exposed for antibody recognition (fig. S6). The Ca site is proximal to the oligosaccharide at HA1 Asn95, which may interfere with antibody recognition of both subregions, Ca1 and Ca2. In other reported HA structures, only H9 has a glycosylation site at this position (21). Otherwise, the closest relative (94% identity) to the 1918 HA is the 1930 swine virus (A/swine/Iowa/15/30) that is believed to have evolved from the 1918 virus (45). Because the life-span of swine is short, immunological drift is much slower, and as a result, any differences between these 1918 and 1930 viruses are minimized. Unfortunately, because no virus samples exist for comparison from before 1918, it is difficult to reconstruct the data necessary to fully explain the pathogenicity of this virus. However, statistics reveal that people over 65 years old in 1918 were no more at risk than for a normal pandemic (2), which suggests that people born before ∼1855 may have acquired some resistance to a related H1 virus or other cross-reactive subtype (44).

The publication of the first sequences (4) of the 1918 HA did not reveal any characteristics that were obviously responsible for the extreme pathogenicity of the 1918 pandemic, such as the polybasic residues that make avian viruses so lethal. Notwithstanding, recent data suggest that, when expressed on a mouse-adapted WSN viral backbone (A/WSN/33, H1N1 virus), 18HA is more virulent than a control H1 (A/New Caledonia/20/99) (17). The structural analysis here reveals a viral antigen with a number of previously unobserved features that may have contributed to altered cleavage properties and/or fusion propensity. Such characteristics may have endowed the virus with unusual mechanisms, which have not been seen in subsequent infections, that enhanced host-cell infection, particularly in those individuals with no previous exposure to an antigenically similar virus, which could have provided some antibody protection. Finally, previously unobserved aspects of the expression system used here provide important methodological advances for future production of unprocessed HAs and other homotrimeric viral coat proteins, such as human immunodeficiency virus–1 gp41, for which additional structural information is urgently needed.

Supporting Online Material

Materials and Methods

Figs. S1 to S6

Tables S1 to S3

References and Notes

References and Notes

View Abstract

Stay Connected to Science

Navigate This Article