Structure of the HIV-1 Nucleocapsid Protein Bound to the SL3 Ψ-RNA Recognition Element

See allHide authors and affiliations

Science  16 Jan 1998:
Vol. 279, Issue 5349, pp. 384-388
DOI: 10.1126/science.279.5349.384


The three-dimensional structure of the human immunodeficiency virus–type 1 (HIV-1) nucleocapsid protein (NC) bound to the SL3 stem-loop recognition element of the genomic Ψ RNA packaging signal has been determined by heteronuclear magnetic resonance spectroscopy. Tight binding (dissociation constant, ∼100 nM) is mediated by specific interactions between the amino- and carboxyl-terminal CCHC-type zinc knuckles of the NC protein and the G7 and G9 nucleotide bases, respectively, of the G6-G7-A8-G9 RNA tetraloop. A8 packs against the amino-terminal knuckle and forms a hydrogen bond with conserved Arg32, and residues Lys3 to Arg10 of NC form a 310helix that binds to the major groove of the RNA stem and also packs against the amino-terminal zinc knuckle. The structure provides insights into the mechanism of viral genome recognition, explains extensive amino acid conservation within NC, and serves as a basis for the development of inhibitors designed to interfere with genome encapsidation.

All retroviruses encode a gag polyprotein that is produced in the host cell during the late stages of the infectious cycle and directs the encapsidation of two copies of the unspliced viral genome during virus assembly and budding (1). Concomitant with budding, the gag polyproteins are cleaved by the viral protease into the matrix (MA), capsid (CA), and nucleocapsid (NC) proteins, which rearrange during maturation to form infectious particles (2). Except for the spumaviruses, all retroviral NC proteins contain one or two CCHC-type zinc knuckle domains (Cys-X2-Cys-X4-His-X4-Cys, where X = variable amino acid) (3) (Fig.1A). These domains are critical for viral replication and participate directly in genome recognition and encapsidation (4, 5). Mutations that abolish zinc binding lead to noninfectious virions that lack their genomes (4,6), and mutations of conservatively substituted hydrophobic residues within the CCHC arrays can alter RNA packaging specificity (5). In addition, entire NC domains of HIV-1 and Moloney murine leukemia virus (MoMuLV) have been swapped, resulting in the specific packaging of the non-native genomes (6).

Figure 1

(A) Amino acid sequence of the HIV-1NL4-3 NC protein showing the zinc-binding modes of the two CCHC-type zinc knuckles. Residues that contact the RNA in the NC-SL3 complex are denoted by open letters; asterisks denote residues involved in intermolecular hydrogen bonding. Abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; T, Thr; V, Val; and W, Trp. (B) Nucleotide sequence and secondary structure of the HIV-1NL4-3 Ψ-sequence (12). The dimer initiation and major splice donor sites are labeled DIS and SD, respectively, and the gag initiation codon (AUG) is given in open letters. (C) Sequence of the RNA construct used in our studies (15).

Recognition of the HIV-1 genome occurs by means of interactions between NC and a ∼120-nucleotide region of the unspliced viral RNA known as the Ψ-site, which is located between the 5′ long terminal repeat and the gag initiation codon (7). Extensive site-directed mutagenesis, chemical modification, nuclease accessibility mapping, and free energy computational studies indicate that the HIV-1 Ψ-site contains four stem-loop structures, denoted SL1 through SL4 (Fig. 1B) (8-13). Although mutagenesis experiments indicate that all four of these structures are important for efficient encapsidation (13, 14), SL3 is of particular interest because its sequence is highly conserved among different strains of HIV-1 (10) despite heterogeneity at adjacent positions, and because linkage of SL3 to heterologous RNAs is sufficient to direct their recognition and packaging into virus-like particles (9).

We prepared a 20-nucleotide RNA containing the sequence of SL3 (Fig. 1C) for NC-binding and structural studies by nuclear magnetic resonance (NMR) (15). RNA samples at natural isotopic abundance and enriched in 15N and 13C were prepared with T7 RNA polymerase (15), and recombinant HIV-1 NC protein (strain NL4-3; unlabeled, 15N-labeled, and15N,13C-labeled) was expressed inEscherichia coli and purified under nondenaturing conditions (16). Samples of RNA were titrated with protein to equimolar concentrations, affording a 1:1 NC-SL3 complex with an apparent molecular weight of 13.7 kD (17). Tight binding (dissociation constant K d, ∼100 nM) (18) occurs in the slow exchange regime of the NMR chemical shift time scale. Analysis of the homo- and heteronuclear correlated NMR data enabled complete assignment of the protein and RNA signals and the identification of 59 direct intermolecular nuclear Overhauser effects (NOEs) (19). A portion of the two-dimensional (2D) NOE spectrum (Fig. 2A) shows intermolecular NOEs from the aromatic protons of Phe16 and Trp37 to A8 and G7, respectively, as well as strong intramolecular NOEs. All intermolecular NOEs were assigned unambiguously in 3D pulsed-field gradient– edited13C-filtered,12C-detected NOE data. For example, Ala25-CH3 exhibits NOEs with A8-H1′, A8-H2, A8-H8 (spin diffusion), and G9-H8; Ile24-δCH3interacts with G9-H1′ and G9-H8; and Lys26-Hα interacts with A8-H2 (Fig. 2B).

Figure 2

(A) Portion of the 800-MHz 2D NOE spectrum obtained for the NC-SL3 complex in D2O solution. Intermolecular cross-peaks involving the aromatic protons of Phe16 (F1 knuckle) and Trp37 (F2 knuckle) are labeled. Strong intramolecular Trp37-Phe16 cross-peaks (labeled) are indicative of interknuckle packing in the complex. (B) Selected strips from the 800-MHz 3D13C-filtered,12C-detected HMQC-NOESY data obtained for the NC-SL3 complex in D2O showing unambiguously assigned intermolecular NOE cross-peaks associated with the Ile24-δCH3, Ala25-CH3, and Lys26-Hα protons.

A total of 719 experimental distance restraints (average of 19.4 restraints per refined residue) identified from the NOE data were used to generate an ensemble of 25 distance geometry structures with the program DYANA (20). Stereo views of the best fit superpositions and statistical information for the structure calculations (Fig. 3) demonstrate that the calculations led to good convergence. The stem nucleotides (through the C5-G10 base pair) form an A helix (Fig.4A). The G6 nucleobase of the G6-G7-A8-G9 RNA tetraloop stacks on the C5 base and forms a G6-O6–G9-NH2 hydrogen bond. The remaining tetraloop bases project away from the stem and interact directly with the NC protein (Fig. 4B).

Figure 3

Stereo view of the best fit superposition of the HIV-1 NC-SL3 RNA complex (backbone C, Cα, and N atoms of residues Lys3 to Glu51 of NC, and all heavy atoms of SL3 RNA nucleotides C1 to G14; NC in red and RNA in white). Distance restraints: total, 719; average number of restraints per refined residue, 19.4; intraresidue, 100; sequential, 162; medium and long range, 158; intermolecular, 59; hydrogen bond (four per hydrogen bond), 240. Target function: mean, 0.37 ± 0.05 Å2; maximum, 0.42 Å2; minimum, 0.25 Å2. Individual violations: average maximum, 0.11 ± 0.02 Å; maximum, 0.17; average number of violations (>0.1 Å) per structure, 2.8 ± 0.3. Pairwise root-mean-square deviations relative to mean atom positions: RNA residues C1 to G14 (all heavy atoms), 0.59 ± 0.10 Å; NC residues Lys3 to Glu51 (backbone heavy atoms), 0.36 ± 0.11 Å; backbone atoms of NC (Lys3 to Glu51) plus all heavy atoms of RNA (C1 to G14), 0.63 ± 0.11 Å; all heavy atoms of residues Lys3 to Glu51 plus C1 to G14, 0.93 ± 0.12 Å.

Figure 4

(A) Ribbon diagram of the HIV-1 NC-SL3 Ψ-RNA complex. Color code: 310 helix, purple; F1 knuckle, blue; linker segment, yellow; F2 knuckle, green; zinc atoms, white spheres; RNA, gray, except for the G6(light green), G7 (pink), A8 (violet), and G9 (orange) nucleobases. (B) Space-filling image of the NC protein [rotated ∼90° relative to (A)] showing the G9-F1, A8-F1, and G7-F2 interactions, the orientation of the 310 helix in the RNA major groove, and the extensive intra-NC interactions that occur upon RNA binding [colors as in (A)]. (C) GRASP image showing the nature of G9 nucleobase binding to the hydrophobic cleft of the F1 knuckle (intermolecular hydrogen bonds are shown in green). G7 binds the F2 knuckle in a similar manner. (D) Space-filling representation of the SL3 RNA in the NC-SL3 complex [same orientation as in (A)] showing relative proximities of conserved NC basic residues (blue) to the RNA phosphodiesters (red). Conserved residue Asn5, which forms hydrogen bonds with C11-NH2, G10-N7, and G9-O2′, is shown in purple, and the hydrogen bond between conserved Arg32 and A8 is also shown.

The NC protein consists of two zinc knuckle domains (F1 = Val13 to Ala30; F2 = Gly35 to Glu51) separated by a basic linker segment (Pro31-Arg32-Lys33-Lys34) and flanked by NH2- and COOH-terminal tails (Met1 to Thr12 and Arg52 to Asn55, respectively). Residues Lys3 to Arg10 of the NH2-terminal tail form a 310 helix that binds within the RNA major groove, and the zinc knuckles interact with the exposed bases of the RNA tetraloop (Fig. 4A). G9 interacts specifically with the F1 knuckle by binding to a hydrophobic cleft formed by the side chains of conservatively substituted residues Val13, Phe16, Ile24, and Ala25. The Phe16 and Ala25 backbone NH groups located at the bottom of the cleft form hydrogen bonds with G9-O6, and the Lys14-CO backbone oxygen forms a hydrogen bond with G9-H1 (Fig. 4C). G7 interacts in a very similar manner with the F2 knuckle, with the nucleotide base packing in a hydrophobic cleft formed by conservatively substituted Trp37, Gln45, and Met46 side chains; the exocyclic G7-O6 oxygen forms a hydrogen bond with the backbone NH atoms of Trp37 and Met46, and the G7-H1 proton forms a hydrogen bond with Gly35-CO. Thus, both zinc knuckles bind specifically to exposed guanosines and form hydrogen bonds to groups that normally engage in Watson-Crick hydrogen bonding in A helices, and this may serve as the primary mode by which CCHC zinc knuckles contribute to sequence-specific RNA binding (21). This binding mechanism is substantially different from that of CCHH-type zinc fingers from eukaryotic transcription factors; the latter mechanism mainly involves interactions between side chains of α-helical residues and base pairs in the DNA major groove (22, 23).

The base of the remaining tetraloop nucleotide, A8, makes hydrophobic contacts with the Ala25-CH3, Phe16-CβH2, and Asn17-CβH2 groups of the F1 knuckle (Fig. 4B) and forms a hydrogen bond with the side-chain Nɛ-H proton of Arg32 (Fig. 4D). This arginine is highly conserved among the known strains of HIV-1 (substituted by Lys in only three of 94 published sequences) (24), whereas most of the other basic sites in NC do not substantially discriminate between Arg and Lys residues. Mutation of Arg32 to Gly results in a reduction in genome packaging (to 10% of that found in the wild type) and abolishes infectivity (25). Thus, the Arg32-A8 hydrogen bond provides a rationale for the high conservation of Arg32 and its extreme sensitivity to site-directed mutagenesis.

The HIV-1 NC protein contains 14 additional basic sites with conserved Arg or Lys residues (24), 10 of which participate in intra- or intermolecular interactions in the NC-SL3 complex. Salt bridges involving the Lys38-Glu51 and Lys14-Glu21 pairs appear to stabilize the folding of the F2 and F1 domains, respectively, and a salt bridge between Lys33 and Glu42 appears to stabilize F2 knuckle-linker interactions. The side chain NH3 + of Lys47 is located between the 3′- and 5′-phosphodiesters of G7 in a manner that neutralizes repulsions and anchors the F2 knuckle to RNA. Similarly, Lys26 anchors the F1 knuckle to RNA through electrostatic interactions with the 3′-phosphodiester of G9. The side chains of Lys20, Arg29, Lys34, and Lys41 project into solution and do not form salt bridges in the complex.

Conserved basic sites also exist at positions 3, 7, 10, and 11 of the 310 helix (24), and in the NC-SL3 complex these residues make the following electrostatic contacts with phosphodiester groups of the RNA stem: Lys3 G10 (3′-P), Arg7 C1 (3′- and 5′-P), Arg10 U2 (3′-P, 5′-P, or both), and Lys11G4 (3′-P) (Fig. 4D). In addition to these nonspecific interactions, the side-chain carbonyl of Asn5 forms a hydrogen bond with the exposed exocyclic NH2 proton of C11, and the Asn5-NH2 group is poised to interact with the 2′-hydroxyl of G9 and the N7 atom of G10. Interestingly, Asn5 is also highly conserved among the known strains of HIV-1 (94% Asn and 3% Gln, compared with 33% Asn and 62% Gly for Asn8), and Asn5 is the only NC residue that makes specific hydrogen-bonding contacts with the RNA stem. Although site-directed mutagenesis of Asn5 has not been performed, its high conservation is consistent with the structural implication that this residue is important for RNA recognition.

Strong-intensity interfinger NOE cross-peaks observed in the complex, but not for the free protein (26), indicate that binding is also promoted by the formation of extensive intraprotein interactions (Fig. 4B). The 310 helix packs tightly against the F1 knuckle by hydrophobic interactions involving conservatively substituted Phe4 of the 310 helix and Val13 and Ile24 of the F1 knuckle. In addition, the zinc knuckles pack tightly together as a result of hydrophobic interactions between Trp37 of F2 and Phe16, Asn17, and Gly19 of F1, and by a hydrogen bond from Trp37-Hɛ1 to the backbone carbonyl of Phe16. Finally, residues that link the two zinc knuckles (Pro31-Lys34) adopt a single conformation that is stabilized by extensive hydrogen bonding (Asn17-NHE to Cys28-Sγ, Asn17-NHZ to Pro31-CO, and Asn17-Oδ1 to Lys33-NH). Mutations likely to destabilize the linker structure, such as Ala30 → Pro and Pro31 → Leu, lead to poorly infectious and noninfectious particles, respectively (25), and thus it appears that high-affinity binding to SL3 is mediated by the formation of extensive intramolecular interactions, in addition to the specific protein–nucleic acid interactions described above. These findings are analogous to those observed recently in a DNA complex with a three–zinc finger domain of transcription factor IIIA, where flexible linker segments become structured and numerous finger-finger contacts are made upon binding to DNA (23).

The SL3 RNA-NC complex differs from other structurally characterized protein-RNA complexes (27), most of which are characterized by purine-purine base pairs that widen the major groove and allow penetration of α-helical (28) or β-sheet (29) segments, or by combinations of interactions (30). In the NC-SL3 complex, non–A-helical torsion angles associated with the G9 phosphodiesters lead to a kink in the RNA backbone and a widening of the major groove, allowing penetration of the smaller 310 helix. The structure of the tetraloop differs markedly from those of the GNRA class (31), in which three of the bases are stacked and involved in intramolecular hydrogen bonding. In this respect, the NC-SL3 complex is similar to that of the bacteriophage MS2 coat protein–operator stem-loop structure in which exposed tetraloop bases participate in specific intermolecular hydrogen bonding (32). In general, protein-RNA interactions occur by means of an adaptive binding mechanism in which flexible RNA nucleotides become ordered upon binding (27), and this also appears to be the case for the SL3 RNA (33).

Retroviral genome recognition occurs in the cytosol before budding, and its mechanism is difficult to study and appears complex. Although SL3 alone is sufficient to direct packaging of heterologous RNAs (9), its deletion from the native genome does not fully abrogate packaging (13). Deletion of stem loops SL1, SL3, and SL1+SL3 reduces packaging to 19%, 12%, and 5%, respectively, of that found in the wild type (13). Also, isolated SL1, SL3, and SL4 RNAs have affinities of ∼100 to 200 nM for the NC protein (compared with ∼50 nM for the intact Ψ-site) (12). Thus, it is likely that in vivo packaging involves more than one gag polyprotein, and in this regard, the inherent flexibility of NC may permit binding of different gag proteins to the other stem loops through different subsets of inter- and intramolecular interactions.

In summary, the NC-SL3 structure provides a rationale for the high conservation of more than 50% of the amino acids in NC, explains the available mutagenesis data, and reveals molecular-level details associated with HIV-1 genome recognition. The NC protein plays essential roles in both early and late stages of the viral replication cycle and is thus an attractive antiviral target. The mutationally intolerant CCHC domains of the HIV-1 NC protein are susceptible to attack by antiviral agents that eject zinc from the zinc knuckles (34), at least two of which are undergoing clinical trials for the treatment of acquired immunodeficiency syndrome (35). Our studies provide the basis for an alternative rational drug design strategy that involves the development of inhibitors that interfere with genome recognition and packaging by competing with the NC or RNA binding sites.


View Abstract

Stay Connected to Science

Navigate This Article