Structural basis for transcriptional start site control of HIV-1 RNA fate

See allHide authors and affiliations

Science  24 Apr 2020:
Vol. 368, Issue 6489, pp. 413-417
DOI: 10.1126/science.aaz7959

One guanosine determines transcript fate

Transcripts of the HIV-1 RNA genome can be either spliced and translated into viral proteins or packaged into new virions as a progeny genome. The path taken depends on whether the transcript contains one guanosine at the 5′ terminus (1G) rather than two or three (2G or 3G). Brown et al. used nuclear magnetic resonance spectroscopy to show that 1G transcripts adopt a dimeric structure that sequesters a terminal cap required for translation and splicing but exposes sites that bind to the HIV-1 Gag protein, which recruits the genome during viral assembly. Conversely, 2G or 3G transcripts have the cap accessible, but Gag-binding sites are sequestered. Therefore, a single guanosine acts as a conformational switch to determine the fate of HIV-1 transcripts.

Science, this issue p. 413


Heterogeneous transcriptional start site usage by HIV-1 produces 5′-capped RNAs beginning with one, two, or three 5′-guanosines (Cap1G, Cap2G, or Cap3G, respectively) that are either selected for packaging as genomes (Cap1G) or retained in cells as translatable messenger RNAs (mRNAs) (Cap2G and Cap3G). To understand how 5′-guanosine number influences fate, we probed the structures of capped HIV-1 leader RNAs by deuterium-edited nuclear magnetic resonance. The Cap1G transcript adopts a dimeric multihairpin structure that sequesters the cap, inhibits interactions with eukaryotic translation initiation factor 4E, and resists decapping. The Cap2G and Cap3G transcripts adopt an alternate structure with an elongated central helix, exposed splice donor residues, and an accessible cap. Extensive remodeling, achieved at the energetic cost of a G-C base pair, explains how a single 5′-guanosine modifies the function of a ~9-kilobase HIV-1 transcript.

All viral constituents required for HIV-1 replication are encoded within a single integrated proviral DNA and expressed using a single promoter (1). Diversification of transcript function is achieved primarily by splicing, which produces mRNAs encoding the viral envelope and accessory proteins, and by regulated frameshifting during translation of unspliced transcripts to produce the Gag and Gag-Pol polyproteins. Some unspliced transcripts do not function as mRNAs but are instead selected for packaging into assembling virions as progeny genomes (gRNA). Genomes are packaged as dimers (24), a requirement for strand transfer–mediated recombination during reverse transcription (5). Dimerization, packaging, and other RNA-dependent functions required for viral replication are mediated by conserved elements within the HIV-1 5′-leader (1, 68), and there is considerable evidence that transcript structure and function are established by the dimerization state of the leader (24). Although dimerization could be modulated by a riboswitch-like mechanism (810), recent studies indicate that dimerization and function are instead controlled at the level of transcription by heterogeneous start site usage (11, 12).

The HIV-1 promoter contains three sequential guanosines that can function as the transcription initiation site (U3-R junction) (Fig. 1A). Cells infected with the laboratory-adapted NL4-3 strain of HIV-1 (subtype B; HIV-1NL4-3) utilize all three start sites to express transcripts containing one, two, or three 5′-guanosines (1G, 2G, or 3G, respectively). Most of these RNAs (~90%) are transcribed with 1G or 3G 5′-ends (11, 12), consistent with a predominant “twinning” transcription initiation mechanism (12). Like eukaryotic mRNAs, HIV-1 transcripts are cotranscriptionally capped by a 5′,5′-triphosphate–linked 7-methylguanosine (Fig. 1A) (1316). Capping is important for RNA splicing, nuclear export, translation, and metabolic stability (17). 5′-Capped 1G transcripts (Cap1G) preferentially form dimers in vitro (12) and are selectively packaged into assembling virions in infection assays (11, 12), whereas those containing two or three guanosines (Cap2G and Cap3G) preferentially form monomers and are retained in cells and enriched on polysomes (12).

Fig. 1 Heterogeneous transcriptional start site usage modulates HIV-1MAL RNA dimerization and function.

(A) Three guanines (red) can serve as alternative transcription start sites. Transcripts are cotranscriptionally capped by 7-methylguanosine. (B) 5′ ends of cellular and virion RNAs expressed from transiently transfected HIV-1 MAL-GPP-pA analyzed by RNase protection. Lane 1: molecular size standards. Lanes 2 to 4: HIV-1MAL RNA standards; 2G, 3G and 4G protected products served as mobility standards for Cap1G, Cap2G, and Cap3G ends, respectively. Lanes 5 and 6: protected fragments from RNA samples harvested from 293T cells transfected with the HIV-1 derivative MAL-GPP-pA (lane 6) and from virus produced by these cells (lane 5). Lanes 7 and 8: protected fragments from RNA samples harvested from mock-transfected 293T cells (lane 8) and from media produced by these cells (lane 7). (C) Full-length leader RNAs (L371) that begin with a single capped guanosine or two noncapped guanosines favor the dimer, whereas RNAs with an additional 5′-guanosine or cap favor the monomer. nt, nucleotides.

To understand how transcriptional addition of as few as one or two 5′-guanosines modulates RNA dimerization and fate, we probed the structures of Cap1G, Cap2G, and Cap3G HIV-1 leader RNAs by deuterium (2H)–edited nuclear magnetic resonance (NMR) and examined their abilities to interact with cellular proteins important for RNA processing and metabolic stability. Studies focused on the MAL strain of HIV-1, which is widely distributed among humans (M group subtype A; HIV-1MAL) (18). The HIV-1MAL leader contains a dimer-promoting GUGCAC palindrome and adopts a monomer-dimer equilibrium insensitive to the presence of the cognate NC protein (19) (fig. S1). Cultured 293T cells were transiently transfected with an HIV-1 vector containing the first 368 nucleotides of HIV-1MAL followed by the NL4-3 strain gag/pol sequence (MAL-GPP-pA), and 5′-end sequences of cellular and virion-associated RNA transcripts were determined by RNase protection assays (12). Of the three potential 5′-ends encoded by the provirus, only Cap1G and Cap3G RNAs were detected (Fig. 1, A and B). Cap1G transcripts were enriched in virions produced from MAL-GPP-pA–transfected cells (>95%), whereas Cap3G transcripts were retained in cells during virus replication (Fig. 1B). As observed for HIV-1NL4-3 RNAs, in vitro–transcribed HIV-1MAL Cap1G 5′-leader RNAs (Cap1G-L) preferentially formed dimers in vitro under physiological-like conditions [PI buffer (10 mM phosphate, 1 mM Mg2+, 122 mM K+; pH 7.4)], whereas Cap3G-L (and Cap2G-L) preferentially formed monomers (Fig. 1C). These findings confirm that HIV-1 subtypes A and B encode 5′-leaders with similar start site–dependent dimerization propensities and gRNA versus mRNA control.

The secondary structure of the dimeric HIV-1MAL Cap1G leader [capped residue G3 through G359 [Cap1G-L359]2; 232 kDa] was probed by 2H-edited NMR. Sequential and long-range adenosine-H2–detected nuclear Overhauser effects (NOEs), which are diagnostic of RNA secondary structure (20) and can be used for larger RNAs (8, 21), were detected for leader constructs prepared with the following nucleotide-specific 2H labeling schemes (superscripts denote sites of protonation; all other sites are deuterated, e.g., A2r indicates adenosines protonated at C2 and ribose carbons): A2r, A2rGr, A2rUr, A2Gr, and A2Ur. NMR assignments were corroborated by comparisons with spectra obtained for fragment RNAs (fig. S2) and by database 1H-NMR chemical shift analyses (22). Adenosine-H2 NOEs were assigned for stretches of residues within most of the expected secondary structures of [Cap1G-L359]2, including those in the transcriptional activation (TAR), primer binding (PBS), dimerization (DIS), packaging (Ψ), and cleavage and polyadenylation site [poly(A)] elements (Fig. 2, A and B) (8, 21). Well-resolved NOEs for A111 and A351 confirmed the presence of the U5:AUG helix that pairs upstream sequences with those flanking the gag start codon (Fig. 2B and fig. S3) (23).

Fig. 2 NMR and structural findings for the dimeric Cap1G form of the HIV-1MAL leader.

(A) Portions of 2D NOE spectra for 2H-labeled [Cap1G-L359]2 samples (A2r, black; A2Ur, red; A2Gr, blue). (B) Assigned A-H2 NOEs and deduced secondary structure; discrete functional elements differentiated by color and intermolecular “kissing” interactions are denoted as shaded residues. (C) Portions of 2D NOE spectra showing similarities of Cap-CH3 to Cap-H8, G3, and G108 NOEs observed for G8-[Cap1G-L359]2 (blue), and truncated leader fragment Cap1G-LTPUA (black). (D) (dashed black lines). (E) Portion of the Cap1G-LTPUA NMR structure showing Cap NOEs (dashed lines) indicative of end-to-end stacking of the TAR (brown) and poly(A) (cyan) helices.

NOEs between the Cap methyl group and protons of G3 and G103 were detected for the intact dimeric leader, suggesting that the Cap is sandwiched between these residues (Fig. 2C). Similar spectra were obtained for a truncated portion of the leader comprising the TAR and poly(A) hairpins and the U5:AUG helix (Cap1G-LTPUA) (Fig. 2, C and D, and fig S4). The improved sensitivity and spectral resolution obtainable for the smaller Cap1G-LTPUA construct (42 kDa) were sufficient for three-dimensional (3D) structural studies (table S1 and fig. S4). Residues G103, Cap, and G3 are sequentially stacked, as are residues C55, C56, and C57, leading to an overall end-to-end stacking of the TAR and poly(A) hairpins (Fig. 2E and fig. S4). The structure juxtaposes the Cap and C56 bases in a manner consistent with Cap:C56 base pairing.

NMR studies were also conducted with monomeric 2G and 3G leader RNAs. Nuclear Overhauser effect spectroscopy (NOESY) spectra obtained for noncapped 3G-L371 (Fig. 3A) and capped Cap2G-L371 and Cap3G-L371 (Fig. 3, B to E) RNAs exhibited similar cross-peak patterns indicative of a common structure, with residues of AUG forming a hairpin rather than the U5:AUG helix observed in the dimer (8). NOE patterns for TAR, Ψ, and a portion of PBS were similar to those observed for [Cap1G-L359]2, indicating that these substructures exist in both the monomeric and dimeric forms of the leader (figs. S5 and S6). Long-range A58-H2 NOEs to G1 (but not G103) ribose protons were observed [confirmed by using a sample in which only the G1 guanosine and adenosines were protonated (Fig. 3A)], indicating that the lower portion of the poly(A) hairpin was remodeled (12). In addition, signals diagnostic of the DIS hairpin in the [Cap1G-L359]2 RNA were absent in the 3G-L371 NOESY spectra. 1H-NMR chemical shifts of the H2 and H8 protons of adenosines A65, A66, A72, A73, and A75 to A77 were also different from those observed for [Cap1G-L359]2, and none of these adenosines exhibited long-range NOEs (fig. S6A). However, adenosine residues in the downstream portion of poly(A) exhibited long-range NOEs and chemical shifts indicative of base pairing with residues of DIS (Fig. 3B). Similar NMR results were obtained for Cap3G-L371 (Fig. 3, C and E) and Cap2G-L371 (Fig. 3D). The capped RNAs exhibited additional NOEs between A58-H2 and the Cap methyl and ribose protons (Fig. 3, D and E). The NMR data are consistent with a secondary structure that is substantially remodeled relative to that of the dimeric [Cap1G-L359]2 leader, with residues of TAR, SD, Ψ, and AUG adopting independently folded hairpin structures (Fig. 3F) and residues of poly(A), U5, and DIS forming an elongated helix (Fig. 3F).

Fig. 3 NMR and structural findings for the monomeric Cap3G, Cap2G, and 3G forms of the HIV-1MAL leader.

(A to F) Portions of 2D NOE spectra [(A) to (E)] used to make the secondary-structure assignments shown in (F). (A) Noncapped 3G-L371 spectra (AH, black; A2rGr, green; A2rCr, blue; G1HA2r, red); G1 is the only protonated guanosine in the G1HA2r-labeled sample, enabling unambiguous assignment of A58-H2 NOEs to G1. (B) NOE spectra for A2-Cap3G-L371 (black) showing A-H2 to A-H2 NOEs of the extended poly(A)-DIS helix matches a noncapped analog lacking the PBS loop (A2rGr-4G-L371-ΔPBS, blue; see fig. S5). (C) Cap3G-L371 spectra (AH, black; A2rGr, green) showing that the Cap is in close proximity to A58. (G) Comparison of 2D NOE spectra for A2-Cap2G-L371 (black) (D) and TAR fragment Cap2G-TARm (G) (red), showing NOEs between the Cap and A58. (E) Similar Cap-to-A58 NOEs were observed for Cap3G-L371 (black) and a Cap3G-TARm RNA (red). (H) Portion of the NMR structure of Cap3G-TARm, showing the disordered cap residue.

No sequential NOEs between the Cap and G1 residues were detected in spectra obtained for the intact Cap3G-L371 or Cap2G-L371 leader RNAs. Spectra with improved sensitivity and resolution were obtained for constructs corresponding to the lower portion of the capped TAR hairpin (Cap2G-TARm in Fig. 3D and Cap3G-TARm in Fig. 3, E and G). NMR chemical shifts and NOE patterns were similar to those observed for analogous residues in Cap3G-L371 and indicated that the Cap residue does not stack with G1 or G2 and is disordered (Fig. 3H and fig. S7).

The NMR data suggest that structural remodeling of the capped 2G and 3G RNAs relative to the capped 1G leader is a consequence of a single additional base pair (Cap:C57 in Cap2G-L or G1:C57 in Cap3G-L). Consistent with this hypothesis, replacement of C57 by G in Cap1G-L371 to ablate the base pair at the terminus of the poly(A) helix (C57:G103) (Fig. 2B) shifted the monomer-dimer equilibrium to the monomer (Fig. 4A), and compensatory substitution of G103 to C reverted the equilibrium toward dimer (Fig. 4A). This indicates that structural remodeling is achieved at an energetic cost equivalent to a single G-C base pair (~3 to 5 kcal/mol) (24). The compensatory mutant did not fully recapitulate the dimerization properties of the wild-type sequence, suggesting that the C57:G103 base pair, which is conserved in 99% of deposited sequences with reported full-length 5′ untranslated regions (see the supplementary materials), is important both for stabilizing the dimeric form of the Cap1G transcripts and enabling remodeling through C57:Cap or C57:G3 base pairing in the Cap2G and Cap3G transcripts, respectively.

Fig. 4 Influence of 5′-guanosine number on RNA function.

(A) Disruption of a single base pair (C57-G103) by C57→G mutagenesis disrupts Cap1G-L371 dimerization. Compensatory G103C substitution substantially restores dimerization. (B) eIF4E binds Cap3G-L371 (C denotes the eIF4E:RNA complex), but not the noncapped RNA. (C) At low ionic strength, eIF4E binds the M conformer of Cap1G-L359, but not M* or the Cap1G-LLock construct. (D) Similar results were obtained in PI buffer. (E and F) The 5′-RNA exonuclease (XRN-1) and decapping enzyme (hDcp2) are independently unable to degrade Cap1G or Cap3G leader RNAs. In the presence of both enzymes, the Cap1G-LLock resists degradation over time (E) and with increasing hDcp2 (F) compared to the cap-exposed Cap3G-L371 leader. (G) Mechanism for transcriptional control of HIV-1 RNA function. Capped RNAs containing two or three 5′-guanosines (green dots) adopt a monomeric structure that exposes the cap (red dot) and enables RNA processing and metabolism, whereas those with a single capped G adopt a cap-sequestered conformation that promotes dimerization and packaging.

Because the cap is exposed in monomeric Cap2G and Cap3G leader RNAs and sequestered between the TAR and poly(A) helices in the dimeric Cap1G leader, we examined the abilities of these RNAs to interact with two cellular cap binding proteins: the eukaryotic translation initiation factor 4E (eIF4E) and the human decapping enzyme hDcp2. eIF4E initiates recruitment and assembly of the eukaryotic translation machinery (25), and cap recognition and removal by hDcp2 is required for 5′-exonucleolytic mRNA turnover (26). Native agarose gel shift experiments revealed that eIF4E binds the Cap2G and Cap3G leader RNAs with affinities (Kd ~ 0.7 μM) similar to that of a single capped guanosine (Kd = 1.44 μM) (25) (results for noncapped 3G-L371 and Cap3G-L371 are shown in Fig. 4B). Cap1G-L titrations were also conducted under nonphysiological low–ionic-strength conditions that favor the monomer (10 mM NaCl, no Mg+2). Under these conditions, Cap1G-L359 adopts two monomeric conformations that are resolvable on Tris-borate gels, M and M* (Fig. 4C and fig. S8). The M conformer exhibits gel mobility similar to that of the cap-exposed Cap3G-L monomer, whereas M* exhibits mobility of a cap-sequestered Cap1G-L359 mutant engineered to form a monomer while retaining the secondary structure of the dimer [DIS residues A273 to A281 mutated to GAGA to prevent dimerization (27); Cap1G-LLock) (Fig. 4C and figs. S2 and S8]. Titration of Cap1G-L359 with eIF4E resulted in a mobility shift for the cap-exposed M conformer, but not for the cap-sequestered M* species, even at (twofold) excess molar ratios of eIF4E (Fig. 4C). Cap1G-LLock was likewise unable to bind eIF4E (Fig. 4C). Differential eIF4E binding between Cap3G-L371 and Cap1G-L was also observed in PI buffers (Fig. 4D). The capped 1G leader also exhibited reduced sensitivity to hDcp2-dependent 5′-exonuclease digestion compared to the capped 3G leader RNA (Fig. 4, E and F). These findings indicate that cap-binding proteins important for mRNA translation and processing bind efficiently to monomeric, cap-exposed forms of the leader but not to the cap-sequestered Cap1G dimer.

Our findings support a structure-based mechanism for diversification of HIV-1 transcript function by heterogeneous transcriptional start site usage (Fig. 4G). Analogous to riboswitches, which undergo structural remodeling and functional activation upon binding of small exogenous ligands (28), the structure and function of HIV-1 transcripts are controlled by transcriptional addition of one or two 5′-guanosines. Transcripts that begin with a single 5′-capped guanosine adopt a dimeric branched multihelical structure that promotes dimerization and exposes Gag binding sites while simultaneously sequestering the 5′ cap, the major splice donor site, and the translational start site. Cap sequestration is likely to inhibit both translation and splicing, as both processes depend on initial interactions with cap-binding proteins (29) and may also inhibit decapping-dependent 5′-exonuclease–dependent degradation of the gRNA during cytoplasmic transport and particle assembly. Subgenomic flaviviral RNAs are similarly protected from exonuclease digestion by structural sequestration of 5′-nucleotides (30). HIV-1 transcripts that contain additional 5′-guanosines adopt an alternate structure that inhibits dimerization (12), sequesters Gag-binding sites (8), and exposes the cap, the major splice donor site, the gag start codon, and unstructured residues immediately downstream of the TAR hairpin. Cap exposure enables eIF4E binding, and the unstructured poly(A) residues immediately downstream of the TAR hairpin could facilitate eIF4E-dependent association of additional factors required for splicing and translation (31). A genome-wide study of mammalian promoter architecture by cap analysis of gene expression revealed that twinned transcriptional start sites comprise a substantial subset of mammalian promoters (32, 33). Start site–dependent modulation of transcript structure and cap exposure could serve as a general mechanism for expanding cellular RNA function.

Supplementary Materials

Material and Methods

Figs. S1 to S8

Table S1

References (3454)

References and Notes

Acknowledgments: We thank HHMI staff at UMBC for technical assistance, C. Burnett (University of Michigan Medical School) for help with manuscript preparation, and R. Sprangers (University of Regensburg, Germany) for helpful suggestions. Funding: This research was supported by research grants from the National Institutes of Health (NIAID 8R01 AI50498 to M.F.S. and A.T., NIAID U54 AI150470 to A.T. and D.A.C.). J.D.B. was supported by NIH predoctoral fellowship F31 GM123803; M.L., K. Singh, M.O., T.R., and F.G.G. were supported by an NIGMS grant for enhancing minority access to research careers (MARC U*STAR 2T34 GM008663); M.O. and T.R. were supported by an HHMI undergraduate education grant; A.S.I., M.L., K. Singh, M.O., T.R., and F.G.G. were supported by the Meyerhoff Scholars Program at UMBC. Author contributions: M.F.S. and A.T. supervised and raised financial support for the studies. M.F.S. and J.D.B. conceived the study and designed the NMR and in vitro experiments. J.D.B., A.S.I., H.C., Y.D., L.G., S.H.C., M.W.L., I.C., K. Singh, M.O., T.R., U.O., J.H., F.G.G., K. Stewart, G.B., D.F., B.E., and P.C. prepared RNA samples, conducted in vitro and NMR experiments, and helped with NMR data analysis; A.T. and S.K. designed and conducted the virology experiments; D.A.C. developed the amber force field for the RNA Cap and provided advising for amber calculations; M.F.S., J.D.B., A.T., and S.K. wrote the manuscript, with contributions from all coauthors. Competing interests: The authors declare no competing interests. Data and materials availability: Depositions for Cap1G-LTPUA and Cap3G-TARm structures include atomic coordinates (PDB ID 6VU1 and 6VVJ, respectively) and NMR chemical shifts and restraints for structure calculations (BMRB ID 30723 and 30724, respectively).

Stay Connected to Science

Navigate This Article