Structure of an active human histone pre-mRNA 3′-end processing machinery

See allHide authors and affiliations

Science  07 Feb 2020:
Vol. 367, Issue 6478, pp. 700-703
DOI: 10.1126/science.aaz7758

Architecture of an mRNA processor

The 3′-end processing of the three major classes of RNA polymerase II transcripts in metazoan cells—polyadenylated messenger RNAs (mRNAs), histone mRNAs, and small nuclear RNAs (snRNAs)—requires three distinct machineries that share common features. Sun et al. reconstituted the active human histone pre-mRNA 3′-end processing machinery and solved its structure at near-atomic resolution by cryo–electron microscopy. This structure provides a basis for understanding the mechanism of the shared catalytic reactions between histone pre-mRNA and canonical pre-mRNA and snRNA 3′-end processing machineries.

Science, this issue p. 700


The 3′-end processing machinery for metazoan replication-dependent histone precursor messenger RNAs (pre-mRNAs) contains the U7 small nuclear ribonucleoprotein and shares the key cleavage module with the canonical cleavage and polyadenylation machinery. We reconstituted an active human histone pre-mRNA processing machinery using 13 recombinant proteins and two RNAs and determined its structure by cryo–electron microscopy. The overall structure is highly asymmetrical and resembles an amphora with one long handle. We captured the pre-mRNA in the active site of the endonuclease, the 73-kilodalton subunit of the cleavage and polyadenylation specificity factor, poised for cleavage. The endonuclease and the entire cleavage module undergo extensive rearrangements for activation, triggered through the recognition of the duplex between the authentic pre-mRNA and U7 small nuclear RNA (snRNA). Our study also has notable implications for understanding canonical and snRNA 3′-end processing.

The 3′-end processing machineries for polyadenylated (1, 2) and histone precursor mRNAs (pre-mRNAs) (3, 4) both use the 73-kDa subunit of the cleavage and polyadenylation specificity factor (CPSF73) to cleave pre-mRNA (5, 6), but the molecular mechanism for their functions is still poorly understood. CPSF73, CPSF100, symplekin, and the 64-kDa subunit of the cleavage stimulation factor (CstF64) compose the histone pre-mRNA cleavage complex (HCC) (Fig. 1, A and B, and table S1), which is equivalent to the mammalian cleavage factor (mCF) for polyadenylated pre-mRNAs (7, 8). The cleavage site in histone pre-mRNAs is located between a conserved stem-loop (SL) that is recognized by SL binding protein (SLBP) and a histone downstream element (HDE) that forms base pairs with the 5′ end of U7 small nuclear RNA (snRNA), forming an HDE-U7 duplex (Fig. 1A). The U7 small nuclear ribonucleoprotein (snRNP) is critical for this processing, and the Lsm11-FLASH complex recruits the HCC to the machinery (912) (see supplementary text in the supplementary materials).

Fig. 1 Overall structure of the human histone pre-mRNA 3′-end processing machinery.

(A) Schematic drawing of the histone pre-mRNA 3′-end processing machinery. F, SmF subunit; E, SmE; G, SmG; D3, SmD3; B, SmB. (B) Domain organizations of Lsm10, Lsm11, and the subunits of HCC. The domains in CPSF100 are shown in slightly darker colors compared with their homologs in CPSF73. The vertical line in the symplekin CTD marks its N-terminal half that interacts with CPSF73. MβL, metallo-β-lactamase; RRM, RNA recognition module. (C) Cryo-EM density at 3.2-Å resolution for the core of the machinery. (D) Schematic drawing of the structure of the core of the machinery, viewed after a 150° rotation around the vertical axis from (C). The proteins are colored as in (A) and (B). The U7 snRNA is dark green, and H2a* is orange. (E) Cryo-EM density for the entire machinery (gray), low-pass filtered to 8-Å resolution to show the density of FLASH and SLBP. The possible density for CTD3 of CPSF73 is indicated with an asterisk. Sympk, symplekin. Structure figures were produced with PyMOL (, unless otherwise noted. (C) and (E) were produced with ChimeraX and Chimera (30).

To prepare a fully recombinant machinery, we reconstituted human U7 snRNP (13) and mixed it with purified human HCC, FLASH (14), and SLBP (15). Using a modified mouse histone H2a pre-mRNA (H2a*) (fig. S1) as substrate, we observed robust cleavage activity generating the authentic product (supplementary text and figs. S2 and S3). Notably, the N-terminal domain (NTD) of symplekin was essential for processing, and its binding partner Ssu72 (16) inhibited the cleavage reaction. A mutation in the active site of CPSF73 abolished the cleavage.

We purified the active machinery (fig. S3F) and obtained a cryo–electron microscopy (cryo-EM) reconstruction at 3.2-Å resolution for its core (Fig. 1, C and D) and a reconstruction at 4.1-Å resolution for the entire machinery (tables S2 and S3 and figs. S4 to S6). The overall structure of the machinery resembles an amphora with one long handle (Fig. 1E, fig. S6B, and movie S1). The machinery core constitutes the body of the amphora, with the U7 snRNA 3′-end SL and the Sm ring at the base and the CTDs of CPSF73 and CPSF100 and the first few helical repeats of the symplekin CTD forming the mouth. CPSF73 and symplekin NTD are positioned opposite each other on the Sm ring (Fig. 1D and fig. S6A). CPSF100 interacts with both CPSF73 and symplekin but does not directly contact the Sm ring (Fig. 1C). The symplekin CTD, FLASH dimer (14), SLBP, pre-mRNA SL, and residues 20 to 65 of Lsm11 form the handle of the amphora (Fig. 1E). The FLASH dimer makes an 80-Å-long connection from the symplekin CTD to the SLBP-SL complex. CstF64 was not observed in the EM density and is not required for cleavage in vitro (supplementary text and fig. S2F).

Twelve consecutive Watson-Crick base pairs in the HDE-U7 duplex were observed in the center of the amphora (Fig. 1D and fig. S1). The metallo-β-lactamase domain of CPSF73, the β-CASP domain of CPSF100, and the concave face of the symplekin NTD (fig. S6C) surround the duplex on three sides (Figs. 1, D and E, and 2A). The interactions are ionic and hydrophilic in nature but involve none of the bases in the duplex (Fig. 2B), which explains earlier observations that base pairing rather than sequence is important for processing (3, 4, 13). The structure revealed an extra, U-U base pair at the bottom of the duplex (Fig. 2C and fig. S1), and analysis of histone pre-mRNA sequences suggested that U-U base pairs are common in HDE-U7 duplexes (fig. S7).

Fig. 2 Recognition of the HDE-U7 duplex and the U7 Sm site.

(A) The HDE-U7 duplex is surrounded by CPSF73, CPSF100, and symplekin NTD, shown as a transparent surface. Lsm11 has interactions with the bottom of the duplex. (B) Electrostatic surface of the proteins in the duplex binding site, showing charged interactions with the backbone of the duplex. (C) A U-U base pair at the bottom of the duplex, flanked on the other face by A19 of U7 snRNA. (D) A C-G base pair in the 3′ CUAG sequence of the U7 Sm site. The base pair is flanked on one side by Arg34 of Lsm10 and on the other by Arg174 of Lsm11. Single-letter abbreviations for the amino acid residues are as follows: F, Phe; K, Lys; R, Arg.

The structure also revealed a Watson-Crick base pair between C28 and G31 of the CUAG sequence at the 3′ end of the U7 Sm site (Fig. 2D and fig. S1). It is flanked by residues from Lsm10 and Lsm11 and assumes a different backbone conformation compared with other Sm sites (Fig. 2D and fig. S8A). In addition, G26 is hydrogen-bonded with C33 of H2a*, providing a direct connection between the Sm site and the pre-mRNA (figs. S1 and S8B). The recognition of the first five Sm site nucleotides (21-AAUUU-25) and U27 is similar to that in spliceosomal Sm rings (figs. S1 and S8B) (17, 18), although there are substantial differences in the extensions of the Sm proteins and the positions of the RNA outside the Sm ring (fig. S8, C to E).

The pre-mRNA substrate (Fig. 3A) is bound in the active site of CPSF73. The correct scissile phosphate, after A26 (fig. S1), is coordinated to the two zinc ions in the active site (Fig. 3B). The A26 base has hydrogen-bonding interactions to its N1 and N6 atoms, which is consistent with the preference for an adenine at the cleavage site (3, 4) (fig. S9, A to C). C25 has weak density (Fig. 3A) and is not recognized by CPSF73. This binding mode of the pre-mRNA clearly illuminates the molecular mechanism for the cleavage reaction. The hydroxide ion that is a bridging ligand between the two zinc ions (6) is the nucleophile that initiates the cleavage reaction (Fig. 3B), and the 3′ oxyanion of A26, the leaving group, is protonated by His396, which is activated by Glu204. Glu204, His396, and the ligands to the zinc ions are conserved among CPSF73 homologs (19, 20), including integrator complex subunit 11 (IntS11), the endonuclease for snRNA cleavage (21). Therefore, the conformation of the machinery observed here is likely poised for the cleavage reaction. Except for the brief moment during EM grid preparation, the sample was kept at 4°C or on ice, which slowed the reaction (12) and allowed us to observe the pre-mRNA in the CPSF73 active site. There are substantial differences in the orientation of the β-CASP domain compared with the orientation in ribonuclease J (22, 23) (fig. S9D) and especially in the binding modes of the RNA substrate (fig. S9E).

Fig. 3 CPSF73 is in an active state, poised for the cleavage reaction.

(A) Cryo-EM density for H2a* nucleotides bound in the CPSF73 active site. The scissile phosphate is indicated with a black arrow. (B) The endonuclease mechanism of CPSF73. The positions of the zinc ions (gray spheres) and the bridging hydroxide (red sphere) are based on the crystal structure of CPSF73 alone (6) (Protein Data Bank ID 2I7V), and EM density is observed for the two zinc ions. The position of the sulfate ion observed in the earlier structure is shown using thin sticks. D, Asp; E, Glu; H, His. (C) Overlay of the structure of CPSF73 in the active state observed here (in color) with the inactive, closed state reported earlier (gray) (6). The metallo-β-lactamase domain was used for the overlay. The rearrangement of the β-CASP domain is indicated with a red arrow, corresponding to a rotation of 17°. (D) Molecular surface of the active site region of CPSF73, colored according to the domains. Lsm10 is located at the rim of the canyon, contacting nucleotides downstream of the cleavage site.

The reported structures of CPSF73 (6) and its yeast homolog Ysh1 (24) are in a closed, inactive conformation. We observed in this study an open, active conformation of CPSF73. A large rearrangement of its β-CASP domain relative to the metallo-β-lactamase domain, corresponding to a rotation of ~17° (Fig. 3C and fig. S9F), is necessary to create a narrow, deep canyon that is only large enough to accommodate single-stranded RNA (Fig. 3D and fig. S9, A, F, and G).

The N- and C-terminal extensions of Lsm10, highly conserved among vertebrate homologs (fig. S10A), play a crucial role in this conformational change for CPSF73. These extensions are placed directly against the β-CASP domain (fig. S11A) and have extensive steric clashes with its closed conformation (Fig. 3C and fig. S9F), likely helping to trigger the activation of CPSF73. In addition, a segment in the C-terminal extension of Lsm10 (residues 107 to 110) is positioned at the rim of the canyon (fig. S9A) and forms a part of the binding site for the 3′ portion of the substrate (Fig. 3D).

The recognition of the HDE-U7 duplex may be the critical event to initiate the conformational rearrangement in CPSF73, which is consistent with the requirement of symplekin NTD for cleavage. On the other hand, the NTD-Ssu72 complex is incompatible with the structure observed here, as Ssu72 would clash with the duplex as well as with CPSF73 (fig. S6D), explaining the inhibitory effect of Ssu72 (supplementary text).

Besides the rearrangement in CPSF73, an extensive change in the architecture of HCC is required for activation. We recently showed that mCF (or HCC) in an inactive state has a trilobal structure and is highly dynamic (25). In contrast, the HCC structure observed in this study in the active state shows drastic differences compared with the inactive state (Fig. 4A). There are intimate contacts between CPSF73 and CPSF100 in the current structure, and in fact they form a pseudo-dimer (fig. S11, B and C). These may be hallmarks of the active state for HCC (or mCF).

Fig. 4 Schematic of histone pre-mRNA 3′-end processing cycle.

(A) Notable structural differences of HCC in an active state compared with an inactive state. The structure of HCC observed here is docked into the EM density for mCF (gray surface) (25), using the symplekin CTD as the reference. (B) Schematic drawing of the CTD2 domain complex of CPSF73 (light green) and CPSF100 (darker green) and the N-terminal segment of the symplekin CTD (magenta). The CTD complex of IntS9 and IntS11 (28) was docked into the EM density at 4.1-Å resolution (transparent surface) using Chimera. (A) and (B) were produced with Chimera. (C) A putative model for histone pre-mRNA 3′-end processing cycle. The machinery is assembled from the U7 snRNP (state I) with the recruitment of the FLASH dimer (state II) and HCC (state III), followed by the recognition of the pre-mRNA for CPSF73 and HCC activation and pre-mRNA cleavage (state IV). The machinery is likely highly dynamic before the binding of the authentic pre-mRNA, and the possible flexible regions are indicated with curved arrows and dashed lines. After cleavage (state V), the downstream product is degraded by an exonuclease activity, and the machinery can be recycled directly (solid arrow), or possibly disassembled and then reassembled (dashed arrows). State IV corresponds to the structure reported here, with the scissors indicating cleavage by CPSF73, and the other states are models.

The conformational dynamics of HCC (mCF) is due to flexibility in its core, formed by the CTDs of CPSF73, CPSF100, and symplekin (1012, 26, 27) (Figs. 1E and 4B). The CTD of CPSF73 likely has three subdomains (CTD1 to 3), and that of CPSF100 has two subdomains (Fig. 1, B and E, and fig. S12A). The CTD1 subdomains of CPSF73 and CPSF100 form a six-stranded β barrel–like structure (fig. S12B). The CTD2 subdomains form a separate complex, which makes only limited contact with the CTD1 complex, contributing to the flexibility in HCC (mCF). The overall structure of the CTD2 complex is similar to that of IntS11 and IntS9 (28). The first two helices of the symplekin CTD pack against the helices in the CTD2 complex (Fig. 4B) in the core of HCC (mCF).

The structure showed that HCC is recruited to the machinery directly by both FLASH and Lsm11 through two tethering contacts. Residues in FLASH N-terminal to the coiled-coil domain, including the LDLY motif (10), interact with the symplekin CTD (fig. S10C), and residues 107 to 118 of Lsm11 interact with CPSF73 (figs. S10B and S11D and supplementary text). These observations explain earlier data showing the importance of the LDLY motif in FLASH and residues 65 to 130 in Lsm11 for HCC recruitment (10, 11, 29).

Mutagenesis and biochemical experiments supported the structural observations (supplementary text). HCC recruitment was abolished by mutating the FLASH LDLY motif or symplekin CTD (fig. S13A). Removing the N- and C-terminal extensions of Lsm10 greatly reduced the cleavage activity without affecting U7 snRNP or machinery assembly (fig. S13, B to D). Moreover, the Lsm10 mutants showed misprocessing of the pre-mRNA. Therefore, these extensions may also play a crucial role in correctly positioning CPSF73 for the cleavage reaction. Mutating as few as two symplekin NTD residues that interact with the HDE-U7 duplex (fig. S6C) greatly reduced the cleavage activity (fig. S13E). Finally, the experiments also provided evidence for an Lsm11-FLASH-SLBP-SL quaternary complex (Fig. 1E and fig. S13F).

The structure of the machinery suggests how it may be assembled for processing (Fig. 4C, movie S2, and supplementary text) and provides a molecular foundation to understand and explain the large body of biochemical and functional data on histone pre-mRNA 3′-end processing (3, 4). The structure also has important implications for understanding canonical pre-mRNA and snRNA 3′-end processing. The binding mode of the histone pre-mRNA in CPSF73 is likely similar for canonical pre-mRNAs and snRNAs, and the active conformation of mCF for canonical pre-mRNAs is likely to be the same as that of HCC observed in this study. The comparison to the structure of mCF in an inactive state suggests that the correct architecture of this cleavage module is another critical requirement for the activation of the processing machineries.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S13

Tables S1 to S3

References (3173)

Movies S1 and S2

References and Notes

Acknowledgments: We thank L. Yen, D. Bobe, E. Eng, and R. Grassucci for data collection at the New York Structural Biology Center; M. Ebrahim and J. Sotiris for grids screening at the Evelyn Gruss Lipper Cryo-Electron Microscopy Resource Center at The Rockefeller University; and K. Xiang and D. Tan for initial studies for this project. Funding: This research was supported by NIH grants R35GM118093 (to L.T.) and R01GM029832 (to W.F.M. and Z.D.). W.S.A. was also supported by a fellowship from the Raymond and Beverley Sackler Center for Research at Convergence of Disciplines at Columbia University Medical Center. The Simons Electron Microscopy Center at the New York Structural Biology Center is supported by grants from the Simons Foundation (349247), NYSTAR, NIH (GM103310, S10 OD019994), and Agouron Institute (F00316). Author contributions: Y.S. produced the HCC and prepared all the samples for the EM analysis, carried out the mixing experiments, and performed model building and structure refinement. Y.Z. carried out EM data collection and analysis, EM reconstruction, and model building and refinement. W.S.A. developed the protocols for reconstituting the U7 snRNP and its complex with FLASH-SLBP-H2a*. Z.D. and X.-C.Y. carried out the cleavage assays. L.T., Z.D., T.W., and W.F.M. supervised the research and analyzed the data. L.T. wrote the paper, with substantial contributions from Z.D., Y.S., Y.Z., and W.F.M. All authors commented on the paper. Competing interests: The authors declare no competing interests. Data and materials availability: The atomic coordinates and the EM maps can be accessed in the Protein Data Bank (ID 6V4X).

Stay Connected to Science

Navigate This Article