Research Article

Structure of a human catalytic step I spliceosome

See allHide authors and affiliations

Science  02 Feb 2018:
Vol. 359, Issue 6375, pp. 537-545
DOI: 10.1126/science.aar6401

Structure of the human spliceosome

Catalyzed by the spliceosome, precursor mRNA splicing proceeds in two steps: branching and exon ligation. Transition from the C (catalytic post-branching spliceosome) to the C* (catalytic pre-exon ligation spliceosome) complex is driven by the adenosine triphosphatase/helicase Prp16. Zhan et al. report the cryo-electron microscopy structure of the human C complex, showing that two step I splicing factors stabilize the active site and link it to Prp16.

Science, this issue p. 537

Abstract

Splicing by the spliceosome involves branching and exon ligation. The branching reaction leads to the formation of the catalytic step I spliceosome (C complex). Here we report the cryo–electron microscopy structure of the human C complex at an average resolution of 4.1 angstroms. Compared with the Saccharomyces cerevisiae C complex, the human complex contains 11 additional proteins. The step I splicing factors CCDC49 and CCDC94 (Cwc25 and Yju2 in S. cerevisiae, respectively) closely interact with the DEAH-family adenosine triphosphatase/helicase Prp16 and bridge the gap between Prp16 and the active-site RNA elements. These features, together with structural comparison of the human C and C* complexes, provide mechanistic insights into ribonucleoprotein remodeling and allow the proposition of a working mechanism for the C-to-C* transition.

Each cycle of precursor messenger RNA (pre-mRNA) splicing executed by the spliceosome proceeds in two sequential steps of transesterification: branching and exon ligation (1, 2). The products of the branching reaction are an intron lariat–3′ exon intermediate and a 5′ exon, both of which remain bound to the catalytic step I spliceosome (C complex). The ribonucleoprotein remodeling of the C-to-C* transition is driven by the RNA-dependent DEAH-family adenosine triphosphatase (ATPase)/helicase Prp16 (3), which is thought to bind and pull the single-stranded RNA sequences in a 3′-to-5′ direction (4).

The first atomic model of an intact spliceosome was generated in 2015 from the cryo–electron microscopy (cryo-EM) structure of the Schizosaccharomyces pombe intron lariat spliceosome (ILS) at 3.6-Å resolution (5, 6). Since 2016, a burst of structural advances has allowed atomic visualization of the Saccharomyces cerevisiae and human spliceosomes (720). Here we report the cryo-EM structure of the human C complex at an average resolution of 4.1 Å. Compared with the yeast C complex (10, 11), the structure of the human C complex contains 11 additional proteins, including four peptidyl prolyl isomerases (PPIs), the exon junction complex (EJC), U5-40K, RBM22, and Aquarius. The structural features and comparison with the human C* complex (19, 20) offer mechanistic insights into the transition from the C to the C* complex.

Cryo-EM analysis

The human spliceosomes were assembled using HeLa S3 nuclear extract and a synthetic pre-mRNA. After a pilot experiment (fig. S1A), we chose to use the small molecule BN82685, which is known to inhibit exon ligation (21), at a final concentration of 250 μM in the splicing assay. The spliceosomes were purified and chemically cross-linked. The spliceosome-containing fractions were pooled on the basis of RNA analysis (fig. S1B), and the sample was examined by EM under negative staining (fig. S1C) and cryogenic conditions (fig. S1D). Micrographs were collected on a Titan Krios electron microscope, yielding 1,464,033 auto-picked particles.

A subset of 157,388 particles was used to generate the initial references for the human spliceosomal complexes (fig. S2), which were applied to the entire data set. After three-dimensional classifications, 95,064 particles gave a reconstruction of the human C complex at 4.5-Å resolution (fig. S3). Further selection of 53,633 particles yielded a reconstruction at an average resolution of 4.1 Å (figs. S3 to S7 and tables S1 and S2). The local resolutions varied greatly in the human C complex (fig. S4), reaching 3.8 to 4.2 Å in the core but considerably lower in the peripheral regions (table S2). A soft mask on the Brr2 region improved the local EM map to an average resolution of 6.5 Å (figs. S3 and S7 and table S2).

Overall structure

The refined model of the human C complex contains 15,479 amino acids from 47 proteins and 414 nucleotides from three small nuclear RNAs (snRNAs; U2, U5, and U6) and the pre-mRNA (Fig. 1A and tables S1 and S2), with a combined molecular mass of about 1.8 MDa. About two-thirds of the 15,479 amino acids have side chains, and the rest were built as Ala. The poly-Ala sequences are mostly assigned to proteins in the peripheral regions, including Prp16 and a few components (Prp19 tetramer, Spf27, Syf1, and part of Cdc5) of the nineteen complex (NTC, also known as the Prp19-CDC5L complex). The 47 proteins in the atomic model include 11 from U5 snRNP, nine from U2 snRNP, seven from the NTC, six from the NTC-related (NTR) complex, four from the EJC, five splicing factors (SRm300, Cwc22, CCDC49, CCDC94, and Prp17), four PPIs (tentatively assigned as PPIL1, CypE, PPIG, and PPWD1), and Prp16 (Fig. 1A).

Fig. 1 Cryo-EM structure of the human catalytic step I spliceosome (C complex) at 4.1-Å resolution.

(A) Overall structure of the human C complex. Two perpendicular views are shown. The atomic model includes 47 proteins, three snRNAs (U2, U5, and U6), a 5′ exon, and a 44-nucleotide intron lariat, with a combined molecular mass of ~1.8 MDa. Among the modeled 15,479 amino acids, about two-thirds have side chains. All structural images in the figures were created using PyMol (40). (B) An overall structural comparison of the human (left) and S. cerevisiae (right) C complexes. In both structures, the U2, U5, and U6 snRNAs are colored blue, orange, and green, respectively; the pre-mRNA is highlighted in red; and the shared protein components are in gray. Protein components that are specific to either structure are color-coded. Compared with the yeast C complex, the human C complex has 11 additional protein components. These include four peptidyl prolyl isomerases (PPIs), four proteins of the exon junction complex (EJC; eIF4AIII, MAGOH, Y14, and MLN51), U5-40K, RBM22, and Aquarius. The EM density map also allows unambiguous identification of the ATPase/helicases Prp16 and Brr2, which were absent in our own preparation of the yeast C complex (10). Notably, the human protein RBM22 appears to be derived from the yeast proteins Ecm2 and Cwc2 (22). PDB, Protein Data Bank.

Compared with the yeast C complex (10, 11), the human C complex adopts a similarly extended, highly asymmetric organization but is considerably larger, with 11 additional protein components that are scattered in discrete regions (Fig. 1B). These 11 proteins include the EJC (eIF4AIII, MAGOH, Y14, and MLN51), the U5 snRNP component U5-40K, the NTR proteins Aquarius and RBM22, and four PPIs. In contrast to the yeast C complex structure (10), Prp16 and Brr2 are unambiguously identified in the human C complex. The human protein RBM22 bears significant sequence homology to the yeast NTR proteins Cwc2 and Ecm2 (22), likely reflecting convergent evolution. Other than Cwc2 and Ecm2, all proteins in the structure of the yeast C complex have corresponding functional orthologs in the human C complex (Fig. 1B).

We observed previously unknown structural features for a number of the human proteins that have functional orthologs in S. cerevisiae. Most notably, we identified structural features of CCDC49 and CCDC94 that have not been recognized in their S. cerevisiae counterparts Cwc25 and Yju2. These structural elements play a crucial role in connecting the active-site RNA elements to Prp16. Because the human and yeast C complexes share nearly identical structural features in the core of the spliceosome, we focus our discussion on the previously unknown structural elements and the human-specific proteins.

The RNA map and the active site

The RNA sequences in the core regions of the human C complex display clear features in the EM density map (fig. S5). Of the 414 modeled nucleotides, 174, 84, and 97 are assigned to U2, U5, and U6 snRNA, respectively (table S2). The remaining 59 nucleotides belong to pre-mRNA, with 13 nucleotides in the 5′ exon and 46 in the intron lariat–3′ exon intermediate. The overall structure of the three snRNAs, the 5′ exon, and the intron lariat–3′ exon intermediate in the human C complex is similar to that in the yeast C complex (10, 11) (Fig. 2, A and B).

Fig. 2 The RNA elements and the active-site metals in the human C complex.

(A) Overall structure of the core RNA elements. The color-coded RNA structure of the human C complex is shown alone in the left panel and in comparison with that of the S. cerevisiae C complex (colored gray) in the right panel. In the C complex, the 5′ exon remains anchored to loop I of U5 snRNA. The nucleophile-containing adenine nucleotide from the branch point sequence (BPS) is covalently joined to the guanine nucleotide at the 5′ end of the 5′SS. The disordered RNA sequences of the intron lariat are indicated by the dotted line. (B) Coordination of the metal ions in the active site of the human C complex (left) and comparison with that in the S. cerevisiae C complex (right). All structural elements of the S. cerevisiae C complex are shown in gray. Both the location and the coordination of the three structural metals (magenta) in the active site of the human C complex are nearly identical to those in the S. cerevisiae C complex. However, despite a similar location for the catalytic M1 metal (red) in the human C complex, its coordination is different than in the yeast C complex. In addition, compared with the yeast C complex, the catalytic M2 in the human complex is translocated by about 3 Å. (C) A close-up view of the coordination of the M1 metal. Owing to a positional shift of the ribose of the guanine nucleotide (G-1) at the 3′ end of the 5′ exon, the M1 metal is coordinated by the 3′-OH of G-1 only in the S. cerevisiae C complex, not in the human C complex.

The active site of the human C complex comprises the intramolecular stem loop (ISL) of U6 snRNA, the catalytic triplex between U2 and U6, loop I of U5 snRNA, and five metal ions. Three of the five metals, presumably Mg2+, appear to stabilize the ISL by neutralizing the negative charges of the RNA backbone phosphates (Fig. 2B, left panel). All three metals are identically coordinated in the yeast C complex (10, 11) (Fig. 2B, right panel).

Two catalytic metal ions known as M1 and M2 are thought to play indispensable but reciprocal roles in the branching reaction and exon ligation (23, 24). Compared with the yeast C complex (10), the position of M2 is shifted by ~3 Å in the human C complex (Fig. 2B, right panel). The coordination of M1 in the human C complex is different from that in the yeast complex, where M1 is specifically bound by the 3′-OH at the 3′ end of the 5′ exon (10, 11). In the human C complex, the ribose at the 3′ end of the 5′ exon is flipped by ~10 Å away from M1 (Fig. 2C). Consequently, M1 is only coordinated by three ligands: the phosphate of the nucleophile-containing adenine nucleotide in the branch point sequence (BPS) and the phosphates of G72 and U74 of U6 snRNA (Fig. 2B).

The step I splicing factors and Prp16

Like in the yeast C complex (10, 11), the conformations of the active-site RNA elements in the human C complex are sustained by 15 surrounding protein components, particularly CCDC49 (Cwc25 in S. cerevisiae) and CCDC94 (Yju2 in S. cerevisiae), the NTC components Isy1 and the N-terminal fragment of Cdc5, and the ribonuclease H (RNaseH)–like domain of Prp8 (Fig. 3, A and B, and fig. S6). The ATPase/helicases Brr2 and Prp16 in the peripheral regions are connected to the core of the spliceosome mainly through CCDC49 and CCDC94.

Fig. 3 Structure and functional implications of the step I splicing factor CCDC49 in the human C complex.

(A) An overall view of the protein components (color-coded) surrounding the active site. The ATPase/helicase Prp16 is placed between Brr2 and the core of the spliceosome. The step I splicing factors CCDC49 and CCDC94 correspond to the yeast proteins Cwc25 and Yju2, respectively. (B) A surface view of the key proteins that stabilize the active-site conformation. The N termini of both CCDC49 and CCDC94 insert deeply into the active site to interact with the RNA elements. The RNaseH-like domain of Prp8, the NTC component Isy1, and the N-terminal portion of Cdc5 also stabilize the RNA elements at the active site. (C) Domain structure of CCDC49 and its sequence alignment with Cwf25 (S. pombe) and Cwc25 (S. cerevisiae). Three invariant Trp residues of CCDC49—Trp12 in the N-terminal plug, Trp24 in the N-helix, and Trp72 in the C-terminal hook—interact with CCDC94, the 1585-loop of Prp8, and the RNaseH-like domain of Prp8, respectively. (D) A close-up view of CCDC49. The three invariant Trp residues are indicated. (E) A close-up view of the N-terminal plug of CCDC49 and its interactions with surrounding components. The N-terminal Gly2-Gly3-Gly4 motif is inserted deeply between the U2/BPS duplex and the U2/U6 duplex at the active site. Trp12 of CCDC49 interacts with hydrophobic amino acids from CCDC94. (F) A close-up view of the N-terminal portion of the N-helix. Trp24 of CCDC49 directly contacts the 1585-loop of Prp8. (G) A close-up view of the C-terminal hook of CCDC49. The EM density map for this region is shown. The C-terminal portion of the N-helix interacts with Prp16, whereas the hook binds to the RNaseH-like domain of Prp8, with Trp72 of CCDC49 playing a critical role. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.

The structurally identified fragment of CCDC49 comprises an extended α-helix (N-helix; residues 17 to 61), which is capped by an N-terminal plug (residues 2 to 16) and a C-terminal hook (residues 62 to 80) (Fig. 3C). Three Trp residues (Trp12, Trp24, and Trp72) are invariant among the yeast orthologs. The N-terminal plug is placed into the active site (Fig. 3, B and D), with a conserved Gly-rich motif (Gly2-Gly3-Gly4 in CCDC49) penetrating a small crevice generated by the U2/BPS duplex and helix Ia of the U2/U6 duplex (Fig. 3E and fig. S6). Residues 5 to 16 of the N-terminal plug interact with the major groove of the U2/BPS duplex and the zinc-binding domain of CCDC94. Trp12 stacks against Phe76 of CCDC94. The N- and C-terminal ends of the N-helix interact with the 1585-loop of Prp8 and Prp16, respectively (Fig. 3, D and F). Trp24 directly contacts the tip of the 1585-loop. The hook associates with the RNaseH-like domain (Fig. 3G). Notably, the interactions of CCDC49 with Prp16 and the RNaseH-like domain are unknown in the S. cerevisiae C complex owing to the lack of the C-terminal hook and half of the N-helix in the structurally identified portion of Cwc25 (10).

Although both CCDC94 and CCDC130 share sequence homology with the S. cerevisiae Yju2 (Fig. 4A), the fine features of the EM density map are only consistent with CCDC94, not CCDC130. The structure of CCDC94 consists of an extended N terminus, a Cys4-type zinc-binding motif, and three α-helices (H1, H2, and H3) at the C terminus (Fig. 4B). Like CCDC49, the N terminus is inserted deeply into the active site, interacting with the 5′ exon, the U2/BPS duplex, Isy1, and Prp8 (Fig. 4C and fig. S6). The zinc-binding motif contacts the ISL, the U2/BPS duplex, the N-helix of CCDC49, and the N-terminal portion of Cdc5 (Fig. 4D and fig. S6). The α-helices H1, H2, and H3 interact with Prp16, Prp8, the N-terminal portion of Cdc5, and the C-terminal HAT repeats of Syf1 (Fig. 4E). Notably, the interactions involving the CCDC94 helices H1, H2, and H3 have not been characterized in the S. cerevisiae C complex because these helices remain structurally unidentified in Yju2 (10, 11).

Fig. 4 The step I splicing factor CCDC94 and the ATPase/helicase Prp16 in the human C complex.

(A) Sequence alignment involving the N-terminal fragment and the zinc-binding domain of CCDC94. The corresponding sequences from CCDC94 and CCDC130 (human), Cwf16 and Saf4 (S. pombe), and Yju2 (S. cerevisiae) are aligned. Compared with CCDC130 and Saf4, CCDC94 and Cwf16 share a higher degree of sequence similarity with Yju2. (B) The structure of CCDC94 is shown below its schematic domain representation. CCDC94 sequentially consists of an N-terminal fragment, a Cys4-type zinc-binding motif (ZF), and three α-helices known as H1, H2, and H3. The protein and RNA components that interact with these domains are indicated below the schematic domain representation. (C) The N terminus of CCDC94 inserts deeply into the active site, making direct interactions with Isy1, Prp8, the U2/BPS duplex, and the 5′SS. (D) The zinc-binding motif binds the N terminus of CCDC49, U6 snRNA, and the U2/BPS duplex. (E) The helix H2 interacts with Prp16, Prp8, and the N-terminal domain of Cdc5, whereas the helix H3 binds to the N-terminal domain of Cdc5 and the C-terminal region of Syf1. (F) Prp16 is connected to the active-site RNA elements through its interactions with the two step I factors CCDC49 and CCDC94. Two perpendicular views are shown. Prp16 is thought to pull the 3′-exon sequences in a 3′-to-5′ direction, dissociating CCDC49 and CCDC94 and eventually forming the step II activated spliceosome (the C* complex). The red dotted line indicates the direction of the RNA sequence at the 3′ end of the intron lariat.

The elongated helices of CCDC49 and CCDC94 directly associate with Prp16 (Fig. 4F). Specifically, the N-helix of CCDC49 binds the RecA2 domain of Prp16, whereas the helices H2 and H3 of CCDC94 interact with the OB and Ratchet-like domains. Prp16, in turn, interacts with the Jab1/MPN domain of Prp8 and the N-terminal plug (residues 107 to 179) and the PWI-like domain (residues 1030 to 1181) of Brr2 (fig. S7A). The Jab1/MPN domain also stably associates with Brr2 (25). The RNA-binding site of Prp16 remains unoccupied (fig. S7, B and C), consistent with the structural comparison between Prp16 of the human C complex and Prp43 bound to RNA (26) (fig. S7D). Prp16 is thought to remodel the C complex by pulling the 3′-end sequences of the intron lariat–3′ exon intermediate (3, 4, 27, 28). In the human C complex, the putative RNA-binding site of Prp16 is located ~60 Å away from the last ordered nucleotide at the 3′ end of the intron lariat–3′ exon intermediate (Fig. 4F). Nine or 10 RNA nucleotides are minimally required to cover the distance, consistent with the previous observation that shortening of the 3′ exon led to the inhibition of exon ligation (4, 29).

PPIs

PPIs are absent in the structure of the S. cerevisiae C complex (10, 11). Eight PPIs are known to be present in the human spliceosome (30) (fig. S8 and table S3), of which four have been identified in our structure of the human C complex. CypE contains an RNA recognition motif (RRM; residues 6 to 99) and a canonical PPI domain (residues 108 to 266) (fig. S8). In the human C complex, both the RRM and the PPI of CypE associate with the intron lariat (Fig. 5A). The RRM also binds the U2 snRNP component U2-A′ (Lea1 in yeast) and the superhelical protein Syf1; the PPI interacts with U2-A′, Isy1, and the heptameric U2 Sm ring. The C-terminal sequences of SmB contain a conserved motif GLXGPVRGVGGP (where X is any amino acid residue) (Fig. 5A), which is spatially close to the PPI domain of CypE and may serve as a substrate.

Fig. 5 The location and putative mechanism of the PPIs in the human C complex.

(A) CypE associates with the intron lariat through both the N-terminal RNA recognition motif (RRM) domain and the C-terminal PPI domain. The PPI domain of CypE may catalyze proline isomerization in the C-terminal fragment of the SmB protein. Sequence alignment of SmB from five organisms is shown below the structure. Conserved residues are boxed, and invariant residues are shaded red. (B) PPIL1 associates with SKIP, Prp17, and RBM22. Sequence alignment of Prp17 from five organisms is shown below the structure. Prp17 may serve as the substrate for PPIL1. (C) A PPI domain associates with the RNaseH-like domain of Prp8. Sequence alignment of CCDC49 from five organisms is shown below the structure. This PPI might come from PPWD1. (D) A PPI domain interacts with the RT Finger/Palm domain of Prp8. The C-terminal sequences of Ad-002 may serve as the substrate for this PPI, which is tentatively assigned to PPIG. Green stars indicate the potential substrate residues of the PPIs. The red and purple dotted lines indicate the directions of the RNA sequences.

PPIL1 (Cyp1 in S. pombe) closely interacts with the extended sequences of both SKIP (Prp45 in yeast) (31, 32) and the step II factor Prp17 (Fig. 5B). PPIL1 also binds both the N- and C-terminal domains of RBM22. The PPIL1-bound sequences of Prp17 contain the motif NPX7PXXGP, which likely is the substrate. The remaining two PPIs associate with the RNaseH-like domain or the RT Finger/Palm domain of Prp8. One PPI is located close to the C-terminal sequences of CCDC49, which harbor the sequence motif FAPX15DP but lack a conserved Gly residue, which is usually required for substrate specificity (33, 34) (Fig. 5C). The other PPI may catalyze proline isomerization of Ad-002 (Cwc15 in yeast) (Fig. 5D). These two PPIs may be PPWD1 and PPIG, although the current EM density map does not allow conclusive assignment.

All four PPIs bind the extended sequence elements of the protein components in the human spliceosome. The switch of the proline cis-trans configuration usually results in alteration of the local structure, which is known to trigger signaling and affects a range of important cellular processes (34, 35). These PPIs are thought to join spliceosomal complexes at distinct stages of the splicing reaction to regulate pre-mRNA splicing (30, 36). The structural changes caused by proline isomerization likely lead to alteration of the interactions between the substrate protein and surrounding components of the spliceosome.

Structural comparison with the human C* complex

The structural resolution of the human C complex allows comparison with the human C* complex (19, 20) and visualization of the compositional and conformational changes. These changes are concentrated in two aspects.

First, the protein components at the peripheral regions on one side of the spliceosome have undergone major reshuffling, exemplified by the large-scale movement of the U2 snRNP (Fig. 6A, left panels). The overall translocation of the U2 snRNP exceeds 100 Å (fig. S9 and movies S1 and S2). The RRM domain of CypE binds the intron sequences upstream of the BPS in the human C complex but interacts with helix IIa/b of U2 snRNA in the human C* complex (19) (fig. S9). In contrast, the RRM domain binds Syf1 identically in both the C and C* complexes. Such changes may be facilitated by proline isomerization in the U2 snRNP component SmB. During the C-to-C* transition, the protein PRKRIP1 is recruited into the C* complex to stabilize the new position of the U2 snRNP. Brr2 is translocated by about 150 Å in the C-to-C* transition (Fig. 6A, middle panels, and fig. S10). Prp16 is dissociated in the transition, and Prp22 is recruited into the C* complex (19).

Fig. 6 A working model of the transition of the human spliceosome from the C complex to the C* complex.

(A) Structural comparison of the human C and C* complexes. Three surface views of the human C and C* complexes are shown in the upper and lower panels, respectively. The protein components that are specific to either complex or undergo marked positional movement are color-coded. (B) A working model of the transition from the C to the C* complex. An overall structural comparison between the human C and C* complexes is shown in the upper panels. The protein components that remain unchanged in the transition are removed to better display the key players. A cartoon diagram of the working model is shown in the lower panels. In this model, the transition is triggered by Prp16, which, through ATP binding and hydrolysis, pulls the 3′-end single-stranded sequences of the intron lariat–3′ exon intermediate. This leads to the dissociation of the step I factors CCDC49 and CCDC94 and recruitment of the step II factors Prp18 and Slu7 and the protein PRKRIP1. Prp16 is dissociated, presumably owing to the loss of interactions with the step I factors, and Prp22 is recruited into the C* complex.

Second, the protein components at or near the active site exhibit major changes. During the transition, CCDC49 and CCDC94, along with Isy1, are dissociated (Fig. 6A, right panels). The step II splicing factor Slu7 is recruited into the C* complex; Prp17 and the RNaseH-like domain of Prp8 undergo major positional changes (movie S3). During the C-to-C* transition, the WD40 domain of Prp17 is translocated by ~70 Å, the RNaseH-like domain of Prp8 is rotated by about 70°, and the β-finger is inserted into the gap between the U6/5′ splice site (5′SS) duplex and the U2/BPS duplex (fig. S10).

Discussion

Our structural analysis enables us to posit a working model for the C-to-C* transition (Fig. 6B). In the C complex, the active-site RNA elements are stabilized by CCDC49 and CCDC94 together with Isy1, Prp17, and the RNaseH-like domain of Prp8. The ATPase/helicase Prp16 is framed adjacent to the active site mainly through interactions with the two step I factors, and Brr2 binds to the other side of Prp16 from the periphery of the spliceosome. Given their direct interactions with the U2/BPS duplex at the active site, CCDC49 and CCDC94 may be immediately dissociated by the action of Prp16 (27). This is likely followed by the dissociation of Prp16 because it is bound to the spliceosome mainly through the two step I factors. These changes allow the recruitment of Slu7, PRKRIP1, and the ATPase/helicase Prp22, ultimately forming the C* complex.

The step I factors CCDC49 and CCDC94 stabilize the active site RNA conformation to prime the branching reaction. This structural finding is fully consistent with biochemical and genetic observations (3739). The structure of the human C complex also reveals notable features of CCDC49 and CCDC94 that were not previously known from the structure of the S. cerevisiae C complex. These structural features—the elongated N-helix and the hook in CCDC49, and the three α-helices H1, H2, and H3 in CCDC94—directly interact with the ATPase/helicase Prp16 and the central protein Prp8 (Figs. 3 and 4), linking Prp16 to the active-site RNA elements.

Structural analysis suggests a key role for the 1585-loop in the uploading of the 3′ exon into the active site (fig. S11). In the human C-to-C* transition, the location of the 1585-loop flips around the sequence element at the 3′ side of the lariat junction. In the yeast P complex (1416), the 1585-loop is located in a position that resembles that in the human C* complex (19) but is sandwiched by the 3′SS of the intron lariat and the ligated exon (fig. S11). During these processes, the conformation of the 1585-loop undergoes dramatic changes.

In this study, the splicing inhibitor BN82685 was used to enrich the human C complex. The EM density map in the core of the spliceosome does not support the presence of the inhibitor. Although direct binding of BN82685 cannot be ruled out, a more likely scenario is that the inhibitor regulates the kinase or phosphatase activity that is essential for pre-mRNA splicing (21). The structure of the human C complex shows that it contains 11 additional proteins compared with the S. cerevisiae C complex (10, 11). The majority of these additional proteins in the human spliceosome play a regulatory role and are a testimony to the more sophisticated nature of pre-mRNA splicing in mammals.

Supplementary Materials

www.sciencemag.org/content/359/6375/537/suppl/DC1

Materials and Methods

Figs. S1 to S11

Tables S1 to S3

References (4167)

Movies S1 to S3

References and Notes

Acknowledgments: We thank L. Zu for synthesizing the compound BN82685, the Tsinghua University Branch of the China National Center for Protein Sciences (Beijing) for the cryo-EM facility, and the Bio Computing Platform for computational support. This work was supported by funds from the National Natural Science Foundation of China (31621092 and 31430020) and the Ministry of Science and Technology (2016YFA0501100 to J.L.). The atomic coordinates have been deposited in the Protein Data Bank with the accession code 5YZG, and the EM maps have been deposited in the Electron Microscopy Data Bank with the accession code EMD-6864.
View Abstract

Navigate This Article