Research Article

Structure of a yeast catalytic step I spliceosome at 3.4 Å resolution

See allHide authors and affiliations

Science  26 Aug 2016:
Vol. 353, Issue 6302, pp. 895-904
DOI: 10.1126/science.aag2235

How spliceosomes make the first cut

In eukaryotes, transcribed precursor mRNA includes noncoding sequences that must be spliced out. This is done by the spliceosome, a dynamic complex in which five small nuclear RNAs and several proteins go through a series of ordered interactions and conformational rearrangements to achieve splicing. Two protein structures provide a look at the first catalytic step in the pathway. Yan et al. report the structure of the activated spliceosome (the Bact complex) at 3.5 Å resolution, revealing how latency is maintained even though the complex is mostly primed for catalysis. Wan et al. report the structure of the catalytic step 1 spliceosome (the C complex) at 3.4 Å resolution; this complex forms after the first step of the splicing reaction.

Science, this issue pp. 904 and 895

Abstract

Each cycle of pre–messenger RNA splicing, carried out by the spliceosome, comprises two sequential transesterification reactions, which result in the removal of an intron and the joining of two exons. Here we report an atomic structure of a catalytic step I spliceosome (known as the C complex) from Saccharomyces cerevisiae, as determined by cryo–electron microscopy at an average resolution of 3.4 angstroms. In the structure, the 2′-OH of the invariant adenine nucleotide in the branch point sequence (BPS) is covalently joined to the phosphate at the 5′ end of the 5′ splice site (5′SS), forming an intron lariat. The freed 5′ exon remains anchored to loop I of U5 small nuclear RNA (snRNA), and the 5′SS and BPS of the intron form duplexes with conserved U6 and U2 snRNA sequences, respectively. Specific placement of these RNA elements at the catalytic cavity of Prp8 is stabilized by 15 protein components, including Snu114 and the splicing factors Cwc21, Cwc22, Cwc25, and Yju2. These features, representing the conformation of the spliceosome after the first-step reaction, predict structural changes that are needed for the execution of the second-step transesterification reaction.

Each cycle of pre-mRNA splicing results in the removal of an intron and the joining of two exons, through two sequential, SN2-type transesterification reactions (13). In the first-step reaction, the 2′-OH of an invariant adenine nucleotide in the branch point sequence (BPS) of an intron serves as a nucleophile to attack the phosphorous atom of the guanine nucleotide at the 5′ end of the 5′ splice site (5′SS), forming an intron lariat–3′-exon intermediate and freeing the 5′ exon. In the second-step reaction, the 3′-OH of the RNA nucleotide at the 3′ end of the 5′ exon serves as a nucleophile to attack the phosphorous atom of the nucleotide at the 5′ end of the 3′ exon, joining the two exons and releasing the intron lariat (2). These two reactions are executed by a highly dynamic spliceosome that assumes at least six distinct states known as the B, Bact, B*, C, P, and ILS complexes (4).

The precatalytic spliceosome (B complex) contains all five small nuclear ribonucleoprotein particles (snRNPs): U1, U2, U4, U5, and U6. Dissociation of U1 and U4 snRNPs and recruitment of the nineteen complex (NTC) and NTC-related complex (NTR) trigger formation of the activated spliceosome (Bact complex). The Bact complex is converted to the catalytically activated spliceosome (B* complex), which executes the first-step reaction. The catalytic step I spliceosome, also known as the C complex, catalyzes the second-step transesterification with the help of a few splicing factors (5). The resulting P complex contains an intron lariat and a ligated exon, which is released in the ILS complex.

The spliceosome is a metalloribozyme (68), and conserved nucleotides in the intramolecular stem loop (ISL) of U6 snRNA coordinate the catalytic magnesium (Mg2+) ions (1, 912). During the first-step reaction, nucleotides at the 3′ end of the 5′ exon are anchored by loop I of U5 snRNA, whereas the 5′SS and BPS are recognized by U6 and U2 snRNA, respectively. The splicing active site, located in a catalytic cavity on the central spliceosomal component Prp8 (13), comprises the ISL of U6 snRNA, helix I of the U2/U6 duplex, loop I of U5 snRNA, and at least two Mg2+ ions (11, 12).

Structures of individual spliceosomal components have been elucidated, primarily through x-ray crystallography (1421). Investigations of the intact spliceosome, which is known for its conformational and compositional variability (3, 22), have relied on electron microscopy (EM). Structures of various spliceosomal complexes over a wide range of resolution limits have been obtained (11, 12, 2342). The 3.6 Å structure of the ILS complex from Schizosaccharomyces pombe unveils a detailed arrangement of U2, U5, and U6 snRNAs and specific interactions at the active site (11, 12). More recently, the 3.5 Å structure of the Bact complex from Saccharomyces cerevisiae reveals how catalytic latency is maintained by the protein components surrounding the active site (43). Here we report the 3.4 Å structure of a spliceosomal C complex, which reveals the inner workings of the RNA elements together with their protein cofactors after the first-step transesterification reaction.

Cryo-EM analysis

Using the NTC component Cef1 as an affinity-tagged protein, we purified a mixture of different spliceosomal complexes and used two-dimensional (2D) and 3D classifications to separate these distinct structural entities (43). Among the original set of 761,767 particles, 84,486 were used for reconstruction of the activated Bact complex at 3.52 Å resolution (43). The strategy of applying multiple simultaneous 3D classifications and merging all relevant classes proved to be important for the maximal inclusion of particles that represent the Bact complex. Starting from the same set of 761,767 particles, we applied the same strategy to identify those that represent the C complex (figs. S1 and S2A). After two rounds of 3D classification, 161,066 particles yielded a reconstruction at an average resolution of 3.95 Å, which, through particle polishing and autorefinement, was improved to 3.41 Å on the basis of the gold-standard Fourier shell correlation criteria (fig. S2B and tables S1 to S4).

The local resolutions vary greatly in the C complex (fig. S2C). The actual resolution in the central regions of the spliceosome reaches 2.9 to 3.5 Å, allowing atomic modeling. At the periphery, however, the EM density becomes contiguous only after being low-pass filtered to 10 Å (fig. S2D). To facilitate model building in the peripheral regions, we performed two more rounds of 3D classification, focusing on only the class that displays structural features in these regions (fig. S3). This effort generated two distinct reconstructions at 3.65 and 4.6 Å (fig. S4 and tables S1 to S4). In the central regions of the 3.41 Å density maps, most secondary structural elements are well defined, and a large proportion of amino acid side chains are identifiable (figs. S5 to S7). The RNA elements at the catalytic center and the surrounding protein components are marked by distinguishable features in the density maps (figs. S8 to S11), allowing atomic modeling of RNA nucleotides.

Overall structure

The refined model of the C complex from S. cerevisiae contains 8587 amino acids from 35 proteins, 377 nucleotides from three snRNAs, and 57 nucleotides from two pieces of the pre-mRNA molecule (Fig. 1A and tables S1 to S4), with a combined molecular mass of ~1.1 MDa. Among the modeled amino acids, 5040 have side chains and the remaining 3547 residues were built as poly-Ala. The poly-Ala sequences are mostly assigned to the NTC core (Prp19 tetramer, Snt309, and part of Cef1), U5 Sm ring, and the U2 snRNP components (Msl1, Lea1, and U2 Sm ring). The 35 protein components in the atomic model include nine from U5 snRNP, nine from U2 snRNP, seven from NTC, six from NTR, and four splicing factors (Cwc21, Cwc22, Cwc25, and Yju2). Notably, the adenosine triphosphatase (ATPase)/helicase Brr2 displays no discernable EM density, likely reflecting its dynamic nature in the C complex.

Fig. 1 Cryo-EM structure of the S. cerevisiae catalytic step I spliceosome (C complex) at 3.4 Å resolution.

(A) EM density map of the C complex at an average resolution of 3.4 Å shown in two perpendicular views. The color-coded protein and RNA components are tabulated on the right. (B) Structure of the C complex. The cartoon representation shown in two views includes 35 proteins, three snRNAs, a free 13-nucleotide (nt) 5′ exon, and a 44-nt intron lariat, with a combined molecular weight of ~1.1 MDa. Among the modeled 8587 amino acids, 5040 have side chains. Figure 1A was prepared using CHIMERA (72). All other structural images were created using PyMol (73).

The C complex has an extended, triangular appearance, with the NTC core located on the back of the assembly (Fig. 1B). The three corners of the assembly are separated from one another by ~300 Å (fig. S12A). The center and bottom side of the triangular-shaped C complex are constituted by Prp8 and Snu114 of U5 snRNP, portions of the three snRNAs, pre-mRNA, at least 10 protein components from NTC and NTR, and four splicing factors (Fig. 1B). The splicing active site, located in the center of the C complex, and its adjacent RNA elements are sandwiched by two layers of protein components. One layer consists of the anchoring component Prp8, the guanosine triphosphatase Snu114, the NTR protein Bud31, and two splicing factors (Cwc21 and Cwc22) (fig. S12B); the other layer comprises four NTC components (Cef1, Isy1, Syf2, and a portion of Clf1), six NTR proteins (Bud31, Cwc2, Cwc15, Ecm2, Prp45, and Prp46), and the splicing factors Cwc25 and Yju2. Intermolecular interactions within these two layers are stabilized by the intrinsically disordered protein Prp45 and, to a lesser extent, by Cwc15.

The RNA map

The RNA elements display clear features in the density maps (fig. S8A). We assigned nucleotides 28 to 55 and 60 to 127 of U5 snRNA (fig. S8, B to D). In addition, 20 contiguous nucleotides at the 3′ end of U5 snRNA were docked into the density maps along with the heptameric Sm ring. Excluding nine nucleotides at the 3′ end, nucleotides 1 to 103 of U6 snRNA were identified in the maps (fig. S9). U2 snRNA in S. cerevisiae contains 1175 nucleotides, most of which are dispensable for pre-mRNA splicing (44). Virtually all functionally important sequences of U2 snRNA are visible in the maps, including 48 nucleotides at the 5′ end, which are responsible for forming duplexes with U6 snRNA (helix I and helix II) and with the BPS. The modeled U2 snRNA also includes the stem loop sequences and the binding sites for the U2 Sm ring and Msl1. Fifty-seven nucleotides are assigned to pre-mRNA, which consists of a free 5′ exon (13 nucleotides) and an intron lariat (44 nucleotides) (figs. S9C and S10).

The three snRNA elements are organized into an extended structure, and the pre-mRNA molecule is placed at the center of the RNA map through extensive base-pairing interactions with the snRNAs (Fig. 2A). Consistent with published evidence (4548), the freed 5′ exon remains bound to loop I of U5 snRNA and is located close to the T-shaped junction of the intron lariat. Five contiguous nucleotides (UGUAU) of the intron, including three at the 3′ half of the 5′SS, are recognized by the U6 snRNA sequences (AUACA) through Watson-Crick base-pairing interactions (Fig. 2B). Nineteen contiguous nucleotides of the intron, including the BPS (UACUAAC), form a duplex with 17 conserved nucleotides of U2 snRNA, producing two single-nucleotide bulges in the pre-mRNA. One bulge is formed by the nucleophile-containing adenine nucleotide in the BPS. The base-pairing interactions between U2 and U6 snRNAs are consistent with published observations (49, 50) and are nearly identical to those observed previously in the S. cerevisiae Bact complex (43) or the S. pombe ILS complex (11).

Fig. 2 Arrangement of the RNA elements in the S. cerevisiae C complex.

(A) Overall cartoon representation of the RNA map displayed in two perpendicular views. The catalytic center comprises the ISL of U6 snRNA, helix I of the U2/U6 duplex, loop I of U5 snRNA, and the Mg2+ ions. After the first-step reaction, the freed 5′ exon remains anchored to loop I. The invariant adenine nucleotide from the BPS is covalently linked to the guanine nucleotide at the 5′ end of the 5′SS. The disordered RNA sequences are indicated by dotted lines. (B) Overall base-pairing interactions among U2 snRNA, U5 snRNA, U6 snRNA, the 5′ exon, and the intron lariat. Canonical Watson-Crick and noncanonical base-pairing interactions are identified by solid lines and dots, respectively.

The active site

The active site of the C complex comprises the ISL of U6 snRNA, helix I of the U2/U6 duplex, loop I of U5 snRNA, and five metal ions that are probably magnesium (Mg2+) (Fig. 3A). Three consecutive nucleotides at the 3′ end of the 5′ exon, modeled as AAG, form a short duplex with the nucleotides U96-U97-U98 in loop I of U5 snRNA. The phosphate group of the guanine nucleotide at the 5′ end of the 5′SS is already covalently bonded to the nucleophile (the 2′-oxygen atom of the invariant adenine nucleotide in the BPS). Three of the five putative Mg2+ ions are located away from the reaction center and likely stabilize the delicate fold of ISL by neutralizing the negative charges of the RNA backbone phosphates.

Fig. 3 Catalytic center and active site of the S. cerevisiae C complex.

(A) Structure of the catalytic center is shown in two perpendicular views. Following the first transesterification reaction, the 2′-oxygen atom of the invariant adenine nucleotide in the BPS is covalently joined to the phosphorous atom of the guanine nucleotide at the 5′ end of the 5′SS. (B) Two close-up views of the active site. Among the two putative Mg2+ ions, M1 is coordinated by phosphate groups from G78 and U80 of U6 snRNA, and M2 is bound by phosphates from A59 and U80. In addition, M1 is bound to the phosphate of the guanine nucleotide at the 5′ end of the 5′SS and the 3′-OH of the nucleotide at the 3′ end of the 5′ exon. The M2 ion, which activates the nucleophile before the first transesterification reaction, is separated from the nucleophile (i.e., the 3′-OH of the invariant adenine nucleotide in the BPS) by ~6 Å.

The other two Mg2+ ions may directly catalyze the two sequential transesterification reactions. The putative M2 ion, which is thought to activate the nucleophile during the first-step reaction (1, 6), is coordinated by the phosphates of A59 and U80 of U6 snRNA (Fig. 3B). M2 is located ~6 Å away from the nucleophile, which likely reflects the postreaction state. During the second-step reaction, M2 is thought to stabilize the leaving group (the 3′-OH of the guanine nucleotide at the 3′ end of the intron) (1, 6). The M1 ion, which stabilizes the leaving group during the first-step reaction (1), appears to be coordinated by four ligands in a planar fashion, with two ligands from the pre-mRNA and two from U6 snRNA (Fig. 3B). The coordinating ligands include the phosphate of the invariant adenine nucleotide in the BPS, the 3′-OH of the nucleotide at the 3′ end of 5′ exon, and the phosphates of G78 and U80 of U6 snRNA.

Notably, the nucleophile for the second-step reaction—the 3′-OH of the nucleotide at the 3′ end of 5′ exon—is already coordinated and activated by M1, which is known to be responsible for the second-step nucleophile activation (1). Although the M1-activated nucleophile is poised to initiate nucleophilic attack, the scissile phosphodiester bond between the intron and 3′ exon is yet to be loaded into the active site. The placement of M2 away from the lariat junction in the C complex makes the reversal of the first-step reaction highly unlikely. Both steps of the pre-mRNA splicing reactions in vitro were shown to be reversible through alteration of the experimental conditions, particularly the identity and concentration of the cations (51).

Comparison of the snRNA elements

Structural analysis of the S. cerevisiae Bact complex supported the prediction that the overall conformation of the snRNA elements in the catalytic center is highly conserved in various spliceosomal complexes (12, 43). Structural resolution of the S. cerevisiae C complex provides another opportunity to scrutinize this prediction. The U5 snRNA of the Bact complex aligns well with that of the C complex, with near-perfect registry for both the phosphodiester backbone and the base-pairing interactions (Fig. 4A and fig. S13A). Applying the same alignment matrix to the entire RNA map, the overall structures of U6 snRNA, the 5′ exon, and a portion of the intron at the 5′ end in the Bact complex superimpose well with those of the corresponding elements in the C complex (Fig. 4A and fig. S13B). A closer examination reveals minor shifts, mostly within ~3 Å, for nucleotides in the ISL of U6 snRNA (Fig. 4A, inset). Despite large conformational changes for the majority of the U2 snRNA sequences, nucleotides 1 to 30 adopt a nearly identical structure between these two complexes (Fig. 4A and fig. S13C). These sequences include helices I and II; the former contributes to formation of the active site. Notably, the U2 snRNA sequences (nucleotides 32 to 47) that form a duplex with BPS and the surrounding intron sequences are translocated by distances of 25 to 95 Å from the Bact to the C complex. This shift presumably occurs in the B* complex, which brings the nucleophile in the BPS in close proximity to the scissile phosphodiester bond for the first-step reaction (52).

Fig. 4 Structural comparison of the RNA elements among the Bact and C complexes from S. cerevisiae and the ILS complex from S. pombe.

(A) Structural comparison of the overall RNA maps between the Bact (43) and C complexes from S. cerevisiae. Comparison of the ISL is shown in the inset. U5, U6, and the 5′ portion of U2 snRNA remain largely the same between the two complexes. (B) Structural comparison of the overall RNA maps between the C complex from S. cerevisiae and the ILS complex from S. pombe (12). The ISL structure is similar between the two complexes (inset). (C) Structural comparison of the pre-mRNA molecules from the three complexes. (D) Close-up comparison of the lariat junction between the C complex from S. cerevisiae and the ILS complex from S. pombe (12). The T-shaped lariat junction in the ILS complex is separated from that in the C complex by 20 to 25 Å. The lariat junction in the C complex must be moved away before the second transesterification reaction can occur.

Analogous to the comparison between the Bact and C complexes, U5 snRNA, U6 snRNA, and the 5′-end portion of U2 snRNA remain structurally similar between the S. cerevisiae C complex and the S. pombe ILS complex (11, 12) (Fig. 4B and fig. S13, A to C). Unlike that in the S. cerevisiae Bact or C complex, the 5′ exon has been released from the ILS complex. The T-shaped junction of the intron lariat in the ILS complex is located in a different position compared with that in the C complex (Fig. 4C and fig. S13D). The guanine nucleotide at the 5′ end of the 5′SS and the invariant adenine nucleotide of the BPS in the ILS complex are separated from the corresponding nucleotides of the C complex by ~25 Å (Fig. 4D). Such translocation likely occurs before the second-step reaction so as to vacate the space for accommodation of the 3′ exon and the preceding intron sequences.

Prp8 and the RNaseH-like domain

Prp8 displays clear EM density for most sequences (fig. S5, A to G). The Jab1/MPN domain exhibits no density and is not modeled in our structure. Both the N domain and the core of Prp8 in the C complex align well with those in the Bact complex (43) (Fig. 5A). Consistent with a role in stabilizing the bound 5′ exon (43), the switch loops (residues 1402 to 1439) in these two complexes adopt an identical conformation. The ribonuclease H (RNaseH)–like domains, however, exhibit a large positional shift of up to 99 Å between the Bact and C complexes. Prp8 of the S. cerevisiae C complex and Spp42 of the S. pombe ILS complex (11) also exhibit very similar conformations for their N domains and the cores (Fig. 5B). Conversely, the switch loop in Spp42 points in the opposite direction of that in Prp8 of the C complex, reflecting the 5′-exon released state in the ILS complex. The RNaseH-like domains in these two complexes adopt very different positions (Fig. 5B). Similar to that between the U4/U6.U5 tri-snRNP and the Bact complex (43), only the core of Prp8 from the tri-snRNP aligns well with that of the C complex (Fig. 5C). The N domains, switch loops, and RNaseH-like domains all display marked positional variations.

Fig. 5 Structural comparison of the central component Prp8 (Spp42 in S. pombe) among three spliceosomal complexes and the U4/U6.U5 tri-snRNP.

(A) Structural comparison of Prp8 between the Bact (43) and C complexes from S. cerevisiae. Both the N domain and the core of Prp8 align well, including the switch loop (colored orange and magenta in the Bact and C complexes, respectively). In contrast, the RNaseH-like domain adopts two markedly different locations in the two complexes. (B) Structural comparison of Prp8 from the S. cerevisiae C complex and Spp42 from the S. pombe ILS complex (11). Compared with that in the C complex, the switch loop in the ILS complex (colored red) is flipped 180°. The RNaseH-like domains exhibit pronounced positional differences in the two complexes. (C) Structural comparison of Prp8 between the C complex and the U4/U6.U5 tri-snRNP from S. cerevisiae (41). The N domains, switch loops, and RNaseH-like domains all exhibit large differences. Similar to that in the ILS complex, the switch loop in the tri-snRNP points in the opposite direction of that in the C complex. (D) Three close-up views of the RNaseH-like domains. A pairwise comparison of the RNaseH-like domains is shown between the Bact (43) and C complexes from S. cerevisiae (left), between the S. cerevisiae C complex and the S. pombe ILS complex (11) (middle), and between the C complex and the tri-snRNP from S. cerevisiae (41) (right). (E) Superposition of the RNaseH-like domains from the four complexes. The smallest difference has a root mean square deviation (RMSD) of only 0.62 Å for 244 aligned Cα atoms between the Bact and C complexes. The largest variation has an RMSD of 1.52 Å for 228 aligned Cα atoms between the S. cerevisiae Bact complex and the S. pombe ILS complex.

Despite their very different positions (Fig. 5D), the RNaseH-like domains in the three spliceosomal complexes align with each other to near-perfect registry (Fig. 5E). Thus, the RNaseH-like domain has a rigid conformation but serves as a highly mobile element in the various spliceosomal complexes. Structural analysis reveals a surprising role for the RNaseH-like domain in stabilizing a mobile RNA element during pre-mRNA splicing (fig. S14). In the U4/U6.U5 tri-snRNP (41, 42), the RNaseH-like domain simultaneously interacts with Prp3, Prp6, and Prp31, whereas Prp3 and Prp31 directly recognize the U4/U6 duplex (fig. S14A). The RNaseH-like domain must be dislocated before Brr2 unwinds the U4/U6 duplex. In the Bact complex (43), the RNaseH-like domain associates with the scaffold protein Hsh155 and the splicing factor Cwc22 (fig. S14B). Because Hsh155 plays a major role in binding the BPS and intron sequences toward the 3′ end (53, 54), the RNaseH-like domain would have to dissociate for the BPS to move to the catalytic center of the spliceosome. In the C complex, the RNaseH-like domain directly recognizes the intron-U2 snRNA duplex while interacting with the U2 Sm ring and the splicing factor Cwc25 (fig. S14C). In the S. pombe ILS complex (11), the RNaseH-like domain of Spp42 mainly interacts with Cwf19, which binds to the intron lariat and ISL of U6 snRNA while making close contacts with the core of Spp42 (fig. S14D). The dissociation of the intron lariat is likely preceded by the dislocation of the RNaseH-like domain.

Protein components at the center of the C complex

The RNA elements at the catalytic center are specifically recognized by a number of protein components (Fig. 6A). As previously observed (11), the splicing active site is anchored in a positively charged catalytic cavity formed between the N domain and the core of Prp8 (Fig. 6B). In addition to Prp8, at least 15 other proteins directly contact the RNA elements at the center of the C complex (Fig. 6C). These proteins and Prp8 also closely interact with one another to stabilize the conformation of the RNA elements.

Fig. 6 Protein-protein and protein-RNA interactions at the center of the C complex.

(A) Overall view on the center of the C complex. At least 16 protein components directly interact with the RNA elements at the catalytic center. (B) Prp8 anchors the RNA elements at the catalytic center. (C) Locations of 14 protein components relative to the central RNA elements. Prp8 and Snu114 are removed to allow improved viewing of the other proteins. (D) Close-up view of the NTC components Cef1 and Syf2. (E) Close-up view on the NTC component Isy1 and the NTR components Cwc2 and Bud31. The interactions and the compact fold of Cwc2 are stabilized by a zinc ion that is bound to Cys73/81/87 and His91. Bud31 contains a metal cluster of three zinc ions, each of which is tetrahedrally coordinated by four Cys residues (Cys104/105/108/148, Cys104/122/150/153, and Cys108/120/122/145). (F) Close-up view of the NTR component Ecm2. Two zinc ions in the N-terminal domain of Ecm2 are coordinated by Cys13/71/73/74 and Cys34/37/61/64. (G) Close-up view of the splicing factors Cwc21 and Cwc22. (H) Close-up view of the splicing factors Yju2 and Cwc25. A zinc ion is coordinated by Cys51/54/88/91 at the edge of the β sheet in Yju2. (I) Close-up view of the NTR component Prp46.

The NTC components Cef1, Isy1, and Syf2

Cef1, the S. cerevisiae ortholog of Cdc5 in S. pombe, is essential for pre-mRNA splicing (55, 56). The N-terminal sequences (residues 9 to 111) constitute a Myb domain of six α helices, followed by extended sequences that form four additional α helices (Fig. 6D). The N-terminal residues of helix α8 (residues 165 to 194) reach into the active site to contact the ISL, whereas the middle portion of α8 binds the NTC component Isy1. The C-terminal half of α8 and helices α9 and α10 of Cef1 interact with Ecm2 and the N-terminal helices of Clf1, and the intervening loop between α8 and α9 also binds the 5′ intron just downstream of the 5′SS. The N-terminal half of the Myb domain (α1 to α3) closely interacts with α7 to form a globular domain, which associates with helix I of the U2/U6 duplex, Prp45, Syf2, and the core of Prp8 (Fig. 6D).

Isy1 is thought to act together with U6 snRNA to promote a spliceosomal conformation favorable for the first-step reaction and also to interact with Prp16 to regulate the fidelity of pre-mRNA splicing (57). Isy1 (residues 2 to 96) consists of four α helices. Helices α1 and α2 reach into the lariat junction and interact with the surrounding intron sequences, the BPS-U2 duplex, one side of the 5′SS-U6 duplex, Cwc2, and the splicing factor Yju2 (Fig. 6E). A pair of antiparallel helices (α3 and α4) associates with the other side of the 5′SS-U6 duplex and helix α8 of Cef1. The NTC component Syf2 (residues 92 to 211), which is thought to modulate Syf1 function (58), wraps around helix II of the U2/U6 duplex while interacting with Prp45 and Clf1 (Fig. 6D).

The NTR components Cwc2, Ecm2, and Bud31

As an essential NTR component that contains a RRM motif and a zinc finger, Cwc2 is known to directly cross-link to U6 snRNA and pre-mRNA (59, 60). In our structure, Cwc2 recognizes the intron sequences just downstream of the 5′SS and 11 nucleotides of U6 snRNA preceding the U6 sequences that base-pair with the 5′SS (Fig. 6E). Cwc2 also interacts with Ecm2, Bud31, and Isy1. Ecm2 is thought to facilitate the cooperative formation of helix II in the activation of yeast spliceosome (61). In our structure, Ecm2 consists of an N-terminal metal-binding domain (residues 3 to 125) and a C-terminal globular domain (residues 210 to 288) that comprises a four-stranded β sheet stacked against two α helices (Fig. 6F). These two domains are separated by and closely interact with Cwc2. The N-terminal domain of Ecm2 also binds Cef1, Prp45, and U6 nucleotides 29 to 32. The nonessential NTR component Bud31 is thought to stabilize the pre-mRNA–protein interactions (62). Bud31 associates with U6 nucleotides 25 to 29, stem II of U5 snRNA, and the N domain of Prp8 (Fig. 6E).

The splicing factors Cwc21, Cwc22, Cwc25, and Yju2

Cwc21 is the functional ortholog of SRm300, which is the only SR-related protein known to be located at the catalytic center of the human spliceosomes (63, 64). In our structure, only residues 2 to 28 of Cwc21 are well characterized by the EM maps; these sequences are embedded in the center of the spliceosome, closely interacting with the 5′ exon (Fig. 6G). The N-terminal sequences of Cwc21 bind the Prp8 N domain, and a lone β strand of Cwc21 pairs with a short β strand from the switch loop. The sequences following the β strand of Cwc21 interact with Cwc22, whereas Cwc22 also associates with the core of Prp8 and directly contacts the switch loop.

Yju2 is thought to associate with the NTC and promotes the first-step reaction after the action of the ATPase Prp2 (65). The evolutionarily conserved N-terminal domain of Yju2 (residues 1 to 130) promoted the first transesterification reaction to ~75% of that by the full-length protein (66). The extended N-terminal sequences of Yju2 (residues 2 to 16) reach into a deep cleft at the active site, interacting with the 5′ exon, both the N domain and the core of Prp8, and a portion of U2 snRNA that recognizes BPS. An ensuing α helix (residues 17 to 31) intercalates between the BPS-U2 duplex and the α1 and α2 helices of Isy1 (Fig. 6H). A five-stranded β-sheet domain (residues 38 to 116) associates with the ISL, the phosphate backbone of the intron, and the splicing factor Cwc25. Cwc25 is a heat-stable step I factor containing a short coiled-coil motif and is required after Prp2 and Yju2 to facilitate the first-step reaction (52, 67). The extended N-terminal sequences of Cwc25 (residues 2 to 16) are inserted deeply into the active site, interacting with the ISL, the BPS-U2 duplex, helix I, and the β-sheet domain of Yju2 (Fig. 6H). An α helix (residues 17 to 42) of Cwc25 extends from the active site by ~37 Å to interact with the RNaseH-like domain of Prp8.

The NTR components Prp46, Cwc15, and Prp45

The WD40 protein Prp46 is a seven-bladed β propeller (68). The top face (69) of the Prp46 propeller interacts with Prp8, whereas the bottom face binds extended sequences of Prp45 (residues 51 to 99) (Fig. 6I). Prp46 also contacts Cwc15, Clf1, and stem I of U5 snRNA. The intrinsically disordered proteins Cwc15 and Prp45 appear to stabilize the catalytic center by simultaneously interacting with multiple components at the center of the spliceosome. Both Cwc15 and Prp45 directly contact all three snRNA elements.

RNA recognition by the splicing factors

The identified 16 protein components at the center of the spliceosome interact closely with one another and make numerous contacts to the four RNA elements. A detailed description of these interactions goes beyond the scope of this manuscript. Nonetheless, we wish to exemplify such interactions by focusing on the splicing factor Yju2. Yju2 and the surrounding proteins Cwc25, Isy1, and Cef1 together form a scaffold onto which the RNA elements are placed (Fig. 7A). The RNA elements closely follow the positively charged surface of the protein scaffold, where the basic amino acids play a key role in RNA recognition (Fig. 7B). Lys63 and Lys68 of Yju2 directly contact the phosphate groups of the BPS and U6 snRNA, respectively; Arg42 of Yju2 may recognize A51 of U6 snRNA through a base-specific hydrogen bond (H-bond) (Fig. 7C). Asn65 of Yju2 may also specifically recognize G50 of U6 snRNA. The N-terminal residue Arg3 of Isy1 likely donates a pair of H-bonds to the backbone phosphates of the BPS, whereas the N-terminal residue Gly2 of Cwc25 may make two H-bonds with the BPS and U6 snRNA. In addition, the side chain of Lys10 from Cwc25 donates a candidate H-bond to the phosphate backbone of the BPS (Fig. 7C).

Fig. 7 RNA recognition at the catalytic center by the splicing factors Yju2, Cwc25, Cwc21, and Cwc22.

(A) Recognition of the RNA elements at the catalytic center by Cef1, Cwc25, Isy1, and Yju2. Prp8 and other protein components are removed. (B) RNA elements mainly interact with the positively charged surface areas in the four proteins (Cef1, Cwc25, Isy1, and Yju2). Two views of the electrostatic surface potential are shown here. Proteins in the left panel display the same exact orientation as those in (A). (C) Close-up view of the detailed interactions involving the N terminus of Cwc25 and Yju2. A hydrogen bond (H-bond) is tentatively assigned when a H-donor and a H-acceptor are located within ~3.5 Å of each other. G, Gly; C, Cys; R, Arg; A, Ala; N, Asn; K, Lys. (D) Close-up view of Cwc21 and its interactions with 5′ exon and the switch loop (green). (E) Close-up view of the detailed interactions between Cwc21 and the 5′-exon sequences. T, Thr; H, His; V, Val; S, Ser.

The other two splicing factors Cwc21 and Cwc22 cooperate to stabilize the 5′ exon (Fig. 7D). The extended sequences of Cwc21 are oriented by the switch loop of Prp8 and Cwc22 to recognize the 5′ exon. Four residues of Cwc21 (Ser2, Lys12, His19, and Arg22) may make direct H-bonds to nucleotides of the 5′ exon (Fig. 7E). His19 of Cwc21 may contact a base of the 5′ exon through cation-π interactions. The majority of these interactions are directed to the phosphates and the ribose of the 5′ exon, consistent with the highly variable nature of the 5′-exon sequences. In addition, the interactions between Cwc21 and the 5′ exon are of low-to-moderate intensity, which presumably would not impede dissociation of the joined exons after the second-step reaction.

Discussion

We previously reported atomic structures of the spliceosome, representing the beginning and ending states of the two transesterification reactions: the S. cerevisiae activated Bact complex (43) and the S. pombe intron-lariat ILS complex (11). In this manuscript, we report the cryo-EM structure of a crucial intermediate spliceosomal complex: the catalytic step I spliceosome. These three structures reveal detailed arrangements of the RNA map that begin to recapitulate the process of the pre-mRNA splicing reaction (Fig. 8A).

Fig. 8 Working model of pre-mRNA splicing at the level of RNA conformation.

(A) Conformations of the RNA elements during the two transesterification reactions. In the Bact complex (43), the nucleophile-containing adenine nucleotide in the BPS is located ~50 Å away from the nucleotide at the 5′ end of the 5′SS. In the C complex, the 5′ exon is severed from the intron, but the lariat junction and the surrounding intron sequences occupy the same general location as that required for the 3′ exon and the preceding intron sequences. In the ILS complex (11, 12), the lariat junction and the surrounding intron sequences are located more than 20 Å away from the catalytic Mg2+ ions. These structural observations allow us to model the yet-to-be-captured conformations of the RNA elements in the B* and P comlexes. The movement and placement of the RNA elements are driven by the protein components, the splicing factors, and the RNA-dependent ATPases/helicases. (B) A schematic representation of the pre-mRNA splicing pathway as proposed in (A). In the proposed C* complex (step II catalytically activated spliceosome), the lariat junction has moved away, and the 3′ exon and the preceding intron sequences are delivered into the active site.

The structures of U5 and U6 snRNAs, along with their relative positions in the RNA map, remain largely unchanged in all three spliceosomal complexes (Fig. 8, A and B). Consequently, the U2 sequences that form duplexes with U6 snRNA (helices I and II) also remain relatively static among these spliceosomal complexes. Therefore, the 3′ half of the 5′SS and a few ensuing nucleotides of the intron, which are recognized by U6 snRNA, and the 5′ exon, which is anchored to loop I of U5 snRNA, should remain largely static throughout the two reactions. This conclusion has been corroborated by structures of the Bact, C, and ILS complexes (11, 12, 43). The mobile elements are the 5′ half of the 5′SS; the RNA sequences far downstream of the 5′SS, including the BPS and the 3′ exon; and some of the U2 snRNA sequences, particularly those that form duplexes with the BPS.

In the Bact complex, the nucleophile (i.e., the 2′-OH of the invariant adenine nucleotide of the BPS) is located ~50 Å away from the phosphorous atom of the guanine nucleotide at the 5′ end of the 5′SS (43). During the transition from the Bact to the B* complex, the BPS-U2 duplex must be moved into the active site to initiate the first transesterification reaction. Thus, the general features of the RNA map in the B* complex should be very similar to those of the C complex, except that the covalent linkage between the 5′SS and the 5′ exon in the B* complex is broken in the C complex and replaced by that between the 5′SS and the BPS (Fig. 8). In the C complex, the location to be occupied by the 3′ exon and its preceding intron sequences (the 3′SS, for example) is occupied by the T-shaped lariat junction and surrounding intron sequences. These structural elements must be moved away before the onset of the second transesterification reaction (70). In the P complex, the two ligated exons (the 5′ and 3′ exons) should remain bound at the catalytic center as a single chain, with the 5′ exon anchored to loop I of U5 snRNA (45, 48) (Fig. 8). We speculate that the intron lariat in the P complex is bound at a similar position as that in the ILS complex (11, 12), which leaves ample space for the accommodation of the 3′-exon sequences (Fig. 8). The spatial requirement for accommodation of the 3′ exon in the P complex is suggested by the structure of the ILS complex (11).

Three of the six distinct spliceosomal complexes (Bact, C, and ILS) that have been structurally characterized are interspersed by two missing conformations, B* and P, that are likely to be more transient. However, available information on the three structurally characterized complexes allows us to model the RNA maps of the B* and P complexes (Fig. 8A). The transition from the C to the P complex likely comprises two distinct steps. During the first step, the T-shaped lariat junction in the C complex is moved away from the active site, and the 3′ exon and its preceding intron sequences are translocated into the active site. The resulting complex, which perhaps should be named the step II catalytically activated spliceosome (or C* complex), is likely to be transient (Fig. 8B). The proposed C* complex just before the step II transesterification corresponds to the B* complex for the step I transesterification. During the second step, the step II transesterification occurs, and the released intron sequences preceding the 3′ exon are moved away from the active site, resulting in the P complex.

Because the conformation of the RNA elements in each spliceosomal complex is stabilized by Prp8 and a distinct set of protein components, movement of the BPS-U2 duplex should be accompanied by dissociation of many proteins and association of many others. Such remodeling processes are driven by the highly conserved ATPase/helicases Prp2, Prp16, and Prp22. This has been experimentally observed for the transition from the Bact to the B* complex and then to the C complex, from the C to the P complex, and from the P to the ILS complex (3). Analysis of the RNA maps and associated protein components suggests that the remodeling processes may be particularly drastic for the transitions from Bact to B* and from C to C* (Fig. 8). For example, at least 12 structurally identified proteins in the Bact complex (43)—including three in the RES complex (Bud13, Pml1, and Snu17), seven in the SF3a/b complex (Rse1, Hsh155, Cus1, Hsh49, Rds3, Ysf3, and Prp11), and two splicing factors (Cwc24 and Cwc27)—are dissociated in the C complex. Simultaneously, at least three proteins that were absent in the catalytic center of the Bact complex now appear in the catalytic center of the C complex, including two splicing factors (Cwc25 and Yju2) and the NTC protein Isy1.

Limited by both the local resolution and the highly heterogeneous nature of the spliceosome, some of the cryo-EM density maps at the peripheral regions are poorly defined and the constituent proteins are yet to be identified. This is true for all three spliceosomal complexes (Bact, C, and ILS) and for the U4/U6.U5 tri-snRNP. Enhancement of the density maps, perhaps through acquisition of more spliceosomal particles and application of improved analysis software, will allow identification of more protein components and assignment of more RNA sequences. This practice will lead to more precise description of the splicing active site and the coordination of catalytic metal ions. Nonetheless, the local resolutions already reach 2.8 to 3.2 Å in the center of the three spliceosomal complexes, allowing unambiguous assignment of amino acid side chains. Such resolutions may facilitate identification of chemical components that modulate the splicing reaction. After all, a sizable fraction of genetic disorders are caused by defects in pre-mRNA splicing (71).

Supplementary Materials

www.sciencemag.org/content/353/6302/895/suppl/DC1

Materials and Methods

Figs. S1 to S14

Tables S1 to S4

References (7493)

References and Notes

  1. Acknowledgments: We thank the Tsinghua University Branch of China National Center for Protein Sciences (Beijing) for providing facility support. The computation was completed on the “Explorer 100” cluster system of Tsinghua National Laboratory for Information Science and Technology. This work was supported by funds from the Ministry of Science and Technology (grant 2014ZX09507003006) and the National Natural Science Foundation of China (grants 31130002 and 31321062). For the C complex structure, the atomic coordinates have been deposited in the Protein Data Bank with accession code 5GMK, and the EM maps have been deposited in the Electron Microscopy Database with accession codes EMD-9525, EMD-9526, and EMD-9527. We declare no competing financial interests. Correspondence and requests for materials should be addressed to Y.S.
View Abstract

Navigate This Article