Research Article

Structural Basis of Transcription: An RNA Polymerase II Elongation Complex at 3.3 Å Resolution

See allHide authors and affiliations

Science  08 Jun 2001:
Vol. 292, Issue 5523, pp. 1876-1882
DOI: 10.1126/science.1059495


The crystal structure of RNA polymerase II in the act of transcription was determined at 3.3 Å resolution. Duplex DNA is seen entering the main cleft of the enzyme and unwinding before the active site. Nine base pairs of DNA-RNA hybrid extend from the active center at nearly right angles to the entering DNA, with the 3′ end of the RNA in the nucleotide addition site. The 3′ end is positioned above a pore, through which nucleotides may enter and through which RNA may be extruded during back-tracking. The 5′-most residue of the RNA is close to the point of entry to an exit groove. Changes in protein structure between the transcribing complex and free enzyme include closure of a clamp over the DNA and RNA and ordering of a series of “switches” at the base of the clamp to create a binding site complementary to the DNA-RNA hybrid. Protein–nucleic acid contacts help explain DNA and RNA strand separation, the specificity of RNA synthesis, “abortive cycling” during transcription initiation, and RNA and DNA translocation during transcription elongation.

The recent structure determination of yeast RNA polymerase II at 2.8 Å resolution and of bacterial RNA polymerase at 3.3 Å resolution has led to proposals for polymerase-DNA and -RNA interactions (1–3). A DNA duplex was suggested to enter a positively charged cleft between the two largest subunits and to make a right angle bend at the active center, where the DNA strands are separated and from which a DNA-RNA hybrid emerges. Avenues for entry of substrate nucleoside triphosphates and for exit of RNA could also be surmised. Although consistent with results of cross-linking experiments (4–8), these general proposals for polymerase–nucleic acid interaction have not been proven, and they do not address key questions about the transcription process: How is an unwound “bubble” of DNA established and maintained in the active center? Why does the enzyme initiate repeatedly, generating many short transcripts, before a transition is made to a stable elongating complex? What is the nature of the presumptive DNA-RNA hybrid duplex? How are DNA and RNA translocated across the surface of the enzyme, forward and backward, during RNA synthesis and back-tracking? Here we report the crystal structure determination of yeast RNA polymerase II in the form of an actively transcribing complex, from which answers to these questions and additional insights into the transcription mechanism are derived.

The main technical challenge of this work was the isolation and crystallization of a transcribing complex. Initiation at an RNA polymerase II promoter requires a complex set of general transcription factors and is poorly efficient in reconstituted systems (9,10). Moreover, most preparations contain many inactive polymerases, and the transcribing complexes obtained would have to be purified by mild methods to preserve their integrity (11). The initiation problem was overcome with the use of a DNA duplex bearing a single-stranded “tail” at one 3′-end (Fig. 1A) (12, 13). Pol II starts transcription in the tail, two to three nucleotides from the junction with duplex DNA, with no requirement for general transcription factors. All active polymerase molecules are converted to transcribing complexes, which pause at a specific site when one of the four nucleoside triphosphates is withheld. The problem of contamination by inactive polymerases was solved by passage through a heparin column (13); inactive molecules were adsorbed, whereas transcribing complexes flowed through, presumably because heparin binds in the positively charged cleft of the enzyme, which is occupied by DNA and RNA in transcribing complexes. The purified complexes formed crystals diffracting anisotropically to 3.1 Å resolution (14).

Figure 1

Nucleic acids in the transcribing complex and their interactions with pol II. (A) DNA (“tailed template”) and RNA sequences. DNA template and nontemplate strands are in blue and green, respectively, and RNA is in red. This color scheme is used throughout. (B) Ordering of nucleic acids in the transcribing complex structure. Nucleotides in the solid box are well ordered. Nucleotides in the dashed box are partially ordered, whereas those outside the boxes are disordered. Three protein regions that abut the downstream DNA are indicated. (C) Protein contacts to the ordered nucleotides boxed in (B). Amino acid residues within 4 Å of the DNA are indicated, colored according to the scheme for domain or domainlike regions of Rpb1 or Rpb2 (3). Ribose sugars are shown as pentagons, phosphates as dots, and bases as single letters. Amino acid residues listed beside phosphates contact only this nucleotide. Amino acid residues listed beside riboses contact this nucleotide and its 3′-neighbor. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; D, Asp; E, Glu; G, Gly; H, His; K, Lys; L, Leu; M, Met; N, Asn; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; and Y, Tyr. (D) Schematic representation of protein features participating in the detailed interactions shown in (C). Same notation as in (C), except that bases are shown as thick bars.

Structure of a pol II transcribing complex.Diffraction data complete to 3.3 Å resolution were used for structure determination by molecular replacement with the 2.8 Å pol II structure (15). A native zinc anomalous difference Fourier map showed peaks coinciding with five of the eight zinc ions of the pol II structure, confirming the molecular replacement solution (16). The remaining three zinc ions were located in the clamp, a region shown previously to undergo a large conformational change between different pol II crystal forms (3). The locations of the three zinc ions served as a guide for manual repositioning of the clamp in the transcribing complex structure. An initial electron density map revealed nucleic acids in the vicinity of the active center. After adjustment of the protein model, the nucleic acid density improved and nine base pairs of DNA-RNA hybrid could be built (17). Additional density along the DNA template strand allowed another three nucleotides downstream and one nucleotide upstream to be built. Modeling of the nucleic acids assumed the 3′-end of the RNA at the biochemically defined pause site (Fig. 1A), because the nucleic acid sequences could not be inferred from the crystallographic data (18).

The final model contains 3521 amino acid residues, 22 nucleotides, eight Zn2+ ions, and one Mg2+ ion and has a free R factor of 29.8% (R factor 25.0%, 40 to 3.3 Å) (Fig. 2). A simulated-annealing omit map computed from a model of the protein alone revealed the phosphate groups and most bases in the DNA-RNA hybrid region, confirming the modeling of the nucleic acids (Fig. 2A). Density for DNA in the downstream region was very weak and discontinuous but revealed the major groove, allowing a canonical B-DNA duplex to be approximately placed [not included in the model (19)]. Numbering of nucleotides in the DNA begins with +1 immediately downstream and –1 upstream of the Mg2+ion (Fig. 1A).

Figure 2

Crystal structure of the pol II transcribing complex. (A) Electron density for the nucleic acids. On the left, the final sigma-weighted 2mF obsDF calcelectron density for the downstream DNA duplex (dashed box in Fig. 1B) is contoured at 0.8σ (green). At this contour level, the surrounding solvent region shows only scattered noise peaks. A canonical 16–base pair B-DNA duplex was placed into the density. On the right, the final model of the DNA-RNA hybrid and flanking nucleotides (boxed in Fig. 1B) is superimposed on a simulated-annealingF obsF calc omit map, calculated from the protein model alone with CNS (45) (green, contoured at 2.6σ). The location of the active site metal A is indicated. (B) Comparison of structures of free pol II (top) and the pol II transcribing complex (bottom). The clamp (yellow) closes on DNA and RNA, which are bound in the cleft above the active center. The remainder of the protein is in gray. (C) Structure of the pol II transcribing complex. Portions of Rpb2 that form one side of the cleft are omitted to reveal the nucleic acids. Bases of ordered nucleotides (boxed in Fig. 1B) are depicted as cylinders protruding from the backbone ribbons. The Rpb1 bridge helix traversing the cleft is highlighted in green. The active site metal A is shown as a pink sphere.

Closure of the clamp. The structures of free and transcribing pol II differ mainly in the position of the clamp (Fig. 2B). As previously suggested (1), and now demonstrated, the clamp swings over the cleft during formation of the transcribing complex, trapping the template and transcript. The clamp rotates by about 30°, with a maximum displacement of over 30 Å at external sites (at the Rpb1 “zipper”). Although most of the clamp moves as a rigid body, five “switch” regions undergo conformational changes and folding transitions (Table 1). Switches 1, 2, 4, and 5 form the base of the clamp (Fig. 3). Switches 1 and 2 are poorly ordered and switch 3 is disordered in free pol II; all three switches become well ordered in the transcribing complex. Ordering is likely induced by binding of the switches to DNA downstream and within the DNA-RNA hybrid (see below). Binding to the hybrid may help couple clamp closure to the presence of RNA. The conformational changes of the switch regions may be concerted, because the switches interact with one another. The conformational changes are accompanied by changes in a network of salt linkages to the “bridge” helix across the cleft (Rpb1 residues Arg839, Arg840, and Lys843).

Figure 3

Switches, clamp loops, and the hybrid-binding site. (A) Stereoview of the clamp core (1, yellow) and the DNA and RNA backbones. The view is as in Fig. 2C. The five switches are shown in pink and are numbered. Three loops, which extend from the clamp and may be involved in transactions at the upstream end of the transcription bubble, are in violet. Major portions of the protein are omitted for clarity. (B) Stereoview of nucleic acids bound in the active center.

Table 1

Switch regions.

View this table:

Downstream DNA mobility. Downstream DNA lies in the cleft between the clamp and Rpb2 (Figs. 1B and 2, B and C), consistent with results from electron crystallography of the transcribing complex (20) and results of DNA-protein cross linking (4–8). The DNA contacts the Rpb5 “jaw” domain at a loop containing proline residue Pro118, as previously suggested (1), and then passes between the Rpb2 “lobe” region and the Rpb1 “clamp head.” The sequence of the Rpb2 lobe is divergent between yeast and bacteria, but the fold is conserved, whereas the clamp head is not conserved.

Details of downstream DNA–pol II interaction are lacking because the electron density is weak, indicative of mobility of the DNA. Furthermore, downstream DNAs from neighboring transcribing complexes in the crystal interact end to end, stacking on one another, so the precise location of the DNA may be determined by crystal packing forces. This could be the reason why there is no apparent contact between downstream DNA and the upper jaw. In addition, the length of DNA used here is possibly too short for passage all the way through the jaws.

Transcription bubble. The downstream edge of the transcription bubble lies between the poorly ordered downstream duplex DNA and the first ordered nucleotide of the template strand at position +4, three nucleotides before the beginning of the RNA-DNA hybrid (Fig. 3B). The nucleotide at position +4 in the nontemplate strand and the remainder of this strand are disordered. The template strand follows a path along the bottom of the clamp and over the “bridge” helix. Template nucleotides +4, +3, and +2 are stacked in the manner of right-handed B-DNA. The base of nucleotide +1 is flipped with respect to that of nucleotide +2 by a left-handed twist of 90°. The base at +1 therefore points downward into the floor of the cleft for readout at the active site, whereas the base at +2 is directed upward into the opening of the cleft. This unusual conformation of the DNA results from binding to switches 1 and 2, as well as to the bridge helix (Figs. 1, C and D, and 3). Invariant bridge helix residues Ala832 and Thr831 position the coding nucleotide through van der Waals interactions, whereas Tyr836 binds nucleotide +2 and may correspond to a tyrosine in the “O-helix” of some single subunit DNA polymerases (21, 22).

Maintenance of the downstream edge of the transcription bubble may be attributed not only to the binding of nucleotides +2, +3, and +4 but also to Rpb2 “fork loop” 2 (Figs. 1D and4). Although this loop includes several disordered residues (23), it would likely clash with the nontemplate strand at position +3 if the nontemplate strand was still base paired with the template strand. A corresponding loop in the bacterial enzyme (“βD loop I”), four residues longer than that in yeast, was previously suggested to play such a role (5). Rpb2 fork loop 1 may help maintain the transcription bubble further upstream (Figs. 1D and 4). This loop is absent from the bacterial enzyme, perhaps reflecting a difference in promoter melting between eukaryotes, which require general transcription factors for the process, and bacteria, which do not. Both fork loops, although exposed, are highly conserved between yeast and human polymerases.

Figure 4

Maintenance of the transcription bubble. (A) Schematic representation of nucleic acids in the transcribing complex. Solid ribbons represent nucleic acid backbones from the crystal structure. Dashed lines indicate possible paths of nucleic acids not present in the structure. (B) Protein elements proposed to be involved in maintaining the transcription bubble. Protein elements from Rpb1 and Rpb2 are shown in silver and gold, respectively.

DNA-RNA hybrid. The base in the template strand at position +1 forms the first of nine base pairs of DNA-RNA hybrid, located between the bridge helix and Rpb2 “wall” (Figs. 1D and 4). The length of the hybrid corroborates the value of eight to nine base pairs determined biochemically (24, 25). The hybrid heteroduplex adopts a nonstandard conformation, intermediate between those of standard A- and B-DNA (Fig. 5), and is underwound (26), in comparison with the crystal structure of a free DNA-RNA hybrid, which is closely related to the A-form (27).

Figure 5

DNA-RNA hybrid conformation. The view is similar to that in Fig. 2C. The conformation of the DNA-RNA hybrid is intermediary between canonical A- and B-DNA. DNA, blue; RNA, red.

The electron density for the hybrid is strongest in the downstream region around the active center, indicative of a high degree of order, important for the high fidelity of transcription. The electron density remains strong for the DNA template strand further upstream, but the density for the RNA strand becomes weaker (Fig. 2A). This gradual loss of density reflects a diminution in the number of RNA-protein contacts. The template DNA strand is bound by protein over the entire length of the hybrid, whereas RNA contacts are limited to the downstream region (Fig. 1C). The five upstream ribonucleotides are held mainly through base pairing with the template DNA.

Contacts to the downstream and upstream parts of the hybrid are made by Rpb1 and Rpb2, respectively (Fig. 1C). Fifteen protein regions are involved, with a substantial portion of the contacts arising from the ordering of Rpb1 switches 1, 2, and 3 upon nucleic acid binding. The entire set of protein contacts forms an extended, highly complementary binding surface. A surface area of 3400 Å2 is buried in the protein–nucleic acid interface, comparable to values for transcription factors bound specifically to DNA sites of similar size. Biochemical studies have shown the binding interaction contributes substantially to the stability of a transcribing complex and thus to the high processivity of transcription (25, 28).

Although a strong pol II–nucleic acid interaction is important for the ordering of nucleic acids in the active center region and for the stability of a transcribing complex, the interaction must not interfere with the translocation of nucleic acids during transcription. Indeed, the nucleic acids in the transcribing complex are mobile, as shown by the partial order of the downstream DNA (see above) and by a high overall crystallographic temperature factor of the hybrid, which appears to reflect mobility rather than static disorder (29). The conflicting requirements of tight binding and mobility may be reconciled in at least three ways. First, almost all protein contacts are to the sugar-phosphate backbones of the DNA and RNA. There are no contacts with the edges of the bases, so there is no base specificity. A large open space between pol II and the major groove of the hybrid is a prominent feature of the structure. Second, several side chains interact with two phosphate groups along the backbone simultaneously (Fig. 1C), which may reduce the activation barrier for translocation. Finally, about 20 positively charged side chains form a “second shell” around the hybrid at a distance of 4 to 8 Å, which may attract the hybrid without restraining its movement across the enzyme surface (30).

RNA synthesis. The active site metal ion in the transcribing complex structure corresponds to one of two metal ions in the 2.8 Å pol II structure, referred to as metal A (3). The location of this metal in the transcribing complex is appropriate for binding the phosphate group between the nucleotide at the 3′-end of the RNA and the adjacent nucleotide, designated +1 and –1, respectively (Fig. 1C). In the two-metal–ion mechanism proposed for single subunit polymerases, metal A contacts the α-phosphate of the incoming nucleoside triphosphate and metal B binds all three phosphates (21, 31–35). Metal B may be absent from the transcribing complex structure because it has left with the pyrophosphate after nucleotide addition. On this basis, position +1 in the transcribing complex would be that of a nucleotide just added to the growing RNA, before translocation to bring the next template base into position opposite an empty nucleotide-binding site at the end of the RNA (36) (Fig. 6).

Figure 6

Proposed transcription cycle and translocation mechanism. (A) Schematic representation of the nucleotide addition cycle. The nucleotide triphosphate (NTP) fills the open substrate site (top) and forms a phosphodiester bond at the active site (“Synthesis”). This results in the state of the transcribing complex seen in the crystal structure (middle). We speculate that “Translocation” of the nucleic acids with respect to the active site (marked by a pink dot for metal A) involves a change of the bridge helix from a straight (silver circle) to a bent conformation (violet circle, bottom). Relaxation of the bridge helix back to a straight conformation without movement of the nucleic acids would result in an open substrate site one nucleotide downstream and would complete the cycle. (B) Different conformations of the bridge helix in pol II and bacterial RNA polymerase structures. The view is the same as in Fig. 2C. The bacterial RNA polymerase structure (2) was superimposed on the pol II transcribing complex by fitting residues around the active site. The resulting fit of the bridge helices of pol II (silver) and the bacterial polymerase (violet) is shown. The bend in the bridge helix in the bacterial polymerase structure causes a clash of amino acid side chains (extending from the backbone shown here) with the hybrid base pair at position +1.

The ribonucleotide in position +1 lies in the entrance to the previously noted “pore 1,” which extends from the floor of the cleft through to the backside of the enzyme. This location and orientation of the 3′-end of the RNA lend strong support to the previous proposal that nucleoside triphosphates enter through the pore during RNA synthesis and that RNA is extruded through the pore during back-tracking (1). The close fit of the DNA-RNA hybrid to the surrounding protein leaves no alternative to the pore for access of nucleotides to the active site. (Major conformational changes creating access are unlikely, because they would disrupt protein–nucleic acid contacts important for the fidelity and processivity of transcription.)

Specificity for ribo- rather than deoxyribonucleotides may be attributed to recognition of both the ribose sugar and the DNA-RNA hybrid helix. The 2′-hydroxyl group of a ribonucleotide in the substrate binding site (position +1) is 5 Å from the side chain of the highly conserved Rpb1 residue Asn479. Although this distance is too great for specific interaction, a slightly different positioning of an incoming nucleoside triphosphate might permit hydrogen bonding and discrimination of the ribose sugar. Different positioning of the nucleoside triphosphate could result from chelation by metal B, bound at a site in the structure of free pol II (3). RNA 2′-hydroxyl groups at positions −1, −3, and −5 are at hydrogen bonding distance from the side chains of Rpb1 residue Arg446 and Rpb2 residues His1097 and Gln481. The nucleic acid binding site is, furthermore, highly complementary to the nonstandard conformation of the hybrid helix and not to the standard conformation of a DNA double helix. Such indirect discrimination was previously suggested to contribute to the specificity of T7 RNA polymerase transcription (37).

Recognition of RNA in the transcribing complex from positions –1 to –5, by both hydrogen bonding and indirect discrimination, can contribute to the specificity of RNA synthesis through proofreading. The presence of a deoxyribonucleotide or of an incorrect base anywhere in this region of the RNA will be destabilizing. A back-tracked complex, with previously correctly synthesized RNA in the hybrid region and with the RNA containing the misincorporated nucleotide extruded at the 3′-end, will be favored. The extruded RNA can be removed by cleavage at the active site, through the action of transcription factor TFIIS.

Key nonspecific (van der Waals) contacts to the nucleotide base at the end of the hybrid region, in position +1, are made by residues Thr831 and Ala832 from the Rpb1 bridge helix, as mentioned above. Although highly conserved, the bridge helix is essentially straight in the pol II structures so far determined but bent in the bacterial enzyme structure in the vicinity of the residues corresponding to Thr831 and Ala832(1, 2). The bend would produce a movement of this region of the bridge helix by 3 to 4 Å, resulting in a clash with the nucleotide at position +1 (Fig. 6). Modeling of a bacterial transcribing complex resulted in such a clash (5). We speculate that the bridge helix oscillates between straight and bent states and that this movement accompanies the translocation of nucleic acids during transcription: Addition of a nucleotide at position +1 would occur in the straight state; translocation to position –1 and movement of nucleic acids through the distance between base pairs, about 3.2 Å, would be accompanied by a conformational change to the bent state; and reversion to the straight state without movement of nucleic acids would create an empty site at position +1 for entry of the next nucleotide, completing a cycle of nucleotide addition during RNA synthesis (Fig. 6).

Protein-RNA contacts are of special importance at the very beginning of transcription. Nucleoside triphosphates must be held in positions +1 and –1 for the synthesis of the first phosphodiester bond. After translocation to positions –1 and –2, the dinucleotide product must still be held by protein-RNA contacts, as the energy of base-pairing alone is insufficient for retention in the complex. Indeed, RNA is deeply buried in the transcribing complex as far as position –3 (Fig. 1C). Di- and trinucleotides are nevertheless occasionally released, and transcription must restart, resulting in “abortive cycling” (38). RNA is exposed at position –4 and beyond, with no direct protein contacts except for the hydrogen bond at position –5 mentioned above. Coincident with exposure of the RNA, biochemical studies reveal a transition in stability at a transcript length of four residues, beyond which the RNA is generally retained (39). Although the direct protein-RNA contacts observed up to this point may be largely responsible for retention, long-range interactions also play a role. For example, a highly conserved arginine makes long-range electrostatic interactions with the RNA around position –4 (Arg497 in Rpb2, Arg529 in Escherichia coli β), and mutation of this residue results in the overproduction of abortive transcripts (40).

RNA exit. Abortive cycling yields an abundance of two- to three-residue transcripts, as well as transcripts of up to 10 residues (41). An initiating complex evidently undergoes a second transition when the transcript reaches 10 residues in length. At this point, the newly synthesized RNA must separate from the DNA-RNA hybrid and enter an exit channel on the surface of the enzyme, where it remains protected from nuclease attack for about six more residues (42). Three loops extending from the clamp, termed “rudder,” “lid,” and “zipper,” have been suggested to play roles in hybrid dissociation, RNA exit, and maintenance of the upstream end of the transcription bubble (2, 3) (Fig. 4). Modeling of the DNA-RNA hybrid beyond the nine base pairs seen in the transcribing complex structure would produce a clash with the rudder. Extension of the RNA from the last hybrid base pair leads beneath the rudder to the previously proposed “exit groove 1.” Continuation of this RNA path also leads beneath the lid, whose role may be to maintain the separation of RNA and template DNA strands. The zipper may play a similar role in separating template and nontemplate DNA strands. The lid and a small portion of the rudder are disordered in the transcribing complex structure but are ordered in the free pol II structure. The lid and rudder may become ordered in the transcribing complex in conjunction with the second transition (43) and with the establishment of a stable, elongating complex.

Conclusions and prospects. The atomic structure of RNA polymerase II in the act of transcription reveals the protein-DNA and -RNA interactions underlying the process. The structure shows a right angle bend of the DNA path at the active center. This feature is understandable in retrospect. The bend orients the DNA-RNA hybrid optimally for transcription, which occurs along the direction of the hybrid axis. Nucleotides enter through the funnel and pore, add to the RNA at the end of the RNA-DNA hybrid, translocate through the hybrid-binding region, and exit beneath the rudder and lid.

Answers to many long-standing questions about the transcription mechanism may be found in the structure of the clamp. This mobile, multifunctional element does more than close over the nucleic acids in the active center to enhance the processivity of transcription. First, switch regions at the base of the clamp couple its closure to the presence of DNA-RNA hybrid in the active center. This coupling satisfies the dual requirement for retention of nucleic acids during transcript elongation and their release after termination. Second, through the rudder, lid, and zipper, the clamp plays a key role in the events of hybrid melting and template reannealing at the upstream end of the transcription bubble.

Experiments to test the proposed roles for these structural elements by site-directed mutagenesis are among the many that can now be designed on the basis of the structure. In addition, polymerase may be cocrystallized with synthetic transcription bubbles (44) and other forms of RNA and DNA.

  • * Present address: Department of Pharmacology and Experimental Therapy, University of Maryland, 655 West Baltimore Street, HH403, Baltimore, MD 21201, USA.

  • Present address: Institute of Biochemistry, Gene Center, University of Munich, 81377 Munich, Germany.

  • Present address: Department of Molecular Biology and Genetics, Cornell University, 223 Biotechnology Building, Ithaca, NY 14853, USA.

  • § To whom correspondence should be addressed. E-mail: kornberg{at}


View Abstract

Stay Connected to Science

Navigate This Article