Flexibility in DNA Recombination: Structure of the Lambda Integrase Catalytic Core

See allHide authors and affiliations

Science  04 Apr 1997:
Vol. 276, Issue 5309, pp. 126-131
DOI: 10.1126/science.276.5309.126


Lambda integrase is archetypic of site-specific recombinases that catalyze intermolecular DNA rearrangements without energetic input. DNA cleavage, strand exchange, and religation steps are linked by a covalent phosphotyrosine intermediate in which Tyr342 is attached to the 3-phosphate of the DNA cut site. The 1.9 angstrom crystal structure of the integrase catalytic domain reveals a protein fold that is conserved in organisms ranging from archaebacteria to yeast and that suggests a model for interaction with target DNA. The attacking Tyr342 nucleophile is located on a flexible loop about 20 angstroms from a basic groove that contains all the other catalytically essential residues. This bipartite active site can account for several apparently paradoxical features of integrase family recombinases, including the capacity for both cis and trans cleavage of DNA.

The integrase protein (Int) of Escherichia coli phage lambda (λ) belongs to a large family of site-specific DNA recombinases from archaebacteria, eubacteria, and yeast (1-3) that catalyze rearrangements between DNA sequences with little or no sequence homology to each other (4-8). Like λ Int, many of these recombinases function in the integration and excision of viral genomes into and out of the chromosomes of their respective hosts. Others function in the decatenation or segregation of newly replicated chromosomes, conjugative transposition, regulation of plasmid copy number, or expression of cell surface proteins. Integrase family members have the distinctive ability to carry out a complete site-specific recombination reaction between two DNAs in the absence of high energy cofactors. DNA cleavage and rejoining is accomplished in two steps. First, a tyrosine hydroxyl attacks the scissile phosphate, nicking the DNA and forming a 3ʹ phosphotyrosine-linked DNA complex. This covalent protein-DNA intermediate is resolved when the 5ʹ terminal hydroxyl of the invading DNA strand attacks the phosphotyrosine linkage and displaces the protein, forming a Holliday junction. The reaction is repeated for the other strand of each DNA partner, generating the recombinant DNA duplexes. It is the transient covalent linkage of protein and DNA that conserves the energy of the broken phosphodiester bond, enabling a pair of reciprocal strand exchanges to proceed.

The 40-kD Int protein, which was first purified by Kikuchi and Nash (9), has been separated into discrete domains by limited proteolysis (10, 11). On the basis of these results, a minimal catalytic domain, termed Int c170 (residues 170 to 356), was cloned and purified. The c170 domain is approximately the same size as the smallest Int family members; it can cleave and ligate DNA and it functions as a type I topoisomerase (11). This catalytic domain encompasses both of the highly conserved sequence motifs of the integrase family (1, 2), including an invariant arginine-histidine-arginine catalytic triad, and the tyrosine nucleophile (12). Amino acid substitutions within the catalytic triad severely impair phage recombination, and they compromise DNA-binding and cleavage in vitro. The results of mutational analyses (13-15) and phosphate interference footprinting (16) of various recombinases belonging to the integrase family are consistent with the suggestion (17) that the Arg-His-Arg triad activates the scissile phosphate for cleavage of DNA. The location of the attacking tyrosine is carboxyl terminal to the catalytic triad in a less conserved region that is rich in acidic residues. Despite their sequence similarity, Int family proteins exhibit a remarkable mechanistic duality. In some cases, the tyrosine nucleophile of one subunit cleaves DNA that is bound by an adjacent subunit (trans cleavage) (17), whereas in other cases the tyrosine cleaves the site bound by the same subunit (cis cleavage) (18). These distinctive modes of DNA cleavage by conserved catalytic residues may reflect intrinsic differences among Int family proteins, or the tailoring of the cleavage mechanism in response to reaction conditions.

The crystal structure of the Int catalytic domain was determined by multiple isomorphous replacement and refined at 1.9 Å resolution (Table 1). Int c170 has a mixed α-β structure consisting of seven α helices and seven β strands (Fig. 1). An α-helical bundle with an unusual packing geometry is cradled by two antiparallel β hairpins, which together form a globular structure of roughly 25 Å by 38 Å by 50 Å. Helices αA, αB, and αC form the core of the domain and they are circumscribed by a helical collar consisting of αD, αE, and αF. The packing angles between these helices are irregular and apparently unprecedented; an automated search of structures in the Protein Data Bank (19) did not identify any proteins with a similar fold. More specifically, the Int c170 domain does not resemble the catalytic domains of other site-specific recombinases of known structure, including the HIV-1 and ASV retroviral integrases (20, 21), MuA transposase (22), and γδ resolvase (23). The Int catalytic domain also differs from those of topoisomerase I and topoisomerase II, both of which have buried tyrosine nucleophiles (24, 25). The base of the molecule is formed by a four-stranded mixed β sheet (β1, β2, β3, β7) with two β-hairpin extensions (β2-β3 and β4-β5) that wrap around the α-helical bundle. The connection between αG and β6 (residues Lys334 to Gln341) is disordered in both molecules of the crystallographic asymmetric unit, implying that this segment is flexible (Fig. 1B). In one Int c170 protomer the catalytic Tyr342 (phenylalanine in the crystal structure) is adjacent to a short β strand (β6) that is hydrogen-bonded to the COOH-terminal strand β7. Tyr342 is not visible in the electron density of the other protomer, and the disordered (flexible) segment extends from Lys334 to Trp350. In both molecules strand β7 is hydrogen-bonded in a parallel orientation to strand β3 at one edge of the β sheet. It is fixed in this orientation by several buried hydrophobic residues including Trp350 and Ile353. These contacts anchor β7 to the rest of the protein, thereby defining the distal boundary of the flexible loop and restricting the range of motion of the attacking nucleophile Tyr342.

Table 1.

Summary of crystallographic structure determination. The Int c170 domain of lambda integrase (residues 170 to 356), in which the active site residue Tyr342 is substituted with phenylalanine, was prepared as described (11). Crystals of Int c170 were grown by mixing an equal volume of Int c170 (26 mg/ml) with a well solution containing 50 mM MES (pH 6.15), 75 mM NaCl, 7 mM MgCl2, 40 mM sodium citrate, 1 mM DTT, 0.1 mM EDTA, and 1 mM spermine-HCl. Crystals were grown by vapor diffusion at 22°C during the course of several days, and larger cube-like crystals were obtained by seeding with microcrystals. The Int c170 crystals belong to space group R3 (a = b = 107.3 Å, c = 108.7 Å; triply-primitive hexagonal indexing) with two Int c170 protomers occupying the asymmetric unit. The crystals were flash-frozen in a -160°C nitrogen stream after being stabilized in a well solution containing 30% ethylene glycol. Diffraction data were integrated and scaled with DENZO and SCALEPACK (49). Heavy atom derivatives were obtained by soaking crystals for 12 to 18 hours in well solution containing 1 mM chloro (2,2ʹ:6ʹ,2"-terpyridine)platinum(II) plus 10 mM 2-mercaptoethanol, or 0.5 mM PtCl4, or 1 mM Au(CN)2. Heavy atom parameters were refined with the programs HEAVY (50) and MLPHARE (CCP4, 1979). The initial MIRAS map was subjected to solvent flattening, twofold averaging, and histogram matching (CCP4, 1979) (51-53). A model was built with the program O (54), and electron-density maps were further improved by incorporating SIGMAA-weighted model phases (55). Model refinement by energy minimization and simulated annealing was performed with X-PLOR (56). An explicit bulk solvent correction was applied to the Fcalc values so that low resolution reflections could be included in the refinement. The final structure was checked by inspection of simulated annealing-omit electron density (57). Noncrystallographic symmetry restraints were imposed during the early stages of model refinement, then released as high-resolution data were incorporated into the refinement. The final model consists of residues 177 to 333 and residues 342 to 355 for protomer A, residues 177 to 334 and 350 to 355 for protomer B, and 195 bound water molecules. Side chain density is lacking for residues Lys256, Glu269, Glu301, Glu319, Lys320, Asp324, Lys325, and Arg343 in protomer A and Glu269, Glu301, and Asp324 in protomer B, and these residues are modeled as alanines.

View this table:
Fig. 1.

The lambda integrase catalytic domain is a seven-helix bundle cradled by a β sheet and two β hairpins. The ribbon diagram (A) is colored with a gradient from the NH2-terminus (blue) to the COOH-terminus (red) with the program SETOR (58). (B) Topology diagram of the Int catalytic domain with the flexible loop connecting helix G and strand β6 depicted by spheres. The α helices of Int c170 are intertwined in an irregular packing arrangement that, to our knowledge, has not been seen before. (C) The catalytically essential residues Arg212, His30*, and Arg311 face a small depression in the protein surface, located far from the attacking nucleophile Tyr342. Tyr342 is at the junction of the flexible loop (spheres) and strand β6, which forms a hairpin structure with β7 that is, in turn, anchored to the protein core by hydrophobic and electrostatic interactions.

The amino acid sequences of 66 members of the integrase family were aligned with a tree-based algorithm (26) and several striking features emerge (Fig. 2B) (27). First, the invariant residues of the integrase family (1, 2) are all located on the proposed DNA interaction surface of λ Int (Fig. 2, A and B) and many of these residues are essential for catalytic function. In contrast, few of the exposed residues away from the enzyme active site are conserved in the integrase family. Furthermore, it is found that many of the buried residues of Int c170 are conserved. In particular, Leu205, Val207, Val208, Leu216, Met219, Leu229, Val231, Ile242, Pro243, Met255, and Leu330 form a highly conserved core surrounding Thr209. This clustering of conserved, functionally important residues around the active site implies that the catalytic domains of integrase family members have similar folds.

Fig. 2.

(A) Conserved residues cluster around the active site pocket of lambda integrase. These include the Arg212-His308-Arg311 triad (navy), Lys235 and Lys239 (cyan), His333 (magenta), Ser312-Gln328 (green), and Leu331-Gly332 (yellow). Tyr342 (red) is located some distance from the other catalytic residues. (B) Sequence alignment of selected integrase family members showing the conserved hydrophobic residues (brown) that form the core of the Int catalytic domain. This conservation of buried residues strongly implies that integrase family recombinases have similar folds. The other highly conserved motifs (1, 2) are predominantly surface residues that cluster around the enzyme active site, and they are color-coded as for (A). The AMPS program suite (26) was used for alignment of integrase sequences, and (B) was prepared with ALSCRIPT (59). Residue numbers refer to λ Int residues and the secondary structure of the Int catalytic domain is shown. The aligned sequences are: λ, phage lambda Int; HK022, phage HK022 Int; HP1, Haemophilus influenzae phage HP1 Int; P2, phage P2 Int; XerD, E. coli XerD; L5, Mycobacterium spp. phage L5 Int; Tn1545, Streptococcus pneumoniae transposase Tn1545; P22, phage P22 Int; P1, Cre of phage P1; SSV1, Sulfolobus phage SSV1; mcoc, Methanococcus jannaschii putative Int; Flp, Saccharomyces cerevisiae Flp.

A shallow groove that is approximately 25 Å wide runs along one face of the protein (Figs. 1 and 2). It is circumscribed by αC, αD, αE, αF, and the β2-β3 hairpin. Conserved, polar residues facing this groove include Arg212, Asp215, His308, Arg311, Ser312, Gln328, and His333 (Fig. 2). The catalytic triad of Arg212, His308, and Arg311 are within 7 Å of each other at the base of the groove (Arg212 and Arg311 are within 3.4 Å, and His308 is 6.4 Å from Arg311 and 7.2 Å from Arg212); His308 and Arg311 extend from the face of αF (Fig. 1), and Arg212 is located at the NH2-terminus of the short αC helix and it forms a water-mediated interaction with Asp215. The strict conservation of the Arg-His-Arg triad within the Int family, in conjunction with their demonstrated roles in binding and cleavage of DNA or DNA ligation (13-15, 18), implicate this surface as part of the enzyme active site. Other basic residues facing the groove of Int c170 include Arg177, Arg179, Lys235, Arg287, Arg291, Arg293, Lys294, and Arg317, and these bolster the positive electrostatic character of this surface. The shape and charge of this surface of the protein, together with the presence of essential catalytic residues, suggest that this groove is the DNA-binding surface of the Int catalytic domain. This is consistent with the identification of six amino acid substitutions in λ integrase that decrease recombinase activity for the phage λ attachment site (att site) DNA sequence and promote activity on the att sites of the closely related phage HK022 (28, 29). All but one of these six residues are located in the Int c170 domain, where they face the surface of the proposed DNA-binding groove. The phenotypes of other mutants, along with the pattern of conserved amino acids, suggest that the highly conserved Gly332-His333 residues preceding the loop and the Tyr342 nucleophile in the middle of the loop (Fig. 2, A and B) also interact with DNA during catalysis. Residue substitutions involving Gly332 and Tyr342 have varied effects on DNA binding, DNA bending, and catalysis by λ Int and FLP (14, 30).

The catalytic Tyr342 (phenylalanine in the crystal structure), which covalently links to DNA during Int-mediated recombination, must be activated for nucleophilic attack of the scissile phosphate. This most likely occurs by donating the tyrosine hydroxyl proton to a nearby Lewis base. In the Int c170 structure, the phenyl ring of Tyr342 is sandwiched between Glu349 and Asp351. These or other nearby acidic residues like Asp344 (His in some Int family proteins) (Fig. 2B), Asp345, or Glu354 could activate Tyr342 or aid in positioning it on DNA. These acidic residues are not strictly conserved in the integrase-related proteins, but the acidic character of this segment (Fig. 2) is a consistent feature of these recombinases.

One striking feature of the Int catalytic domain structure is the location of the catalytic Tyr342 on an exposed 17 amino acid loop extending from Lys334 to Trp350 between αG and β7 (Fig. 1D). In one protomer no electron density is evident for strand β6. In the other protomer, a crystal contact stabilizes the orientation of β6 and the β6 to β7 turn and the atomic B factors of this segment are high, averaging 58 Å2 as opposed to 25 Å2 elsewhere in the protein. Downstream of the flexible segment, β7 contacts the protein core in both protomers and thereby restricts the range of motion of Tyr342. The finding that Tyr342 is flexibly tethered to the rest of the protein by a peptide loop is consistent with the proteolytic sensitivity of this segment in both λ Int and yeast FLP recombinases (11, 31). Three- or four-residue insertions in this region of FLP inactivate strand cleavage, although DNA-binding activity is retained (31).

Flexible loops are a common feature of proteins that bind to the phosphates of DNA or nucleotides. Examples include the glycine-rich loop of nucleotide-binding folds (32), the active site loop of protein kinases (33, 34), the DNA-binding loop of deoxyribonuclease (DNase I) (35), the surface loops of the γδ resolvase catalytic domain (36, 37), and the loop harboring the conserved glutamic acid of the D-D-E motif in HIV-1 integrase (20) and MuA transposase (22). The movable loop of λ Int could shield the phosphotyrosine intermediate from solvent-mediated hydrolysis, or it could contribute to the correct positioning of the scissile phosphate for cleavage. Serine and threonine are prominent in the loops of the integrase-related proteins, and these residues could bind to DNA phosphates.

The relatively long flexible loop that loosely tethers the active site Tyr342 to the body of the protein explains one of the major mechanistic paradoxes concerning DNA cleavage by Int family recombinases. Conservative site-specific recombination involves the cleavage, exchange, and rejoining of four DNA strands. Although it is generally recognized that this reaction is executed by four protomers (one for each DNA strand cleaved), experiments addressing the roles of individual protomers support both cis cleavage and trans cleavage of DNA by Int family proteins. In the cis cleavage model, a single protomer provides both the catalytic tyrosine and the Arg-His-Arg triad for cleavage of one DNA strand (8, 18). In the trans cleavage model, the tyrosine from one protomer cleaves a DNA strand that is bound and activated by the Arg-His-Arg triad of a neighboring protomer (17). A substantial body of evidence indicates that FLP cleaves DNA in trans, and current data suggest that the two collaborating protomers are bound to the same DNA (as opposed to synapsing partners) during at least one step of recombination (38). Trans cleavage has also been reported for λ Int and Cre (39, 40). However, other experiments point to a cis cleavage mechanism for λ Int (18) and the related XerC- XerD recombinases (15). These findings have raised fundamental questions of how closely related proteins, and possibly the same protein, can cleave DNA either in cis or in trans (3, 8, 38, 41, 42).

In considering how the λ Int catalytic domain interacts with att site DNA, we are guided by the experimental observations that Tyr342 covalently attaches to the scissile phosphate and that the Arg311 homologue of the closely related XerD recombinase directly contacts the scissile phosphate (16). We have positioned two Int c170 protomers on an idealized att site in a configuration that accounts for the interactions described above and that places the positively charged groove of the protein in contact with DNA. For clarity, we show only one of the two Int protomers modeled on DNA (Fig. 3). Both subunits lie to the 3ʹ-side of their respective cut sites with the dimer interface located over the major groove surface of the overlap region between cut sites. The NH2-terminal end of each c170 protomer lies over the minor groove at the 3ʹ-end of the consensus att core sequence 5ʹ-CAACTT-3ʹ, adjacent to its major groove surface. This orientation is consistent with evidence that residues upstream of the c170 fragment are important for core site recognition by lambda Int and related proteins (11, 28, 29, 43). Our theoretical model of the Int c170-DNA complex is also consistent with the pattern of methylation protection for Int-DNA complexes, in which the minor groove adjacent to the cut site and the major groove surfaces of the overlap region and of the core binding sites are protected by Int along one face of the helix (44).

Fig. 3.

Theoretical model of the λ Int catalytic core bound to a B-form half-att site. A full att site contains a pair of inverted core-type Int binding sites. An Int protomer at each site is responsible for cleaving one DNA strand via formation of a covalent 3ʹ phospho-tyrosine linkage and a free 5ʹ-hydroxyl. The two nicks are staggered by seven base pairs with a 5ʹ overhang. For clarity, only one subunit of the Int c170 dimer that was modeled on DNA is shown. The catalytic Arg-His-Arg triad (cyan) of Int is docked over one of the scissile phosphates (shown as breaks in the DNA ribbon). The Cα trace of Int c170 (blue) is displayed with the active site loop containing the Tyr342 nucleophile shown in two alternative conformations. The orientation corresponding to cis cleavage (orange tyrosine) is a theoretical model, whereas that corresponding to trans cleavage (red tyrosine) is present in one of two Int protomers in the crystal structure. The segment of the loop that is disordered in both protomers (Lys334 to Gln341) is modeled in pink.

The active site loop of λ Int is disordered in the unliganded structure of Int c170, and it could deliver Tyr342 to the Arg-His-Arg catalytic triad in either a cis or a trans configuration. The cis-cleavage mode requires the disruption of the β6-β7 hairpin, which is only visible in one of the two protomers in the crystal structure, and the movement of Tyr342 toward the active site center. This is consistent with the observation that the topoisomerase activity of λ Int is enhanced by deletion of residues 350 to 356 (β7) (14). Alterations that remove or destabilize the β6-β7 hairpin are likely to release the active site Tyr342, and these findings support the possibility that a fixed orientation of Tyr342 may not be required or preferred for enzymatic activity. The λ Int loop can readily be modeled in a cis configuration that orients Tyr342 for an in-line attack of the scissile phosphate coordinated by the Arg-His-Arg triad (Fig. 3). In the trans cleavage mode, Tyr342 would engage one scissile phosphate and the catalytic triad of the same protomer would bind to the opposite scissile phosphate located some 25 Å away in a B-form att site. Indeed, the Tyr342 of one Int protomer in the asymmetric unit extends away from the protein core and it is located 23 Å from His308 and about 17 Å from Arg212 and Arg311 (Figs. 1 and 3). In this orientation, each Int protomer can bridge the scissile phosphates of the att site in a trans cleavage mode. It is to be noted that FLP, for which trans cleavage has been most thoroughly documented, and all five of the other eukaryotic recombinases, have loop segments that are five to eight residues longer than those of other Int family proteins (3). These additional residues may restrict the tyrosine-containing loop of FLPs to the trans orientation. The structure of the λ Int catalytic domain provides an explanation for both cis and trans DNA cleavage reactions catalyzed by closely related recombinases and, perhaps, catalyzed by the same protein.

The folding of disordered segments within the DNA-binding domains of many proteins accompanies their binding to specific DNA targets, and such an "induced fit" may enhance discrimination against nonspecific sites (45). The productive interaction of a site-specific recombinase with DNA generates double-stranded breaks, so it is essential that DNA-binding or cleavage activity is stringently controlled. In that λ Int and other recombinases show relatively high nonspecific DNA binding (44), strand cleavage must be tightly controlled. The flexible active site loop of λ Int may provide such a control mechanism. In the case of cis cleavage the flexible loop may adopt the catalytically competent orientation only in the context of the correct multiprotein-DNA complex, thereby keeping the active site tyrosine at bay until the recombination apparatus is poised to proceed. In the case of trans cleavage, the requirement for two precisely aligned protomers could serve to restrain inappropriate cleavages, as proposed for FLP (17). A second potential role of the flexible catalytic loop may be to accommodate strand transfer during synapsis. Movement of the protein-DNA complex may be required for access of the invading strand to the catalytic center, and this movement might be facilitated by adjustment of the active site loop.

The structure of λ Int c170 with its flexibly tethered tyrosine nucleophile also suggests a mechanism to explain another variable property of Int family recombinases. Some members of the Int family are, like λ Int, quite fastidious in their requirement for DNA-DNA homology within the overlap regions of two recombining partners (46). Other Int family recombinases, such as the Tn1545 and Tn916 transposases, are more relaxed in their response to heterologies between recombining partners (47, 48). This difference might be due in part to the variety of amino acids occupying this segment of Int family recombinases (Fig. 2B). Alterations in the delivery arm of Tyr342 could be expected to significantly influence such parameters as the rate, reversibility, and specificity of the DNA cleavage and ligation reactions.


  1. We thank E. Healey for purified protein, S. Nunes-Düby for help with sequence alignments, R. Sweet for assistance with data collection at beamline X-12C, National Synchrotron Light Source (Upton, NY), T. Oliveira for technical assistance, J. Boyles for assistance with manuscript preparation, and J. Cheah, S. Doublié, S. Nunes-Düby and other members of our research groups for their assistance and comments. Supported by the Lucille P. Markey Charitable Trust (TE) and NIH grants AI13544 and GM33928 (AL) and a Howard Hughes Medical Institute predoctoral fellowship (HJK). The coordinates have been deposited in the Brookhaven Protein Data Base with accession number 1AE9.
View Abstract

Navigate This Article