The Crystal Structure of TAL Effector PthXo1 Bound to Its DNA Target

See allHide authors and affiliations

Science  10 Feb 2012:
Vol. 335, Issue 6069, pp. 716-719
DOI: 10.1126/science.1216211


DNA recognition by TAL effectors is mediated by tandem repeats, each 33 to 35 residues in length, that specify nucleotides via unique repeat-variable diresidues (RVDs). The crystal structure of PthXo1 bound to its DNA target was determined by high-throughput computational structure prediction and validated by heavy-atom derivatization. Each repeat forms a left-handed, two-helix bundle that presents an RVD-containing loop to the DNA. The repeats self-associate to form a right-handed superhelix wrapped around the DNA major groove. The first RVD residue forms a stabilizing contact with the protein backbone, while the second makes a base-specific contact to the DNA sense strand. Two degenerate amino-terminal repeats also interact with the DNA. Containing several RVDs and noncanonical associations, the structure illustrates the basis of TAL effector–DNA recognition.

TAL effectors are proteins that are injected into plant cells by pathogens in the bacterial genus Xanthomonas. There they enter the nucleus, bind to effector-specific promoter sequences, and activate the expression of individual plant genes, which can either benefit the bacterium or trigger host defenses (1, 2). In each TAL effector, a variable number of tandem amino acid repeats (which are usually 34 residues in length), terminated by a truncated “half repeat,” mediates DNA recognition. Each of the repeats preferentially associates with one of the four nucleotides in the target site (3, 4). The repeats are located centrally in the protein between N-terminal sequences required for bacterial type III secretion and C-terminal sequences required for nuclear localization and activation of transcription (Fig. 1A).

Fig. 1

(A and B) Domain organization of PthXo1 and (C) structure of a single TAL effector repeat. TAL effectors contain N-terminal signals for bacterial type III secretion, tandem repeats that specify the target nucleotide sequence, nuclear localization signals (NLS), and a C-terminal region that is required for transcriptional activation. PthXo1 contains 23.5 canonical repeats (color coded to match Fig. 2) that contact the DNA target found in the promoter of the rice Os8N3 gene (15). Blue bases correspond to positions in the target where the match between protein and DNA differs from the optimal match specified by the recognition code (3, 4). Arrows indicate the start and end of the crystallized protein construct. In the structure, repeats 22 to 23.5 are poorly ordered, as are the C termini of the two N-terminal cryptic repeats. The sequence and structure of a representative repeat (number 14) is shown; RVD residues (HD) that recognize cytosine are shown in red. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; G, Gly; H, His; I, Ile; K, Lys; L, Leu; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; and V, Val.

The nucleotide specificity of individual TAL effector repeats is encoded by two adjacent residues (located at positions 12 and 13) called the repeat-variable diresidue (RVD) (Fig. 1, B and C) (4). More than 20 unique RVD sequences have been observed in TAL effectors, but just seven—HD, NG, NI, NN, NS, “N*” (which corresponds to a 33-residue repeat in which the RVD appears to be missing its second residue), and HG—account for nearly 90% of all repeats (5) and, respectively, specify C, T, A, G/A, A/C/T/G, C/T, and T (3, 4). These relationships enable prediction of targets for existing TAL effectors and engineering of artificial TAL effectors that bind DNA sequences of choice. Consequently, TAL effectors have received much attention as DNA-targeting tools (6).

Nearly all TAL effector binding sites observed in nature are preceded by a T (3, 4). Notably, the protein sequence immediately preceding the canonical TAL effector repeats bears some similarity to the repeat consensus. It has therefore been suggested that this region of the protein may participate in DNA binding by forming a cryptic repeat structure that specifies the T (7).

A recent nuclear magnetic resonance structural study of 1.5 repeats of TAL effector PthA, and an accompanying small-angle x-ray scattering study of the entire protein, indicated that an isolated TAL effector repeat is largely α-helical, similar to a tetratricopeptide (TPR) fold, and that the full-length protein compacts upon DNA binding (8). However, in that study, it was unclear to what extent the structure of repeats in the context of the entire protein might differ from that of an isolated repeat, and the manner in which individual repeats associate with contiguous DNA base pairs was not resolved.

A protein construct corresponding to residues 127 to 1149 of the 23.5 repeat TAL effector PthXo1 from the rice pathogen Xanthomonas oryzae (Fig. 1 and fig. S1) was crystallized bound to a 36–base pair DNA duplex (table S1) containing the target sequence found in the rice genome along with flanking sequences ending in short 3′ overhangs. The structure was determined with a high-throughput computational approach in which structural models built with the Rosetta software package (9) were iteratively refined and selected, guided by molecular replacement searches (fig. S2). The best model was subsequently validated with a variety of model-free features of electron density, including anomalous difference peaks calculated from a selenomethionyl derivative (fig. S3). The final structure was refined to 3.0 Å resolution to values for Rwork/Rfree of 0.264/0.294 and excellent geometry (Table 1).

Table 1

Crystallographic data and refinement statistics. WT, wild type; SeMet, selenomethionine; RMSD, root mean square deviation; ALS, Advanced Light Source; APS, Advanced Photon Source.

View this table:

The structure consists of a relatively unperturbed B-form DNA duplex, with 23 consecutive bases of the target site intimately engaged in the major groove by a superhelical arrangement of TAL effector repeats (Fig. 2). The overall dimensions of the protein-DNA complex are approximately 60 Å by 60 Å by 90 Å. The quality of the electron density is excellent from repeat 1 through the middle of repeat 22, and then becomes less well defined.

Fig. 2

Structure of the PthXo1 DNA binding region in complex with its target site. The coloring of individual repeats matches the schematic in Fig. 1.

All of the repeats in the DNA-bound PthXo1 structure form highly similar two-helix bundles (Fig. 1C). The helices span positions 3 to 11 and 14 to 33, locating the RVD in a loop between them. A proline located at position 27 creates a kink in the second helix that appears to be critical for the sequential packing and association of tandem repeats with the DNA double helix. The packing of consecutive helices within and between individual repeats is left-handed, in contrast to the right-handed packing of helices found in TPR proteins (10). The modular architecture of the TAL effector repeats is reminiscent of the mitochondrial transcription terminator mTERF (11) and the RNA-binding attenuation protein TRAP (12); however, interactions of those proteins with their nucleic acid targets are structurally distinct from those of TAL effectors with DNA and lack modular correspondence to single nucleotides.

Sequence-specific contacts of PthXo1 to the DNA are made exclusively by the second residue in each RVD to the corresponding base on the sense strand. In contrast, the side chain at the first position of each RVD contacts the backbone carbonyl oxygen of position 8 in each repeat, constraining the RVD-containing loop (Fig. 3). Additional, nonspecific contacts to the DNA are made by a lysine and glutamine found at positions 16 and 17. The average root mean square deviation between backbone atoms in any two repeats in the PthXo1 structure is ~0.8 Å for all atoms; it is slightly greater for the 33-residue N* repeats, which are missing one residue in the RVD loops (fig. S4). The positions within the core of individual repeats are occupied entirely by small aliphatic residues, whereas several positions in the interface between repeats correspond to polar residue pairs.

Fig. 3

Topology and contacts between TAL effector repeats and DNA bases. (A) Eight distinct combinations of RVDs and DNA bases are observed in the structure. HD (repeat 5) forms a steric and electrostatic contact with cytosine; HG (repeat 4) and NG (repeat 6) both form nonpolar interactions between the glycine α-carbon and the thymine methyl group. A “mismatch” between NG and a cytosine (repeat 11) results in a longer distance from the RVD to the base. NN associates with either guanine (repeat 16) or with adenine (which would interact with the same N7 nitrogen of the purine base). NI forms a desolvating interface with either adenine (repeat 3) or cytosine (repeat 19). The reduction in loop length by one residue in the N* RVD (repeat 7) results in an increased distance to the base. (B) Two adjacent repeats form a tightly packed left-handed bundle of helices that position the second amino acid of each RVD in proximity to corresponding consecutive bases in an unperturbed B-form DNA duplex. The first residue of each RVD (position 12, either His or Asn) forms H-bonds to the backbone carbonyl oxygen of amino acid position 8 of the same repeat.

The PthXo1-DNA structure displays five HD-containing repeats (all aligned to cytosines), four NG repeats and one HG repeat (aligned to thymines), one additional NG repeat aligned to cytosine, seven NI repeats (aligned with four adenosines and three cytosines), two NN repeats (both opposite a guanosine), and two N* repeats paired to cytosines (Fig. 1A). The observed contacts by individual repeats (Fig. 3) correlate well with their specificity and fidelity (or lack thereof) that have been described via bioinformatic and genetic analyses. The sole NS in PthXo1 and one additional N* are located in the last full repeat and the half repeat, respectively, which are disordered in the structure.

In the HD RVDs, the aspartate residue makes van der Waals contacts with the edge of the corresponding cytosine base and a hydrogen bond to the cytosine N4 atom. Contacts between cytosine bases in protein-DNA complexes and charged acidic side chains, which exclude alternative base identities via physical and electrostatic clash, have been observed in a wide variety of solved sequence-specific protein-DNA complexes (13).

Both the NG and HG repeats make a contact in which the backbone α carbon of the glycine residue forms a nonpolar van der Waals interaction with the methyl group of the opposing thymine base (average distance ~3.3 Å). At the one position where an NG is aligned opposite a cytosine base, the backbone carbonyl and α carbon of the same glycine residue displays a less favorable, far more distant contact (~6 Å).

The second asparagine residue in the NN RVDs is positioned to make a hydrogen bond with the N7 nitrogen of an opposing guanine base. This RVD associates with either guanosine or adenine with roughly equal frequency (3, 4, 14); the availability of an N7 nitrogen in either purine ring appears to explain that observation (13).

PthXo1 contains two 33-residue N* repeats (7 and 22). Because RVDs are followed immediately by two conserved glycine residues, this repeat is equivalent to an NG repeat in which one of those glycine residues is missing. The crystal structure indicates that the deletion results in a truncated RVD loop that extends less deeply into the DNA major groove, with the glycine at position 13 located a considerable distance (>6 Å) from the corresponding sense strand base. Consistent with this observation, the observed specificity of the N* repeat is relatively lax (4).

Finally, NI, which is the second most common RVD overall, accounting for roughly 20% of all TAL effector repeats, occurs seven times in PthXo1 and displays an unusual contact pattern to adenosine or cytosine bases. The aliphatic side chain of the isoleucine residue is observed to make nonpolar van der Waals contacts to C8 (and N7) of the adenine purine ring, or to C5 of the cytosine pyrimidine ring. These contacts would appear to necessitate desolvation of at least one polar atom in the adenosine ring, without the formation of a compensating hydrogen bond, and might therefore reasonably be expected to represent a reduced-affinity interaction.

N-terminal to the canonical repeats, the PthXo1 structure reveals two degenerate repeat folds that appear to cooperate to specify the conserved thymine that precedes the RVD-specified sequence (Fig. 4). We have designated these as the 0th and –1st repeats. Residues 221 to 239 and residues 256 to 273 each form a helix and an adjoining loop that resembles helix 1 and the RVD loop in the canonical repeats; the remaining residues in each region are poorly ordered. Those two N-terminal regions converge near the 5′ thymine base, with the indole ring of tryptophan 232 (in the –1st repeat) making a van der Waals contact with the methyl group of that base. Mutation of the thymine reduces TAL effector activity at the target (3, 15). Tryptophan-232, as well as the surrounding residues, is highly conserved across available, intact TAL effector sequences. Some TAL effectors efficiently target sequences preceded by a cytosine rather than a thymine (14, 16). Though less favorable, the packing of tryptophan 232 would be expected to accommodate this substitution.

Fig. 4

N-terminal cryptic repeats and contacts with 5′ thymine. (A) 2Fo-Fc electron density maps contoured around thymine at position “0” and tryptophan 232 in the “–1” repeat. (B) Residues 221 to 239 and residues 256 to 273 each form a helix and an adjoining loop that resembles helix 1 and the RVD loop in the canonical repeats; the remaining residues in each region are poorly ordered. W232 forms a nonpolar van der Waals contact with the methyl carbon of the thymine base at position 0.

In addition to revealing folding and interactions of the N-terminal cryptic repeats with the 5′ end of the DNA target site and illustrating the functions of the six most common repeat types in TAL effector–DNA recognition, the structure provides a basis for prediction of structures that are not represented. For example, an alignment of the 35-residue repeat type found in some TAL effectors with the more common 34-residue repeat type found in PthXo1 (fig. S5) indicates that the additional residue (a proline) at position 33 would be located within the relatively disordered turn region that connects the helices of one repeat to the next. The 35-residue repeat therefore can be predicted to be functionally indistinguishable from the 34. Likewise, although the sole NS repeat in PthXo1 is in an apparently disordered part of the protein-DNA complex, the overall homogeneity of the repeat structures and the consistent role of the first RVD residue in stabilizing the RVD loop to facilitate specific contacts of the second residue with the DNA should make it possible to computationally model the potential nucleotide interactions of NS, as well as those of rare or artificial RVDs.

The protein-DNA complex studied leaves some questions unanswered, such as the structure of the N- and C-terminal portions of TAL effectors that are, respectively, required for translocation and interaction with host transcriptional machinery. As well, because of the observed disorder at either end, it does not yet precisely define the minimal TAL effector–DNA binding domain. However, by demonstrating the essential features that accomplish interaction specificity, the structure provides a foundation for more accurately predicting and efficiently exploiting TAL effector–DNA targeting. More fundamentally, it reveals the hitherto enigmatic structural nature of a simple solution that an important group of pathogens has evolved to manipulate host gene expression in a specific yet highly adaptable manner.

Supporting Online Material

Materials and Methods

Figs. S1 to S5

Table S1

References (1729)

References and Notes

  1. Acknowledgments: This project was funded by NIH (grants RL1 0CA833133 to B.L.S., R01GM098861 to B.L.S. and A.J.B., and R01 GM088277 to P.H.B.), NSF grant 0820831 to A.J.B., and a Searles Scholars Fellowship to P.H.B. A.N-S.M. was supported by a training grant from the Northwest Genome Engineering Consortium. We thank the staff of the Advanced Light Source beamline 5.0.2 and L. Doyle, B. Shen, R. Takeuchi, J. Bolduc, and C. Schmidt for technical assistance and advice; T. Edwards and M. Clifton for collecting SeMet data; and C. Pabo for helpful discussion. A.J.B. is an inventor on a patent application titled “TAL effector-mediated DNA modification” (US-2011/0145940-A1 and PCT/US2010/059932). This intellectual property, co-owned by Iowa State University and the University of Minnesota, has been licensed to Cellectis. The refined coordinates and corresponding x-ray intensities for the PthXo1-DNA structure have been deposited in the RCSB Protein Database (accession code 3UGM).
View Abstract

Navigate This Article