A General Strategy for Selecting High-Affinity Zinc Finger Proteins for Diverse DNA Target Sites

See allHide authors and affiliations

Science  31 Jan 1997:
Vol. 275, Issue 5300, pp. 657-661
DOI: 10.1126/science.275.5300.657


A method is described for selecting DNA-binding proteins that recognize desired sequences. The protocol involves gradually extending a new zinc finger protein across the desired 9- or 10-base pair target site, adding and optimizing one finger at a time. This procedure was tested with a TATA box, a p53 binding site, and a nuclear receptor element, and proteins were obtained that bind with nanomolar dissociation constants and discriminate effectively (greater than 20,000-fold) against nonspecific DNA. This strategy may provide important information about protein-DNA recognition as well as powerful tools for biomedical research.

Design of DNA-binding proteins that will recognize desired sites on double-stranded DNA has been a challenging problem. Although a number of DNA-binding motifs have yielded variants with altered specificities, zinc finger proteins related to TFIIIA (1) and Zif268 (2) appear to provide the most versatile framework for design. Modeling, sequence comparisons, and phage display have been used to alter the specificity of an individual zinc finger within a multifinger protein (37), and fingers also have been “mixed and matched” to construct new DNA-binding proteins (8, 9). These design and selection studies have assumed that each finger [with its corresponding 3-base pair (bp) subsite] can be treated as an independent unit (Fig. 1B). This assumption has provided a useful starting point for design studies, but crystallographic studies of zinc finger-DNA complexes (1013) reveal many examples of contacts that couple neighboring fingers and subsites, and it is evident that context-dependent interactions are important for zinc finger-DNA recognition (3, 7, 8). Existing strategies have not taken these interactions into account in the design of multifinger proteins, and this may explain why there has been no effective, general method for designing high-affinity proteins for desired target sites (14).

Fig. 1.

(A) Amino acid sequence and secondary structure of the Zif268 zinc fingers. [Adapted from (10)] Randomized positions (circled) correspond to residues −1, 1, 2, 3, 5, and 6 in each of the α helices and include every position that makes a base contact in one of the known zinc finger-DNA complexes (1013, 30). The wild-type Zif268 sequence was retained at all other positions in the new proteins. (B) Key base contacts (solid arrows) in the Zif268-DNA complex (10, 13). Most of the bases contacted are located on the primary (guanine-rich) strand (boldface). Each finger makes several base contacts with its 3-bp subsite (dashed boxes), but also makes important base and phosphate contacts in flanking subsites. The 1.6 Å structure (13) shows that the aspartic acid at position 2 in finger 2 contacts a cytosine that is just outside the canonical 3-bp subsite. Analogous contacts from position 2 in the other fingers (dashed arrows) have less favorable hydrogen-bonding geometry, but binding site selections (32) suggest that these contacts may contribute to recognition. Contacts made by Tramtrack (11) and GLI (12) also include bases and phosphates outside the canonical 3-bp subsites. (C) DNA sequences of the sites used in our selections. The TATA box is from the adenovirus major late promoter (33), the p53 binding site is from the human p21WAF1/CIP1 promoter (31), and the NRE is from the human apolipoprotein AI promoter (34). One strand of each duplex site is shown. (D) Structure of the wild-type Zif268 zinc finger-DNA complex (10, 13). The DNA is gray, and a ribbon trace of the three zinc fingers is shown in red (finger 1), yellow (finger 2), and purple (finger 3). The 18 residues that were randomized in this study (van der Waals surfaces shown in blue) occupy the major groove of the DNA and span the entire length of the binding site. [Image created with Insight II (Biosym Technologies, San Diego, California)]

We have developed a selection strategy that can accommodate many of these context-dependent interactions between neighboring fingers and subsites. Our strategy involves gradual assembly of a new zinc finger protein at the desired binding site—adding and optimizing one finger at a time as we proceed across the target site. We use the Zif268 structure (10, 13) as our framework and randomize six potential base-contacting positions in each finger (Fig. 1, A and D) (15). Our protocol includes three selection steps (Fig. 2), one for each finger of the new protein: (i) A finger that recognizes the 3′ end of the target site is selected by phage display (Fig. 2A). At this stage, two wild-type Zif fingers are used as temporary anchors to position the library of randomized fingers over the target site, and we use a hybrid DNA site that has Zif subsites fused to the target site. (ii) The selected finger is retained as part of a “growing” protein and, after the distal Zif finger is discarded, phage display is used to select a new finger that recognizes the central region of the target site (Fig. 2B). (iii) Finally, the remaining Zif finger is discarded, and phage display is used to select a third finger that recognizes the 5′ region of the target site (Fig. 2C). Optimization of this finger yields the new zinc finger protein.

Fig. 2.

Overview of protocol that successively selects finger 1, finger 2, and finger 3 to create a new zinc finger protein. Fingers that are present in the phage libraries used in these steps (15) are indicated on the left side of each panel. Zif1 and Zif2, wild-type Zif268 fingers; R, a randomized finger library; and asterisk, a selected finger. Small horizontal arrows indicate the multiple cycles of selection and amplification used when selecting each finger by phage display (35). The right side of each panel shows the binding sites used in selections with the TATA site and indicates the overall binding mode for the selected fingers [each DNA duplex has biotin (not shown) attached at the 3′ end of the upper strand]. Vertical arrows indicate how fingers selected in earlier steps are incorporated into the phage libraries used in later steps and reselected to optimize affinity and specificity in the new context (16). (A) A randomized finger 1 library was cloned into the pZif12 phagemid display vector (36), and selections with this library were performed in parallel at the TATA, p53, and NRE sites (17). (B) The wild-type Zif1 finger was removed, and a randomized finger 2 cassette was ligated to the appropriate vector pool and optimized by phage display (29). (C) The remaining wild-type finger was removed, and a randomized finger 3 cassette was added and optimized by phage display. To construct the sites used in these selections, we fused the target strand with the higher purine content to the guanine-rich strand of the Zif268 site. Because of the overlapping base contacts that can occur at the junction of neighboring subsites (Fig. 1B), the 3′ end of the target site (Fig. 1C) was aligned so that it overlapped with the Zif2 subsite.

Our strategy ensures that the new fingers are always selected in a relevant structural context. Because an intact binding site is present at every stage, and because our selections are performed in the context of a growing protein-DNA complex, our method readily optimizes context-dependent interactions between neighboring fingers and subsites and naturally selects for fingers that will function well together (16). To ensure that the selected proteins will bind tightly and specifically to the desired target sites, we performed all selections in the presence of calf thymus competitor DNA (3 mg/ml) (17). This serves to counterselect against any proteins that bind promiscuously or prefer alternative sites, and our protocol thus directly selects for affinity as well as specificity of binding (18).

We tested our protocol by performing selections with a TATA box, a p53 binding site, and a nuclear receptor element (NRE) (Fig. 1C). These important regulatory sites were chosen because they normally are recognized by other families of DNA-binding proteins and because these sites are quite different from the guanine-rich Zif268 site and from sites that have been successfully targeted in previous design studies (14). After the multiple rounds of selections (Fig. 2) were completed, the final phage pools bound tightly to their respective target sites. DNA sequencing of eight clones from each pool revealed marked patterns of conserved residues (Fig. 3) (19), and many of the selected residues (Arg, Asn, Gln, His, and Lys) could readily contribute to base recognition (20).

Fig. 3.

Amino acid sequences of new zinc finger proteins that recognize (A) the TATA box, (B) the p53 binding site, and (C) the NRE. Residues selected at each of the six randomized positions are shown (37). Six or more of the eight clones in each phage pool encode unique zinc finger proteins (16, 19). A box indicates the clone that was overexpressed and used for binding studies. Residues that are fully conserved (eight of eight clones) are shown in boldface; residues that are partially conserved (four or more of eight) are denoted by lowercase letters in the consensus sequence below the set of clones. Modeling (38) suggests that these new zinc finger proteins (including those that recognize the TATA box) can bind to B-form DNA. Each panel indicates how the fingers could dock with a canonical 3-bp spacing (dashed boxes), and dashed arrows indicate plausible base contacts (20, 26). Recent data from studies of a designed zinc finger protein provide precedence for many of these contacts (39). Detailed modeling suggests many additional contacts (not shown), including some that couple neighboring fingers and subsites (38). For the p53 site, there is an alternative, equally plausible, docking arrangement with a 4-bp spacing for one of the fingers (40). A section of the NRE site shows a 5 of 6 bp match (underlined) with the Tramtrack binding site, and these matching segments happen to be aligned such that the new fingers bind in the same register as the Tramtrack fingers (11). Every Tramtrack residue that contacts one of the matching bases (solid arrows) was recovered in our selections (26). Two residues that do not directly contact the DNA in the Tramtrack complex were also recovered (at positions 5 and 6 in NRE finger 3).

Because of the marked sequence conservation within each of the final phage pools, we used a single clone from each set for further analysis. The corresponding peptides were overexpressed in Escherichia coli and purified (21). Affinities of the peptides for their respective target sites were determined by electrophoretic mobility shift analysis (22), and the measured dissociation constants (Kd's) were 0.12 nM for the TATA box, 0.11 nM for the p53 binding site, and 0.038 nM for the NRE. These new complexes are almost as stable as the wild-type Zif268-DNA complex (Kd of 0.010 nM under these buffer conditions).

Apparent Kd's for nonspecific DNA were estimated by competition experiments with calf thymus DNA (23). Ratios of the nonspecific to specific dissociation constants (Kdns/Kd) indicate that the peptides selected for the TATA box, p53 binding site, and NRE discriminate effectively against nonspecific DNA (preferring their specific sites by factors of 25,000, 54,000, and 36,000, respectively). These ratios are similar to the specificity ratio of 31,000 that we measured for wild-type Zif268. Taken together, the affinities and specificities of the new proteins indicate that they bind as well as many natural DNA-binding proteins.

Many discussions of zinc finger-DNA recognition have considered the idea of a “code” that specifies which positions along the α helix contact the DNA and which side chain-base interactions are most favorable at each position (5, 24). There are recurring patterns of contacts in some zinc finger proteins (10, 11), and similar patterns are apparent in the proteins we selected (Fig. 3). Thus, when adenine or guanine occurs in the primary strand of one of our binding sites (the strand corresponding to the guanine-rich strand of the Zif268 site), there often is a conserved residue at position −1, 3, or 6 of the α helix that could form hydrogen bonds with this base (20). Related patterns have been discussed in previous design and selection studies (36). There also are strong “homologies” between the zinc fingers we have selected and natural zinc fingers that may recognize the same subsites (Fig. 3) (25).

Such simple patterns are not seen at other positions in our selected proteins. Thus, we found no simple patterns of residues at positions 1, 2, and 5 of the α helix, and when thymine or cytosine occurs on the primary strand (Fig. 3), we found no simple pattern of potential contacts from residues at positions −1, 3, and 6. However, there still are numerous instances in which residues at these positions are highly conserved within a particular set of proteins (Fig. 3), and we infer that many of these conserved residues make energetically significant contributions to folding or binding (26). Because no readily predicted pattern of coded contacts is apparent, we surmise that residues at these positions may be involved in more subtle, context-dependent interactions. In short, there still is no general code that can be used to design optimal zinc finger proteins for any desired target sequence or that can predict the preferred binding site of every zinc finger protein (27). Nonetheless, our sequential selection strategy should provide valuable information about potential patterns in zinc finger-DNA recognition, because it (i) makes few assumptions about the preferred spacing, docking, or contacts of the individual fingers; (ii) yields proteins with essentially wild-type affinities and specificities; (iii) yields sequences that match very well with those of natural zinc finger proteins that recognize similar subsites (25); and (iv) can readily be adapted to pursue analogous studies with other TFIIIA-like zinc finger proteins.

The sequential selection strategy provides a general and effective method for design of new zinc finger proteins, and our success with a diverse set of target sites suggests that it should be possible to select zinc finger proteins for many important regulatory sequences. These proteins could then be fused with appropriate regulatory or effector domains for a variety of applications. The protocol also could be adapted to allow selection of proteins with four, five, or six fingers or to allow optimization of zinc fingers fused to other DNA-binding domains (28). Related selection methods might be developed for other families of multidomain proteins, including other DNA- and RNA-binding proteins, and possibly even modular domains involved in protein-protein recognition. The sequential selection strategy should open the field to a host of applications and studies, including tests to see how designer zinc finger proteins can be used in gene therapy.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
View Abstract

Navigate This Article