A Structural Basis for Recognition of A·T and T·A Base Pairs in the Minor Groove of B-DNA

See allHide authors and affiliations

Science  02 Oct 1998:
Vol. 282, Issue 5386, pp. 111-115
DOI: 10.1126/science.282.5386.111


Polyamide dimers containing three types of aromatic rings—pyrrole, imidazole, and hydroxypyrrole—afford a small-molecule recognition code that discriminates among all four Watson-Crick base pairs in the minor groove. The crystal structure of a specific polyamide dimer-DNA complex establishes the structural basis for distinguishing T·A from A·T base pairs. Specificity for the T·A base pair is achieved by means of distinct hydrogen bonds between pairs of substituted pyrroles on the ligand and the O2 of thymine and N3 of adenine. In addition, shape-selective recognition of an asymmetric cleft between the thymine-O2 and the adenine-C2 was observed. Although hitherto similarities among the base pairs in the minor groove have been emphasized, the structure illustrates differences that allow specific minor groove recognition.

Before the first structure of a molecule bound to DNA had been determined, specific recognition of double helical B-form DNA was predicted to occur primarily in the major, rather than the minor, groove (1). This proposal was based on the observation that for A,T base pairs, the hydrogen bond acceptors at N3 of adenine and O2 of thymine are similarly placed and lack any prominent distinguishing feature (1) (Fig. 1). Subsequent structures of DNA binding domains cocrystallized with DNA supported this idea, because most of the specific contacts were made with the major groove (2). The principle that “the major groove is a better candidate for sequence-specific recognition than the minor groove” (3) continues to provide the basis for strategies to decipher rules for protein-DNA recognition.

Figure 1

Anatomy of the T·A base pair. Arrows indicate potential sites for discrimination of A·T from T·A in the major and minor grooves. Lone pair electrons in the minor groove are shown as ovals, and Watson-Crick hydrogen bonds of the base pair as dotted lines. Arrows for previously described sites (1) are black, and sites identified in this report are red. The type of potential recognition is labeled: a, hydrogen bond acceptor; d, hydrogen bond donor; and vdW, van der Waals.

Although there has been remarkable progress in the design of zinc fingers to recognize the major groove (4), no protein structure motif has been identified that provides an α-amino acid–base pair code for the minor groove. Eight-ring hairpin polyamides have affinities and specificities that rival those of major groove–binding proteins (5) and have been shown to permeate living cells and inhibit specific gene expression (6). The side-by-side pairing of the residues in the polyamide dimer determines the DNA sequence recognized. An imidazole (Im)/pyrrole (Py) pair distinguishes G·C from C·G and both of these from A·T and T·A base pairs (5), and the structural basis of this discrimination is now understood (7). However, a structural understanding for how a hydroxypyrrole (Hp)/Py pair distinguishes T·A from A·T and both of these from G·C and C·G (8) has yet to be established. To address this question, we determined the cocrystal structure of a polyamide of sequence ImHpPyPy-β-Dp (Fig. 2A), bound as a dimer to a self-complementary 10–base pair oligonucleotide containing all four Watson-Crick base pairs, 5′-CCAGTACTGG-3′ (binding site in bold; β, β-alanine; Dp, dimethylamino-propylamide) (Fig. 2B and Table 1). The structure of the polyamide ImPyPyPy-β-Dp, containing a Py-Py pair that does not distinguish A·T and T·A (9, 10), bound to the same duplex was solved for comparison. In both the ImHpPyPy and ImPyPyPy structures, the polyamides bind as antiparallel dimers centered over the target GTAC sequence in the minor groove of a B-form DNA duplex (Fig. 2B). The NH2- to COOH-terminal orientation of each fully overlapped polyamide is parallel to the adjacent 5′-to-3′ strand of DNA, consistent with previous chemical (11) and structural studies of polyamide dimers (7, 10, 12, 13).

Figure 2

(A) Omit |F o| – |F c| electron density map for one of the ImHpPyPy polyamide molecules, contoured at 1.5σ, showing the position of the 3-hydroxyl group. The numbering of the atoms used in the text is indicated below on the chemical structure. The Hp is red and the Py that would be paired with it is yellow. The Im, the other Py, β, and Dp are silver. (B) Space-filling model of (ImHpPyPy)2·5′-CCAGTACTGG-3′. Adenosine is purple and thymidine cyan; polyamide is colored as above. A schematic is shown to the right, with the aromatic residues of the polyamide indicated by filled circles and β by the diamonds. The overall structure of (ImPyPyPy)2·5′-CCAGTACTGG-3′ is similar.

Table 1

Data collection and refinement statistics. The ImHpPyPy and ImPyPyPy structures crystallized in an isomorphous lattice (22) and were solved by molecular replacement with a B-DNA model (23). The ImHpPyPy data were collected on beamline 9-1 at the Stanford Synchrotron Radiation Laboratory (SSRL), with a MAR Research image plate detector at wavelength 0.98 Å. The ImPyPyPy data were collected on an R-Axis IIC image plate with CuKα radiation produced by a Rigaku RU200 rotating anode generator with double-focusing mirrors and a Ni filter. Both sets of data were collected on flash-cooled crystals. The data were processed with DENZO/SCALEPACK (24). Free-R sets comprising 5% of the data were chosen to contain the same reflections in resolution shells that overlapped between the data sets. All data were used, with bulk solvent correction and anisotropic B-scale applied with the program X-PLOR (25) and no sigma cutoff. The polyamide β-Dp tails of both structures were modeled in alternate conformations. The planarity of the bases, aromatic rings, and peptide bonds were restrained throughout the refinement. Topology and parameter files for polyamides and Tris were generated with XPLO2D (26), and nucleic acid parameters were those of Parkinsonet al. (27).

View this table:

Although the functional groups of adenine and thymine are very similar in the minor groove, the number of lone pairs on the hydrogen bond acceptors is different: a thymine-O2 has two free lone pairs, whereas an adenine-N3 has only one (Fig. 1). The amide nitrogens of the ligand form hydrogen bonds with the purine-N3 (A or G) or pyrimidine-O2 (T or C). As a result, the hydrogen bond potential of adenine-N3 is filled when a polyamide composed of imidazole or pyrrole residues is bound, but the thymine-O2 has the capacity to accept an additional hydrogen bond. We found that both the hydroxyl group of the Hp and the amide-NH of the preceding residue form hydrogen bonds with the target thymine-O2 of the adjacent DNA strand (Fig. 3A). A similar interaction between the Hp and the adenine-N3 would be impossible without loss of the hydrogen bond from the preceding amide-NH.

Figure 3

(A) The hydrogen bonds between ImHpPyPy and one strand of DNA, indicated by dashed lines. (B) Space-filling model of the Hp/Py pair interacting with the T·A base pair shows that the Hp-OH tightly fits the cleft formed by the adenine-C2H. Figures were prepared by use of Molscript, Bobscript, and Raster3D (21).

Although a hydrogen bond of favorable length (Table 2) is formed between Hp and thymine-O2, it was possible that the position of the hydroxyl and the amide out of the plane of the thymine-O2 sp2-hybridized lone pairs would weaken the hydrogen bonds. The thymine-C2=Ô2⋯O6-Hp and thymine-C2=Ô2⋯N-amide angles and their out-of-plane and in-plane components were calculated to be 17° or 35° out of the plane, for the Hp and amide, respectively, and 25° in the plane (Table 2). The observed values for the components of the hydroxyl and amide hydrogen bond angles with thymine were found to be comparable to hydrogen bond angles between carbonyls and waters in protein structures, which range from ∼0° to 60° for both the in- and out-of-plane angles (14). In addition, the out-of-plane thymine-C2=Ô2⋯N-amide components in the ImPyPyPy structure are approximately the same as those of ImHpPyPy, indicating that formation of an additional hydrogen bond with the hydroxyl does not substantially perturb the hydrogen bond geometry between the amide and the thymine-O2.

Table 2

Polyamide-DNA hydrogen bonds. Dashes indicate not applicable.

View this table:

In addition to the difference in number of lone pairs of the adenine-N3 versus thymine-O2, adenine is also distinguished from thymine by a bulkier aromatic ring. Although the adenine-C2-H does not protrude into the minor groove like the guanine exocyclic amine, the additional carbon results in an asymmetric cleft in the minor groove of a T·A base pair (8, 15) (Fig. 1). The adenine-C2 of the ImHpPyPy structure contacts the Hp hydroxyl (Fig. 3B). Modeling the target thymine as an adenine reveals that the C2 carbon of a mismatch “adenine” opposite an Hp residue would sterically overlap the hydroxyl by 1 to 2 Å (depending on the hydrogen positions). Furthermore, the orientation of the Hp hydroxyl observed in the ImHpPyPy structure, 3.5 Å from the adenine-C2, with an average adenine-C2-H⋯O6-Hp angle of 165° (depending on the hydrogen positions) (Table 2), indicates that the Hp-O6 forms a favorable C-H hydrogen bond with the adenine-C2-H. As in this case, C-H hydrogen bonds are strongest between aromatic carbons adjacent to nitrogen atoms with oxygen hydrogen bond acceptors (16). Shape-selective recognition of the asymmetric cleft is the second feature that allows the Hp/Py pair to discriminate T·A from A·T.

The sugar-phosphate backbones in the ImHpPyPy and ImPyPyPy structures superimpose with 0.75 Å root-mean-square (rms) difference. In both structures, the oligonucleotides have the standard B-DNA features of 35° twist, 3.4 Å rise per residue, and C2′-endo sugar pucker, but they are distinguished from ideal B-form by a strong propeller twist and opening of the target T·A base pairs. However, the Hp/Py pairs induce a change in the T·A base pairs from no shear (−0.2 average displacement between the bases in the base pair, perpendicular to the helix axis) to a large positive shear (1.2 Å, average) (Table 3). The movement of the bases past one another may result from the Hp-O6 contact with the adenine-C2, pressing the adenine of the target base pair back into the major groove. The increased displacement between the bases stretches the Watson-Crick hydrogen bonds between them by 0.5 Å, on average (Table 3, center portion). Although the specificity of Hp-containing polyamides is greatly increased for T·A compared with A·T, the affinities are slightly reduced relative to the Py counterparts. For example, ImHpPyPy-β-Dp and ImPyPyPy-β-Dp bind a 5′-AGTACT-3′ site with equilibrium dissociation constants of 344 and 48 nM, respectively (17). The energetic penalty due to the partial “melting” of the target T·A base pairs could account for the 1.2-kcal/mol reduction in binding affinity (18).

Table 3

DNA conformation.

View this table:

The change in the shear in the presence of the Hp/Py versus the Py/Py pair is more dramatic for one of the two crystallographically independent T·A base pairs than for the other (2.2 Å compared with 0.4 Å). A buffer molecule from the crystallization solution, tris-(hydroxymethyl)-aminomethane molecule (Tris), is bound in the major groove of this A·T base pair of the ImHpPyPy structure. No evidence for a corresponding buffer molecule was found in the major groove of the ImPyPyPy structure. The Tris molecule bound in the major groove selectively in the presence of an Hp/Py pair in the minor groove, suggesting that Hp-containing polyamides may be used as an indirect lever to manipulate interactions of proteins with the major groove.

The hydrogen bonds between the amides of each ImPyPyPy polyamide and the purine-N3 or pyrimidine-O2 of the adjacent DNA strand are maintained for the ImHpPyPy polyamide. However, the hydrogen bonds between the DNA and the ImHpPyPy amides are longer for the residues that follow the Hp than those observed for the ImPyPyPy complex (Table 2). The hydroxyl forms an intramolecular hydrogen bond with the following amide, causing the hydrogen bond of that amide with the adenine-N3 to become bifurcated and therefore weaker. This may be an additional source of the slightly decreased affinity of the Hp-containing polyamides relative to the Py counterparts.

These studies have established how a designed ligand can predictably discriminate A·T from T·A in the minor groove, using the double hydrogen bond acceptor potential of the thymine-O2 and the asymmetry of the adenine-C2 cleft (8, 15). The structure eliminates the possibilities that a bulky substitution at the Py 3-position might sterically clash with the thymine-O2 (12) or cause a gross distortion of the DNA duplex (19). In addition, the structural basis of minor groove recognition by a synthetic molecule raises the question of whether naturally occurring DNA binding proteins may use similar principles to distinguish between the base pairs in the minor groove (20).

  • * To whom correspondence should be addressed. E-mail: dervan{at} (P.B.D.); dcrees{at}


View Abstract

Navigate This Article