Structures from Anomalous Diffraction of Native Biological Macromolecules

See allHide authors and affiliations

Science  25 May 2012:
Vol. 336, Issue 6084, pp. 1033-1037
DOI: 10.1126/science.1218753


Crystal structure analyses for biological macromolecules without known structural relatives entail solving the crystallographic phase problem. Typical de novo phase evaluations depend on incorporating heavier atoms than those found natively; most commonly, multi- or single-wavelength anomalous diffraction (MAD or SAD) experiments exploit selenomethionyl proteins. Here, we realize routine structure determination using intrinsic anomalous scattering from native macromolecules. We devised robust procedures for enhancing the signal-to-noise ratio in the slight anomalous scattering from generic native structures by combining data measured from multiple crystals at lower-than-usual x-ray energy. Using this multicrystal SAD method (5 to 13 equivalent crystals), we determined structures at modest resolution (2.8 to 2.3 angstroms) for native proteins varying in size (127 to 1148 unique residues) and number of sulfur sites (3 to 28). With no requirement for heavy-atom incorporation, such experiments provide an attractive alternative to selenomethionyl SAD experiments.

Crystallographic structure determinations for biomolecules require the retrieval of phases, which are lost when measuring x-ray diffraction patterns. For the first protein crystal structures, phase evaluation was by the method of multiple isomorphous replacement (MIR) with derivatives incorporating mercury [atomic number (Z) = 80] or other heavy atoms. Once many structures were known, phases could often be estimated by the method of molecular replacement; however, de novo structure determination remained essential for molecules without adequately close structural relatives. Multiwavelength anomalous diffraction (MAD) analyses (1), which exploit element-specific scattering from x-ray resonance with atomic orbitals, came to be used increasingly for de novo structures as tunable synchrotron beamlines developed (2). Whereas MAD gives definitive phase information, its single-wavelength counterpart, SAD, is ambiguous in defining only trigonometric sines of phases. This phase ambiguity could be resolved once density-modification procedures, based largely on molecular boundaries and symmetry, were devised (3, 4); and SAD then surged (5). MAD and SAD now dominate de novo phasing, as they have the advantage that lighter atoms can be effective sources of phasing signals. Selenomethionine is easily incorporated into proteins (6), and selenium (Z = 34) is now by far the most-used phasing element (2). With MAD and SAD, metal atoms such as iron (Z = 26) present in some native proteins can also suffice.

Sulfur (Z = 16) is the heaviest element in most native proteins. Its K-shell resonance at 2.47 keV (λ = 5.02 Å) is inaccessible to standard MAD experiments, and its anomalous scattering at conventional wavelengths is slight; nevertheless, sulfur anomalous scattering can suffice for SAD phasing. The structure of crambin was the first to be determined from sulfur SAD phasing (7), although the experiment was not then identified as SAD. Later, broader effectiveness of sulfur SAD was demonstrated with tests on lysozyme (8) and in solving the structure of obelin (9). Similarly, the feasibility of phosphorous SAD was demonstrated for nucleic acids (10). The motivation for truly routine native SAD is great, because heavy-atom incorporations are often problematic, even for the most reliable selenomethionine. Subsequent optimization of native-SAD experiments has included developments for low-energy measurements (11), assessments of the impact of high data redundancy (10, 11), optimal wavelength selection (12), control of complications from radiation damage (13, 14), and the use of home-source CrKα radiation (15).

Besides test cases and technical developments, some novel protein structures beyond crambin and obelin have been determined by sulfur SAD analyses. As compared with the swelling numbers of SAD structures in general, however, the sulfur SAD component is small and not growing. Our February 2012, compilation of light-only SAD structures (all atoms with Z ≤ 20) as reported in the Protein Data Bank (PDB, reveals 57 novel structures determined since 1981 (table S1). Although more than 30 novel native-SAD structures were reported between 2004 and 2007, there have been only six after 2009 (not counting deposits reported here). Most of the table S1 native-SAD structures (42 of 57) are at high resolution (dmin < 2.0 Å); only two have dmin > 2.3 Å. Native-SAD numbers are to be compared with overwhelming SAD and MAD PDB deposits overall (>5000 each since 1996). Why is the sulfur SAD contribution so meager? Surely a major factor is the low strength of anomalous scattering signals from sulfur as compared with selenium, the mainstay of SAD and MAD experiments. With one sulfur atom for every 30 residues, the average metal-free protein will produce a Bijvoet diffraction ratio (|ΔF±h|/|F|) (1) of only ~1.0% even for 7 keV x-rays as compared with ~6.5% for a corresponding average selenomethionyl protein at the Se-K edge (12.658 keV). When compounded with noise from counting statistics, diffuse scattering, absorption, radiation damage, and many other sources, such feeble signals are difficult to measure with sufficient accuracy to place the dozens of atomic positions in a typical sulfur substructure and then to obtain phases for full structure analysis.

The ratio of signal to noise in diffraction measurements can be enhanced by increasing data multiplicity; however, radiation damage is the enemy of multiplicity, as crystal deterioration limits useful redundancy (16). We previously devised a procedure for combining the SAD data from multiple crystals to increase multiplicity without added radiation damage, whereby we were able to solve a relatively large and poorly diffracting selenomethionyl protein structure (17). For light-only native-SAD phasing, the anomalous signals are much weaker than those from Se; thus, sample preparation, data collection, and data analysis require much more care. Here, we describe procedures for robust structure evaluation from the small anomalous signals of light-atom–only molecules and apply these procedures in determining four protein structures.

Anomalous signals for light elements increase steadily with decreasing energy; notably, f″ (the determinative, imaginary component of anomalous scattering) for sulfur increases from 0.24 electrons (e) at the Se-K edge (12.658 keV) to 0.73 e at 7 keV and to 1.93 e at 4 keV (table S2). Thus, native-SAD analyses benefit from low-energy experiments. Enhanced f″ comes at the expense of increased x-ray absorption and incoherent scattering, however, which also increases with decreasing energy (table S2). All materials in all beam paths contribute, including the sample itself and gases between the sample and detector. Absorption diminishes diffracted signals, and incoherent scattering dramatically increases the background beneath integrated Bragg intensities; both reduce the signal-to-noise ratio. To optimize the experimental design, we estimated transmitted anomalous signals as a function of x-ray energy and sample size (fig. S1). For samples of the sizes used in these studies (100 to 300 µm), the optimal x-ray energy is at 6 to 7 keV, and we chose to use 7.112 keV x-rays as defined by the Fe-K edge; for smaller samples and microbeams, lower energies become optimal if one assumes appropriate detector geometry. To minimize diffuse scattering, we introduced a helium-filled cone in the sample-to-detector beam path and conscribed the beam size to match crystal size. To enhance signal-to-noise ratios, we adopted an inverse-beam data collection strategy (18), which reduces systematic errors by measuring Friedel mates with equivalent geometry, and we used a multicrystal strategy (17) to mitigate random errors by greatly increasing data multiplicity. To avoid crystal variation, we devised measures to assure statistical equivalence of all included crystals. To minimize effects of radiation damage, we merged data in successive wedges and excluded highly deteriorated wedges.

We undertook to develop our multicrystal native-SAD procedures by determining four crystal structures; experimental details are given in Table 1 and in the materials and methods section in the supplementary materials. Three of these applications solved protein structures without homologs of known structure (HK9S, CysZ, and netrin G2) and one (TorT/TorSS) “re-solved” a challenging test problem solved previously by other methods. One (CysZ) is an integral membrane protein and the others are domains of membrane receptors.

Table 1

Summary of native-SAD structure determinations, Embedded Image is estimated at zero scattering angle from a priori contents (6). SHELXD substructure success rate from 100 tries for HK9s and netrin G2, 1000 tries for CysZ, and 2000 tries for TorT/TorSS. MapCC is the correlation coefficient between density-modified (DM, no averaging) experimental- and model-phased maps. Auto-built residues are model building (successes/refined as ordered) by ARP/wARP except by Buccaneer for TorT/TorSS.

View this table:

Crystal variation is always a concern for structure determination from multiple crystals, especially because variations may arise upon rapid freezing (19); nevertheless, we did not observe adverse effects in our earlier eight-crystal mergings (17). Here, as before, the crystals did differ somewhat in various parameters, including unit cell dimensions, I/σ(I), Rmerge, anomalous correlation coefficient (ACC), and ∆F/σ(∆F) (tables S3 to S6). We evaluated these crystal variations, testing for statistical equivalence in three characteristics: unit cell variations, overall diffraction dissimilarity (1.0–pairwise intensity correlation coefficient), and the relative anomalous correlation coefficient (RACC). RACC correlates the Bijvoet differences from an individual data set with those of the data merged from all crystals. Note that we found evidence for rejecting only one data set (CysZ data set 8) as an outlier.

Analyses of unit cell variations (left), overall diffraction variations (center), and anomalous differences (right) are shown in Fig. 1 for 6 crystals of HK9S (A, B, and C); 8 crystals of CysZ (D, E, and F); 5 crystals of netrin G2 (G, H, and I); and 13 crystals of TorT/TorSS (J, K, and L). Variations among crystals within an experiment were small except for data set 8 of CysZ, which deviated from other CysZ crystals both for unit cell parameters (>4 σ) and diffracted intensities (>15%). The next biggest deviations were from data 2 of TorT/TorSS, where deviations were within 2.2 σ for unit cell parameters and within 5% for overall diffraction patterns, which indicated their compatibility.

Fig. 1

Variations among crystals from multicrystal data sets. (Left) Cluster analyses of unit cell variations. (Center) Cluster analyses of overall diffraction dissimilarity. (Right) Relative anomalous correlation coefficient. (A, B, and C) HK9S. (D, E, and F) CysZ. (G, H, and I) netrin G2. (J, K, and L) TorT/TorSS. Unit cell variations are standard Euclidean distances normalized by population variances, i.e., the distance between j and k of among N crystals is Δj,k = {Σi [(ui,jui,k)2/Vi]}½ where ui includes all variable-unit cell parameters i, each having a variance of Vi = σi2 = Σk (ukūk)2/N, k = 1 → N. Overall diffraction dissimilarity between crystals j and k is defined as Di,j = 1.0 – Cij, where Cij is the correlation coefficient between all Bragg intensities in common between the two diffraction patterns. For Cij calculations, high-angle data cutoffs were at 3.0 Å for HK9S, CysZ, and netrin G2 and 3.9 Å for TorT/TorSS. Clustering calculations were made in a single-linkage (minimum) mode for unit cell analyses and in a complete-linkage (maximum) mode for diffraction analyses. The linkage line connecting any two clusters defines the “distance” between those clusters. The RACC compares Bijvoet differences from an identified individual data set with those in the data set merged from all crystals.

Although the anomalous correlation coefficient (ACC) was a most effective measure of SAD phasing efficacy in our previous SeMet study (17), ACC proved to be unreliable for many single native-SAD data sets, with overall ACC values approaching zero. Thus, we devised the RACC to provide a more robust measure of anomalous signals. As shown in tables S3 to S6, typical RACC values are at or above 40%; CysZ outlier data set 8 was the one exception (RACC = 24.1%). For the four crystal systems in this study, we found that rejection criteria of unit cell deviations of >3σ, overall diffraction dissimilarity of >5%, and RACC of <35% were satisfactory and consistent with one another for outlier identification. A combinatorial criterion could be more robust, and appropriate rejection criteria could vary with characteristics of specific native-SAD experiments. Populated subclusters of data sets might be identified to yield satisfactorily valid but nonisomorphous structures.

Individual data sets from accepted crystals were further checked for exclusion of radiation-damaged wedges and for individual outlier rejections upon scaling and merging with the program SCALA (20). For each application, individual data sets were reordered according to overall RACC, and mergings were made by successively adding one data set at a time, both best to worst (tables S7A to S10A) and worst to best (tables S7B to S10B), followed by attempts at structure determination. In general, substructures were determined at reduced resolution (3.0 to 3.9 Å) and then used for phasing at the data limit. In each case, assessment parameters improved with successive mergings once substructure determination was feasible, and measures of anomalous signal strength (fig. S2) and diffracted intensity precision (fig. S3) were appreciably enhanced for merged data versus individual data sets. Most important, substructure determinations by SHELXD (21) were robust in all cases after data merger (Fig. 2, A, C, E, and G) and resulting phases were accurate as evidenced by Bijvoet-difference peak profiles (Fig. 2, B, D, F, and H) and electron density maps (fig. S4). Each of the multicrystal electron density distributions supported automated chain tracing at a high level of completion (77 to 93%) (Table 1). Phasing efficacy increased with the number of crystals and with multiplicity ordered by data wedges, but it appeared to reach an asymptotic limit (Fig. 3). Map quality continued to improve even at 150-fold data multiplicity.

Fig. 2

Analysis of anomalously scattering substructures. (Left) Profiles of SHELXD correlation coefficients (CC) between observed and calculated Bijvoet differences. (A) HK9S, 35 solutions in 100 tries. (C) CysZ, 34 solutions in 1000 tries. (E) Netrin G2, 11 solutions in 100 tries. (G) TorT/TorSS, 14 solutions in 2000 tries. Successful solutions are colored in red; random solutions are colored in blue. (Right) Bijvoet-difference Fourier peak profiles. (B) HK9S, 10 highest peaks. (D) CysZ, 40 highest peaks. (F) Netrin G2, 30 highest peaks. (H) TorT/TorSS, 50 highest peaks. For each experiment, ordered peak-height profiles are shown for maps from each single-crystal data set (identified by inset keys) and for the map from the merged data set (red in all cases). Peak heights are given in units of root-mean-squared deviation over the entire respective Fourier syntheses.

Fig. 3

Phasing efficacy dependence on data multiplicity. Phasing efficacy is measured here by the map correlation coefficient (mapCC), and average data multiplicity is given in the order of wedges of accumulated diffraction data. Each individual data set was divided into wedges of sequentially measured frames (6 wedges for HK9S, 6 wedges for CysZ, 10 wedges for netrin G2, and 9 wedges for TorT/TorSS), and these data were then merged wedge by wedge. Accumulations from these wedges were then used for native-SAD phasing based on substructures obtained before from all data. For each accumulation, experimental electron densities after density modification were compared with model electron densities. Resultant mapCCs for these wedge structures are plotted with respect to multiplicity for the respective accumulations: HK9S (red), CysZ (green), netrin G2 (magenta), and TorT/TorSS (blue). Fittings are with an asymptotic formula described in supplementary text.

The four crystal structures determined here are shown as ribbon diagrams in Fig. 4, including the sulfur, chlorine, calcium, and sulfate substructures identified and used in the SAD analyses. These structures range from 127 to 1148 ordered residues, are in crystal symmetries from monoclinic to tetragonal, and have resolution limits from 2.8 to 2.3 Å. Full descriptions of the structures and associated biology are being published elsewhere, including the structure of netrin G2 as refined at higher resolution (22) and the prior structure of TorT/TorSS (23). Each structure was unknown previously; both HK9S and netrin G2 had resisted structure determination by other methods; CysZ is an unprecedented kind of membrane protein; and the TorT/TorSS complex is both large and at modest resolution (2.8 Å). The range of these applications indicates substantial generality and robustness.

Fig. 4

Native-SAD crystal structures. (A) HK9S, a substructure of 3 S atoms and 1 Cl atom defined 127 ordered residues at 2.3 Å resolution. (B) CysZ, a substructure of 20 S atoms, 4 Cl atoms, and 1 SO4 defined 453 ordered residues at 2.3 Å resolution. (C) Netrin G2, a substructure of 26 S atoms, and 1 Ca atom defined 312 ordered residues at 2.3 Å resolution. (D) TorT/TorSS, a substructure of 28 S atoms and 3 SO4 defined 1148 ordered residues at 2.8 Å resolution. Each molecular oligomer or complex is shown as a ribbon diagram with those residues in the asymmetric unit colored orange. Anomalously scattering substructures are shown as spheres with sulfur atoms in magenta, chloride atoms in green (HK9S and CysZ), sulfate ions in yellow and magenta (CysZ and TorT/TorSS), and the one calcium ion in red (netrin G2).

Although the multicrystal native-SAD procedure was effective as implemented here, our experimental setup was not ideal. These results are from a bending-magnet beamline on a second-generation synchrotron source. As suggested by fig. S1 and table S1, substantially better performance can be expected by microdiffraction at an advanced undulator beamline that is optimized for low-energy experiments, including the provision of a detector designed to capture high-angle diffraction effectively. Improvements in scaling and weighting procedures can also be anticipated.

Notwithstanding prospects for improvement, we suggest that multicrystal-SAD phasing can facilitate robust de novo determination of native proteins and nucleic acids at a level rivaling that from the use of selenomethionyl proteins at present. The TorT/TorSS complex has a size (~1200 residues) and a resolution limit (2.8 Å) that are at a complexity level exceeding more than 90% of entries in the current PDB. Although difficulties may arise in obtaining sufficient numbers of adequate crystals to support multicrystal determinations, advantages will accrue in being liberated from the need for derivatization or selenomethionine incorporation.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S4

Tables S1 to S10

References (2443)

References and Notes

  1. Acknowledgments: We thank R. Abramowitz and J. Schwanof for help with synchrotron data collection. This work was supported in part by cooperative agreement GM095315 from the Protein Structure Initiative (CysZ), and by NIH grants GM034102 (HK9s and TorT/TorSs) and GM062270 (netrin G2). Beamline X4A of the NSLS at Brookhaven National Laboratory, a U.S. Department of Energy facility, is supported by the New York Structural Biology Center. Accession codes for PDB deposits are listed in Table 1. DNA constructs and cell lines may require a material transfer agreement with Columbia University.
View Abstract

Navigate This Article