Research ArticlesChemistry

Unequivocal determination of complex molecular structures using anisotropic NMR measurements

See allHide authors and affiliations

Science  07 Apr 2017:
Vol. 356, Issue 6333, eaam5349
DOI: 10.1126/science.aam5349

Picking structures out of a lineup

Pharmaceutical research relies critically on determining the correct structures of numerous complex molecules. When well-ordered crystals are not available for x-ray analysis, nuclear magnetic resonance (NMR) spectroscopy is the most common structure-elucidation method. However, sometimes it is hard to distinguish isomers with similar spectra. Liu et al. showcase a protocol that combines computer modeling with anisotropic NMR data acquired using gel-aligned samples. Because of its uniform sensitivity to relative bond orientations across the whole molecular framework, the method overcomes common pitfalls that can lead to invalid structure assignments.

Science, this issue p. eaam5349

Structured Abstract

INTRODUCTION

Single-crystal x-ray diffraction studies represent the gold standard for unequivocal establishment of molecular structure and configuration. For molecules that will not crystallize or that form poorly-diffracting crystals, alternative methods must be used. Crystalline sponges and atomic force microscopy are techniques with increasing potential, although nuclear magnetic resonance (NMR) spectroscopy methods provide the primary viable alternative means to determine molecular structures. However, misinterpretation of NMR data—as a result of poor data quality, inappropriate experiment selection, or investigator bias—has led to burgeoning numbers of structure revision reports. Clearly, the development of a method to more effectively use NMR data and simultaneously quell reports of incorrect structures would be highly beneficial.

RATIONALE

Combining computer-assisted structure elucidation (CASE) algorithms and density functional theory (DFT) calculations with measured anisotropic NMR parameters, specifically residual dipolar coupling (RDC), and residual chemical shift anisotropy (RCSA) holds strong promise as an effective alternative means of assigning three-dimensional (3D) molecular structures. Anisotropic NMR data provide a spatial view of the relative orientations between bonds (RDCs) and chemical shielding tensors (RCSAs), regardless of the separation between the bonds and atoms, respectively. Hence, these data are sensitive reporters of global structural validity. The combination of DFT calculations and anisotropic NMR data represents an orthogonal approach to conventional NMR data interpretation that is not subject to the interpretational biases of human investigators and, as such, mitigates the risk of incorrect structure assignments.

RESULTS

Anisotropic NMR data can be used directly to evaluate the validity of investigator-proposed structures or can be combined with a CASE program in conjunction with DFT calculations for both structural proposal and validation. The RDC data are typically used to structurally define C-H bond vectors, whereas the RCSA data report on the chemical shift tensors of both protonated and nonprotonated carbons, the latter only accessible by long-range RDC data that are difficult to measure and interpret. These data are used to evaluate a given structure proposal on the basis of the agreement between the experimentally measured data and theoretical values calculated for the corresponding 3D DFT models. When structures generated by a CASE program are being considered, the method only requires a multidimensional NMR data set of sufficient quality and sophistication to allow the CASE program to generate a set of proposals that contains the correct structure of the molecule. The molecules being studied should also be amenable to modern DFT calculations for 3D model building. The CASE program output is sorted on the basis of cumulative error between experimental and calculated 13C data for the ensemble of structures generated, and the best-fitting molecules are subsequently subjected to DFT calculation for analysis. Results obtained using the proposed method demonstrate its applicability to a diverse range of complex molecules, each of which challenged the investigators originally reporting the structures.

CONCLUSION

The technique described here represents a potential paradigm shift from conventional NMR data interpretation and can provide an unequivocal and unbiased confirmation of interatomic connectivity and relative configuration for organic and natural product structures.

The principle of residual dipolar coupling (RDC)–based model differentiation is shown using aquatolide as an example.

The revised structure of aquatolide is shown on the top left, with the originally reported structure shown on the bottom left. The selected C-H bond vectors in the two structures have different orientations, as is evident after translating them to the same origin in the middle diagrams. Theoretical RDC values associated with these vectors can be calculated for each model on the basis of the experimentally determined alignment tensor. Correlation data are shown for only the four highlighted CH groups, although the alignment tensor was actually determined using all available data. The originally proposed (incorrect) structure clearly shows poorer agreement between the calculated and experimental data.

Abstract

Assignment of complex molecular structures from nuclear magnetic resonance (NMR) data can be prone to interpretational mistakes. Residual dipolar couplings and residual chemical shift anisotropy provide a spatial view of the relative orientations between bonds and chemical shielding tensors, respectively, regardless of separation. Consequently, these data constitute a reliable reporter of global structural validity. Anisotropic NMR parameters can be used to evaluate investigators’ structure proposals or structures generated by computer-assisted structure elucidation. Application of the method to several complex structure assignment problems shows promising results that signal a potential paradigm shift from conventional NMR data interpretation, which may be of particular utility for compounds not amenable to x-ray crystallography.

Single-crystal x-ray diffraction is unquestionably the gold standard in structure elucidation. However, a crystal that diffracts well is not always easily obtained, and many compounds, especially natural products, are isolated and purified in quantities too small for conventional crystallization screening. For some molecules that will not crystallize directly, crystalline sponges offer an exciting possibility, although currently, the sample-soaking step requires careful optimization and serendipity on a per-molecule basis (1). On a different front, atomic force microscopy with atomic resolution (2) has reached the breakthrough point of enabling visualization of individual atoms, but the current state of the art requires that the molecule under study adopt a planar structure for quality resolution (3, 4).

In the realm of spectroscopic methods, nuclear magnetic resonance (NMR) is the primary technique for full structure elucidation. However, conventional NMR has inherent limitations subject to the interpretational biases of the investigator. A recent SciFinder search under the keywords “structure revision” revealed >1200 reports, including 39 in 2016, with five of these examples appearing in a single week in Organic Letters and the Journal of Organic Chemistry (59). Clearly, a more objective and robust protocol for NMR structural determination not hampered by interpretational difficulties is highly desirable.

Traditional NMR structural elucidation is essentially a reverse-engineering process—i.e., one that deduces the actual structure in a puzzle-solving fashion, starting from various pieces of experimental data. For example, proton and carbon chemical shifts are first measured, providing information on the numbers of atoms and types of functional groups. Next, experiments based on scalar (J) couplings, such as homonuclear correlation spectroscopy (COSY) and heteronuclear multiple-bond correlation (HMBC), are used to establish connectivities between these groups. Distance constraints afforded by nuclear Overhauser effect spectroscopy (NOESY) and rotating frame Overhauser effect spectroscopy (ROESY) can also be employed to further facilitate the assembly of connectivity networks and to define a three-dimensional (3D) configuration. Unfortunately, prejudices of the investigator can intrude at any point in this process.

Here we describe an orthogonal check on the validity of molecular structure assignments based on anisotropic NMR parameters, namely residual dipolar coupling (RDC) (1012) and residual chemical shift anisotropy (RCSA) (1316). These data provide 3D information on relative orientations of different bonds and chemical shielding tensors in the molecule. Because RDC and RCSA data are insensitive to the distances between the bonds and atoms, respectively, they reveal angular relationships of various structural elements from all positions of a molecule and therefore reflect the correct overall structure without being subject to investigator bias.

Fundamental principles of combining CASE and DFT calculations with RDC and RSCA measurements

Over the past decade, advances in computer algorithms (1719) and quantum chemistry computational methods (2023) have underpinned a trend in structural elucidation that could be termed “forward engineering.” Rather than striving to assemble the correct structure in a single stroke using all available information, all possible structures consistent with the ensemble of available experimental data are first assembled by a computer-assisted structure elucidation (CASE) algorithm using homo- and heteronuclear shift correlation data (1719). Next, theoretical values associated with critical measurements (i.e., 1H and 13C NMR chemical shifts) are calculated from each of the proposed structural candidates through density functional theory (DFT). Structures generated by the program are finally sorted on the basis of the congruence of the experimental and calculated 13C chemical shift data, leading to the selection of the best candidate or candidates.

Clearly, the CASE software undertakes some of the reverse-engineering task in its black-box suite of algorithms. However, in the approach presented here, we are not so concerned with obtaining the exact structure, but more with covering sufficient chemical space so that the correct structure is contained in the ensemble of structural proposals generated by the program. Inclusion of additional long-range 1H-13C correlations absent in conventional HMBC experiments but available from newer NMR experiments such as LR-HSQMBC (long-range heteronuclear single quantum multiple-bond correlation) (24), and 1/n-bond 13C-13C correlations available from 1,1/n-HD-ADEQUATE (adequate sensitivity double-quantum spectroscopy) experiments (2527), can greatly facilitate candidate generation or narrow down the range of candidates in the CASE output (28, 29).

As an alternative to using a CASE algorithm, candidate structures deduced by an investigator can also be examined using anisotropic NMR data. Examples of both CASE-generated structures and those deduced by an investigator will be discussed.

Typically, the top candidates from the CASE program or those deduced by the investigator are subjected to DFT geometry optimization to obtain 3D structures. These calculations can be a rate-limiting step if too many candidates are to be evaluated; however, on modern computational clusters, they can often be completed in a matter of hours, and optimization with an extensive conformer search can be completed for a dozen candidates within 2 to 3 days. With the 3D models in hand, experimentally observable parameters such as chemical shifts, scalar (J) couplings, NOE and ROE patterns, and vibrational circular dichroism (VCD) and/or infrared spectra can be calculated. Although these data can potentially serve as critical measurements, all have well-known limitations. For example, structural information from chemical shifts, scalar (J) couplings, and NOE and ROE correlations are inherently local, as these phenomena arise mostly through short-range interactions; the readout from VCD is a cumulative effect from all stereogenic centers in a molecule, so that diastereomeric differentiation can be very challenging for compounds possessing multiple chiral centers.

For the reasons above, we propose here an alternative, orthogonal method that provides a check on the global structure in an objective manner. Anisotropic NMR parameters, RDCs (1012), and RCSAs (1316)—used here for critical structure and configuration assessment—arise from well-known NMR phenomena. RDCs and RCSAs are observed only when a molecule is partially aligned, usually with the aid of an alignment medium such as a constrained polymeric gel. A schematic of how a molecule might orient with a polymeric gel stretched along the B0 axis of the magnetic field is shown in Fig. 1. Experimentally, RDC values are extracted from the difference of two measurements that yield the heteronuclear 1JCH coupling constant in isotropic solution and the total coupling constant 1JCH + 1DCH in an anisotropic medium, respectively. The 1DCH component of the total coupling constant measured in an anisotropic medium is the RDC, which can be either positive or negative—that is, the total coupling constant measured in anisotropic media can be either larger or smaller than the value of 1JCH. The commonly measured one-bond 13C-1H RDC data, for example, report on the relative orientations of different C-H bonds in the molecule. This information forms the basis of RDC-based structural differentiation. The corresponding bond-bond orientation relationships are invariably different between the correct and incorrect structures. Whereas the theoretical RDC values based on a correct structure must agree with experimental measurements, the same is not true for an incorrect structure. Such agreement can be qualitatively assessed by a theoretical versus experimental correlation plot or numerically measured by the quality (Q) factor (30). A low Q factor indicates good agreement between theory and experiment.

Fig. 1 Illustration of partial molecular ordering at the gel polymer surface.

The polymer structure (left) represents a poly(methyl methacrylate) (PMMA) filament. The filament is displayed along B0, demonstrating the excess projection along this direction in the actual NMR experimental setup. The rotation of the analyte (center) is restricted at the polymer surface due to steric occlusion, leading to partial molecular alignment. The experimentally determined principal axis frame of the alignment tensor, a particular molecular frame in which the alignment tensor is diagonalized, is displayed as the yellow dashed frame on the far right. The most ordered axis, Pz, can frequently be understood intuitively on the basis of the molecular shape. For example, in aquatolide, the most ordered molecular axis Pz is, as expected, roughly perpendicular to the planar envelope of this nearly flat molecule.

A complementary anisotropic NMR parameter, RCSA, has the advantage of providing structural information for carbons that are not bonded to hydrogen. Structural information for these carbons was previously only obtainable with challenging-to-acquire long-range RDC measurements (3133). Due to difficulties in the elimination of isotropic contributions to the RCSA measurement and other issues, the RCSA measurements were only recently applied to small molecules (1316). In the most current report, two alternative methods for successfully measuring RCSAs in stretched and compressed gels have been described, along with the means of eliminating any isotropic chemical shift change that otherwise contaminates RCSA measurement (16). As the measurement of RCSA is operationally somewhat more demanding than RDCs, any satisfactory RCSA-enabling techniques usually facilitate RDC measurements as well, although the reverse is not always true. As RCSA and RDC can provide complementary structural information, the measurement of both is advisable when RCSA data are being acquired. Unlike RDCs, which inform on relative bond orientations, RCSAs report on the relative orientations of different chemical shielding tensors in the molecular structure. In terms of their application in structure elucidation, RCSAs share a similar utility to that of RDCs.

The principle for structural differentiation based on RCSAs is illustrated in Fig. 2 and is comparable to RDC-based analysis, except that the bond vectors are now replaced by chemical shielding tensors that can be accurately calculated by gauge-invariant atomic orbital (GIAO) DFT (2023). Simultaneously using both RDC and RCSA data also leads to more reliable determination of the alignment tensor, consequently enhancing the robustness of overall structure differentiation.

Fig. 2 The principle of residual chemical shift anisotropy (RCSA)–based model differentiation shown using aquatolide as an example.

The chemical shielding anisotropy tensors are shown in magenta and green for two sp2 carbons for which residual dipolar coupling (RDC) data are unavailable. In the revised (top) and the original (bottom) structures, these two anisotropy tensors have different orientations relative to each other, as clearly seen here, and relative to anisotropy tensors of the other carbons, which are not displayed for visual clarity. The revised structure is clearly favored by much better agreement between experimental RCSA data and calculated values predicted using alignment-tensor parameters determined from all data, including both RCSA and RDC. Correlations for carbons whose chemical shielding anisotropy tensors are not displayed are shown in tan.

Recently, we reported the application of a combination of RDC and RCSA measurements in establishing the stereochemistry of the natural product homodimericin A (29). During the course of that work, it became obvious that the combination of RDCs and RCSAs provides a powerful and orthogonal means of confirming not only the relative configuration of a given stereocenter but also the overall molecular structure and atomic connectivity of the molecule under study. Combined analysis of RDC and RCSA data provides an independent means of confirming or refuting a proposed structure or choosing among alternative structures generated by a CASE program. The confluence of capabilities embodied by CASE methods, DFT calculations, and now the relatively facile measurement of anisotropic NMR parameters has facilitated the development of a generally applicable method for the definition of molecular structure and configuration that should help to address the growing and general problem of structural mischaracterization.

Application to cryptospirolepine

The structure proposed for cryptospirolepine, 1, first reported in 1993, was based solely on COSY, 1H-13C heteronuclear multiple-quantum coherence, and optimized 8-Hz 1H-13C HMBC data (34). In 2002, the original NMR sample was examined chromatographically and found to contain 26 components with none of 1 remaining. The two major components, 2 and 3, were isolated and fully characterized (35). Identification of the major degradants strongly suggested that the original structure report was incorrect (Fig. 3). Finally, in 2015, with the use of 1.7-mm MicroCryoProbe technology and the newly developed 1,1-HD-ADEQUATE experiment (27), it was possible to revise the structure to 4.

Fig. 3 Evolution of the cryptospirolepine structure.

The originally reported structure, 1 (34), was found to degrade to two major compounds, 2 and 3, in 2002 (35). Although the formation of 2 could be mechanistically rationalized, the formation of 3 could not, which suggests that the originally reported structure, 1, was likely incorrect. Using 1.7-mm MicroCryoProbe capabilities in conjunction with the recently reported 1,1-HD-ADEQUATE experiment allowed revision of the structural assignment to 4 (27). The formation of both 2 and 3 can be mechanistically rationalized from 4.

Although cryptospirolepine, 4, is racemic, the molecule provides a useful sample for assessing the structural validation capability of RDC and RCSA data, which are measured in achiral alignment media that are insensitive to absolute configuration. The comparison is presented in Fig. 4. Even a cursory inspection reveals that the correlation between theoretical and experimental values for the revised structure, 4 (panel B), is considerably stronger than for the original structure, 1 (panel A). This difference is also reflected in a much lower Q value of 0.122 for 4, in comparison with the value of 0.245 for 1. There are three strongly coupled pairs of resonances in the 500-MHz 1H NMR spectrum of 4 that can potentially undermine RDC measurement accuracy (36). After removing the corresponding RDC data, the Q value for 4 decreased to 0.082, whereas that for 1 was 0.217, still providing ample basis for differentiating the correct structure, 4, from the erroneous original structure, 1 (see supplementary materials).

Fig. 4 RDC and RSCA analysis for structural assignment of cryptospirolepine.

(A) Plot of the experimental versus calculated RDC (red) and RCSA (blue) data for the incorrect structure of cryptospirolepine reported in 1993 (34). Q = 0.245. (B) Plot of the experimental versus calculated RDC and RCSA data for the revised structure of cryptospirolepine (4) reported in 2015 (27). Q = 0.122.

In this particular case (Fig. 5), the bottom half of the structure (green) is actually identical in structures 1 and 4 (correlation Q factors of 0.16 and 0.13, respectively), but the top half (red) is substantially different (correlation Q factors of 0.29 and 0.12 for 1 and 4, respectively). The fact that the poor agreement in 1 is localized to the top half further confirms the location of the structural assignment mistake.

Fig. 5 Division of RDC and RCSA data on the basis of their structural locations narrows the structural error in 1 to the top half.

The correlation associated with the common green part is nearly identical between 1 (A) and 4 (B), whereas the correlation associated with the diverging red part is a substantially poorer fit in 1 than in 4, which indicates that the red part in 1 contains the structural error. Blue denotes nitrogen atoms.

Application to spiroketal rearrangement products

We recently collaborated on a study that involved the rearrangement of a spiroketal molecule triggered by an enol-ether epoxidation (37). Because of the indeterminate nature of the number of bonds spanned by HMBC correlations, it was not possible to assemble the structure from the normal ensemble of NMR data [COSY, heteronuclear single-quantum coherence (HSQC), and HMBC]. Resorting to computer-assisted structure elucidation reduced the number of structures consistent with the data to the two best choices (based on cumulative error between the experimental versus calculated 13C shift data), represented by 5 and 6. The correct structure was readily identified as 5 after the acquisition of a 40-Hz optimized 1,1-HD-ADEQUATE spectrum (27). In the event that an investigator did not have either a working knowledge of the ADEQUATE experiments or access to a spectrometer equipped with a cryoprobe with which to acquire those data, the differentiation of 5 from 6 afforded another opportunity for structure assignment using RDC and RCSA data.

Density functional theory calculations performed on both 5 and 6 suggested multiple conformers of comparable thermal energies, but the RDC and RCSA data instead suggested a single major conformer (see supplementary materials). Plotting the experimental versus the back-calculated RDC and RCSA data as above produced the results shown in Fig. 6. Although it was readily possible to identify the correct structure between the two choices using the 1,1-HD-ADEQUATE data, clearly the RDC and RCSA data independently identify the correct choice as 5 as well, based on the significant difference in the Q = 0.20 for 5 versus Q = 0.55 for 6.

Fig. 6 RDC and RSCA analysis for spiroketal rearrangement.

(A) Plot of the RDC (red) and RCSA (blue) data for the structure of the enol-epoxide rearrangement product confirmed with 1,1-HD-ADEQUATE data (27) as 5. The Q value was 0.20. (B) Correlation plot of the RDC and RCSA data for the alternative structure, 6, suggested by CASE, associated with a Q value of 0.55. RDC and RCSA data alone can clearly differentiate between the two structures.

Application to aquatolide

For a final example, we chose the natural product aquatolide (3842). The originally proposed structure, 7, incorporated a very rare ladderane moiety in the molecular framework (38). Reisolation followed by extensive NMR, quantum mechanical calculations, and x-ray crystallography led to the revision of the structure, 8 (39), which was followed in 2015 by the total synthesis reported by Saya et al. (41). A more recent study by Buevich and Elyashberg examined the use of DFT calculation of chemical shifts as a means of choosing between alternative structures suggested by a CASE program (42). In addition to the revised structure, 8, the CASE program output contained two additional structures, 9 and 10, that were considered to be potentially viable alternatives on the basis of the cumulative error between the experimentally observed and calculated 13C chemical shifts for the structures.

As is readily apparent from the RDC and RCSA data for aquatolide in dimethyl sulfoxide (DMSO)–d6 plotted in Fig. 7, the best fit is obtained for the revised structure (39, 41) for which the data are shown in panel A (Q = 0.12). In stark contrast, the fit for the originally proposed structure with the ladderane moiety is extremely poor, with Q = 0.72. The alternative structures, 9 and 10, had successively poorer correlation plots with Q = 0.23 and 0.59, respectively. Hence, as is shown in the two previous examples, a correlation plot between the experimental and back-calculated RDC and RCSA data for the model structures readily establishes which structure is in best agreement with the data.

Fig. 7 Plots of the calculated versus experimental RDC (red) and RCSA (blue) data for aquatolide candidate structures.

(A) The revised structure, 8; (B) the originally proposed incorrect structure of aquatolide, 7; and (C and D) two alternative structures, 9 and 10, respectively. Clearly there is a vast difference in the Q value for the correct structure, 8, and the originally reported structure of aquatolide, 7. The alternative structures generated by CASE, 9 and 10, had intermediary Q values of 0.23 and 0.59, respectively. A twofold difference in the Q value between the correct structure, 8, and the best alternative structure from the CASE program, 9, still allowed an ample basis for choosing between the structures.

Outlook

The suite of model compounds in this report, all of which have occasioned either incorrect structure reports or structural assignment difficulties, illustrate the power of combining modern CASE algorithms and DFT calculations with RDC and RCSA data to simultaneously define chemical connectivity and configuration as well as 3D structure. The best candidates in our examples have RDC- and RCSA-combined Q values of ~0.1 to ~0.2. A Q value in this range is typically associated with high-resolution crystallographic analysis (43). A 3D structure of this accuracy provides valuable insights into drug-target interactions and structure-activity relationships. On the other hand, a successful execution of this approach has some prerequisites when a CASE algorithm is employed to generate structures. First, the initial input for the program must contain sufficient information to allow for candidate generation within a reasonable amount of time. In many cases, proton and carbon chemical shifts and HMBC correlations suffice, but in more challenging cases, such as that of proton-deficient compounds like homodimericin A (29), long-range H-C (24) correlation and C-C correlation data will be required. Fortunately, these data are now available, even for submilligram samples, owing to new NMR pulse sequences and developments in spectrometer hardware (27). Second, the candidates generated by CASE must contain the correct structure. Our experience with the CASE program indicates that this requirement is met in most instances. In fact, a good computer algorithm has been reported to outperform a human expert, with respect to deep exploration of all structural possibilities (1719). Third, the proposed chemical structures should be approachable by DFT geometry optimization. In this work, DFT geometry optimization was conducted in Gaussian 09 at the B3LYP/6-31G(d,p) level. B3LYP works well with a small basis set such as that used in this work, and employing a large basis set causes virtually no difference in the final structure in our tests. All DFT calculations were performed in vacuum without solvent modeling, although the actual samples were analyzed in DMSO. Our previous experience shows that the inclusion of solvent effects minimally changes the optimized structure but does alter its calculated energy, which should be considered when accurate Boltzmann distribution weighting is needed among multiple rotamers of comparable energies. For compounds in this study, only a single lowest-energy rotamer was predicted, except for the spiroketal molecule. In that case, as described in the supplementary materials, anisotropic NMR data agreed best with a major rotamer that is actually not the lowest-energy rotamer according to DFT in vacuum. This discrepancy may reflect an effect of DMSO that was not taken into account, although including the implicit effect of DMSO by a polarizable continuum model did not lower the relative energy of this rotamer.

Although for some compounds even simple molecular mechanics calculations can yield 3D structures of high accuracy, it can still be quite challenging to obtain useful results from DFT calculation for other compounds—for instance, those whose structures are stabilized mostly through intra- or intermolecular hydrogen bonds, such as polypeptides. However, other structure-prediction tools, such as CS-ROSETTA (chemical shift ROSETTA), are better tailored to these needs (30). For GIAO chemical shielding calculation, the mpw1pw91 functional and the 6-31G(d,p) basis set were used, which consistently produced slightly better RCSA Q factors than the B3LYP/6-31G(d,p) combination in all tested cases. The RCSA-based analysis is more robust against GIAO-DFT inaccuracy than the chemical shift–based analysis, because for the former only the size of the prediction error relative to the overall shielding actually affects the analysis, whereas for the latter the absolute prediction error directly influences the analysis. Once a reasonable 3D model associated with each candidate is generated, whether via computational methods or investigator deduction, and chemical shielding tensors are calculated by DFT based on this 3D model, RDC and RCSA data can be employed as a sensitive critical measure to evaluate the validity of the structural assignment. The possibility of a false-positive determination—that is, agreement of RDC and RCSA data with an incorrect structure—is substantially lower (44) than that in an analysis using only conventional NMR data, especially when both RDC and RCSA are jointly used. These data can serve as a convenient NMR litmus test of structure and stereochemical validity. As such, the method described in this work has considerable potential to be widely applied, which could help to quell the flow of incorrect structures appearing in the literature.

Materials and methods

Preparation of poly-(2-hydroxylethyl methacrylate) (poly-HEMA) gel

The preparation of EGDMA (ethylene glycol dimethylacrylate) cross-linked poly-HEMA gel followed a published protocol (45), but the HEMA monomer concentration during polymerization and the cross-linking ratio were optimized specifically for use with the gel-stretching device to final values of 60% (v/v) and 0.07% (v/v), respectively. Polymerization was carried out in 1/8′′ ID FEP tubing (Cole-Parmer) at 50°C with 0.06% (v/v) V70 (Wako Chemicals USA) as the radical initiator. The polymerized gel was cut into 2-cm segments and washed three times in methanol over a period of 2 days. The gel sticks were dried on a glass surface before use.

NMR sample preparation and experiments

The NMR experiments for RDC and RCSA measurements are relatively straightforward and well documented (16, 29). The experimental time closely depends on the material availability. For example, J-resolved HSQC and 13C{1H} spectra of excellent quality can be obtained in about 15 hours each, for 1 mg of aquatolide in an unstretched HEMA gel on a 500-MHz magnet equipped with a Prodigy probe. Under the gel-stretched state, longer experiments with 50% more transients are advised to account for the reduction in active sample volume in the narrower segment of the tube, as previously described (16, 29). The compounds used in our current work, which ranged from 1 to 3 mg in quantity, took ~2 to ~3.5 days of analysis time per sample. The DMSO-compatible gel has an advantage over the chloroform-compatible gel for dilute samples in that longer experiments can be run without solvent evaporation.

The samples used for this study were either isolated and purified or synthesized as described in the primary references for the individual compounds. Resonance assignments of cryptospirolepine (34) were adopted from a previous publication (27). Samples of the spiroketal (1 mg) (37) and aquatolide (1 mg) (41) were first dissolved in 150 μl DMSO-d6 for resonance assignment prior to RDC and RCSA measurements, as the original chemical shift assignments were carried out in different solvents. NMR resonance assignments were completed using a combination of 1H, 13C, COSY, 1H-13C HSQC, and 1H-13C HMBC experiments.

To prepare the gel sample, the test compound was dissolved in 350 μl DMSO-d6 with the addition of 5 μl tetramethylsilane for carbon chemical shift referencing, and a gel stick was added in a horizontal position to swell for a period of 3 days. The fully swollen gel was then transferred to a gel-stretching device with inner diameters of 4.2 and 3.2 mm for the wide and narrow sections, respectively, as described previously (16). For RDC measurements, the J-resolved BIRD-HSQC experiment with F2 homonuclear decoupling (HD-J-HSQC) (46) was utilized with an F1 acquisition time ranging from 256 to 312 ms, an F2 acquisition time of 120 ms, and a recycling delay of 1.5 s. For the spiroketal and aquatolide, some signals overlapped with signals from the gel polymer, so the F2-coupled CLIP-HSQC experiment (47) was employed to obtain coupling data for those overlapped resonances, as well as couplings of individual C-H vectors of methylene groups. (Note that the HD-J-HSQC experiment only provides the sum of two C-H couplings for anisochronous protons of a methylene group.) An F1 acquisition time of 12 ms, an F2 acqusition time of 500 ms, and a recycling delay of 1.5 s were used in the F2-coupled CLIP-HSQC experiment; the aromatic regions were folded in the F1 dimension to conserve spectrometer time. All NMR experiments were conducted at 25°C on a Bruker 500-MHz spectrometer equipped with a Prodigy probe.

Supplementary Materials

www.sciencemag.org/content/356/6333/eaam5349/suppl/DC1

Supplementary Text

Figs. S1 to S10

Tables S1 to S9

References (49, 50)

Data S1

References and Notes

  1. In an unpublished study on a nine-residue cyclized peptide, aureobasidin, out of 100,000 rotamers generated by a diversity-orientated conformer search, three incorrectly folded structures were identified that had Q factors of 0.33, 0.33, and 0.35, respectively. In contrast, comparison of 26 RDCs with the single-crystal x-ray structure [(48), Cambridge Crystallographic Data Center entry QEFHUE] provided a significantly better Q factor of 0.23. The average Q factor in the test set was 0.81, with a standard deviation of 0.10 (these results are illustrated graphically in fig. S10). Therefore, a false-positive rate in this case is estimated to be ~30 per million, even with a loose Q-factor cutoff of 0.35. This rate is expected to drop orders of magnitude further if carbonyl RCSA data are also included.
Acknowledgments: This work was funded, in part, by NIH grant GM086258. Experimental data are available in supplementary materials. We thank K. Lexa for providing the coordinates of 100,000 peptide conformers for the false-positive rate estimation (44) and fig. S10.
View Abstract

Navigate This Article