Concise total synthesis of glucosepane

See allHide authors and affiliations

Science  16 Oct 2015:
Vol. 350, Issue 6258, pp. 294-298
DOI: 10.1126/science.aac9655

Getting a handle on a cross-linking motif

Although protein backbones consist exclusively of amino acids, various other molecules in the cell often get latched on afterward in a process termed posttranslational modification. In one such motif, called glucosepane, the side chains of lysine and arginine form a condensed cross-link through a reaction sequence with glucose. Formation of this cross-link is of interest in diabetes research. Draghici et al. now report a chemical synthesis of glucosepane outside the broader environment of a surrounding protein (see the Perspective by Boger). This synthesis should facilitate more precise characterization of the structure and function of the motif in vivo.

Science, this issue p. 294; see also p. 275


Glucosepane is a structurally complex protein posttranslational modification that is believed to exist in all living organisms. Research in humans suggests that glucosepane plays a critical role in the pathophysiology of both diabetes and human aging, yet comprehensive biological investigations of this metabolite have been hindered by a scarcity of chemically homogeneous material available for study. Here we report the total synthesis of glucosepane, enabled by the development of a one-pot method for preparation of the nonaromatic 4H-imidazole tautomer in the core. Our synthesis is concise (eight steps starting from commercial materials), convergent, high-yielding (12% overall), and enantioselective. We expect that these results will prove useful in the art and practice of heterocyclic chemistry and beneficial for the study of glucosepane and its role in human health and disease.

Posttranslational modifications (PTMs) of proteins are responsible for a number of critical functions, ranging from acceleration of protein folding to mediation of protein-protein interactions (1). Protein glycation is a nonenzymatic process for PTM formation wherein protein side chains react spontaneously with open-chain tautomers of carbohydrates. Mounting evidence suggests that protein glycation adducts [also called advanced glycation end products (AGEs)] are critically involved in both healthy and disease processes, including inflammation, diabetes, cancer, and normal human aging (2, 3). AGEs often possess highly complex chemical structures, impeding their detailed chemical and biological characterization (4).

Glucosepane (1) is an AGE formed as a cross-link from reaction sequences between arginine and lysine side chains and one equivalent of hexose carbohydrate, most commonly glucose (Fig. 1). Glucosepane is present on long-lived proteins in the human body, such as collagen and lens crystallin (3, 5), and is also found in high levels in various dietary sources, especially alkali-treated baked goods (6). Researchers have speculated that glucosepane is directly involved in the pathophysiology of various conditions (e.g., diabetes, diabetes-related complications, and aging), owing to the patterns of glucosepane formation on disease-associated proteins. For example, via analysis of skin biopsies obtained through the Diabetes Control and Complications Trial, Monnier et al. have determined that increases in skin glucosepane levels represent an independent risk factor for the onset of diabetic nephropathy, retinopathy, and neuropathy (3, 7). Additional studies have demonstrated that nonenzymatic glucosepane cross-links outnumber enzyme-catalyzed cross-links in the collagen of people over 65 years of age (8). By age 100, glucosepane levels reach 2 nmol/mg of collagen, almost 10 times the normal level, and levels in diabetic patients can reach up to 20 times those in healthy controls (9, 10).

Fig. 1 Glucosepane and the 4H-imidazole.

(A) Chemical structure of the protein-bound glucosepane cross-link, depicting both nonaromatic 4H-imdazole (1) and aromatic 1H-imidazole (2) tautomers. (B) Proposed biosynthetic pathway for glucosepane cross-links. (C) Retrosynthetic analysis employed in this work for glucosepane total synthesis.

Despite glucosepane’s health implications, biological investigations have been hampered by a scarcity of chemically homogeneous material available for study. Its complex nonenzymatic biosynthesis involves serial tautomerizations of Amadori adduct 3 to provide glucosone 4 (a process known as carbonyl mobility) (Fig. 1B). During this process, each stereocenter undergoes epimerization, and therefore the glucosepane core exists naturally as a mixture of all eight possible diastereomers (3, 11). These stereoisomers can only be chromatographically resolved into four binary mixtures, each putatively containing two spectroscopically indistinguishable diastereomers with the same relative configuration at the 6, 7, and 8a positions but opposite absolute configurations with respect to the enantiomerically pure backbone amino acids (11). Despite substantial effort, purification of stereochemically homogeneous glucosepane from biological samples has proven elusive. It is therefore unknown which of the eight stereoisomers is the most prevalent in vivo. Furthermore, these binary diastereomeric mixtures can only be isolated in low yields (0.2 to 1.4%) following model reactions between lysine, arginine, and glucose, and via extensive chromatographic purification (5, 11). Because of these difficulties in purification, antibody reagents that would enable biological detection of glucosepane in unprocessed tissue preparations are unavailable. Published investigations into glucosepane biology have thus had to rely on time-consuming extraction protocols involving exhaustive enzymatic hydrolysis followed by high-performance liquid chromatography (HPLC) separation. The development of synthetic routes toward chemically defined glucosepane constructs represents an essential next step toward our understanding of the roles that this compound plays in human health and disease and also toward the identification of associated therapeutic and/or diagnostic agents.

Glucosepane incorporates a stereogenic polyol motif within a fused hetero-bicyclic constitution, an epimerizable stereocenter at C-8a, and an arginine-derived 4H-imidazole at its core. At first glance, one would expect glucosepane to tautomerize spontaneously to the corresponding aromatic 1H-imidazole (Fig. 1A); however, reported structural assignments of the 4H-imidazole in glucosepane are consistent with one- and two-dimensional nuclear magnetic resonance (NMR) data reported by Biemel et al. (11). Furthermore, because glucosepane forms naturally as a protein adduct (not as the free bis–amino acid cross-link), any useful synthesis needs to be compatible with glucosepane incorporation into peptides. Also, because glucosepane is formed naturally as a mixture of all eight possible diastereomers, synthetic efforts targeting both enantio- and diastereomerically pure material are essential for detailed biochemical study.

In analyzing the glucosepane core, we were first intrigued by its reported tendency to adopt a 4H-imidazole constitution rather than that of the corresponding aromatic 1H-imidazole. We therefore set out to investigate this tautomeric preference through theoretical calculations performed on model compounds designed to mimic the glucosepane core and protonation state (Table 1). Although unsubstituted imidazolium (entry 1) greatly prefers the aromatic arrangement, addition of methylamino substituents to the 2 or 5 positions (entries 2 and 3) decreases this preference substantially. The 2,5-diamino–substituted derivative (entry 4), which contains the same substitution pattern as glucosepane, exhibits a strong preference for the nonaromatic tautomer. This trend may be partly explained by the decreasing aromatic stabilization of imidazole upon addition of electron-donating substituents to the 2 and 5 positions, as indicated in prior work and additional calculations provided in the supplementary materials (fig. S5) (12). This model is further supported by geometry minimization experiments (tables S10 and S11), which demonstrate for the 2,5-diamino imidazole system that the substituent at the 5 position is rotated such that N lone pairs only partially overlap with the heterocycle’s π system. The inability of electron-donating substituents at this position to delocalize into the imidazole ring may drive a decrease in stabilization energy, as well as a tendency to tautomerize into the 4H-imidazole, which permits such delocalization. We hypothesize that in the setting of 5- and 2,5-diamino imidazoles, the decrease in aromaticity does not afford a sufficiently high degree of energetic stabilization. In the 4H-imidazole tautomer, on the other hand, electron-donating amines can contribute extensively to resonance stabilization.

Table 1 Results from DFT calculations comparing energies of various tautomerization states of protonated imidazoles.

ΔG, Gibbs free energy; Me, methyl.

View this table:

With this information in mind, we constructed our retrosynthesis (Fig. 1C). We reasoned that, given the strong thermodynamics driving the core heterocycle’s tautomerization state, formation of C–N bonds between the arginine guanidine and the lysine-derived azepane (6 + Arg5) would be accompanied by spontaneous isomerization to the correct structure. Therefore, we first chose to disconnect at the two C–N bonds endocyclic to the imidazole motif. This is the same disconnection suggested by Biemel et al. for the final step in the biosynthesis of glucosepane (11), wherein arginine is proposed to condense directly with an α-keto iminium intermediate (6) formed from an adduct derived from lysine and glucose (Fig. 1B). We then reasoned that 6 could be generated through N-fluorination and regioselective elimination of fluorine from azepane 7. In turn, azepane 7 could be deconstructed via an Amadori rearrangement sequence to a suitably protected lysine derivative and known epoxide 8 (13). In this sense, 8 would serve as the source of the chiral diol encountered in glucosepane diastereomer 5. As it is unknown which stereoisomer(s) of glucosepane are most prevalent in vivo, 8 was chosen because it reflects the stereochemistry of glucose, which is the most common precursor in vivo. In future studies, simply inverting the C-6 and C-7 stereocenters of the starting epoxide 8 would then permit access to other reported diastereomers.

Our synthesis began with epoxide 8, which was prepared from diacetone-d-glucose as previously described (Fig. 2) (13, 14). Nucleophilic addition of Dod-protected lysine derivative 9 to the less-substituted side of the epoxide in 8, followed by acidic deprotection of the resulting tertiary amine, provided amino alcohol 10 in 80% yield over two steps. Acetonide removal in the presence of aqueous acetic acid then afforded azepane acetal 13. The conversion of 10 to 13 proceeded by way of intramolecular attack of the lysine amino group onto the anomeric carbon of the carbohydrate with accompanying acetonide loss to give bicyclic intermediate 11 (15). This material spontaneously underwent Amadori rearrangement (16, 17) to give an intermediate ketone 12, which was then trapped intramolecularly by the C-6 hydroxyl to produce bridged bicyclic acetal 13 in 60% yield. Reinstallation of the acetonide group proceeded with reformation of the ketone functionality to afford the desired protected ketone (14).

Fig. 2 Preparation of the lysine-derived azepane intermediate (14).

iPr, isopropyl; PPTS, pyridinium p-toluenesulfonate; Cbz, carboxybenzy.

Access to 14 set the stage for oxidation-trapping attempts, outlined in our retrosynthesis. Although we were able to achieve the desired α-keto iminium intermediate (16) by way of oxidation with 1-(chloromethyl)-4-fluoro-1,4-diazoniabicyclo[2.2.2]octane ditetrafluoroborate (Selectfluor), this material rapidly underwent ring contraction to produce aldehyde 17 (Fig. 3A). All attempts to condense 17 with guanidine derivatives, including various protected forms of arginine, were met only by complete decomposition of 17 and recovery of the guanidine nucleophile. Furthermore, attempts to perform oxidation and guanidine trapping in one pot were also unsuccessful, providing similar results to the two-step process.

Fig. 3 Synthesis of glucosepane’s 4H-imidazole core.

(A) Attempted oxidation-trapping sequence. (B) Reformulation of the synthetic sequence in terms of sequential sigmatropic rearrangement and cyclization reactions. (C) Mechanistic details of the proposed sigmatropic rearrangement-cyclization sequence.

In light of these observations, we decided to reengineer our synthetic strategy. Though we were encouraged by our ability to access α-keto iminium 16, the inability of this species to undergo intermolecular trapping suggested that perhaps condensation with the arginine guanidine functionality succeeds in vivo because of proximity effects. In other words, cross-linking is only likely to occur for proteins in which an appropriately modified lysine residue is directly adjacent to the attacking arginine, rendering the process functionally intramolecular (even for proteins such as collagen, in which intermolecular cross-linking is accelerated due to the high local concentration of reactive side chains) (3, 18). Hence, for the reaction to be successful, the nucleophilic (guanidine) and electrophilic (iminium) components must be tethered together at the time of oxidation.

We recognized that an intramolecular oxidation transfer process, by way of a [3,3]-sigmatropic rearrangement from semicarbazone tautomer 19, would afford an intermediate (20) with the same core oxidation state as α-keto iminium 16 (Fig. 3B). Furthermore, 20 also contains a tethered guanidine function that is perfectly poised for subsequent intramolecular cyclization and tautomerization to afford the glucosepane core. We reasoned that simple condensation of lysine-derived ketone 14 with semicarbazide derivatives (Fig. 3C) would permit rapid access to semicarbazone 19, which is capable of tautomerizing to the desired [3,3] rearrangement substrate (20). In this sense, 20 would function as a masked version of α-keto iminium 16, possessing the correct oxidation state and functional group disposition to afford the desired 4H-imidazole 18. We were encouraged by previous reports of analogous hetero-Claisen rearrangements (1923). On the whole, we envisioned that this sequence would accomplish the goal of directly coupling the oxidation and condensation steps, thus solving the problems associated with our intermolecular-trapping sequence.

With this analysis in mind, we began with the condensation of thiomethyl semicarbazide 22 with ketone 14, which proceeded smoothly to afford semicarbazone 24 (as a mixture of E/Z isomers) in 78% yield (Fig. 4A). After several failed attempts, we discovered that treating semicarbazone 24 with excess chlorotrimethylsilane (TMSCl) in anhydrous, refluxing chloroform induced the formation of 4H-thioimidazole 26. We believe that this material forms via the pathway predicted in Fig. 3C—by way of tautomerization, [3,3]-sigmatropic rearrangement, and cyclodeamination—and is accompanied by acetonide removal, which likely results from HCl generated by aqueous quenching of excess TMSCl. Compound 26 was isolated as an epimeric mixture at C-8a, as confirmed by NMR analysis (24). Our attempts to purify 4H-thioimidazole 26 under open atmosphere led only to the isolation of C-8a–oxidized product 27. By displacing the thiomethyl group with an ornithine derivative followed by C-8a reduction using Na(OAc)3BH (OAc, acetate), we were able to access protected glucosepane 28 as a 4:1 mixture of epimers.

Fig. 4 Completion of the synthesis.

(A) Formation of glucosepane’s 4H-imidazole core using a cascade [3,3]-sigmatropic rearrangement-and-cyclization sequence. (B) Completion of the total synthesis and characterization of synthetic glucosepane. (C) 4H-imidazoles prepared using our sequence. d.r., diastereomeric ratio; Et, ethyl.

Despite this result, we sought a more concise route to intermediate 28. Replacement of 22 in this sequence with a fully elaborated arginine derivative (23) readily afforded 25 in good yield (69%), and this intermediate also underwent the desired rearrangement, cyclization, and acetonide removal sequence. Therefore, this sequence furnished fully protected glucosepane derivative 28 in a 4:1 diastereomeric ratio in a single synthetic step.

With backbone-protected glucosepane in hand, completion of the synthetic sequence proved straightforward (Fig. 4B). Global hydrogenolytic deprotection of carboxybenzyl and benzyl ester protecting groups was achieved using palladium on carbon under an atmosphere of hydrogen gas and either trifluoroacetic acid (TFA) or formic acid, enabling rapid access to 5 as either the formate or TFA salt. Although two C-8a epimers are produced in a 4:1 ratio via this route, these can be separated by preparative HPLC. Spectral data obtained from 1H and 13C NMR experiments using synthetic 5 proved identical to that reported by Lederer and colleagues for material isolated from model reactions (11). Overall, the full synthetic route proceeded in a total of eight steps and 12% overall yield.

To provide further experimental evidence that 2,5-diaminoimidazoles adopt the 4H-imidazole tautomer, we applied our newly developed rearrangement to the synthesis of two additional 2,5-diaminoimidazoles (29 and 30) (Fig. 4C) (25). These compounds were prepared in two steps from commercially available starting materials and, as expected, adopted the 4H-imidazole tautomer exclusively, based on NMR analysis. These results support our computational data and confirm that imidazoles with electron-rich amino substituents at the 2 and 5 positions are more stable as the 4H-imidazole tautomer.

With synthetic glucosepane (5) in hand, we used multidimensional NMR techniques to investigate its structural features. Two-dimensional 1H NMR nuclear Overhauser effect spectroscopy (NOESY) experiments with compound 5 revealed the presence of conformational exchange peaks (26), which we attribute to E/Z isomerization about the exocyclic C2-N bond in glucosepane. Although the original glucosepane isolation report noted the possibility of E/Z isomerism in acyclic 2-amino imidazoles, the presence of these exchange peaks in the case of glucosepane had previously been incorrectly attributed to stereoisomerism at the C-8a stereocenter (27). Using an exchange spectroscopy NOESY sequence (fig. S1) (26), we were able to calculate an approximate rate for this conformational exchange process on the order of 3 s−1 in D2O (24).

We next took advantage of glucospane’s intrinsic spectral properties to measure the pKa (Ka, acid dissociation constant) of the 4H-imidazole core (fig. S2). These experiments revealed compound 5 to possess only one basic site under aqueous conditions with a pKa of ~12, which we believe to reflect protonation at the 4H-imidazole N1 atom, consistent with both NMR data and density functional theory (DFT) calculations (24). Exposure of either epimer of 5 to D2O leads to quantitative hydrogen-deuterium exchange at the C-8a H atom, which occurs rapidly (<60 min) under basic conditions (aqueous NaOD). Taken together, these studies suggest that glucosepane contains both acidic and basic sites, the latter of which possesses a pKa quite close to that of native arginine (pKa = 12.5).

We believe that the brevity and modularity of our synthesis will render it compatible with the site-specific incorporation of glucosepane into synthetic oligopeptides, preparation of affinity reagents to identify molecular targets of glucosepane, development of immunogens for raising antibodies against glucosepane, and perhaps also the identification of novel therapeutic strategies for breaking glucosepane cross-links.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S5

Tables S1 to S11

References (2832)

Spectral Data

References and Notes

  1. The full synthetic sequence from diacetone-d-glucose to epoxide 8 is presented in detail in the supplementary materials.
  2. Analysis of the crude reactions always showed 15 to 20% of hemiaminal present, even after prolonged exposure to reaction conditions.
  3. See the supplementary materials for more information.
  4. The full synthetic sequence to 4H-imidazoles 29 and 30 is presented in detail in the supplementary materials.
  5. Acknowledgments: We thank the SENS Foundation for financial support and G. Micalizio, A. de Grey, and W. Bains for helpful discussions.
View Abstract

Navigate This Article