Research Article

Mapping the dark space of chemical reactions with extended nanomole synthesis and MALDI-TOF MS

See allHide authors and affiliations

Science  10 Aug 2018:
Vol. 361, Issue 6402, eaar6236
DOI: 10.1126/science.aar6236

A rapid screen for complex reactants

Chemists engaged in reaction discovery tend to report outcomes involving a few, relatively simple reactants. It remains a major challenge to fine-tune reported conditions when the reactants become more structurally complex, as often happens in pharmaceutical research. Lin et al. developed a protocol for rapidly screening different catalytic conditions for C–N coupling across a wide range of complex substrates. The product detection scheme relies on mass spectrometry of nanomole-scale reaction mixtures without any need for intervening chromatography.

Science, this issue p. eaar6236

Structured Abstract


The invention of new chemical reactions provides new bond construction strategies for improved access to diverse regions of structural space. However, a pervasive, long-standing bias toward reporting successful results means that the shortcomings of even mature reaction methods remain poorly defined, making practical syntheses of structurally diverse targets far from certain. Distinct tools and experimental approaches are required to expose and record the problematic structural elements that limit different synthetic methods. The experimental space required to systematically survey reaction failure is vast, and existing ultrahigh-throughput (uHT) reaction screening approaches are inadequate for exploring the diversity of conditions pertaining in modern synthetic methods. Additionally, analytical approaches must continuously improve to meet the throughput demands of this expansive reaction screening.


We report a nanomole-scale screening protocol that can be used to execute heterogeneous reactions with heating and agitation, use of volatile solvents, and capacity for photoredox chemistry. These advances in miniaturized chemistry screening were combined with the use of matrix-assisted laser desorption/ionization–time-of-flight mass spectrometry (MALDI-TOF MS), enabling analysis of 1536 reactions in ~10 min. Together, these advances create a platform that can enable systematic reaction evaluation and data capture to survey the dark space of chemical reactions.


Using the Buchwald-Hartwig C–N coupling reaction to exemplify this process, an uHT Glorius fragment additive poisons diagnostic approach was first applied to demonstrate that MALDI-MS could provide adequate data quality to monitor the formation of a single product under a wide variety of different synthetic conditions. Four catalytic methods—Ir/Ni and Ru/Ni dual-metal photoredox catalysis, as well as heterogeneous and high-temperature Cu and Pd catalysis—with extended nanomole chemistry requirements were evaluated for the synthesis of a single product in the presence of 383 structurally diverse simple and complex potential poisons. Using a normalizing internal standard that was closely related to the product and optimized operating parameters, MALDI-MS provided good correlation with existing ultra performance liquid chromatography (UPLC)–MS approaches (coefficient of determination R2 up to 0.85), allowing correct binning of “hits” and “misses” (defined as >50% product signal knockdown) up to 95% of the time. Next, the more challenging goal of exploring diverse whole-molecule C–N couplings was explored. In this case, it was not practical to have either product standards or closely related internal standards to enable analytical quantitation. A “simplest-partner test” was employed, in which 192 aryl bromides and 192 secondary amines were each coupled with a MS-active “simplest partner,” guaranteeing a somewhat normalized MS response for all products. The formation of 384 different products using the four aforementioned synthetic methods was monitored by MALDI-MS, with pass-fail binning of results correlating well with UPLC-MS in the identification of common structural elements (such as functional group counts, H-bond donors and acceptors, and polar surface area) that lead to reaction failure.


In the near future, each problematic structural element that is identified through systematic dark-space exploration can be promoted for in-depth examination to precisely define the specific parameters that determine reaction outcome at the atomic and quantum molecular level. Predictive machine learning models will use this focused data to enable synthetic practitioners to select the most appropriate reactions for use in a particular synthetic setting. In addition, functionality that persistently fails across synthetic methods can sharply define important challenges for the invention of improved chemical reactions.

Extended nanomole chemistry and MALDI-TOF MS for systematic reaction profiling.

Nanomole-scale chemistry tools that can execute a wide variety of synthetic protocols are combined with rapid MALDI-TOF MS analysis to enable broad reaction profiling to map the dark space of chemical reactivity. DMSO, dimethyl sulfoxide; DABCO, 1,4-diazabicyclo[2.2.2]octane.


Understanding the practical limitations of chemical reactions is critically important for efficiently planning the synthesis of compounds in pharmaceutical, agrochemical, and specialty chemical research and development. However, literature reports of the scope of new reactions are often cursory and biased toward successful results, severely limiting the ability to predict reaction outcomes for untested substrates. We herein illustrate strategies for carrying out large-scale surveys of chemical reactivity by using a material-sparing nanomole-scale automated synthesis platform with greatly expanded synthetic scope combined with ultrahigh-throughput matrix-assisted laser desorption/ionization–time-of-flight mass spectrometry (MALDI-TOF MS).

The field of synthetic organic chemistry has spawned a vast array of creative reactions that can be logically combined to prepare nearly any molecule. However, efficient selection of the precise reaction sequence that leads to a particular product remains a challenge, as poorly understood substrate-specific interactions often necessitate laborious screening of combinations of catalysts, reagents, and conditions. High-throughput experimentation (HTE) chemistry facilitates these investigations by increasing the pace of problem-solving (17), and an emerging strategy uses machine learning (ML) data generated with these screening tools to create predictive models for specific problems (8, 9). Data mining surveys of existing published or proprietary databases containing information on hundreds of millions of reactions have been somewhat encouraging (1014), but a pervasive, long-standing bias toward reporting successful results limits the utility of this information for model building. Additionally, these data have not been collected under controlled experimental conditions, and most of the substrates are not representative of the complexity that arises in applied synthetic problems. Hence, no existing large, structured datasets meet all of the requirements for effective reaction modeling. Consequently, intentional surveys are required to systematically map reactivity patterns, with the rapid identification of inaccessible regions, the dark space of chemical reactivity, being particularly important for focusing subsequent ML investigations.

Chemical reactivity space is vast, even for a single synthetic transformation, with permutations of possible substrate structures, catalysts, and reaction conditions soon proliferating into unmanageable experimental complexity. Large-scale reactivity surveys have thus far been limited by the throughput of reaction screening and analysis technologies. Additionally, the scarcity and high cost of relevant complex substrates requires that screening be carried out on as small a scale as possible. Existing miniaturized HTE chemistry tools have worked only under a narrow range of conditions and have thus far not been applicable to the vast majority of high-quality synthetic protocols. In this work, we describe how engineering advances in automated, miniaturized reaction experimentation and the use of fast matrix-assisted laser desorption/ionization mass spectrometry (MALDI-MS) for routine reaction analysis affords an opportunity to rapidly survey the chemical reactivity landscape. We demonstrate how these advances can be used to systematically reveal diverse sources of reaction failure within large substrate sets, information that is largely missing from current chemistry data sources, representing a notable advance toward broadly effective reaction profiling.

Extending nanomole synthesis for systematic reaction profiling

Our interest in reaction profiling has grown from an internal effort termed the C–N Coupling Initiative, which aims to systematically build tools to completely understand and leverage a given synthetic transformation. A recent examination of our electronic notebooks (ELNs) shows C–N coupling to be a hit-or-miss affair (~10,000 examples, ~35% failure rate) that, despite its potential to render druglike molecules, may be underused because of its higher demand for optimization than, for example, the Suzuki reaction (~50,000 example, 18% failure). Attempts to use this ELN dataset for creating predictive models have been disappointing, as the data are fractured, with many different classes of amines that would be expected to have different reactivity patterns (primary, secondary, NH-heterocyclic, amides, sulfonamides) while also containing deep channels of project-driven substrate similarity in the aryl halide coupling partners. Additionally, these data are collected under a wide array of synthetic protocols, with specific conditions seldom being repeated. Given this lack of a suitable dataset, a real and urgent goal of applied research is to increase the understanding of C–N coupling reactions by building an experimental process suitable for fast reaction mapping. Clearly, this systematic reaction profiling capability will also be of general value for surveying other reaction types where available knowledge is scarce.

Failure in metal-catalyzed reactions can result from a variety of causes, including inhibition of catalysts due to substrate binding, competitive side reactions, local steric and electronic interactions, and substrate decomposition caused by harsh reaction conditions. In addition, small perturbations in molecular structure may result in changes in reactivity, especially for complex substrates. We perceived that mapping reaction failure arising from this diverse patchwork of mechanisms would require a large collection of structurally diverse analogs as well as a tremendous number of experiments. Emerging nanomole-scale tools have the material-sparing miniaturized scale and ultrahigh-throughput (uHT) required for broad profiling; however, existing experimental strategies involve simplified engineering and are inadequate for examining most high-quality synthetic protocols. The first-generation nanomole synthesis platform that we developed (1) was limited to plastic-compatible, homogenous, ambient temperature reactions in low-volatility polar, aprotic solvents, encompassing only a small subset of organic transformations. The recent work by Perera et al. (2) extends nanomole synthesis capabilities in a flow chemistry format, allowing heating and the use of diverse solvents; however, this format also requires that reactions remain homogeneous, necessitating the use of very high dilution and limiting study to extremely fast kinetics that can keep up with serial high-performance liquid chromatography (HPLC) analysis. Rather than accepting such constraints, we systematically engineered and validated new plate-based nanomole synthesis tools with a general ability to carry out a wide variety of typical synthetic reactions. We identified effective chemically compatible glass microplate reactors, performed fast 384-tip dosing of reagent solutions in volatile solvents, and designed aluminum sealing blocks that can retain volatile solvents on heating. In addition, we used LabRam resonant acoustic mixing to both agitate reactions and create milky slurries of solid inorganic bases that can be added to reactions by parallel liquid handling. Finally, we created a nanomole-scale photochemistry tool to enable reaction evaluation in the rapidly growing field of photoredox catalysis. We describe the use of these tools in reaction profiling experiments below, but they are expected to benefit extensions to nanoscale chemical and biological evaluation as well (15).

MALDI-TOF MS reaction analysis

Owing to the sheer number of experiments necessary, surveying chemical reactivity can be limited by the rate of analytical measurement. Despite considerable recent progress, HPLC-MS assays typically used for such screening are constrained by the fastest speed with which samples can be mechanically taken up and injected (~10 s per sample) (16). Spectroscopic techniques can be faster but often lack the resolving power needed to study a diverse collection of molecules. High-throughput mass spectrometry provides excellent mass-to-charge ratio separation of individual products (1720), and MALDI-MS has been intensively optimized in recent years for rapid and robust molecule-specific MS imaging of biological tissue slices. The new generation of MALDI–time-of-flight (TOF) instruments equipped with a 10-kHz scanning beam laser, substantially faster X,Y stage, and faster plate loading-unloading cycle substantially improves sample throughput to hundreds of thousands of measurements per day. We were therefore intrigued by the possibility of applying this fast-analysis approach (several samples per second) to high-throughput reaction screening (HTS). Although MALDI is commonly used for high-throughput analysis, applications are typically limited to either quantitative biochemical assays that feature a single substrate or product within each well (2128) or chemistry discovery using an MS-active tag or special MALDI plate to ensure MS detection of all labeled products (2931). Neither of these approaches enabled the rapid, label-free analysis of a diverse collection of products that we required.

Direct MS analysis of small molecules can be vulnerable to interference from reaction components that would typically be removed by chromatography. To analyze samples by MALDI-TOF MS, we found a suitable ionization matrix, and instrument settings afforded excellent performance for diverse pharmaceutically relevant products, with the addition of an ionizing internal standard helping to normalize the effects of different reaction conditions on product ionization and MALDI spot formation. Screening and optimization of MALDI ionization matrices were carried out using a library of druglike informer compounds (32), leading to the identification of α-cyano-4-hydroxycinnamic acid (CHCA) as a generally preferred matrix [4 mg/ml in 50% H2O/acetonitrile (ACN) with 0.1% trifluoroacetic acid (TFA)] affording an optimal signal-to-noise ratio across a wide range of protonated model compounds. A general protocol including MALDI plate preparation, automated data acquisition, and data processing was established for MALDI analysis, in which 175 nl of quenched reaction mixture in dimethyl sulfoxide (DMSO) with added internal standard was directly spotted on an HTS MALDI target plate in 1536 format using a positive-displacement liquid-handling robot with 16 channels. In our experiments, the spotting time for a single plate was ~30 min, but MALDI plate preparation time could be shortened by using a liquid-handling robot with a 1536-channel pipetting head. The resulting DMSO on the target plate could be evaporated either under vacuum at 0.01 bar (2 min) or at atmospheric pressure (1 hour). CHCA matrix solution (150 nl) was then applied onto the dried reaction spots through the same liquid-dispensing procedure. The readout speed for each 1536 MALDI plate ranged from 8 to 11 min, depending on the number of laser shots per spectrum. All automated MALDI runs were set up, triggered, and processed using dedicated uHTS software for MALDI-TOF instruments (Bruker MALDI PharmaPulse 2.0). In the whole-molecule analysis experiment, a custom processing script was used to simultaneously extract nearly 400 masses, peak intensities, and peak areas for reaction products and the internal standard into a single spreadsheet, in addition to standard MALDI PharmaPulse processing. A signal-to-noise-ratio of 5/1 was used in the software to extract the final MALDI-MS data.

uHT fragment poisons analysis

To test our MALDI analytical approach for general reaction profiling, we began with an analytically simplified screening protocol, developed by Glorius and co-workers, for identifying reaction poisons—i.e., functional groups and reagents that interfere with a given reaction (Fig. 1) (33, 34). Because the same reaction is investigated in each experimental well, a single product standard can be used throughout, greatly simplifying analysis. We realized that the throughput of the MALDI technique could enable screening of much larger sets of steric and electronic functional variants and higher-order polyfunctional molecular fragments to precisely identify specific deleterious binding motifs and cooperative interactions. At the same time, we could evaluate the ability of the MALDI system to provide accurate product quantification in the presence of different reaction conditions and a wide variety of additives.

Fig. 1 uHT MALDI-MS approach to screening potential reaction poisons.

(A) 1536 reactions were run with four leading C–N coupling methods to evaluate the impact of 383 single and polyfunctional fragments on one simple coupling reaction, requiring 11 min of MALDI acquisition time. Ph, phenyl; DiF, difluoro; RuPhos, 2-dicyclohexylphosphino-2′,6′-diisopropoxybiphenyl; ppy, 2-phenyl pyridinyl; dtbbpy, di-tert-butyl di-pyridyl; DABCO, 1,4-diazabicyclo[2.2.2]octane; bpy, bipyridinyl. (B) Raw MALDI product response is not well correlated with UPLC-MS or UPLC-UV measures, but normalization using an internal standard reveals very good correlation. (C) Ranked, normalized MALDI product response data with corresponding UPLC-MS (EIC) data below for comparison.

We selected four different C–N coupling conditions (Fig. 1A)—Ir/Ni and Ru/Ni photoredox (35) as well as Pd and Cu conditions (32)—that were previously demonstrated to perform well using our chemistry informer libraries approach. Each method was evaluated using the new nanomole experimentation platforms in the presence of 383 polar molecular fragments (with one control reaction containing no additive) ranging from simple compounds containing a single functional group (such as amides, esters, and N-heterocycles) to more complex polyfunctional compounds. The resulting 1536 fragment-additive reactions were analyzed in a single MALDI experiment in 11 min, with the settings described above and using a deca-deuterated product analog, added to each well after reaction completion, as an internal standard (Fig. 1A). The reactions were also analyzed using a conventional 2-min ultra performance liquid chromatography (UPLC)–MS method. The raw MALDI product response shows only marginal correlation with the UPLC-MS or UPLC–ultraviolet absorption (UV) data for the same reactions [extracted ion count (EIC), coefficient of determination R2 = 0.24; UV 210 nm, R2 = 0.16]; however, normalization of the MALDI signal with the closely related internal standard signal affords notably good correlation (EIC, R2 = 0.85), showing that chemistry-specific effects on MALDI signal intensity can be mitigated by the use of internal standards (Fig. 1B). The normalized MALDI product response metric, ranked from high to low for all four methods, is shown in Fig. 1C, with the corresponding UPLC-MS EIC data shown below for comparison. This analytical approach is similar to the application of MALDI-MS for HTS of enzyme inhibitors, but this work demonstrates that MALDI, in conjunction with an appropriate product standard, can be used for routine reaction screening in the presence of a diversity of interfering elements.

To simplify MALDI-based assessments in fragment-additive screening, we binned the normalized MALDI product response for each reaction into poisons (>50% signal knockdown) and nonpoisons (<50% signal knockdown) for each fragment and synthetic method (Fig. 2). Across all four methods, the MALDI assignment matched the UPLC-MS EIC signal ~95% of the time and the UPLC-UV 210-nm data ~88% of the time (we found that UPLC-UV 210-nm data often show peaks that interfere with the product peak). The fragment screening data showed the expected outcome that each of the synthetic methods is susceptible to poisoning by particular single functional groups (see figs. S19 and S20 for complete lists). We were also able to observe that when these single-functional poisons are incorporated as parts of polyfunctional fragments, reaction poisoning almost always occurs (>90% of the time). Notably, polyfunctional fragments composed of functional groups that are not themselves poisons often (~50% of the time) act as poisons as well. The structural details of this poisoning are very specific. For example, we observed cases in which the combination of N-heterocycle and ketone strongly poisons a given catalytic system, whereas other similar structural combinations do not (Fig. 2). This work provides some indication of how screening large permutations of multifunctional structures can identify precise motifs that lead to reaction failure. We were gratified to find that with a large number of experiments, the effects of false negatives and positives on identifying trends are, to some degree, offset by the internal validation that stems from using multiple fragments that are structurally related. The fragment-based analysis herein reveals Cu to be substantially more resilient to single and poly-functional poisoning than the other methods evaluated, which should support its wider use in complex, densely functionalized substrates. The Ir/Ni and Ru/Ni photoredox methods were very similar to each other in their inhibition profiles and also generally performed very similarly to Pd. Most importantly, the specific problematic functional arrays that poison each method can be roughly mapped, and subsequent detailed nanomole chemistry exploration with structural variants can provide focused data for effective ML predictive modeling [as in (8)].

Fig. 2 Fragment additives poisoning study.

Comparative rates of poisoning determined by MALDI analysis for single functional fragments and polyfunctional fragments for the four tested methods are displayed. This study reveals that Cu is well-suited to managing diverse functionality. Polyfunctional fragments that are composed of linked, single functional fragments that are not poisons themselves often lead to differential poisoning.

uHT simplest-partner whole-molecule analysis

We next investigated the much more challenging use of MALDI-MS to characterize the effects present in large, whole-molecule substrate sets (Fig. 3). Whereas the fragments approach is useful for rapid identification of problematic functionality, classifying local steric and electronic as well as bulk molecular effects such as solubility can be accomplished only by using whole-molecule substrates. Previous work in our labs explored the effects of pharma-relevant whole-molecule informer compounds (32) on the performance of different synthetic methods. Though useful to begin to systematically evaluate different synthetic methods, the 18 compounds in these test sets do not provide enough structural diversity to begin to assign general causes of reaction failure. In addition, the impact of amines in complex reactivity space was not previously taken into account. Hence, we envisioned a much larger virtual array of 192 N-heterocycle–containing aryl bromides crossed with 192 cyclic secondary amines (Fig. 3A), both of increasing molecular complexity, representing 36,864 distinct potential products. This array, if evaluated with the four aforementioned C–N coupling protocols, would result in 147,456 experiments. Although experimental assessment of all substrates and protocols is conceivable, we explored a systematic “simplest-partner” approach to study the isolated structural effects of individual building blocks in this space. In this protocol, each bromide is subjected to reaction with the simplest amine in the set and each amine with the simplest bromide. Applying the four different synthetic methods to this 384-substrate test set affords a total of 1536 experiments, or ~1% of the experimental space. To ensure that every reaction in the set would have a strong MALDI-MS response, the simplest coupling partners were chosen to incorporate a basic nitrogen atom, thereby ensuring MS detectability in the positive-ion mode. Although measuring products in negative-ion mode was not the focus of this work, analysis of small molecules by MALDI-TOF in negative-ion mode can be successfully achieved when appropriate sample preparation methods are used (3638). Clearly, this general strategy would be poorly suited for the synthesis of hydrocarbons or other species that have poor MS ionization properties.

Fig. 3 Whole-molecule simplest-partner evaluation with MALDI analysis.

(A) Bromides and amines (192 of each) are each crossed with a simple, mass-active coupling partner under the four previously described (Fig. 1A) catalytic methods. (B) Using a single internal standard, the normalized MALDI data show poor correlation with UPLC-MS and UPLC-UV metrics.

Traditional exploration of complex substrate space using HPLC analysis requires the preparation of product standards to obtain response factors for quantification and to confirm product identity for every new molecule of interest, an untenable proposition for our envisioned experiment. The prospect of adding an internal standard that is tailored to each product for MALDI normalization is equally daunting. We reasoned that in our search for causes of reaction failure, a focus on common structural reactivity trends rather than absolute product yields might afford valuable insights without the traditional need for individual product standards. Also, we hoped that the use of a single internal standard in all wells, although clearly not controlling for structure-based variations in MALDI signal intensity, might still be useful for normalizing the effects of MALDI spotting variability, thereby vastly simplifying experimental execution. Hence, we endeavored to use rapid MALDI responses for 384 different products coupled with parameterized and clustered molecular descriptors (such as functional group counts, steric hindrance, and bulk properties such as H-bond donors and acceptors and polar surface area) to identify fundamental incompatibilities between given sets of reaction conditions and the diverse structural properties within an enormous compound set. Once identified, such reactivity trends can be confirmed by conventional analytical methods. The MALDI signal for all 1536 reactions was acquired using a single internal standard for normalization across all reactions, as well as a conventional 2-min UPLC-MS analysis for comparison. Two-dimensional scatter plots for all 1536 reactions (Fig. 3B) reveal that the normalized product MALDI responses are poorly correlated compared with the UPLC-MS (EIC, R2 = 0.33) and UPLC-UV data [total wavelength chromatogram (TWC), R2 = 0.30; we found an averaged wavelength to be more useful in this diverse compounds set than a single wavelength, such as UV 210 nm]. However, we found that we could use the MALDI data to create binary thresholds for reaction success or failure that permitted us to identify general structural entities that correlate with reaction failure within a large, whole-molecule set. To this end, we set an arbitrary pass-fail threshold of 20% of the average MALDI value across the entire set. When the binned data for each synthetic method are clustered as a percentage of failed reactions for different structural parameters that have at least 10 examples within a set (Fig. 4A), the data from the normalized MALDI experiment show essentially the same trends revealed in the UPLC-MS data (both EIC and TWC responses), suggesting that MALDI can be used to stack-rank the structural entities that most often cause reaction failure for each set of conditions (see figs. S26 to S33 for complete lists). Again, any of these identified problematic structural parameters can be promoted for higher-order systematic ML studies.

Fig. 4 Whole-molecule reactivity trends from binary thresholding analysis.

(A) Failure percentage for different clustered parameters determined by pass-fail binary binning using thresholded MALDI data correlates very well with similarly binned UPLC-MS EIC and UPLC-UV TWC data. Each point on the graph represents the failure rate of different specific aggregated structural parameters, such as NH heterocycles or H-bond donors, for each synthetic method. Circles and crosses denote whether the trend is for amines (circles) or bromides (crosses). The symbol color of each circle or cross indicates which synthetic method was used for the data point. (B) The average failure rate for clustered parameters for all four methods reveals the functionality that is problematic across methods. The color of each data point reveals which method has the lowest failure rate. (C and D) The most problematic parameters for aryl bromides (blue shaded area from Fig. 4B) and amines (red shaded area in Fig. 4B) are listed in descending order of failure rates across methods, along with the number of examples in the test set. In nearly all cases, Cu is the preferred method. However, several very specific structural types in amines are found to be problematic for Cu and show greater reactivity with Pd. MR, membered ring; TPSA, total polar surface area; NHR, nitrogen with an alkyl group and a hydrogen; MW, molecular weight; clogP, calculated logP (logarithm of the partition coefficient between n-octanol and water); Bn, benzyl.

Although lists of potential poisons for specific synthetic methods are useful, understanding the comparative reactivity of different synthetic methods for problematic functionality is arguably even more important. Figure 4B shows how the clustered binary pass-fail data for all four synthetic methods can be combined into a single graph that reveals structural entities that are generally problematic for bromides and amines across all methods. The data can serve as a valuable map for academic research focused on creating new methods to overcome these common problems. This graph also reveals which method affords the lowest failure rate for each parameter. Figure 4B clearly shows the synthetic advantage of the Cu system in nearly all parameters used to describe aryl bromides (the most problematic are listed in Fig. 4C), which again substantiates its value in complex synthesis. At the same time, for a number of parameters within the amine structural space (Fig. 4B, red), Pd is the best catalyst. Diving deeper (Fig. 4D), we can identify very specific structural features within amines that are problematic for Cu and for which Pd can provide a synthetic advantage. Particularly notable is the reproducibly higher performance of Pd versus Cu for 3-substituted-4-N-carbonyl piperazines and 3,4-N-heterocycle–appended piperazines, revealing subtle conformational effects several bonds removed from the reactive amine that are not considered in the current understanding of C–N coupling reactivity. These systems were explored at a larger reaction scale (10 μmol, 40× scale-up) with additional structural examples to confirm the trends (see figs. S37 to S39 for further details on Cu versus Pd amine trends).

The fragment-additive and whole-molecule studies both indicate that Cu has a substantial advantage over the other methods examined in this study, with respect to single and polyfunctional group tolerance and scope. These observations hold with aryl bromides, but Cu also appears to have specific liabilities in the amine structural space. This study demonstrates that uHT fragment and whole-molecule approaches can provide complementary reactivity information, and both approaches will likely be inextricably linked in the future of predictive chemistry. Given the observed importance of precisely defined structures evident in this work, as well as a shortage of diverse test substrates, chemists will likely alternate between the two approaches to refine their maps of holistic reactivity effects. The most important message from this work is that, rather than accurately determining reaction yields, broad uHT mapping of substrate space with no isolated product standards can reveal specific molecular interactions using adequately discriminating binary analytical approaches that simply separate successes from failures. The reactivity trends identified by MALDI in these experiments were confirmed with conventional UV/MS scoring, which suggests that MALDI alone can be used to uncover results previously accessible only via slower analytical methods. Looking forward, we hope that the convenience, robustness, and speed of MALDI will facilitate creation of large sets of structural and mass response data to enable quantitative yield prediction to further increase the value of the MALDI-MS approach.

Full factorial whole-molecule space

The remaining 99% of the full factorial whole-molecule crossover substrate coupling space may contain additional information on higher-order bimolecular interactions that can also influence reaction performance. The simplest-partner approach can be used to triage the experiments required to map the enormity of the remaining space. We reasoned that if either compound in a complex pair performs poorly in the simplest-partner test, then structural poisoning will be likely in the more complex combination. Likewise, when both partners perform well with simple partners, the combination is likely to work well together. We investigated 288 experiments in this crossover space and found that we could assign >50% of the space into “hits” or “misses” (with 90% accuracy, see supplementary materials for details). The remaining space, with substrates having middling MALDI responses, could not be effectively resolved. This pruning approach allows us to chart complex bimolecular space using the minimal coverage provided by the simplest-partner experiments, thereby eliminating >70,000 experiments that do not need to be performed.

Toward predictive informatics

The miniaturized uHT reaction engineering and MALDI-TOF MS analytical advancements described in this work enable generation of large, structured datasets that pinpoint problematic structural elements within complex substrate space. Identified problems can then be promoted for more detailed mapping with the use of uHT experimentation coupled with atomic and quantum molecular physical descriptors to enable structure-based ML. In this work, we ran more than 3000 experiments studying the effects of just four synthesis conditions on a relatively narrow area of chemistry space (the coupling of cyclic secondary amines and N-heterocycle–containing aryl bromides). The real power in this approach will harness the pronounced analytical speed of MALDI analysis for the iterative evaluation of large substrate arrays against diverse catalysts, bases, solvents, reagent stoichiometries, and temperatures, enabling big data chemistry informatics in the search for general solutions to problematic areas in organic synthesis.

A persistent focus on revealing structural limitations in chemistry is a win-win scenario for academic researchers and synthetic practitioners alike. Dark space that remains inaccessible across synthetic methods becomes fodder for new experimental research of known value, and specific knowledge of limitations will lead to predictivity and understanding that will increase the speed and success of the design-make-test cycle, helping to remove synthetic problem-solving from the critical path. Even with optimally efficient tools and strategies in place, mapping the entire landscape of useful chemical reactivity is currently beyond the reach of any single organization. Given the promising enabling value of such a massive survey, a precompetitive public-private partnership to address this gap might even be an achievable and worthwhile goal.

Materials and methods summary

The specific extended nanomole-scale chemistry protocols used for the four described C–N coupling synthetic methods (Ir/Ni and Ru/Ni photoredox, Cu, and Pd), as well as the MALDI-MS and comparative UPLC-MS analysis protocols, are described in detail in the supplementary materials. A brief summary is provided below.

Description of extended nanomole-scale chemistry platform

Nanomole-scale chemistry reactions, to this point, have been run using plastic plates with low-volatility, plastic-compatible solvents (polar aprotics, DMSO, and N-methyl-2-pyrrolidone) and homogeneous components (bases and catalysts) at room temperature. In this work, reaction engineering advancements have enabled a much wider scope of potential reaction conditions. The use of glass 1536-well plates and rapid 384-tip pipetting enables the use of volatile, non–plastic-compatible solvents (such as dioxane, used in this study) and substrates (piperidine). Development of a resonant acoustic LabRam grinding protocol to produce long-lasting slurries that can be dosed with liquid-handling robotics has enabled the use of heterogeneous inorganic bases (Cs2CO3 and K3PO4). An aluminum reactor block was designed that can provide a high-quality seal to prevent solvent loss, and a heating mechanism was devised that can be equipped to heat the aluminum block inside the LabRam, which provides a robust mechanism for parallel reaction agitation. Finally, a nanomole-scale photochemistry tool was created using a modified aluminum reactor with an acrylic plastic bottom that allows uniform light penetration.

Description of MALDI-TOF analysis

All reaction mixtures were quenched and diluted to standard UPLC-MS concentration. These quenched reactions were then spotted using a Mosquito HTS liquid-handling robot on Bruker HTS MALDI targets with barcodes (1.0-mm thickness) in 1536 format [HTS MALDI plate 1.0 mm, BC, part 1833280]. The targets were mounted on Bruker HTS MALDI adapters (part 1847571). Reaction mixture (175 nl) was deposited and allowed to dry, followed by depositing 150 nl of a 4-mg/ml solution of α-cyano-4-hydroxycinnamic acid in 0.1% TFA, 50% ACN/H2O. These targets were analyzed on a Bruker Rapiflex MALDI-TOF/TOF system.

Description of comparative UPLC-MS analysis

The quenched reactions were also monitored using a Waters Acquity UPLC I-Class system (Waters Corp.) equipped with a binary pump, flow-through needle sampler, column manager, photodiode array detector, SQ detector 2 with electrospray ionization source in the positive mode, and MassLynx software. Separations were performed on a Waters CORTECS UPLC C18+ column (dimensions: 30 mm by 2.1 mm; particle size: 1.6 μm).

Supplementary Materials

Materials and Methods

Figures S1 to S55

Tables S1 to S8


Data S1 to S5

References and Notes

Acknowledgments: We thank A. Donofrio (Merck & Co., Inc., Kenilworth, NJ, USA) for helpful discussions. Funding: S.L. and K.Z. are grateful to the MRL Postdoctoral Research Fellows program for financial support. Additional financial support was provided by Merck & Co., Inc., Kenilworth, NJ, USA. Author contributions: S.L., R.P.S., Z.P., I.W.D., D.A.D., and S.D.D. designed the chemistry experiments. S.L., R.D.F., H.S., and S.D.D. performed the chemistry experiments. S.L., R.D.F, D.V.C., T.C., I.W.D., D.A.D., and S.D.D. developed the extended nanochemistry tools. S.L., S.D., W.D.B., H.W., H.S., and C.J.W. developed the MALDI-MS analytical platform. S.L., K.Z., and S.D.D. analyzed comparative UPLC-MS data. S.L., R.P.S., Z.P., and S.D.D. analyzed the reactivity trends. Z.P. conducted ELN data mining. S.L., S.D., T.C., I.W.D., H.S., C.J.W., and S.D.D. wrote the manuscript. Competing interests: None declared. Data and materials availability: All data described in this work are included in the Excel file associated with the supplementary materials.
View Abstract


Navigate This Article