Research Article

Structures of a dimodular nonribosomal peptide synthetase reveal conformational flexibility

See allHide authors and affiliations

Science  08 Nov 2019:
Vol. 366, Issue 6466, eaaw4388
DOI: 10.1126/science.aaw4388

Moving modules drive biosynthesis

Modular biosynthesis of small molecules—where enzyme units can be swapped in and out of assembly line complexes to produce desired products—is a distant goal in the lab despite a huge diversity of modular systems in nature. Part of the challenge is in understanding how modules interact and hand off intermediates. Reimer et al. determined crystal structures of portions of a nonribosomal peptide synthetase, including a full dimodule. Module positioning differed between these structures even when the same intermediate was attached to the enzyme. The authors used small-angle x-ray scattering to confirm that large conformational changes are possible during biosynthesis and handoff between modules.

Science, this issue p. eaaw4388

Structured Abstract

INTRODUCTION

Nonribosomal peptide synthetases (NRPSs) are microbial megaenzymes that make a wide variety of small-molecule products, including many that are clinically used as antitumors, antibiotics, or immunosuppressants. Nonribosomal peptide synthesis proceeds with assembly-line logic, where each station on the NRPS assembly line is a multidomain unit called a module. An excellent understanding of the structures and activities of isolated modules has been established, but much less is known about how modules work with each other in the context of the larger NRPS. Structural investigation of multimodular NRPSs is needed to understand NRPS architecture, organization, and intramodular function during the synthetic cycle of an NRPS and to facilitate the longstanding goal of bioengineering for production of new-to-nature bioactive small molecules.

RATIONALE

To gain insight into outstanding trans- and supermodular questions in NRPS function, we performed x-ray crystallography with a series of constructs of the dimodular NRPS protein linear gramicidin synthetase subunit A (LgrA). We performed complementary small-angle x-ray scattering experiments to analyze the behavior of the NRPS in solution. We also performed direct coupling analysis to confirm the biological relevance and evolutionary conservation of observed interdomain interfaces. Both the structures and direct coupling analyses were used to guide mutagenesis studies designed to enhance the activity of a chimeric NRPS.

RESULTS

We have determined five independent crystal structures of constructs of LgrA, bound with a series of ligands and intermediate analogs, to resolutions between 2.2 and 6 Å. The crystallized constructs include the complete initiation module and from one to all three canonical domains from the elongation module. Some structures are in markedly different conformations, inferring large movements, and each structure seems to be in a catalytically relevant state. Small-angle x-ray scattering indicates that LgrA is also very flexible in solution, confirming that markedly different conformations are a bona fide feature of NRPS biology. The structures reveal previously unobserved states, including a full condensation conformation, where the thiolation (T) domains from both the initiation and elongation modules are simultaneously bound at the condensation (C) domain. Similar conformations in high-resolution structures allow analyses of the productive T:C domain-domain interface, which mediates the only known functional link between modules. Direct coupling analysis applied to large collections of NRPS sequences provides strong support for the biological relevance and evolutionary conservation of observed interdomain interfaces. Furthermore, both the structures and coupling scores for mutational effects were used to guide bioengineering, and we were able to double the activity of a module-swapped chimeric NRPS by introducing two point mutations at the unnatural T:C domain-domain interface.

CONCLUSION

The structures and small-angle x-ray scattering show NRPSs undergo very large conformational changes and challenge the general assumption that NRPSs have regular higher-order architecture. They demonstrate that there is no strict coupling between the catalytic state of a particular module and the overall conformation of the multimodular NRPS and suggest that the T:C interaction for condensation is the only point where adjacent modules must coordinate. This feature can be exploited in module-swapping bioengineering to produce new useful nonribosomal peptides.

Structures of a dimodular NRPS protein reveal the central condensation state and infer very large conformational changes.

A series of crystal structures of the dimodular nonribosomal peptide synthetase protein LgrA includes a structure of the condensation state (left). Condensation is the central event in synthesis, elongating the peptide intermediate and passing it to the downstream module. Additional structures in condensation and thiolation states show large conformational differences (indicated by arrows), which are supported by solution small-angle x-ray scattering data. These structures show decoupling of the catalytic state and overall conformation and imply that coordination of adjacent modules’ catalytic states is only required at condensation. The structures and coevolution analyses enable improvement of activity of a module-swapped chimeric enzyme (bottom left).

Abstract

Nonribosomal peptide synthetases (NRPSs) are biosynthetic enzymes that synthesize natural product therapeutics using a modular synthetic logic, whereby each module adds one aminoacyl substrate to the nascent peptide. We have determined five x-ray crystal structures of large constructs of the NRPS linear gramicidin synthetase, including a structure of a full core dimodule in conformations organized for the condensation reaction and intermodular peptidyl substrate delivery. The structures reveal differences in the relative positions of adjacent modules, which are not strictly coupled to the catalytic cycle and are consistent with small-angle x-ray scattering data. The structures and covariation analysis of homologs allowed us to create mutants that improve the yield of a peptide from a module-swapped dimodular NRPS.

Nonribosomal peptide synthetases (NRPSs) are intricate macromolecular machines that make small-molecule products with very high chemical diversity and activity (1). Compounds made by NRPSs have found widespread clinical use and are on lists of United Nations–designated essential medicines (2) and top-selling pharmaceuticals (3). Nonribosomal peptide synthesis uses modular, assembly-line logic where each multidomain module adds one amino acid substrate to the growing peptide (Fig. 1) (4). A module’s adenylation (A) domain selects and activates the amino acid and then covalently attaches it as a thioester to the thiolation (T) domain’s phosphopantetheine (ppant) arm. The condensation (C) domain catalyzes peptide bond formation between that aminoacyl-T domain and the donor peptidyl-T domain from the upstream module (5, 6). The newly elongated peptidyl-T domain is then the donor substrate for condensation in the downstream module, passing off and further elongating the peptide. Many modules also have tailoring domains integrated within them, which cosynthetically modify the nonribosomal peptide, such as the tailoring formylation (F) domain (7) found in the initiation module of the NRPS studied here, linear gramicidin synthetase (Fig. 1).

Fig. 1 Overview of the biosynthetic steps performed by LgrA.

(A) LgrA is a dimodular protein with initiation (F1A1T1) and elongation modules (C2A2T2Eo2). Valine and glycine are selected and adenylated by A1 and A2 and then transferred to T1 and T2. Val-T1 is formylated by F1, and then peptide bond formation between fVal-T1 and Gly-T2 by C2 produces fVal-Gly-T2. This is the donor substrate for peptide bond formation in the C3 domain of the next NRPS subunit, LgrB. (This schematic is not intended to indicate the timing of rebinding of substrates.) Eo, inactive epimerization domain; ATP, adenosine triphosphate; AMP, adenosine monophosphate; PPi, inorganic pyrophosphate; THF, tetrahydrofolate. (B) Chemical structure of linear gramicidin A, with a box highlighting the fVal-Gly portion assembled by LgrA.

An excellent structural understanding of the synthetic cycle of isolated modules has been gained from structures of domains, didomains, and individual modules [reviewed in (4, 8)]. However, modules typically function within the context of the full NRPS. They are physically attached to their neighbors by flexible peptide linkers or through small docking domains (9). Adjacent modules must functionally coordinate at least once during the synthetic cycle, when the C domain catalyzes peptide bond formation between aminoacyl and peptidyl moieties attached to T domains of adjacent modules. Little else is known about how modules work with each other in the context of the larger NRPS. Two previous high-resolution structures contain domains from adjacent modules: The T5C6 didomain of tyrocidine synthetase is in an unproductive conformation (10), and A1T1C2 of bacillibactin synthetase (11) showed that the sole observed intermodule contact must break in the course of peptide synthesis. Structural data for multimodular NRPS are limited to 26- to 29-Å negative-stain electron microscopy reconstructions of two modules of bacillibactin synthetase (CATCAT), which showed heterogeneity in the module:module conformation (11). Hypothetical models of multimodular NRPSs can be constructed by consecutively overlapping multidomain structures from different synthetases, and the models often take the form of rigid superhelices (4, 12), but there is no evidence that any of these conformations occur in vivo. More data are needed to understand NRPS architecture, organization, and intramodular function during the synthetic cycle of an NRPS and to facilitate their use to make new-to-nature compounds.

Results

Crystallography of five large NRPS constructs

Linear gramicidin synthetase is a 16-module, 4-protein NRPS which makes the clinically used eponymous antibiotic (Fig. 1) (13). The antibiotic acts by forming dimeric β-helical pores in Gram-positive bacterial membranes, which kills the bacteria by allowing free passage of monovalent cations across the membrane (14). To gain insight into outstanding trans- and supermodular questions in NRPS function, we undertook more than 100,000 crystallography screening trials with constructs of linear gramicidin synthetase subunit A (LgrA, with domains FATCATE0; Fig. 1) complexed with substrates, substrate analogs, and dead-end inhibitors. This yielded five structures: two structures of the four-domain construct FATC in peptide donation conformation (FATfValC and FATfValC*; Fig. 2A and fig. S1A); one of FATCA in peptide donation conformation (FATCA; Fig. 2B); one of FATCA in two thiolation conformations (FATVadCA, where Vad is valinyl-adenosine-vinylsulfamonamide; Fig. 2, C and D); and one of the full dimodule FATCAT in overall condensation conformation (FATCAT; Fig. 2E) (figs. S1 and S2 and table S1). In every structure, each domain assumes its canonical forms (4, 8): The F domain bears the formyltransferase catalytic domain fold; the C domain is a V-shaped pseudodimer of chloramphenicol acetyltransferase-like lobes with a tunnel to its active site; each A domain has its major portion, which includes the amino acid binding site (Acore) and its mobile C-terminal subdomain (Asub); and the T domains are small four-helix bundles with prosthetic ppant arms. As examined below, together these structures of LgrA demonstrate three general features of NRPS architecture: (i) The didomain structural unit (FAcore or CAcore) of an NRPS module largely maintains its overall conformation (1519), (ii) the small domains (T and Asub) move according to the catalytic state (1520), and (iii) observed in detail here, the relative orientations of adjacent modules in an NRPS can vary markedly.

Fig. 2 Crystal structures of dimodular LgrA.

(A to E) Structures of LgrA constructs. FATfValC, which has fVal-amino-ppant on T1 and is bound with valine, AMPcPP, and fTHF (space group P212121; 2.5-Å resolution) (A); FATCA (P212121; 2.5-Å resolution) (B); two crystallographically independent molecules of FATVadCA, for which dead-end Vad were used to stall T1 at A1 during thiolation (P212121; 6-Å resolution) [(C) and (D)]; and FATCAT, which has ppant on both T domains (C2221; 6-Å resolution) (E). See fig. S1, A and B, for additional structures. Cartoon insets show schematics depicting protein constructs that were crystallized. Domains which are grayed in the labels are disordered in the crystal structure.

Module conformation varies between structures

The main structural units of the modules are the FAcore or CAcore didomains. The current structures include 11 crystallographically independent FAcore or CAcore didomains, more than doubling the number available (1519) (Figs. 2 and 3A). These show the didomains in each module as “catalytic platforms” (15) that present the binding site for each module’s T domains (the F1 and A1 active sites for T1 and the C2 acceptor site and A2 active site for T2) on the same face to facilitate substrate delivery. The didomains are fairly rigid, because the F:A or C:A configurations shift by only ~1° to 12°, propagating to ~10 Å (Fig. 3 and figs. S3 and S4). In FATCA, there are few crystal contacts at the distal end of A2, and variation in the C:Acore orientation from unit cell to unit cell is evident from progressively increasing B-factors and weaker electron density at the distal end of A2. Notably, there is substantially more variation in C:Acore conformations between different NRPSs than in a single NRPS: A2core superimposition with modules of enterobactin, AB3403, and surfactin synthetases places some equivalent C domain residues >20 Å apart, because of variations of the C:A interface and “openness” (21) of the V shape of the C domain (figs. S4 and S5).

Fig. 3 Comparison of intramodular conformations.

(A and B) The structural unit for the initiation module (F1A1core) (A) and elongation module (C2A2core) (B) are in similar conformations in all structures. The F1:A1core interface buries 773 to 860 Å2 and the C2:A2core interface buries 565 to 770 Å2 of surface area. Superimposing each structure by their Acores shows a ~1° to 12° shift of F1 or C2. Previously, EntF C1:A1 was seen to shift ~15° (18) between structures.

The positions of the Asub and T domains do vary depending on the catalytic state (Fig. 3A) (1520). As further explored below, T1 is bound at the donor site of C2 in four structures, and in three of these (FATfValC, FATfValC*, and FATCAT), A1sub is bound to A1core in the adenylation conformation (Fig. 2, A and E, and fig. S1A). The simultaneous positioning of T1 for condensation and A1 for adenylation reiterates that NRPSs can start a second synthetic cycle before finishing the first (Fig. 1) (18). In both FATVadCA molecules, T1 and A1sub are bound to A1core in the thiolation conformation (Figs. 1 and 2, C and D) (17, 22, 23). This means that FATfValC and FATVadCA represent consecutive steps in synthesis (Fig. 1). To move between catalytic conformations observed here, Asub rotates up to ~151° and translates up to ~17 Å, and T1 rotates up to ~153° and translates up to ~47 Å (fig. S6) (22, 24). These transitions are as large as those that Asub and T1 require to move between their positions in the rest of the synthetic cycle, for example to and from formylation conformation (17).

Multiple structures show condensation and substrate donation conformations

Condensation is the central chemical event of peptide synthesis. It requires that donor T domain (here T1) and acceptor T domain (here T2) bind simultaneously to the C domain (Fig. 4). The structure of FATCAT features the full condensation state, with both T1 and T2 bound to C2, and represents a detailed three-dimensional view of a multimodular NRPS (Fig. 4A). The resolution of this structure is 6 Å, but the high-resolution structures of F1, A1, T1, C2, A2core, and homology models of the ~100-residue A2sub and ~90-residue T2 enabled the building of a high-quality structure for the full, 1800-residue dimodule (fig. S1). T2 occupies the acceptor binding site on C2, located near helices α1 and α10 (15, 18) (Fig. 4B) and positions the phosphate of its ppant arm at the entrance of the C domain active site tunnel (fig. S2H). This T2 position agrees with our direct coupling analysis (DCA) (25) and that in AB3404 (18) but is rotated by ~55° from that in surfactin synthetase (15) (fig. S7, A to C, and table S2).

Fig. 4 Condensation in LgrA.

(A) FATCAT shows a full condensation state, with T1 and T2 docked at the donor and acceptor binding sites, respectively. FATfValC and FATfValC* are overlayed to show that the first four domains are in analogous positions regardless of space group or resolution of the structures. (B) Overlay of LgrA structures with T1 at C2. (C) DCA between Tn and Cn+1 displayed on LgrA and coevolution signal (red lines) between residues in close proximity in the LgrA structures. Residues selected for mutation are indicated with brown α-carbon spheres. (D) Schematic of the LgrA-BmdB chimera protein FATCAT-CT3 and its product fVal-Gly-Tpm. (E) Liquid chromatography–mass spectrometry (LC-MS) peptide synthesis assay of mutations near the T1 binding site of C2. All reactions were performed in triplicate (n = 3) and are shown as mean values from integration of extracted ion chromatogram (EIC) peaks, normalized against the average wild-type value. Statistical significance was determined by two-sided Student’s t test [not significant (ns), p > 0.05; ***p ≤ 0.001; ****p ≤ 0.0001]. Error bars indicate plus or minus the standard error of the mean. A1008K, Ala1008→Lys; R1051N, Arg1051→Asn; P1086G, Pro1086→Gly; L1088Y, Leu1008→Tyr. (F) Unbiased FO-FC (3σ) simulated annealing Polder omit electron density map of the C2 active site of FATfValC, calculated with phases from a model that never included fVal-ppant ligand. (G) Magnified view of the FATfValC C2 active site, which shows that the Val rotates for condensation.

Four structures (FATfValC, FATfValC*, FATCA, and FATCAT) show T1 binding to the donor site of C2 (Fig. 4, B and C). This canonical Tn:Cn+1 interaction is the functional link between modules 1 and 2, which allows the nascent peptide to be elongated and passed downstream in the condensation reaction. The donor site is a shallow depression between helices α4 and α9 on the opposite side of the C domain tunnel from the acceptor site (Fig. 4, B and C). Each LgrA donor structure has slightly different residue-level T1:C2 contacts, all dominated by van der Waals interactions, which shift distal T1 residues up to ~3.5 Å (Fig. 4B and fig. S7, D and E). DCA of the Tn:Cn+1 interaction showed a strong coevolution signal between the areas of T1 and C2 that we observe in direct contact (Fig. 4C and table S3). We established a multiple-turnover peptide synthesis assay by fusing FATCAT to the terminal C (CT) domain of bacillamide synthetase, which catalyzes peptide release by condensation with free tryptamine (Tpm) (26, 27). This FATCAT-CT construct produces fVal-Gly-Tpm tripeptide (Fig. 4D). We then used DCA, the capacity of which to predict mutational effects in proteins has recently been established (28), to guide mutational analysis of the T1:C2 interface. Of four mutations in C2 predicted to be deleterious for the T1:C2 interaction but not for C2 folding, three showed moderate, but significant, decrease in tripeptide production (Fig. 4E and fig. S7, J to M). The observed binding is thus likely a faithful representation of an important T1:C2 interaction.

T domains have previously been observed bound to sites analogous to the donor site in two specialized C domain homologs. E domains, found downstream of T domains in some modules, catalyze chirality inversion in the peptide intermediate (29). Fungal NRPSs often end with a terminal condensation-like (Ct) domain, which catalyzes peptide release by macrocyclization with an internal nucleophile in the peptide intermediate (30). Both E and Ct domains have evolutionarily diverged from canonical C domains (5, 31), and each has one only T domain binding site, which is analogous to the donor site. The structure of TqaA didomain T3Ct (30) shows contacts clustered on the α9 side of the donor site depression, similar to our T1:C2 interaction, with T3 shifted by maximally ~7 Å (fig. S7G). However, the position of T1 in the structure of the GrsA didomain T1E1 (29) is quite different, rotated by ~37°, and translated ~13 Å to the α4 side of the donor site (fig. S7F). Correspondingly, DCA between Tn and En domains (within CnAnTnEn modules) shows signal clustered on that α4 side (fig. S7I and table S4). T domains thus bind C and E domains in distinct ways, with our structures and TqaA TCt representing Tn:Cn+1 binding, and the GrsA didomain representing Tn:En binding.

T domain binding to the donor site places the ppant arm into that side of the active site tunnel. The tunnel leads to the C domain’s conserved HHXXXDG catalytic motif, where X is any residue, with the second histidine (His908) most important for activity (3234). FATfValC, FATfValC*, and FATCA all show electron density for the (amino-)ppant (35) arm in the C2 tunnel (Fig. 4F and fig. S2, C, E, and F). FATfValC contains extra density attached to the amino-ppant, which fits fVal (fig. S2, E and F), placing the formyl group within hydrogen bonding distance of Tyr810 (Fig. 4G). The donor ppant-fVal would require a small shift of the Val to expose the reactive carbonyl carbon to the acceptor site (Fig. 4G), which may only occur when acceptor substrate binds to the active site. A transient “opening” or “closing” of the V shape formed by the C domain’s N- and C-lobes (5, 34) could also be involved, though C2 is in very similar conformations in all the structures, not greatly influenced by whether the T domains are interacting with C2 or what is attached to the ppants (fig. S6). The small shift of a donor substrate to achieve a fully reactive conformation is reminiscent of the large ribosomal subunit, which maintains peptidyl-tRNA in a nonreactive conformation until the aminoacyl-tRNA binds (36).

Large conformational changes in dimodule structures

FATCAT, FATCA, and both molecules of FATVadCA provide insight into questions of supermodular architecture. In FATCAT and FATCA, T1 is similarly bound at the donor site of C2, but C2A2 or C2A2T2 is folded back toward the initiation module in two distinctive ways (Fig. 5A): In FATCAT (and also in FATfValC and FATfValC*), C2 makes contact with F1 near its active site. By contrast, the entire second module in FATCA is rotated ~114° around the F1A1 didomain, the C2A2core didomain center of mass is translocated by ~80 Å (Fig. 5A), and C2 makes extensive contacts with A1. Notably, the C2 acceptor site is not obstructed by any of these interactions, meaning that this conformation would allow a full condensation state in solution. Thus, the overall conformations of FATCAT and FATCA are very different, but both seem capable of peptide bond formation.

Fig. 5 Different dimodular conformations for the same catalytic states.

(A) FATCAT and FATCA both show T1 binding to the donor site of C2 but have very different overall conformations. (B) The two crystallographically independent molecules of FATVadCA both show module 1 in thiolation conformation but have very different positions of module 2. (C) The distances between positions of residue Asp1236 in the four dimodular conformations. The structures are superimposed by their A1core.

To obtain the crystals of FATVadCA, we used a Vad dead-end inhibitor of the thiolation reaction (Fig. 1A) (37, 38). Vad binds A1 and allows the nucleophilic attack of ppant-T1 on Vad but is not cleaved by the reaction, tethering T1 to Vad and stalling T1 and A1 in the thiolation state. Fortuitously, the crystals of FATVadCA contain two molecules in the asymmetric unit, revealing two different views of the complex. In both molecules, Vad is indeed at the A1 active site and T1 and A1sub are in thiolation conformation, but the elongation module has notably different orientations. In one molecule, C2A2 extends away from F1A1T1 at ~45° (Figs. 2C and 5B), and, in the other, it makes a ~135° angle on a different axis (Figs. 2D and 5B). The transition between the two conformations would require a ~82-Å translation and ~140° rotation of the C2A2 (Fig. 5B). The initiation and elongation modules do not form substantial interactions with each other in either conformation, with the T1-C2 linker acting as a flexible tether between the two modules.

Comparing the four dimodular conformations observed in our structures dramatically shows the scale of the conformational changes possible in a dimodular NRPS. Residues on the distal side of A2core would move by between 85 and 216 Å to transition between observed conformations. This is similar to the length of the full dimodular NRPS, because the longest distance within any structure is 220 Å. The kinds of conformational changes required for these transitions is presented in movie S1.

To assess whether the conformational variability seen in the crystal structures reflected flexibility in solution, we analyzed the behavior of FATCA, the LgrA construct in three of the four different crystallographic-observed conformations, using small-angle x-ray scattering (SAXS). FATVadCA, FATfValCA, and apo-FATCA samples behaved well in solution, as judged by the initial characterization of their scattering curves (fig. S8). Notably, their pair-distance distribution functions had limited features and large Dmax values (fig. S8, K to M), which is a characteristic of molecules that either adopt extended conformations or are flexible (39). Comparison of the experimental scattering curves to theoretical scattering curves calculated from the crystal structures resulted in very poor fits (fig. S9, A to F, and table S6), indicating that none of the individual conformations observed in the crystal structures fully described the conformation of FATCA in solution. Using weighted combinations of the crystal structures improved the chi-square values somewhat (table S7) (40). Modeling of the scattering curves by using the ensemble optimization method (41) resulted in ensembles with excellent fits (fig. S9, G to L). The ensembles were reminiscent of the series of conformations observed in the crystal structures (fig. S9, M to P) and retained a level of flexibility similar to the original pool of 1000 independent models (table S5), consistent with the interpretation that LgrA is highly flexible.

Structures and sequence statistics enable module-swapping bioengineering

The structures and SAXS suggest that there is little constraint on positions of adjacent modules in NRPSs. Other than the intramodular linker, only the condensation reaction and the T1:C2 interaction during this reaction strictly couple neighboring modules. Thus, high-resolution views of the T1:C2 interaction should enable module swapping experiments. We fused module 1 of LgrA (LgrAM1) with the termination module 4 of cereulide synthetase (CesBM4) to produce a chimera that synthesizes fVal-Val (Fig. 6 and fig. S10). We then used DCA to guide mutagenesis of the T1:CCesB-4 interface aimed at increasing activity of the chimera. Mutations in CCesB-4 rather than T1 were targeted because T1 must interact with F1, A1, and CCesB-4 for synthesis, but the donor site of CCesB-4 must interact only with T1. Saturation mutagenesis in silico of 75 residues around the donor site of CCesB-4 was performed, and five mutations predicted to improve T:C interaction without substantially affecting C domain folding were constructed and tested in vitro (Fig. 6, B and C, and table S8). Three of the mutations decreased activity, but E925A (Glu925→Ala) and H1008Q (His1008→Gln) showed a significant increase in fVal-Val formation (Fig. 6D). DCA scores predicted that effects of C domain mutations should be additive; LgrAM1-CesBM4(E925A, H1008Q) did indeed show an additive effect, doubling the peptide production of the LgrAM1-CesBM4 chimera.

Fig. 6 Module-swapping bioengineering using structural and direct coevolution analysis.

(A) Schematic of the LgrAM1-CesBM4 chimera and its product fVal-Val. (B) Scatter diagram of the difference in C domain score against the difference in Tn:Cn+1 interaction score for the DCA analysis of 75 residues around the donor T domain binding site of CesB C4 mutated to each of the other 19 amino acids. CesB C4 mutations in red were analyzed biochemically. E925A, Glu925→Ala; K1078Q, Lys1078→Gln; D1052R, Asp1052→Arg; N1089A, Asn1089→Ala; H1008Q, His1008→Gln. (C) The corresponding positions of these five mutations in LgrA C2. (D) Wild-type LgrAM1-CesBM4 and mutant proteins were assayed for peptide production by LC-MS. All reactions were performed in triplicate (n = 3) and are shown as mean values from integration of EIC peaks, normalized against the average wild-type value. Statistical significance was determined by two-sided Student’s t test (*p ≤ 0.05; ***p ≤ 0.001; ****p ≤ 0.0001). Error bars indicate plus or minus the standard error of the mean.

Discussion

The series of structures and SAXS data presented portray multimodular NRPSs as very flexible. Previous results showed that within a single module, the catalytic state can define the positions of all domains in the module (e.g., the thiolation state has specific positions of all domains in an initiation or elongation module) (15, 17, 18). By contrast, our current structures suggest that for multimodular NRPSs, there is no strict coupling between the catalytic state of a particular module and the overall conformation of the multimodular NRPS. Rather, it appears that many overall conformations can allow the various catalytic states: FATVadCA molecule 1, FATVadCA molecule 2, and a continuum of unobserved other conformations should allow thiolation; FATCA, FATCAT, and a continuum of unobserved conformations should allow condensation. The continuum of conformations need not be equally populated: The observation of the same condensation conformation in multiple crystal forms hints that it may be more common than others. Each conformation we observed was fortuitously selected through crystallization in the very different packing environments of the five unrelated crystal forms. Inspection of the crystal packing reveals a myriad of different crystal contacts and few trends, though perhaps predictably, many more contacts are mediated by the larger domains and subdomains (F, Acores, and C) than the small, mobile ones (Asub and T), and the lower-diffracting crystal forms have more porous packing and spacious solvent channels.

The four conformations we observed are all markedly different from each other, and none of the observed positions of module 2 would allow module 1 to perform each step of its catalytic cycle. For example, module 2 has to move from any observed position to allow T1 to reach the F1 active site. To test if there is a single overall dimodule conformation that is compatible with the full synthetic cycle, we visualized possible positions of module 2 by drawing spheres with radii of the length of the T1-C2 linker, at the observed positions of the last residue of T1. In FATCAT, as well as in an extrapolated CATCAT dimodule, there is little or no available position within the overlapping spheres (fig. S11). In addition, if NRPSs possess supermodular architecture, it would be mediated by domain:domain interactions involving CAcore structural units of adjacent modules, but we are unable to detect CnAn:Cn+1An+1 coevolution expected for such interactions. The volume constraints and lack of detectable CnAn:Cn+1An+1 coevolution make it likely that no static module:module conformation exists that accommodates the full NRPS synthetic cycle. Although it is possible that some NRPSs have a different nature from LgrA, we suggest that NRPSs, in general, do not possess constant and rigid supermodular architecture.

The high flexibility of NRPSs could facilitate synthetic cycles that include cosynthetic modification (tailoring), are noncanonical, or are nonlinear. Flexibility could be important for NRPSs with tailoring domains inserted at different positions in their architecture (17, 42) or NRPSs with an abnormal order of domains, like heterobactin synthetase (which includes a C-T-A module) (43), obafluorin synthetase (which has an A domain C-terminal to its TE domain) (44, 45), or vibriobactin synthetase (which includes VibF: Cy-Cy-A-C-T-C) (46). Equally unusual systems include beauvericin synthetase, which uses tandem T domains to perform iterative condensation (47); myxochromide synthetase, which perform skipping of module 4, with T3 donating the peptide directly to module 5 (48); and thalassospiramide B synthetase, where A2 is proposed to thiolate T1, T2, and T5 (49). Presumably, some of these specialized systems have additional interactions that bring domains that are far apart in sequence close together in space.

Because of their straightforward synthetic logic and important bioactive products, NRPSs have long been the subject of bioengineering attempts to create new-to-nature peptides, with mixed success (50, 51). Strategies include mutation of the A domain substrate-binding pocket (52, 53), domain swaps (54), module deletion or insertion (55, 56), module swaps (54), swaps of module-sized segments (5759), and multimodular swaps via docking domains (60, 61). This includes interesting recent results using swapped AnTnCn+1 segments (57, 58), a strategy that conserves the native donor Tn:Cn+1 interaction but not the acceptor Cn:Tn interaction (57, 58), and using the junction of the N and C lobe of the C domain as the split point (58, 59). Notably, both the N and C lobes contribute to both the donor and the acceptor T domain binding sites.

The lack of strong interactions between the didomain structural units of adjacent modules observed here should facilitate module swapping experiments, but unnatural Tn:Cn+1 interactions could inhibit synthesis. That we increased the activity of a chimera by improving the unnatural T1:C2 interaction indicates that this interaction can be rate-limiting in module-swapped NRPSs. Although it did not produce orders of magnitude higher product quantity, our approach for improving unnatural Tn:Cn+1 interactions could be combined with other bioengineering strategies and may help NRPSs fulfill their long-held promise as sources of new designer bioactive small molecules.

Materials and methods summary

Constructs were cloned with cleavable octa-histidine and calmodulin binding peptide tags and modified by site-directed mutagenesis. Proteins were expressed in Escherichia coli and purified for crystallization by calmodulin affinity, nickel affinity, tag removal, anion exchange, and gel filtration chromatography. FATCAT-CT and LgrAM1-CesBM4 were purified by calmodulin affinity, nickel affinity, and gel filtration. Ppants were added to apo T domains using Sfp and fVal-NH-CoA or coenzyme A. Valinyl-adenosine-vinylsulfamonamide was complexed to FATCA by including it in the Sfp reaction.

Initial nanovolume crystallization conditions were optimized in large format to the conditions listed in the supplementary materials. FATC was phased by molecular replacement in Phenix (62) with FAcore (17) and the N-lobe of TycC [Protein Data Bank (PDB) 2JGP] (10), followed by (re)building in Coot (63) and refinement in Phenix. Models of F, A1core, and C and homology models A2 and T2 were used for molecular replacement phasing or as a starting point for (re)building and refinement for other structures.

Small-angle scattering data were collected at three concentrations and processed using ATSAS (40). Because of evidence of high flexibility, EOM2 (41) was used to generate ensembles of FATCA conformations whose theoretical combined scattering matches the experimentally measured scattering well.

For peptide synthesis by FATCAT-CT3, 3.7 μM protein was incubated with 0.2 mM N10-fTHF, 5 mM valine, 2 mM glycine, 1 mM tryptamine, and 5 mM adenosine triphosphate (ATP) at 23°C for 5 hours before quenching and liquid chromatography–mass spectrometry (LC-MS). For peptide synthesis by LgrAM1-CesBM4, 5 μM protein was incubated with 5 mM valine, 5 mM ATP, and 0.5 mM N10-fTHF for 6 hours before quenching and LC-MS.

For DCA (64), we extracted 45,015 Tn:Cn+1 pairs, 14,506 Tn:En pairs, and 29,700 Cn:Tn pairs and calculated interdomain contact scores and domain-domain interaction scores. To computationally suggest mutations to improve the Tn:Cn+1 interaction, we altered all single amino acids in Cn+1 interface positions to all 19 other amino acids and evaluated the interaction scores, looking for those that improved the Tn:Cn+1 interaction score and did not substantially affect the C domain score.

Supplementary Materials

science.sciencemag.org/content/366/6466/eaaw4388/suppl/DC1

Materials and Methods

Figs. S1 to S11

Tables S1 to S10

References (6591)

Movie S1

References and Notes

Acknowledgments: We thank D. Alonzo (MS and ligand building); A. Pistofidis and Y. Ripstein (purification help); K. Bloudoff and C. Fortinez (bmdB plasmid); J. Jiang (cloning); Schmeing lab members (discussions); N. Rogerson (editing); G. Hura (SAXS advice and analyses); K. Burnett (SAXS data collection); A. Berghuis (discussions); and B. Nagar (diffraction data discussions). We thank C. Chalut for BL21(DE3)entD cells; J. Colucci, M. Guerard, and R. Zamboni (ZCS); staff at CLS 08ID-1 (S. Labiuk, J. Gorin, M. Fodje, K. Janzen, D. Spasyuk, and P. Grochulski); APS 24-ID-C (grants GM124165, RR029205, and DE-AC02-06CH11357; F. Murphy); and SAXS data Advanced Light Source (ALS) SIBYLS beamline (US-DOE-BER Integrated Diffraction Analysis Technologies, NIGMS ALS-ENABLE-P30-GM124169 and S10OD018483). Funding: This work was funded by CIHR (FDN-148472) and a Canada Research Chair to T.M.S., a European Union H2020 research and innovation program MSCA-RISE-2016 (#734439 INFERNET) grant to M.W., and studentships from NSERC (J.M.R.), Boehringer Ingelheim Fonds (M.E.), and CIHR (I.H.) Author contributions: J.M.R. performed the crystallography with assistance of I.H. J.M.R performed activity assays for LgrA-CesB chimeric proteins. M.E. performed activity assays for LgrA-BmdB proteins and structure refinement. J.M.R. and M.E. prepared samples for SAXS, and A.G. performed analyses of SAXS data. M.W. performed coevolution and bioinformatic analyses. T.M.S. directed the project. T.M.S., J.M.R., and M.W. wrote the manuscript. Competing interests: The authors declare no competing interests. Data and materials availability: Coordinates and structure factors are available in the RCSB Protein Data Bank under the following PDB IDs: FATfValC, 6MFW; FATfValC*, 6MFX; FATVadCA, 6MFY; FATCAT, 6MFZ; and FATCA, 6MG0. All other data are available in the main text or the supplementary materials.

Stay Connected to Science

Navigate This Article