Natural noncanonical protein splicing yields products with diverse β-amino acid residues

See allHide authors and affiliations

Science  16 Feb 2018:
Vol. 359, Issue 6377, pp. 779-782
DOI: 10.1126/science.aao0157

Protein backbone, broken and mended

Small, posttranslationally modified peptides are produced by microorganisms as antimicrobial agents or to communicate with neighboring cells. Alterations to the peptide backbone can change the structure of peptides or introduce reactive chemical moieties. Morinaka et al. characterized a bacterial enzyme that excises the side chain and α-carbon of a tyrosine residue from a short peptide, leaving behind an α-ketoamide. This backbone functional group is found in some protease inhibitors and is a valuable handle for bio-orthogonal chemistry. The enzyme accepts peptide substrates with a short recognition motif, suggesting that it could be used to generate libraries of modified peptides.

Science, this issue p. 779


Current textbook knowledge holds that the structural scope of ribosomal biosynthesis is based exclusively on α-amino acid backbone topology. Here we report the genome-guided discovery of bacterial pathways that posttranslationally create β-amino acid–containing products. The transformation is widespread in bacteria and is catalyzed by an enzyme belonging to a previously uncharacterized radical S-adenosylmethionine family. We show that the β-amino acids result from an unusual protein splicing process involving backbone carbon-carbon bond cleavage and net excision of tyramine. The reaction can be used to incorporate diverse and multiple β-amino acids into genetically encoded precursors in Escherichia coli. In addition to enlarging the set of basic amino acid components, the excision generates keto functions that are useful as orthogonal reaction sites for chemical diversification.

Overcoming the structural limitations of ribosomal biosynthesis, which is based on a restricted set of amino acids, is important in biology and applied life sciences (14). Natural proteins and ribosomally produced peptides are diversified through posttranslational modifications that equip them with a wide range of added structural and functional features (1). All known ribosomally generated biomolecules are based on α-amino acid backbone topologies. In synthetic biology, efforts have been made to incorporate non–α-amino acid residues into proteins by ribosome engineering or by tRNA misacylation. Although in vitro strategies have resulted in products containing α-hydroxy or β-amino acids (59), we know of only a single in vivo case, which used mutated ribosomes to incorporate β3-Phe–based units (10).

We recently reported the discovery of radical S-adenosylmethionine (rSAM) enzymes that irreversibly introduce multiple d-amino acids into ribosomally synthesized and posttranslationally modified peptides (RiPPs) (1114). These enzymes act on small precursor proteins containing either a nitrile hydratase–like (also termed proteusin) or a Nif11-like N-terminal leader and a variable C-terminal core region. During biosynthesis, the core is modified and proteolytically released from the leader. Although the encoding biosynthetic gene clusters are widespread in bacteria (15), the only proteusins that have been characterized to date are the cytotoxic, pore-forming polytheonamides (11, 1618), which are the most extensively modified known peptides and contain numerous d-amino acids introduced by a rSAM epimerase (11, 14).

In a broader analysis of such RiPP clusters, genes encoding an orphan rSAM family (TIGR04103, rSAM_nif11_3) attracted our attention owing to their consistent colocalization with Nif11 precursor genes. These rSAMs contain a C-terminal region with homology to SPASM domains present in various peptide-modifying rSAM enzymes (19). One homolog, here termed PlpX, is encoded in the orphan plp locus from Pleurocapsa sp. PCC 7319. The locus contains one proteusin and two Nif11 precursor genes, as well as the previously characterized rSAM epimerase PlpD (Fig. 1A) (12, 13). To assign function to the remaining genes in the plp cluster, we studied the activity of PlpX. We coexpressed the gene with either of the two Nif11 precursor genes plpA2 and plpA3, which are located directly upstream and encode precursors with predicted cores of 25 and 23 amino acids, respectively (Fig. 1B). Both precursor genes were modified to produce N-terminally His6-tagged proteins carrying a factor Xa (Fx) cleavage site at the leader-core interface. After coexpression with plpX in Escherichia coli BL21(DE3), Ni-affinity purification, and cleavage with trypsin, the core peptides 1 and 2 of His6-PlpA2-Fx and His6-PlpA3-Fx, respectively, lacked detectable modifications (Fig. 2A). Closer analysis of the plp locus (Fig. 1A), however, revealed an unannotated small open reading frame, named plpY, located downstream of plpX, an architecture that we also detected in other clusters encoding PlpX homologs (figs. S1 and S2 and table S1). PlpY has weak similarity to PqqD from pyrroloquinoline quinone biosynthesis (20) and to the RiPP recognition element domains that occur in various modifying enzymes and mediate precursor binding (21) (fig. S3). When plpY was coexpressed with plpX and plpA2 or plpA3, the liquid chromatography–mass spectrometry (LC-MS) chromatograms contained two additional broad peaks (Fig. 2A) that partially overlapped with unmodified 1 (products 3 and 4) and 2 (products 5 and 6). Tandem mass spectrometry (MS/MS) experiments (Fig. 2B) revealed a mass difference consistent with C8H9NO loss in PlpA2 (diastereomers 3 and 4) and PlpA3 (diastereomers 5 and 6), which was mapped to core regions carrying tyrosine residues (Tyr21 and Tyr6, respectively; numbering relates to core position). For detailed product characterization, simultaneous digestion of the His6-PlpA3-Fx product with trypsin and chymotrypsin cleanly provided a core fragment (Ala1 to Trp12) composed of two diastereomers (7 and 8; Fig. 2C), which were separated by means of high-performance LC. Nuclear magnetic resonance (NMR) analysis (figs. S4 to S15 and tables S3 and S4) revealed heteronuclear two- and three-bond correlations that suggest an α-keto-β3-Met moiety present in both diastereomers. Specifically, we observed cross peaks to the newly formed ketone and amide carbonyls with characteristic chemical shifts of δ = 196 and 165 parts per million (figs. S9 and S15), respectively, similar to those of other α-keto-β3-Met–containing peptides (22). Attempts to obtain an enzymatic product enriched in one diastereomer were unsuccessful, in line with the tendency of α-ketoamides to tautomerize at physiological pH (23).

Fig. 1 The plp and pcp loci from Pleurocapsa spp. PCC 7319 and 7327 and the investigated core peptides.

(A) Map of the plp and pcp loci. Black, precursor genes (plpA1, A2, A3, and pcpA); red, rSAM genes (plpX and pcpX, examined in this study, and epimerase gene plpD); yellow, plpY and pcpY encoding the PlpY and PcpY accessory proteins; blue, putative hydroxylase gene; white, unknown function. (B) Core peptide sequences from Nif11-type precursors (PlpA2, PlpA3, and PcpA) studied in this work. YG motifs and their preceding amino acids are shown in red and bold, respectively. The PlpA1 proteusin core does not contain a YG motif. Single-letter abbreviations for the amino acid residues are as follows: A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; I, Ile; K, Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gln; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr.

Fig. 2 Detection of PlpX activity by coexpression in E. coli.

(A) Total ion chromatograms from trypsin digests of His6-PlpA2-Fx and His6-PlpA3-Fx, indicating the starting material (1 and 2) in all runs. New products 3 to 6 are only detected in coexpression experiments with PlpX and PlpY. The inlays are extracted mass spectra (retention time = 27.6 to 28.6 min for PlpA2-Fx and 21.3 to 22.5 min for PlpA3-Fx). (B) MS/MS spectra for 1 to 6 derived from LC-MS/MS to localize the modification. -Tyn indicates loss of C8H9NO. m/z, mass/charge ratio. (C) Peptide fragments (3 to 6) detected in (A). Products 5 and 6 were obtained from trypsin digest and 7 and 8 from combined trypsin and chymotrypsin digest. Products 3 and 4, 5 and 6, and 7 and 8 are pairs of epimers differing at the newly formed stereochemically labile β position. (D) Net reaction catalyzed by PlpX. The products lack one equivalent of tyramine formally excised from the backbone. The keto carbonyl group shown in green was selectively labeled with [1-13C]Met and [U-13C]Met, whereas the amide carbonyl shown in blue was selectively labeled with [1-13C]Tyr and [U-13C]Tyr (figs. S16 to S19).

Using His6-PlpA3-Fx, we next investigated the origin of the β-amino acid moiety by feeding various 13C-labeled amino acids to expression cultures. For individual feeding experiments, labels of [1-13C]Met, [U-13C]Met, [1-13C]Tyr, and [U-13C]Tyr were detected by MS in the peptide products (figs. S16 to S19). NMR-based characterization of the purified core fragment 8 (PlpA3, residues 1 to 12) revealed enhancements of 13C signals, indicating a fully intact Met unit with only C1 of Tyr retained, accounting for the amide carbonyl in the product (figs. S16 to S19). These data confirm that PlpX catalyzes an unusual reaction involving tyramine excision from the backbone and reconnection of the remaining protein sections to generate β-amino acids (Fig. 2, C and D). β-Amino acids with or without α-keto groups are only known for nonribosomal and polyketide pathways (fig. S20), where they are integrated by direct incorporation of β-residues, amination of polyketide moieties (24), or by as-yet uncharacterized net C1 extension mechanisms (25, 26).

C–C bond cleavage is known for various rSAM enzymes acting on free amino acids (2733)—for example, the hydrogenase maturation protein HydG cleaving Tyr to p-cresol, CO, and CN (30). These lyases are only distantly related (<15% amino acid identity) to PlpX and do not contain a SPASM domain. We hypothesized that the PlpXY mechanism may involve formation of p-cresol, as for HydG. However, many attempts to detect p-cresol or structural variants with or without nitrogen in in vivo coexpressions were unsuccessful. In RiPP biosynthesis, rSAM and SPASM domain–containing proteins catalyze diverse reactions (34). In protein alignment and network analyses of PlpX and other SPASM domain proteins, MftC, catalyzing a C-terminal oxidative decarboxylation and Tyr-Val C–C cross-link event (35), was the closest characterized homolog (figs. S21 and S22).

To obtain insights into the prevalence and distribution of this modification, we analyzed bacterial (meta)genomes deposited in GenBank, as well as 93 newly sequenced cyanobacteria from the Pasteur Culture Collection, for genes that associate with the same TIGRFAM family as the splicase PlpX. We detected 53 positives in 436 phylogenetically diverse cyanobacteria, almost all of them neighboring nif11- and plpY-type genes (figs. S23 and S24 and table S5). All precursors comprise between one and three Tyr residues that are part of conserved “XYG” motifs also present in the splice sites of PlpA2 and PlpA3. Additional members of TIGRFAM rSAM_nif11_3 also occur in various proteobacteria and Frankia spp. actinomycetes (table S5). Each splicase gene was consistently flanked by a small gene encoding YG peptides. Notable examples are genes from Thiothrix spp. for proteins with up to five YG copies. Neither Nif11-type leaders nor PlpY homologs were identified in these noncyanobacterial systems. We detected 27 further rSAM gene candidates in the TARA Ocean metagenomic data set (36), for which six contigs were long enough to reveal adjacent genes for YG peptides (table S5). These data suggest that various prokaryotes are able to generate β-amino acid–containing products. To test whether multiple YG sites direct several splicing events in one precursor, we coproduced His6-PcpA bearing three YG motifs (9; Fig. 1B) from the thermophile Pleurocapsa sp. PCC 7327 (Fig. 1A and fig. S24) with its splicase gene partners pcpX and pcpY. Proteolysis and LC-MS analysis (figs. S25 and S26) revealed two modifications at the C-terminal YG motifs (Tyr15 and Tyr56) to give product 10 (Fig. 2C and figs. S27 to S29), demonstrating that a single splicase can catalyze excisions at multiple sites in one protein. The fate of this moiety and the structure of the natural products from these gene clusters remain cryptic and are under investigation in our laboratory.

The introduction of α-keto-β-amino acids into gene-encoded precursors has considerable potential for applications in drug discovery, chemical biology, and synthetic biology, which we started to explore in a series of experiments. Eight different residues (where X is Ala, Met, His, Leu, Ile, Cys, Phe, or Lys) are part of the currently known cyanobacterial XYG motifs, and six further residues (where X is Gly, Val, Ser, Asp, Glu, or Arg) occur in other bacteria (table S5). To evaluate the potential of PlpX to generate diverse β-residues, we constructed ten Met substitution mutants at the MYG site of His6-PlpA3-Fx (Fig. 3A). Of these, seven (Gly, Val, Leu, Ser, Ala, and the not yet naturally observed Pro and Gln) were converted, whereas Phe, Glu, and Arg mutants were not at all or only poorly accepted (figs. S30 to S39 and table S6). Various insertion, deletion, or substitution mutations surrounding the XYG motif were generally tolerated (Fig. 3B and figs. S40 to S44), but mutations within YG abolished splicing (Fig. 3B, figs. S45 and S46, and table S6). We also generated His6-PlpA3-Fx truncation variants (Fig. 3C, figs. S47 to S52, and table S6) to identify a minimal core sequence converted by PlpXY. These studies defined an 11-residue sequence (11) that is accepted, suggesting opportunities to introduce α-ketoamides into peptides and proteins in a motif-based fashion. During these mutational analyses, we also gained initial insights into the nature of the excision reaction. We noticed that one of the point mutant coexpressions (His6-PlpA3-Fx-4A5 plus PlpXY) yielded truncated C-terminal Met5-amide as by-product (Fig. 3D and fig. S53), which was not observed from a wild-type precursor. This shunt product might arises from inefficient conversion of an imine intermediate (e.g., a dehydroglycine moiety) that might form after initial radical cleavage of the tyrosine side chain, analogous to HydG catalysis (fig. S54) (30).

Fig. 3 Mutational analysis and detection of amide intermediate.

(A) Mutations at X to install a variety of β-amino acids. (B) Mutations within and around the YG motif. Bold indicates mutated residues; dashes at conserved sites show deletions. (C) Truncations to yield a minimal core motif that is excised. (D) Detection of amide intermediate from His6-PlpA3-Fx-4A5 insertion mutant. Green check marks indicate sequence conversion. Red X’s indicate minimal or no conversion. WT, wild-type core sequence.

Two strategies to genetically incorporate reactive carbonyls use evolved tRNA–aminoacyl tRNA synthetase pairs (37) and the aldehyde chemical tag (38). To test the potential of PlpX for orthogonal labeling, the His6-PlpA3-Fx ketoamide was converted in vitro to the corresponding oxime in the presence of methoxyamine (fig. S55). In addition, the ketoamide was conjugated to fluorescein-5-thiosemicarbazide (12) under mild conditions to give the corresponding fluorescent product 13 (Fig. 4A and fig. S56).

Fig. 4 Applications for tyramine splicing.

(A) Conjugate formation to a fluorogenic probe (fluorescein thiosemicarbazide 12) to give thiosemicarbazone 13. WT, unmodified His6-PlpA3-Fx; Keto, same precursor modified by PlpXY. DMF, N,N′-dimethylformamide; TFA, trifluoroacetic acid. (B) Synthetic hepatitis C virus protease inhibitors. CVS 4453 14 was a lead compound used to generate boceprevir 15. (C) Introduction of a genetically encoded protease inhibitor sequence into His6-PlpA3-Fx. Tyr is located at the position of the keto group to be introduced. Coproduction of precursor 16 with PlpXY gave the corresponding ketoamide 17 (fig. S57).

Natural (nonribosomal; fig. S20) and synthetic products containing α-ketoamide moieties show diverse bioactivities (39). Of particular medical importance is the inhibition of cysteine and serine proteases by reversible binding to the keto group. Industrial screening of synthetic ketoamide peptide libraries identified CVS 4453 (14) as a hepatitis C virus protease inhibitor lead. Further optimization provided boceprevir 15 (Fig. 4B), an approved drug for the treatment of chronic hepatitis C infections (4042). We generated a core related to 14 by replacing the first five residues within His6-PlpA3-Fx to give engineered precursor 16. This precursor contains six non-native residues N-terminally flanking the YG motif. Coproduction with PlpXY provided the corresponding ketoamide 17 with a conversion comparable to that of His6-PlpA3-Fx (fig. S57). This result suggests the potential of using synthetic biology platforms to access and screen genetically encoded ketoamide pharmacophore libraries for drug discovery.

We demonstrate a naturally occurring splicing reaction that introduces diverse and multiple α-keto-β-amino acids into proteins. This work opens multiple avenues for applications in chemistry and biology and highlights how bioinformatic analyses can streamline experiments to discover pathways and transformations that change the way we view fundamental biology.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S57

Tables S1 to S6

References (4348)

References and Notes

Acknowledgments: J.P. thanks the Swiss National Science Foundation (31003A_146992/1) and the European Union (SYNPEPTIDE and European Research Council Advanced Grant “SynPlex”) for support. S.S. is grateful for financial support by the Helmut Horten Foundation. B.I.M. was the recipient of an Alexander von Humboldt Fellowship. A.L.V. was the recipient of an ETH Postdoctoral Fellowship. We thank D. Seebach and A. Eschenmoser for discussion and J. Keller, A. Geers, and C. Klaus for assistance with protein purification. M.G. thanks the P2M team of the Institut Pasteur for the novel cyanobacterial genomes and the Institut Pasteur for funding the Pasteur Culture Collection of Cyanobacteria. All data are contained in the main text and supplementary materials. New sequence data are archived in GenBank under accession numbers MG373765 to MG373783. B.I.M. and J.P. are inventors on patent application #EP17150498.8 submitted by the ETH Zurich that covers the use of radical S-adenosyl methionine enzymes for introducing α-keto-β-3-amino acids into (poly)peptides.

Stay Connected to Science

Navigate This Article