Small molecules from the human microbiota

See allHide authors and affiliations

Science  24 Jul 2015:
Vol. 349, Issue 6246, 1254766
DOI: 10.1126/science.1254766

Microbial bioactive molecules

Human cells are outnumbered by the microbial cells of our commensals by an order of magnitude. All of these organisms are metabolically active and secrete multiple bioactive molecules. Genomics has unveiled a remarkable array of biosynthetic gene clusters in the human microbiota, which encode diverse metabolites. Donia et al. review how molecules ranging from lantibiotics and microcins to indoxyl sulfate and immunemodulatory oligosaccharides and lipids could affect the health and physiology of the whole organism, depending on the composition of an individual's microbial community.

Science, this issue p. 10.1126/science.1254766

Structured Abstract


Two developments in distinct fields are converging to create interest in discovering small molecules from the human microbiome. First, the use of genomics to guide natural product discovery has led to the unexpected discovery of numerous biosynthetic gene clusters in genomes of the human microbiota. Second, the microbiome research community is moving from a focus on “who’s there?” to “what are they doing?” with an accompanying emphasis on understanding microbiota-host interactions at the level of molecular mechanism. This merger has sparked a concerted hunt for the mediators of microbe-host and microbe-microbe interactions, including microbiota-derived small molecules.


Numerous small molecules are known that are produced by the human microbiota. The microbiota-derived ribosomally synthesized, posttranslationally modified peptides (RiPPs) include widely distributed lantibiotics and microcins; these molecules have narrow-spectrum activity and are presumptive mediators of interactions among closely related species. Another notable RiPP is Escherichia coli heat-stable enterotoxin, a guanylate cyclase 2C agonist from which the recently approved gastrointestinal motility drug linaclotide was derived. Fewer amino acid metabolites are synthesized by the microbiota, but they are produced at very high levels that vary widely among individuals (e.g., indoxyl sulfate at 10 to 200 mg/day). Gut bacterial species convert common dietary amino acids into distinct end products, such as tryptophan to indoxyl sulfate, indole propionic acid, and tryptamine—indicating that humans with the same diet but different gut colonists can have widely varying gut metabolic profiles. Microbially produced oligosaccharides differ from other natural products because they are cell-associated (i.e., nondiffusible) and because many more biosynthetic loci exist for them than for other small molecule classes. Well-characterized examples, such as Bacteroides polysaccharide A, show that oligosaccharides may not simply play a structural role or mediate adhesion; rather, they can be involved in highly specific ligand-receptor interactions that result in immune modulation. Similarly, the (glyco)lipids α-galactosylceramide and mycolic acid can play roles in immune signaling. The most prominent microbiota-derived terpenoids are microbial conversion products of the cholic acid and chenodeoxycholic acid in host bile. These secondary bile acids can reach high concentration (mM) in the gut and vary widely in composition among individuals. Several canonical virulence factors from pathogens are derived from nonribosomal peptides (NRPs) and polyketides (PKs), but less is known about NRPs and PKs from the commensal microbiota. A recent computational effort has identified ~14,000 biosynthetic gene clusters in sequenced genomes from the human microbiota, 3118 of which were present in one or more of the 752 metagenomic sequence samples from the NIH Human Microbiome Project. Nearly all of the gene clusters that were present in >10% of the samples from the body site of origin are uncharacterized, highlighting the potential for identifying the molecules they encode and studying their biological activities.


There are two central challenges facing the field. The first is to distinguish, from among thousands of microbiota-derived molecules, which ones drive a key phenotype at physiologically relevant concentrations. Second, which experimental systems are appropriate for testing the activity of an individual molecule from a complex milieu? Meeting these challenges will require developing new computational and experimental technologies, including a capacity to identify biosynthetic genes and predict the structure and target of their biological activity, and systems in which germ-free mice are colonized by mock communities that differ only by the presence or absence of a biosynthetic gene cluster.

Small-molecule–mediated microbe-host and microbe-microbe interactions.

Commensal organisms of the human microbiota produce many diverse small molecules with an equally diverse array of targets that can exacerbate or modulate immune responses and other physiological functions in the host. Several act as antibacterials to remove competing organisms, but many other products have unknown targets and effects on commensals and the host.


Developments in the use of genomics to guide natural product discovery and a recent emphasis on understanding the molecular mechanisms of microbiota-host interactions have converged on the discovery of small molecules from the human microbiome. Here, we review what is known about small molecules produced by the human microbiota. Numerous molecules representing each of the major metabolite classes have been found that have a variety of biological activities, including immune modulation and antibiosis. We discuss technologies that will affect how microbiota-derived molecules are discovered in the future and consider the challenges inherent in finding specific molecules that are critical for driving microbe-host and microbe-microbe interactions and understanding their biological relevance.

Symbiotic relationships—including mutualism, commensalism, and parasitism—are ubiquitous in nature (1). Some of the best-known symbioses are between microorganisms and multicellular hosts; in these interkingdom relationships, the fitness of the microbe-host system (the holobiont) often relies on a diverse set of molecular interactions between the symbiotic partners (2, 3). Examples include food digestion, nitrogen and carbon fixation, oxidation and reduction of inorganic molecules, and the synthesis of essential amino acids and cofactors (2, 46). In light of the critical role of a molecular dialog in maintaining a productive mutualism, the community of researchers studying the symbiosis between humans and their microbiota has begun moving from a focus on “who’s there?” to “what are they doing?” The accompanying emphasis on molecular mechanism has sparked a concerted hunt for the mediators of microbe-host interactions, including microbiota-derived small molecules.

It is now possible to identify biosynthetic genes in bacterial genome sequences and, in some cases, predict the chemical structure of their small-molecule products. This genome mining has led to the discovery of a growing number of molecules, and recently developed algorithms (79) have not only automated biosynthetic gene cluster identification but also have led to the unexpected discovery of numerous biosynthetic gene clusters in genomes of the human microbiota (10). In addition, a wealth of natural products has been discovered from bacterial and fungal symbionts of insects, nematodes, sponges and ascidians, and plants (1115). The many known examples of microbe-host mutualisms in which the microbe synthesizes a metabolite important for the ecology of the pair raise an intriguing question: To what extent are mammals, including humans, a part of this paradigm?

We review what is known about small molecules from the human microbiota, examining in depth the diverse chemistries and biological functions of these molecules. Although our focus is predominantly on commensal bacterial species, we include a few notable examples of small molecules from bacterial pathogens. We also discuss recent insights into the metabolic potential of the human microbiota from computational analyses and conclude by considering approaches used to identify and discover the function of microbial molecules within a complex milieu. We have excluded some prominent microbiota-derived metabolite classes, including short-chain fatty acids (SCFAs) and trimethylamine-N-oxide, because their role in microbe-host interactions in the gut has been explored recently by Lee and Hase (16).

A wide range of small molecules has been isolated from human-associated bacteria (Fig. 1). These molecules cover the entire spectrum of chemical classes discovered so far from terrestrial and aquatic bacterial species and include well-characterized mediators of microbe-host and microbe-microbe interactions. Exploring their chemistry and function provides an entry point for understanding the effects of the microbiota on human health and disease.

Fig. 1 Structurally diverse small molecules from the human microbiota.

The diversity of chemical classes produced by the human microbiota rivals that of microorganisms from any ecological niche. Representative molecules are shown for each of the major molecular classes discussed: the RiPPs lactocillin and linaclotide; the amino acid metabolites indolepropionic acid and tryptamine; the oligosaccharide polysaccharide A; the lipids/glycolipids mycolic acid and α-galactosylceramide; the terpenoid deoxycholic acid, in which carbons 3, 7, and 12 of the bile acid scaffold are labeled; the NRPs corynebactin, tilivalline, and mutanobactin; and the PK mycolactone.

Ribosomally synthesized, posttranslationally modified peptides (RiPPs)

Most human-associated bacteria live in complex communities and compete with other species for resources. Several natural products are secreted by bacteria to mediate these competitive and social interactions, including ribosomally synthesized, posttranslationally modified peptides (RiPP). Diverse RiPPs are produced by many organisms in the microbiota (10); they are often toxic for a limited set of species closely related to the producer, and likely to determine niche colonization. RiPPs are divided into numerous subclasses (17), five of which include members that have been isolated from human-associated bacteria, including lantibiotics, bacteriocins, microcins, thiazole/oxazole-modified microcins (TOMMs), and thiopeptides.

Lantibiotics and bacteriocins

Lantibiotics and bacteriocins are the most commonly isolated RiPPs from the human microbiota, and dozens have been found to date. Lantibiotics are short peptides of <40 amino acids with chemical cross-links formed posttranslationally between the terminal thiol of a cysteine residue and a dehydrated serine or threonine. The resulting “lanthionine” contains a thioether bond, which is typically more redox-stable than a disulfide. In contrast, bacteriocins are longer peptides that are usually unmodified. Microbiota-derived lantibiotics are predominantly produced by the Firmicutes and are usually active against a narrow spectrum of Gram-positive bacteria that are closely related to the producing strain.

Some microbiota-derived lantibiotics are synthesized by commensals (18), including the salivaricins from oral resident Streptococcus salivarius (1922), a cocktail of five lantibiotics from the skin commensal Staphylococcus epidermidis (23-27), and ruminococcin A from the gut commensals Ruminococcus gnavus and Clostridium nexile (Table 1) (28, 29). Each of these molecules inhibits the growth of pathogens that are closely related to the producer. Lantibiotics have also been isolated from human pathogens: Staphylococcin Au-26 (also known as Bsa) from Staphylococcus aureus (30, 31), SA-FF22 from Streptoccous pyogenes (32, 33), and the two-component lantibiotic cytolysin from Enterococcus faecalis (34) exert antibacterial activity against a range of common human commensals. Hence, lantibiotics are used by commensals and pathogens to compete and establish resilient colonization.

Table 1 Selected small molecules from the human microbiota.

A representative set of compounds are shown that cover the chemical classes discussed in the review. Asterisks indicate bacterial pathogens that are not normally present in human-associated communities.

View this table:

Microcins and TOMMs

Microcins are prototypical narrow-spectrum antibacterials displaying a wide range of unusual posttranslational modifications, including conversion of cysteine and serine residues to thiazoles and oxazoles (microcin B17), addition of adenosine monophosphate (microcin C7) or a siderophore to the C terminus (microcin E492; Fig. 2), and internal amide cross-linking to form a lassolike topology (microcin J25) (3538). Because they derive exclusively from enterobacteria and have potent antibacterial activity against close relatives of the producer (35), the role of microcins in the Gram-negative microbiota is analogous to that of lantibiotics in the Gram-positive microbiota. Most microcins have been isolated from Escherichia coli strains and are widely distributed in both commensal and pathogenic enterobacteria (35, 3941).

Fig. 2 Small-molecule–mediated microbe-host and microbe-microbe interactions.

The microbiota produces a range of small molecules from various classes with distinct targets. Four examples are shown: the NRP tilivalline, whose host target is unknown; the ribosomally synthesized and posttranslationally modified peptide microcin E492 (MccE492), a narrow spectrum antibacterial; lipid A, the glycolipid core of LPS, which targets TLR4 in host immune cells; and indole propionic acid, a reductive metabolite of tryptophan that enters host circulation but whose biological activity is poorly understood. These metabolites are each produced by different species of the microbiota but are shown here in a single cell for schematic purposes. The following are abbreviations for domains in the NRPS that produces tilivalline: A, adenylation domain; T, thiolation domain; C, condensation domain; R, terminal reductase domain. ACP, acyl carrier protein; UDP-GlcNAc, uridine 5′-diphosphate N-acetylglucosamine.

TOMMs are similar to microcin B17 in their biosynthesis and posttranslational modifications but encompass a larger family of natural products generated by both Gram-positive and Gram-negative bacteria (17, 42). The best-studied example is streptolysin S from the human pathogen S. pyogenes (43). Despite intensive efforts for almost a century, the precise chemical structure and mechanism of action of streptolysin S have not been fully determined (43). However, streptolysin S contains multiple oxazole and thiazole residues that are required for its hemolytic activity (44, 45). Related biosynthetic gene clusters from other human pathogens and commensals have been characterized (43), including listeriolysin S from Listeria monocytogenes (46) and clostridiolysin S from Clostridium botulinum and Clostridium sporogenes (47).

Heat-stable enterotoxin

Although most RiPPs from the human microbiota are thought to mediate microbe-microbe interactions, heat-stable enterotoxin is a RiPP produced by strains of E. coli associated with diarrheal disease and has a well-characterized host target. It is a 14–amino acid peptide stabilized by three internal disulfide bonds (48) and mimics the effect of the host peptide hormones guanylin and uroguanylin by agonizing guanylate cyclase 2C, a transmembrane protein expressed in intestinal epithelial cells with an extracellular ligand binding domain and a cytoplasmic catalytic domain (49). Guanylate cyclase 2C generates cyclic guanosine monophosphate to stimulate electrolyte secretion into the gut lumen. A single–amino acid variant of heat-stable enterotoxin, linaclotide, was approved by the Food and Drug Administration in 2012 for the treatment of constipation-associated with irritable bowel syndrome (Fig. 1) (50). The enzyme that introduces disulfide cross-links posttranslationally into heat-stable enterotoxin is not encoded in the toxin biosynthetic gene cluster, raising the question of whether the endogenous disulfide bond formation system is in fact operating in this system and whether other small peptides from enterobacteria undergo similar posttranslational processing. Importantly, a modified heat-stable enterotoxin peptide can survive the proteolytic milieu of the gut lumen and target a host receptor expressed in intestinal epithelial cells, delivering a potent biological activity without absorption into host circulation.

Products of amino acid metabolism

Gut bacteria living in an anaerobic environment require either an electron acceptor to drive fermentation or an anaerobic electron transport chain (51). Commonly among bacteria amino acids are used as electron acceptors, resulting in the production by the gut microbiota of high levels—sometimes exceeding 100 mg/day—of reductive amino acid metabolites, such as phenylpropionic acid and phenylacetic acid—molecules that are not found in most other habitats. Importantly, the amounts of these metabolites produced can vary widely among individuals, and, unlike RiPPs, they are generally permeable and accumulate systemically in the host (52). For example, humans with comparable levels of dietary tryptophan but distinct gut bacterial communities can have markedly different profiles of gut metabolites. One prominent tryptophan metabolite, indole, is derived from tryptophan by as-yet-unidentified enzyme(s) that are presumably homologs of tryptophanases seen in other bacterial species. In its unmodified form, indole serves as a signaling agent in bacterial communities (53). In addition, following absorption through the intestinal epithelium, indole is 3-hydroxylated and O-sulfated in the liver to become indoxyl sulfate, a well-known uremic toxin that is known from germ-free rodent studies to be derived entirely from the gut microbiota (54). Indoxyl sulfate occurs at a wide range of concentrations in human urine (10 to 200 mg/day), likely reflecting differences among individuals in diet and in the level of indole-producing bacterial species in the gut community (55). A second reductive tryptophan metabolite, indolepropionic acid (Fig. 2), is found in mouse serum if C. sporogenes is present in the gut (52). Although the function of this molecule is unknown, several producers have been identified including Clostridium sporogenes. A third tryptophan metabolite, the decarboxylation product tryptamine, which can act as a biogenic amine neurotransmitter, is synthesized by a variety of gut bacteria (56) and has been linked to signaling in the enteric nervous system (57), one of several findings that has revealed a role for the microbiota in the gut-brain axis (5861). Thus, tryptophan can be diverted to end products with distinct biological activities depending on the composition of the gut community.

The metabolic products of aliphatic amino acids are equally prominent but less well characterized. Notable examples include δ-aminovaleric acid, which derives from arginine, proline, and ornithine, and acts as an electron source for secondary fermenters (62), and α-aminobutyric acid, which derives from threonine or methionine. Notably, the neurotransmitter γ-aminobutyric acid (GABA), the decarboxylation product of glutamate, is both produced and consumed by various species of gut bacteria (62), although its potential role in microbe-host signaling remains unexplored. Many of the less-common SCFAs, including isobutyric, valeric, 2- and 3-methylbutyric, caproic, and isocapropic acids, are also the products of reductive amino acid metabolism, but it is not known whether their signaling properties differ from those of the better known SCFAs (62).


Oligosaccharides provide some of the best-characterized examples of how small molecules from the human microbiota can mediate microbe-host interactions. Diffusible oligosaccharides are well known in the natural products community (63, 64), but the best-studied oligosaccharides from the human microbiota are cell-associated. Capsular polysaccharides from Bacteroides and Streptococcus are not simply structural or nonspecifically adhesive; they can have highly specific ligand-receptor interactions that result in immune modulation, similarly to glycolipids (see below).

Species of Bacteroides, the most abundant bacterial genus in the human gut, produce an array of capsular polysaccharides, the best characterized of which is polysaccharide A from Bacteroides fragilis. Polysaccharide A is an oligomer in which the tetrasaccharide repeating unit consists of four derivatives of galactose: galactofuranose, N-acetylgalactosamine (GalNAc), 4,6-pyruvoylgalactose, and 4-amino-6-deoxy-GalNAc. Although the biosynthesis of polysaccharide A has just begun to be explored (65, 66), its biological activity has been investigated in detail. Polysaccharide A signals to the host’s innate immune system through the Toll-like receptor 2 (TLR2), which then leads to the induction of regulatory T cells to produce the tolerogenic cytokine interleukin-10 (IL-10). This signaling event restricts the activity of T helper 17 (TH17) cells and not only promotes B. fragilis colonization but also suppresses Helicobacter hepaticus–induced colitis (67, 68). Remarkably little is known about the structures and biological activities of the tens to hundreds of other Bacteroides capsular polysaccharides, although they are likely to be the most abundant small molecules in the human gut (10).

Cell-associated oligosaccharides can also play a defensive role, as is the case with the richly diverse capsular polysaccharides elaborated by species of Streptococcus, including a family of capsular polysaccharides from group B Streptococcus, a pathogen (69). Although the repeating unit varies in size, composition, and connectivity among group B Streptococcus serotypes, a common chemical feature is a terminal sialic acid, which blocks phagocytosis by inhibiting the deposition of the complement component C3b (69).

An area of great therapeutic promise is the identification of microbially derived ligands in the microbiota for host receptors. In addition to recent studies on the role of SCFAs and G protein–coupled receptor 43 (GPR43) in regulatory T cell function (70), glycolipids and saccharides are significant candidate ligands. One example is muramyl dipeptide, a glycopeptide fragment of the repeating unit of peptidoglycan, that is a ligand for nucleotide-binding oligomerization domain–containing protein 2 (NOD2) and forms the scaffold for the immune-stimulatory osteosarcoma drug mifamurtide (71, 72). Additionally, the large numbers of uncharacterized oligosaccharide biosynthetic loci in the human microbiome are particularly interesting in light of the C-type lectin receptors in the dectin/langerin/DC-SIGN (dendritic cell–specific C-type lectin) family, which are known to bind oligosaccharides and modulate immune-cell function, but for which few convincing ligands have been discovered (73).

Glycolipids and terpenoids

Perhaps the best-known microbiota-derived molecule is lipopolysaccharide (LPS), a glycolipid that is a major component of the outer membrane of Gram-negative bacteria (Fig. 2). LPS is the ligand for the innate immune receptor TLR4 and has been reviewed extensively elsewhere (74); here, we focus on two other families of bacterial glycolipids with similar immunomodulatory activities: the Bacteroides glycosphingolipid α-galactosylceramide and the mycolic acids of Mycobacterium and Corynebacterium.


α-Galactosylceramide is a glycosphingolipid that was originally discovered nearly two decades ago as a natural product from a sponge and was later found to be a potent ligand for CD1d-restricted natural killer T (NKT) cells (75). Even though >1000 papers have been published on synthetic derivatives of this sphingolipid and the identification and stimulation of NKT cells, the source of the “native” ligand for CD1d has remained a mystery (76). Recently, B. fragilis, a common gut commensal, was discovered to produce α-galactosylceramide (77). Colonization of germ-free mice as neonates by wild-type B. fragilis suppresses NKT cells in the gut and blocks oxazolone-induced colitis (78). These findings suggest that the “native” ligand for the highly conserved mammalian receptor CD1d might originate in the microbiota rather than the host. This may be true for other “orphan” receptors expressed by immune and epithelial cells.

Mycolic acids are distinctive components of the cell wall of Mycobacterium (mostly pathogens) and Corynebacterium (both pathogens and skin and oral commensals). They consist of an all-carbon backbone with two lipid tails, one of which can be 40 to 60 carbons long in Mycobacterium, and occur both as free carboxylic acids and as esters of cell-wall polysaccharides. In addition to being important structural components of the capsule (79), glucose monomycolate serves as a ligand for CD1b-restricted T cells, eliciting a specific immune response against infection (8083). Another glycolipid derivative of mycolic acid, trehalose-6,6-dimycolate, is a potent immune elicitor that binds to the C-type lectin Mincle to induce macrophage activation and a T cell response characteristic of vaccination (84). Mycolic acid has many chemical modifications, including methylation and cyclopropanation, both of which appear to shield it from immune detection: The MmaA4-catalyzed methylation of mycolic acid in Mycobacterium tuberculosis blocks IL-12 production and prevents detection and elimination by macrophages (85), and an M. tuberculosis mutant that is deficient in mycolic acid cyclopropanation is attenuated and hyperinflammatory in a mouse model of infection (86).


The major microbiota-derived terpenoids are not synthesized de novo by the microbiota; they are secondary bile acids derived from the host’s primary bile acids cholic acid (CA) and chenodeoxycholic acid (CDCA) (Fig. 1). CA and CDCA are biosynthesized in the human liver, conjugated to taurine or glycine, and then excreted in bile; although 90% of the bile acid pool is absorbed in the terminal ileum, the remaining 10% enters the large intestine (87). Here, the bile acid pool reaches concentrations of ~1 mM and varies widely in composition among healthy humans. Numerous biochemical transformations of CA and CDCA are performed by gut bacteria, including deconjugation from taurine and glycine; oxidation and subsequent epimerization of the hydroxyl groups at C3, C7, and C12; dehydration and reduction of the hydroxyl group at C7; and esterification with ethanol at the C24 carboxylate. Among these, dehydroxylation at C7 has been characterized most extensively (8789).

Several species of Firmicutes (e.g., Clostridium scindens and Clostridium hylemonae) dehydroxylate CA and CDCA at the C7 position to form deoxycholic acid (DCA) and lithocholic acid (LCA), respectively. The high flux of this biochemical transformation results in DCA and LCA making up nearly two-thirds of the fecal bile acid pool (87). Both DCA and LCA are toxic to human cells and have been implicated in hepatoxicity and colon cancer (90). 7-Dehydroxylation is carried out in part by the eight-gene bai operon. This gene cluster is thought to perform eight successive chemical transformations in a pathway for which the early oxidative steps have been characterized biochemically, but the later, reductive steps remain speculative (89). Little is known about the biosynthetic genes for other secondary bile acids, although there is preliminary evidence that some bile acid pathways might involve transformations by more than one gut bacterial species.

The carotenoids are terpenoids, exemplified by staphyloxanthin, the golden-colored pigment for which S. aureus is named. Staphyloxanthin is composed of a glucose residue that is esterified with both a fatty acid (12-methyltetradecanoate) at the C6″ position and a carotenoid (4,4′-diaponeurosporen-4-oate) at the C1″’ position (91). The core structure of staphyloxanthin is assembled by two biosynthetic enzymes: a glycosyltransferase, which esterifies the C1″ position of glucose, and an acyltransferase, which esterifies its C6″ position. The unusual 4,4′-diaponeurosporen-4-oate originates from dehydrosqualene by further dehydrogenation and oxidation steps (92, 93). The conjugated double bonds of staphyloxanthin’s carotenoid tail serve as a “sponge” for oxygen radicals, protecting S. aureus against killing by hydrogen peroxide, superoxide, and hydroxyl radical, which are produced by host neutrophils and macrophages (94, 95).

Polyketides and nonribosomal peptides

Although polyketides (PKs) and nonribosomal peptides (NRPs) are among the largest classes of natural products in soil and aquatic bacteria, relatively few are known from human-associated bacteria. Two recently discovered examples come from the common pathobionts S. aureus and Streptococcus mutans (Table 1 and Fig. 1). A conserved NRP gene cluster in S. aureus encodes a family of pyrazinones, which are derivatives of the ubiquitous diketopiperazines (96, 97). Although the role of the pyrazinones in regulating the expression of S. aureus virulence factors remains unclear (96, 98), they are unlikely to function exclusively in the context of pathogenesis because they are also produced by the skin commensal S. epidermidis (97). Isolates of S. mutans, the leading cause of dental caries, harbor a genomic island encoding hybrid PK synthase (PKS)/NRP synthetase (NRPS) pathways (99). The product of one of these pathways, mutanobactin, contains an unusual 1,4-thiazepan-5-one ring, which originates from the cyclization of a cysteine and a glycine residue. The biological activity of the mutanobactins has not been fully determined but may involve modulating growth and biofilm formation by the fungal pathogen Candida albicans (100102).

Four pathogen-derived NRPs and PKs cause disease: cereulide, mycolactone, colibactin, and tilivalline. Cereulide, a dodecadepsipeptide toxin, is responsible for the emetic effects of the food-poisoning pathogen Bacillus cereus (103, 104). The ester bonds in cereulide’s alternating ester/amide backbone enable the molecule to have high affinity for potassium ions, which results in uncoupling of oxidative phosphorylation and causes mitochondrial toxicity (105, 106).

Mycolactones are PK toxins produced by the causative agent of Buruli ulcer, Mycobacterium ulcerans (107, 108). These molecules, which cause the necrosis, ulceration, and immune suppression associated with this disease, are encoded by a >100-kb type I PKS biosynthetic gene cluster that includes two genes >40 kb. Interestingly, a heterogeneous suite of mycolactone derivatives is produced by different strains of M. ulcerans, which may explain the variation observed in the virulence of the strains and their biogeography (107, 109112).

Colibactin is produced by a subset of enterobacteria, including strains of E. coli B2, Enterobacter aerogenes, Klebsiella pneumoniae, and Citrobacter koseri (113, 114). Exposure of mammalian cells to colibactin-producing E. coli and K. pneumoniae induces DNA damage in vitro and in vivo. Surprisingly, the colibactin gene cluster occurs in one of the most commonly used probiotic E. coli strains (E. coli Nissle 1917 or EcN) (114118). Considerable efforts have been made to study the biosynthesis of colibactin (119121), and its chemical structure has recently been characterized, revealing a unique spirocyclopropane “warhead” that cross-links DNA (122). It is not yet clear what role colibactin plays in the ecology of the interaction between E. coli and the host and how the genotoxic activity of colibactin benefits its producer.

Tilivalline is an NRP toxin produced by colitogenic strains of the pathobiont Klebsiella oxytoca (Fig. 2) (123). Importantly, tilivalline is essential for the inflammatory pathology characteristic of antibiotic-associated hemorrhagic colitis and induces apoptosis in cultured human epithelial cells. Although the discovery of tilivalline sheds light on one mechanism of antibiotic-induced colitis, there are likely to be alternative mechanisms, because K. oxytoca is present in, at most, 10% of the healthy human population (124).

Much is known about the biosynthesis of NRP-derived and NRP-independent siderophores in a broad range of human pathogens and their role in iron acquisition as an essential component of bacterial pathogenesis, both of which have been reviewed extensively elsewhere (125127). In contrast, very little is known about the mechanisms by which commensals acquire iron, and, to our knowledge, no iron acquisition system has ever been shown to be required for colonization by a commensal.


We have used ClusterFinder (7) to identify biosynthetic gene clusters in the human microbiome as a way to assess the metabolic potential of the human microbiota (10). Of >14,000 putative small-molecule biosynthetic gene clusters identified in human-associated bacterial genomes, 3118 were present in one or more of the 752 whole-genome shotgun metagenomic sequence samples from the NIH Human Microbiome Project (HMP). Although each of the major natural product classes is produced by the human microbiota, oligosaccharide and RiPP gene clusters predominate, underscoring the need to improve analytical chemical techniques to purify and assay these molecules. Nearly all of the gene clusters that were present in over 10% of the subjects in the study are uncharacterized.

There are two central challenges facing the field: First, from the wealth of microbiota-derived molecules, which ones are the functionally “important” ones? Second, what experimental systems are appropriate for testing the activity of an individual molecule from a complex milieu?

Identifying significant microbiota-derived molecules

Human-associated microbial communities can consist of hundreds of abundant bacterial species and perhaps thousands of molecules at physiologically relevant concentrations. Figuring out which of these molecules drive a phenotype and how they act requires new computational and experimental approaches.

Initial mapping of metagenomic sequence data onto KEGG (Kyoto Encyclopedia of Genes and Genomes) or COG (Clusters of Orthologous Groups database) gene categories provides a way of seeing coarse changes, for example, a shift from oligosaccharide toward amino acid catabolism in a community. Resolution is not yet high enough to make reliable predictions about specific biosynthetic pathways or products. Methods that predict the gene content of a sample from 16S data can predict pathways that are present or absent in every member of an operational taxonomic unit (128) but have limited utility for biosynthetic pathways, which are highly variable even among closely related strains of a bacterial species (129, 130).

Methods are needed that take genomic and metagenomic sequence data as an input and use it to predict, at high resolution, pathways for specific molecules. Multiple algorithms will likely be needed: some for identifying clustered biosynthetic pathways, characteristic of a conventional secondary metabolite, and others for predicting unclustered pathways more commonly found in primary metabolism. The dearth of knowledge about primary metabolic pathways in anaerobes from the gut community is a critical gap in current knowledge, and addressing this problem will be a major achievement.

Nicholson, Holmes, and colleagues have pioneered the use of metabolomics to profile microbiota-derived metabolites in a variety of sample types and disease models, developing powerful and widely applicable analytical pipelines (131, 132) (Fig. 3A). By using similar approaches, a range of microbiota-derived molecules have been connected to specific bacterial species through the metabolomic profiling of various artificial and disease-associated communities from mice (52, 59, 133, 134). Although metabolomic profiling is capable of measuring hundreds to thousands of known metabolites in a single run, applying untargeted metabolomics to discovering molecules of interest is laborious and generally requires purification and structural characterization of milligram quantities of compound.

Fig. 3 Approaches to discovering small molecules from the microbiota.

(A) Samples from germ-free and colonized mice can be analyzed by untargeted metabolomics to identify molecules that are present in a microbiota-dependent fashion. (B) A mouse harboring a reference gut community can be subjected to antibiotic treatment, a dietary shift, or another perturbation. Comparative metabolomics can be used to identify microbiota-derived molecules whose abundance changes as a consequence of the perturbation. (C) Candidate biosynthetic gene clusters (BGCs) or bacterial species can be selected by metagenomic profiling, for example, for gene clusters or species that are widely distributed or differ in abundance between cases and controls. Comparative metabolomics can then be used to identify molecules produced by a gene cluster or bacterial species of interest. (D) Subsets of bacteria from a fractionated complex community or designed synthetic communities can be used to colonize mice in order to identify specific bacterial species whose presence correlates with the production of a molecule of interest. m/z, mass-charge ratio.

Although bioassay-guided fractionation is immensely powerful, it is painstaking and difficult to scale, so it is better suited for unusually important phenotypes of interest, including the search for ligands for orphan, sensing molecules GPCRs expressed in the gut (Fig. 3B). A derivative of this approach in which microbes, not molecules, are “fractionated” was recently used to identify a cocktail of 17 gut bacterial strains that induce regulatory T cells and attenuate colitis. This activity was further correlated with SCFAs produced by this anti-inflammatory cocktail of bacteria (135, 136).

An alternative to bioassay-guided fractionation is the candidate molecule approach (Fig. 3C). We used this method for the discovery of the potent thiopeptide antibiotic, lactocillin, from Lactobacillus gasseri, a prominent member of the vaginal community (10, 137). We performed a systematic analysis of all biosynthetic gene clusters for small molecules in genomes of human-associated bacteria and identified 13 previously unknown thiopeptide gene clusters, four of which are present in >20% of HMP samples. Thiopeptides are a class of antibiotics with potent activity against Gram-positive bacteria that bind to a site on the 50S subunit of the bacterial ribosome. One member of this class, LFF571, is currently in a phase II clinical trial (138). Lactocillin has been purified, structurally characterized, and shown to have low-to-mid nanomolar antibiotic activity against vaginal pathogens but not against vaginal commensals.

Another class of molecules ideally suited to the candidate molecule approach is the secondary bile acids. Bile acids, and the sterol scaffold more generally, are rich in biological activity, and there are numerous host receptors for these molecules (87). The levels of secondary bile acids vary widely among individuals, although the bile acid pool in the gut lumen is held at high micromolar to low millimolar concentrations. Although dozens of secondary bile acids are known, very few have been assigned a biological activity or have known biosynthetic genes, making this a promising area for detailed experimental investigation.

Studying individual molecules from a pool

The discoveries of highest impact will come not from simply cataloging new microbiota-derived molecules, but from studying the biological activities of individual molecules in a complex molecular milieu. With the exception of fecal metabolite profiling, few microbiota-derived molecules have been detected in host-derived samples. More sensitive analytical techniques, such as nanospray desorption electrospray ionization mass spectrometry, are needed to verify the production of a molecule in the skin, oral, and vaginal communities. An alternative approach—the detection of RNA transcripts for a particular bacterial gene cluster in metatranscriptomic data—has been used to profile the expression of the cluster in native samples. This approach is consistent with, but not proof of, small molecule production in the “natural” habitat of the microbiome (10).

Although preliminary studies on the biological effects of target molecules have been carried out for three microbiota-derived RiPPs (139143), their biological relevance has not yet been tested experimentally. Colonization studies in germ-free mice have the advantage of simplicity, but there are drawbacks. For example, molecules derived from a monocolonist can be produced at super-physiological levels, leading to a false signal. In addition, the activity of some molecules, such as immune modulators, may require the activity of co-stimulatory signals from other bacterial species. Last, the biological relevance of certain molecules, like antibacterials, may only be observed when other members of the microbiota are present. Faith et al. have overcome some of these challenges in studies on regulatory T cells in the gut, by using small subsets of gut bacterial strains to colonize mice (Fig. 3D) (144).

Similar experimental systems have recently been developed for the skin. Conventional mice can be colonized by individual skin commensals, and the T cell response can be tracked over long periods (145). By contrast, few experimental systems are available for interrogating the role of small-molecule–mediated interactions in the community structure (146) and dynamics (137) of oral and vaginal communities.

Many molecules of interest are produced by bacterial species that are difficult to manipulate genetically, such as the anaerobic Firmicutes (Clostridium and its relatives). Two technologies would be transformative for their study: gene knock-out in the native host (147) and synthetic-biology-based approaches to express biosynthetic gene clusters in a more genetically tractable host (148).


Much is known about which bacterial species are most abundant in human-associated communities and how they vary among individuals. Yet comparatively little is known about the most abundant bacterially derived or modified small molecules in the gut, despite the fact that these molecules are present at high micromolar concentration, their levels can vary greatly among individuals, and the human host is chronically exposed to them for decades with unknown consequences. Certain low-abundance molecules with potent biological activities may also be significant to host physiology. Against this backdrop, it seems likely that in the near future the suite of microbiota-derived molecules in an individual’s gut community will not be left to chance. Pharmaceutical companies go to great lengths to get a single molecule into the human gut at comparable concentrations. Discovering the most abundant, widely (or variably) distributed, and biologically active molecules produced by the microbiota—and connecting them to the genes that encode them—are critical first steps in understanding which molecules have desired effects and which are deleterious, what receptors they target, and how therapeutic communities of microorganisms can be designed in which the production and nonproduction of molecules can be genetically specified.

References and Notes

  1. Acknowledgments: We thank members of the Fischbach and Donia groups for helpful discussions. Work in the authors’ laboratories is supported by Princeton University (M.S.D.); a Medical Research Program Grant from the W.M. Keck Foundation (M.A.F.); a Fellowship for Science and Engineering from the David and Lucile Packard Foundation (M.A.F.); an Investigators in the Pathogenesis of Infectious Disease award from the Burroughs Wellcome Foundation (M.A.F.); Defense Advanced Research Projects Agency award HR0011-12-C-0067 (M.A.F.); the Program for Breakthrough Biomedical Research (M.A.F.); and NIH grants OD007290, AI101018, GM081879, and DK101674 (M.A.F.). M.A.F. is on the scientific advisory boards of NGM Biopharmaceuticals and Warp Drive Bio.
View Abstract

Stay Connected to Science


Navigate This Article