Research Article

Genetic determinants of in vivo fitness and diet responsiveness in multiple human gut Bacteroides

See allHide authors and affiliations

Science  02 Oct 2015:
Vol. 350, Issue 6256, aac5992
DOI: 10.1126/science.aac5992

Diet shapes host and gut microbe fitness

The human gut microbiota is hugely diverse, with many strain variants having a multiplicity of effects on host metabolism and immunity. To define some of these functions, Wu et al. made libraries of mutants of Bacteroides species known for their capacity to process otherwise intractable dietary fiber. Germ-free mice colonized with defined gut microbiota communities containing the mutants were fed specific diets containing different ratios of fat and fiber. Genes, strains, and species were identified that were associated with specific metabolic pathways. The community responses to dietary shifts were manipulated in an attempt to characterize species for their probiotic or therapeutic potential.

Science, this issue 10.1126/science.aac5992>

Structured Abstract


Relatively little is known about the genetic factors that allow members of the human gut microbiota to occupy their niches. Identification of these factors is important for understanding mechanisms that determine microbiota assembly and perturbation through diet, disease, and clinical treatments. Discovery of these factors should enable new approaches for intervening therapeutically in the functional properties of the human gut microbiota. We present a generalizable approach by which to identify fitness determinants for multiple bacterial strains simultaneously in a model human gut microbiota, obtain gene-level characterization of responses to diet change, and design prebiotics for precision microbiota manipulation.


We developed a method—multi-taxon INsertion Sequencing (INSeq)—for monitoring the behavior of tens of thousands of transposon (Tn) mutants of multiple bacterial species and strains simultaneously in the guts of gnotobiotic mice. We focused on four prominent human gut Bacteroides: one strain of B. cellulosilyticus, one strain of B. ovatus, and two strains of B. thetaiotaomicron. INSeq libraries, each composed of 87,000 to 167,000 isogenic Tn mutant strains, were produced (single site of Tn insertion per mutant strain; a total of 11 to 26 Tn insertions represented in the library per gene; and 82 to 92% genes covered per genome). The four mutant libraries were introduced into germ-free mice together with 11 wild-type species commonly present in the human gut microbiota. Animals were given a diet rich in fat and simple sugars but devoid of complex polysaccharides [diet 1 (D1)] or one rich in plant polysaccharides and low in fat (D2), either monotonously or in the sequence D1-D2-D1 or D2-D1-D2. Wecalculated a “fitness index” for each gene on the basis of the relative abundance of its INSeq reads in the fecal or cecal microbiota compared with the input library. In vivo INSeq data were correlated with INSeq data generated from organisms cultured under defined in vitro conditions; microbial RNA-seq profiling of the community’s metatranscriptome; and reconstructions of metabolic pathways, regulons, and polysaccharide utilization loci. On the basis of the results, we designed a prebiotic intervention.


Multi-taxon INSeq (i) provided a digital readout of the remarkably consistent pattern of community assembly; (ii) identified shared as well as species-, strain-, and diet-specific fitness determinants associated with a variety of metabolic or nutrient processing pathways, including those involving amino acids, carbohydrates, and vitamins/cofactors; (iii) enabled quantitative gene-level measurement of the resilience of the responses to diet perturbations; (iv) revealed that arabinoxylan, the most common hemicellulose in cereals, could be used to deliberately manipulate the representation of Bacteroides cellulosilyticus; and (v) defined the niche adjustments of this and the other Bacteroides to arabinoxylan supplementation of the high-fat diet.


In principle, the approach described can be used to obtain a more comprehensive understanding of how host genotype, diet, physiologic, metabolic, and immune factors, as well as pathologic states, affect niches in gut and nongut habitats, as well as to facilitate development of therapeutic interventions for modifying community structure/function.

Identification of a prebiotic that increases the abundance of B. cellulosilyticus.

(Left) The four mutant libraries were pooled together with 11 other phylogenetically diverse wild-type strains, and this consortium, representing an artificial human gut microbiota, was introduced into germ-free mice. Community assembly, the effects of diet, and recovery from diet oscillations were characterized at a community, strain, and gene level in these gnotobiotic animals by use of multi-taxon INSeq. (Middle) Multi-taxon INSeq revealed an arabinoxylan utilization locus in B. cellulosilyticus that is critical for the organism’s fitness in the high-fat/simple-sugar diet (D1) context but not in the D2 context. A homologous arabinoxylan utilization locus in B. ovatus was not a fitness determinant with either diet. (Right) Consistent with this finding, supplementation of drinking water with arabinoxylan in mice consuming D1 selectively increased the abundance of B. cellulosilyticus but not B. ovatus.


Libraries of tens of thousands of transposon mutants generated from each of four human gut Bacteroides strains, two representing the same species, were introduced simultaneously into gnotobiotic mice together with 11 other wild-type strains to generate a 15-member artificial human gut microbiota. Mice received one of two distinct diets monotonously, or both in different ordered sequences. Quantifying the abundance of mutants in different diet contexts allowed gene-level characterization of fitness determinants, niche, stability, and resilience and yielded a prebiotic (arabinoxylan) that allowed targeted manipulation of the community. The approach described is generalizable and should be useful for defining mechanisms critical for sustaining and/or approaches for deliberately reconfiguring the highly adaptive and durable relationship between the human gut microbiota and host in ways that promote wellness.

The human gut microbiota is highly diverse (13). A current view is that strains acquired by an individual early in life persist for decades and that strains are shared among family members. The microbiota can rapidly adapt to changing conditions, but the degree to which given sets of strains share or compete for niche space in the gut ecosystem is poorly understood. Identification of the genetic factors that define an organism’s niche is important for understanding the mechanisms that determine community assembly, community responses to and recovery after various perturbations, and the food webs that link microbes to one another and to their host. Discovery of these factors should spawn new approaches for intentional manipulation of the functional properties of the microbiota.

In this Research Article, we describe an approach for simultaneously identifying genetic determinants of fitness for multiple members of a defined artificial human gut microbiota installed in gnotobiotic mice fed distinct diets monotonously or in an ordered sequence. We focus on four human gut Bacteroides strains: B. cellulosilyticus WH2, B. ovatus ATCC 8483, B. thetaiotaomicron 7330, and B. thetaiotaomicron VPI-5482 [finished genomes for the first three strains were obtained by combining PacBio and Illumina sequencing (supplementary materials, materials and methods) (4)]. Culture-independent surveys of a number of human populations indicate that all three of these Bacteroides species are prominently represented in the guts of healthy individuals (1, 2). All four strains contain numerous genes involved in the recognition and processing of otherwise indigestible dietary polysaccharides (table S1) (47). We applied genome-wide transposon mutagenesis to these three prominent human gut–derived Bacteroides species and the two strain representatives of one of them, and we colonized singly housed, germ-free mice with all four mutant libraries together with a defined consortium of 11 other wild-type human gut bacterial species representing major phylogenetic lineages present in the microbiota. The microbiome of this 15-member community encodes key metabolic functions identified in anaerobic food webs, including the ability to process polysaccharides to oligosaccharides and simple sugars and to ferment amino acids (7, 8).

This approach not only allowed us to examine assembly of the 15-member artificial community but to characterize responses to dietary perturbations and recovery after these perturbations (stability and resilience) at the community level, the individual species/strain level, and the gene level. We identified shared as well as species- and strain-specific genetic and metabolic features that affect fitness of Bacteroides in the gut environment.

Characterizing multiple transposon mutant libraries simultaneously in vivo (multi-taxon INSeq)

Each library we generated from the four bacterial strains was composed of 87,000 to 167,000 isogenic transposon (Tn) mutants. Each mutant contained a single site of insertion of the Tn element. Of the open reading frames (ORFs) in these four Bacteroides, 81.5 to 91.8% contained a Tn insertion, allowing us to conclude that disruption of these genes did not preclude growth in the rich medium used to construct the libraries (an analysis of “essential” genes not represented in the various mutant libraries is provided in the supplementary materials, with a focus on those involved in carbohydrate, amino acid, and vitamin/cofactor biosynthesis/catabolism).

Each ORF was covered by an average of 11 to 26 Tn insertions (table S2 and fig. S1, A to D). These Tn mutant libraries were introduced into germ-free mice together with 11 wild-type species that are common constituents of the adult gut microbiota (fig. S2A). To permit simultaneous analysis of multiple mutant libraries in the same recipient gnotobiotic mouse, a mariner Tn delivery vector with MmeI sites positioned at each end of the Tn (9, 10) was modified so that it contained two taxon-specific barcodes (fig. S2B). MmeI digestion of microbial DNA prepared from the gut contents or feces of recipient gnotobiotic mice cleaves genomic DNA at a site 20 or 21 base pairs (bp) distal to the restriction enzyme’s recognition site. The site of Tn insertion and the relative abundance of each Tn mutant in the input libraries introduced into mice and in the “output” communities recovered from these animals were defined by using a multi-taxon insertion sequencing (INSeq) protocol. This protocol enables sequencing of the flanking genomic sequence in addition to the taxon-specific barcode positioned within the Tn (fig. S2B). A software package (supplementary materials, materials and methods) was used to assign sequencing reads based on the sample-specific and taxon-specific barcodes. Control experiments, using “mock communities” composed of the 15-member collection of cultured human gut bacteria with different proportions of the mutant libraries (ranging from 0.5 to 40% of the community) revealed that this protocol had high specificity (91.6 ± 0.84% of reads mapped to sites of Tn insertion), high sensitivity (a mutant library representing as little as 0.5% of the entire community could be characterized), and good reproducibility [coefficient of determination (R2) > 0.9; n = 5 technical replicates/mock community] (supplementary materials, materials and methods, results, and table S2).

To identify mutations that were deleterious in the gut, we calculated a “fitness index” for each gene based on the relative abundance of its INSeq reads in the output community (for example, fecal microbiota) compared with the input library. We observed that the output-to-input log-ratios for the entire mutant population were bimodally distributed, with the vast majority of mutants (genes) falling in a normal distribution to the right of a much smaller distribution with lower fitness. To statistically distinguish these two populations, we implemented the expectation-maximization (EM) algorithm to iteratively estimate the unknown parameters (mean and variance) for each population using a likelihood function based on observed data (the log-ratios for each individual gene) (supplementary materials, materials and methods). We calculated a z score (fitness index) for the difference between each gene’s log-ratio and the mean of the normal distribution of the population represented by the vast majority of the mutants (fig. S2, C to E). A P value was determined based on a normal distribution by using a Z test, and a q value was assigned after applying a multiple hypothesis testing correction (FDR). To avoid misidentifying genes as significant fitness determinants, we only considered those that, when mutated, were significantly depleted in the output library across biological replicates. In silico simulations revealed that this EM algorithm–based procedure yielded a false positive rate of less than 5% for mutants in genes whose relative abundance was >0.002% (fig. S3B). These simulations were verified by applying the analysis pipeline to our mock communities (table S2). ORFs satisfying the criteria of having a significant q value in one diet condition and no fitness defects in any biological replicates under another diet condition were defined as “diet-specific fitness factors.”

Community assembly, stability, and resilience in response to different diets

Adult (10 to 12 weeks old) male germ-free C57BL/6J mice were placed on either a low-fat/high-plant polysaccharide (LF/HPP) chow or a diet in which calories were largely derived from hydrogenated vegetable shortening and beef tallow and carbohydrates were limited to sucrose, corn starch, and maltodextrin (plus an indigestible cellulose binder) [high–saturated fat/high–simple sugar (HF/HS) diet]. Diets were introduced 7 days before animals received a single oral gavage of the artificial community containing the four mutagenized strain libraries and 11 wild-type strains (Fig. 1A). In control experiments, we used a 15-member community composed of all wild-type strains, or 15-member communities that only contained a single mutant library (fig. S4A). Groups of singly caged mice were maintained on either the LF/HPP or HF/HS diet for the duration of the experiment, whereas other groups were subjected to a reciprocal set of diet oscillations (LF/HPP→HF/HS→LF/HPP or HF/HS→LF/HPP→HF/HS) (n = 5 mice/diet treatment/community type/gnotobiotic isolator) (Fig. 1A). We used short-read shotgun sequencing [community profiling by sequencing (COPRO-seq)] of DNA prepared from fecal samples collected over the course of each experiment to define the relative abundance of each community member.

Fig. 1 The effect of diet and diet oscillations on the configuration of the 15-member artificial community containing 11 wild-type strains and the four mutant libraries, including the structure of the population of B. cellulosilyticus WH2 mutants.

(A) Experimental design. 10- to 12-week-old germ-free male C57BL/6J mice were gavaged with the indicated consortium of 15 strains shown at the bottom of the panel. Animals were given the LF/HPP and/or the HF/HS diet in the order shown. Fecal samples were collected at the indicated time points for INSeq and microbial RNA-seq analyses. (B) Principal coordinates analysis (PCoA) of Hellinger distance measurements based on the relative abundance of B. cellulosilyticus WH2 Tn mutants in fecal samples, as defined by multi-taxon INSeq. Each solid circle represents the population of all mutants in B. cellulosilyticus WH2 sampled from an individual mouse in the indicated diet treatment group. Circles are color-coded based on the diet treatment group. Each open circle represents the location in the ordination plot of the population of Tn mutants, averaged for all animals in a given treatment group sampled at a given time point [data shown for T = 0 (input library before gavage), 4, 10, and 16 days after gavage]. Lines connect open circles for each of the four treatment groups. Dashed lines denote the two different diet oscillation groups. The colors of the lines correspond to the diet being consumed (green, LF/HPP; orange, HF/HS). Arrowheads superimposed on lines indicate the passage of time. (C and D) PCoA of Hellinger distance measurements based on COPRO-seq data. Each solid circle represents a fecal community sampled from an individually caged mouse belonging to the indicated diet treatment group. These circles were ordinated in the same coordinate space but are being shown as two separate panels for clarity. Vertical dashed lines in (D) indicate the time point at which a diet switch occurred.

When considered as a whole, each mutant library acted similarly to the parental wild-type strain in terms of its proportional abundance in the community (fig. S4B and table S3). In both monotonously fed treatment groups, the relative abundance of B. cellulosilyticus WH2 was high, although significantly greater in animals fed LF/HPP chow (36 ± 1.6% versus 16 ± 3.1% after 16 days on the diets; P < 0.01, Mann-Whitney test). B. ovatus ATCC 8483 had a preference for conditions associated with LF/HPP chow feeding (11 ± 0.7% versus 0.8 ± 0.3%; P < 0.01, Mann-Whitney test). Strain-specific differences were also evident; B. thetaiotaomicron VPI-5482 was more successful in the LF/HPP diet context than was B. thetaiotaomicron 7330 (9 ± 0.5% versus 0.02 ± 0.04%; P < 0.05, Mann-Whitney test), whereas the 7330 strain had significantly higher abundance than the VPI-5482 strain when animals were consuming the HF/HS diet (6 ± 1.4% versus 3 ± 0.5%; P < 0.05, Mann-Whitney test) (fig. S5A).

Multi-taxon INSeq allowed us to obtain a genome-wide, gene-level view of the effects of the selective pressures exerted by diets on fitness. Analysis of the relative abundances of all B. cellulosilyticus mutants in fecal samples collected at 4, 10, and 16 days after gavage in the two monotonous diet treatment groups disclosed that by day 4, the mutant population had already manifested diet-specific configurations [permutational multivariate analysis of variance (PERMANOVA) of Bray-Curtis dissimilarities, R2 = 0.676, P = 0.001]. The relative abundances of B. cellulosilyticus mutants were remarkably consistent between individually caged mice monotonously fed a given diet, including groups of mice housed in different gnotobiotic isolators (Student’s unpaired t test) (fig. S6A). This within-treatment group consistency was sustained as the configuration of the mutant population evolved during the ensuing 11 days of monotonous diet consumption (Fig. 1B and fig. S6, B and C). [Because we compared an INSeq library of a given single species/strain after in vivo selection under different dietary treatments, we used the nonphylogenetic Hellinger metric (11) to calculate the dissimilarity between the libraries within or between treatment groups.]

Multi-taxon INSeq of fecal samples collected at the end of 16 days of monotonous diet consumption yielded 550 HF/HS diet-specific B. cellulosilyticus fitness determinants and 34 LF/HPP diet-specific determinants: 244 of the 550 HF/HS diet-specific genes had Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology (KO) assignments; among this group, there was a significant enrichment of genes belonging to the KEGG categories “Membrane Transport” (for example, one operon involved in iron transport, another involved in phosphate transport), “Metabolism of Cofactors and Vitamins” [for example, seven genes in the cobalamin biosynthesis pathway, which is consistent with the view that the capacity to synthesize and use cobalamin and other substituted corrins is an important determinant of survival in the gut (9, 12)], and “Protein Folding, Sorting, and Degradation” (the functional annotations and fitness indices of these genes are provided in table S4).

Resilience—the ability of a system to “absorb changes of state variables and parameters, and still persist”—has been differentiated in community ecology from stability, which is “the ability of a system to return to an equilibrium state after a temporary disturbance” (13). The two groups of mice subjected to a reciprocal set of diet oscillations [LF/HPP (4 days)→HF/HS (6 days)→LF/HPP (6 days) or HF/HS (4 days)→LF/HPP (6 days)→HF/HS (6 days)] exhibited consistent and marked diet-specific changes in overall community structure as defined by COPRO-seq (Fig. 1, C and D; also see the patterns of change in relative abundances of B. cellulosilyticus and B. ovatus in fig. S5A). The 15-member community showed elements of both resilience and stability in the face of brief dietary disturbances. Oscillating the mice between the LF/HPP and HF/HS diets produced substantial changes in the relative proportions of community members, but all persisted through the disturbance (resilience), and perturbed and unperturbed communities converged on very similar states determined by the final diet consumed by the mice. At the end of the experiment, the diets explained 82.6% of the variance between fecal communities (Hellinger distances, PERMANOVA, P = 0.001) (fig. S7A). The interaction between diets and treatment (monotonous diet versus diet oscillation) explained only 8.7% of the variance, which is similar to its explanatory power before the disturbance (R2 = 8.7%).

Followup INSeq analysis of the same fecal DNA samples used for COPRO-seq afforded an opportunity to simultaneously characterize the degree to which the aggregate collections of B. cellulosilyticus and B. ovatus mutants were able to persist and recover from a diet disturbance (an INSeq-based definition of resilience and stability) and the variation in such recovery between animals (an INSeq-based measure of stochasticity). The B. cellulosilyticus WH2 INSeq library showed high resilience to the brief dietary perturbations in the sense that the library as a whole persisted throughout the experiment, although some individual mutants did not. However, it did not show the same stability that the 15-member community did; in the mice that experienced diet oscillations, Hellinger distances indicate that the library of mutants did not converge to the same state as in monotonously fed mice but rather persisted as a reconfigured population. This can be seen in the Hellinger distances between the libraries at the end of the experiment, when diet explained 38.7% of the variance, but the interaction between diet and treatment explained 35.1% (PERMANOVA, P = 0.001) (fig. S7B)—a value much greater than its explanatory power before the diet oscillation (R2 = 8.2%).

The configuration of the B. cellulosilyticus mutant population exhibited diet-specific changes. However, during the “recovery phase,” when these two groups of mice subjected to diet oscillations were returned to the starting diet of the sequence, the B. cellulosilyticus mutant library shifted toward a state similar but not identical to that observed in monotonously fed hosts (Fig. 1B). Of the 550 B. cellulosilyticus genes identified as HF/HS diet–specific fitness determinants in monotonously fed animals, 251 had significantly higher z scores (were less required) in animals that had been switched temporarily to LF/HPP chow and then returned to the HF/HS diet, as compared with mice that had only consumed the HF/HS diet [includes the six genes comprising the only one of its 113 polysaccharide utilization loci (PULs) identified as a HF/HS diet–specific fitness determinant (see table S4A and below)]. Similarly, 21 of the 34 genes designated as LF/HPP diet–specific fitness determinants in monotonously fed mice had significantly higher z scores in mice that had been temporarily switched to the HF/HS diet and then returned to LF/HPP chow (table S4A). As with B. cellulosilyticus, diet oscillations of the type A to B to A (or B to A to B) led to increased similarity in the overall patterns of fitness in the B. ovatus mutant library compared with the patterns of fitness in the library of mutants present in mice fed the diets monotonously (fig. S8 and table S4B). This response to perturbation may represent an important mechanism behind the maintenance of diet-specific traits in a member or members of a microbiota harbored by hosts experiencing more varied diets (here, we are using mutants as proxies for functional “traits”).

Identifying core in vivo fitness determinants in four Bacteroides strains

Genes that are conserved among the four Bacteroides strains and that show a significant effect on fitness in all strains in both dietary contexts can be defined as a core set of in vivo fitness determinants for these members of this genus; as such, they inform us about the resource requirements and selective pressures these taxa experience in the gut in this community context and under these two dietary conditions.

In total, 2238 genes are conserved among all four Bacteroides strains. Multi-taxon INSeq of fecal samples, collected from mice colonized for 16 days and monotonously fed one or the other diet, revealed 82 conserved genes with significant diet-independent effects on fitness in all four strains (table S5). The fitness indices (z scores) for these genes varied as a function of strain but less so across the different diets (P < 0.001; two-way ANOVA). Among the 82 core fitness determinants, 15 were components of biosynthesis pathways for arginine (ArgB, ArgC, ArgD, ArgF’, ArgE, ArgG, and ArgH), aspartate (AspC1), lysine (LysA), methionine (MetA), aromatic amino acids (AroH-TyrA, TyrA2, and TyrB), branched chain amino acids (LeuB), and histidine (HisI) (fig. S9A, table S6A, and supplementary materials). In follow-up studies, the B. cellulosilyticus mutant library was grown to stationary phase in minimal medium lacking amino acids and containing one of several carbon sources [glucose, xylose, arabinose, or wheat arabinoxylan (the most common hemicellulose in cereals)] or in control-rich medium [tryptone-yeast extract-glucose (TYG)] (n = 6 replicates/growth condition). INSeq analysis of the input library and stationary phase cultures (table S7) confirmed that Tn mutagenesis of the 15 core fitness determinants annotated as being involved in the biosynthesis of 11 amino acids precluded growth in any of four types of amino acid–depleted minimal media but did not affect growth in the rich medium (supplementary materials). Together, these results emphasize the important contribution of amino acid biosynthesis to the survival of these Bacteroides strains in the niches they occupy in the 15-member community in the two diet contexts tested in this study. This is consistent with our previous observation made in gnotobiotic mice that the biomass of a 10-member community composed of all wild-type human gut–derived strains—including the three Bacteroides used in the current study (B. thetaiotaomicron VPI-5482, B. ovatus ATCC 8483, and B. caccae ATCC 43185)—correlates with the amount of protein in the diet (8).

Genes related to carbohydrate utilization/metabolism were also prominently represented among the 82 shared in vivo fitness factors (fig. S9B and table S6B). These include (i) members of arabinose, fructose, glucose/galactose, and hexuronate utilization pathways [three inner-membrane monosaccharide transporters—glucose/galactose permease (GlcT), arabinose permease (AraP), and fructose transporter (FruP)—plus a previously unidentified arabinose-dependent repressor from the NrtR family (AraR), an enzyme from the arabinose catabolic pathway (AraA), two galactose catabolic enzymes (GalM, GalE), and two enzymes from the hexuronate-utilization pathway (KdgK, KdgA)]; (ii) three sugar-responsive transcriptional regulators, including homologs of BT4338, a Crp-like transcription factor that controls a proposed global sugar catabolic regulon (14) encompassing up to 30 genes per genome that participate in the utilization of arabinose, xylose, fucose, galacturonate, pectin, and β-hexosamines; and (iii) cytoplasmic enzymes involved in starch and glycogen synthesis [starch/glycogen synthase (BT1294), glycogen branching enzymes (BT0771 and BT4303), and alpha-glucanotransferases (BT4304 and BT4305)].

Group B vitamins are essential micronutrients that serve as cofactors (or their precursors) used by various enzymes involved in many facets of cellular metabolism. All three Bacteroides species are prototrophs for all B vitamins and their respective cofactors except cobalamin (vitamin B12), which is not synthesized by B. thetaiotaomicron or B. ovatus [application of the metabolic subsystem approach implemented in SEED (15) in order to investigate the ability of the three Bacteroides species to produce and salvage thiamine, riboflavin, niacin, biotin, pyridoxine, cobalamin, pantothenate, and folate is described in the supplementary materials]. All three Bacteroides species lack orthologs for all known transporters for B vitamins other than thiamine (vitamin B1) and cobalamin (fig. S9C and table S6C). Our in vivo multi-taxon INSeq analysis revealed that components of B-vitamin biosynthetic pathways as well as thiamine and cobalamin transporters functioned as strain-specific but not core fitness determinants (supplementary materials). This result indicated that in our gnotobiotic model, all three Bacteroides species, represented by four strains, have different requirements for B vitamins and/or deploy divergent strategies for acquiring them (for example, through de novo biosynthesis or from the diet, or from other microbes with B vitamin transporters) (detailed analysis of in vitro fitness determinants in vitamin B/cofactor biosynthetic pathways is provided in the supplementary materials).

One approach that has been used to search for genes that act as fitness determinants in vivo is to identify those that exhibit significant differences in their expression in vitro versus in a given body habitat (16). Therefore, we compared RNA sequencing (RNA-seq) data sets from the four Bacteroides strains, generated by using fecal samples collected at the same time as samples for INSeq, with RNA-seq data sets produced from B. cellulosilyticus during log phase growth in rich medium or in minimal medium containing single carbon sources such as arabinose, arabinoxylan, xylose, or glucose. The results revealed that only 3 of the 82 core fitness determinants had significantly higher levels of expression in vivo under both diet conditions than under any of the in vitro conditions examined (n = 4 mice/diet treatment; n = 3 biological replicates/in vitro growth condition) (fig. S10A). The fact that expression of the remaining 79 core fitness determinants, including the regulator of the proposed sugar catabolic regulon (fig. S10B), was not higher in vivo than in in vitro highlights one of the benefits of INSeq and provides a cautionary note that sole reliance on differential expression as a criterion may miss genes that are critical for survival in a given environmental context.

Differences in fitness determinants in the two B. thetaiotaomicron strains

B. thetaiotaomicron VPI-5482 is one of the most studied human gut Bacteroides (4, 5, 14, 1719). B. thetaiotaomicron 7330 is a strain recovered from a healthy adult. Syntenic regions in their genomes are illustrated in Fig. 2A. The representation of genes in KEGG categories and KEGG pathways for the two strains are described in table S8A. Provided in table S8B is a list of the 91 PULs encompassing 808 predicted ORFs in the 7330 strain, 18 of which (136 ORFs) are distinct (their protein products had <90% similarity to any protein in the predicted VPI-5482 proteome).

Fig. 2 HF/HS diet–specific fitness determinants in B. thetaiotaomicron 7330.

(A) Circos-generated alignment of the B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330 genomes (26). Gray lines connect segments of DNA conserved between these two genomes (these regions were identified through alignment with Mauve) (27). The color-coded outer circle denotes the similarity between their proteins: green, >90% identity (based on Blastp alignment); blue, 70 to 90% identity; red, <70% identity; black, intergenic regions. GC skew was calculated by using a sliding window size of 10kb (yellow, GC skew > 0; purple, GC skew < 0). (B) COPRO-seq analysis of the relative abundance of the two B. thetaiotaomicron strains in the fecal microbiota of mice sampled 2 weeks after gavage while consuming the HF/HS or LF/HPP diets. Mean values ± SEM are shown (n = 5 individually caged mice harboring a community consisting of 11 wild-type and the four mutant libraries per treatment group). The summed relative abundance of the two strains remains the same, even though the relative representation of the individual strains is significantly different in the two diet contexts (P < 0.001, 2-way ANOVA). (C) HF/HS diet–specific fitness determinants in B. thetaiotaomicron 7330 involved in degradation of glycosaminoglycans associated with the intestinal mucosa (genes highlighted with red lines and identified by the EC annotations of their protein products). (D) HF/HS diet–associated changes in the z scores of 7330-strain–specific fitness determinants that are involved in transcription regulation (Btheta7330_04415) or are components of operons encoding transport systems (vertical black lines to the left of the gene ID denote individual operons).

As noted above, the 7330 strain is more successful in the 15-member community context when mice are consuming the HF/HS diet. Although the relative representation of each B. thetaiotaomicron strain in the 15-member community varied by 340-fold depending upon the diet, their total relative abundance in the community was the same across the two diets (Fig. 2B), raising the possibility that the strains compete for a niche in this community and that occupancy of this niche by one strain or the other is highly sensitive to diet. Therefore, we used INSeq to identify factors that define their dietary niches (table S9, which includes the 385 HF/HS diet–specific fitness determinants in B. thetaiotaomicron 7330). In the absence of dietary polysaccharides, B. thetaiotaomicron VPI-5482 adaptively forages on host mucosal glycans, suggesting that the capacity to consume both dietary- and host-derived glycans is one of the mechanisms by which it is able to survive in the gut (5, 20, 21). The HF/HS diet–specific and 7330 strain–specific fitness factors include genes in two operons specifying components of cation efflux systems plus several glycoside hydrolases predicted to be involved in the catabolism of mucosal glycans (Fig. 2, C and D, and table S9B). The latter include genes (Btheta7330_01433, Btheta7330_02903, and Btheta7330_02906) that encode three members of glycoside hydrolase family GH20 (β-hexosaminidase, EC that likely cleave GlcNAc residues. Their orthologs are also represented in the VPI-5482 genome, but in this strain, they do not convey a significant fitness effect. Thus, one functional distinction between the two strains revealed through multi-taxon INSeq is that they likely use different strategies for adaptive foraging of mucosal glycans in the absence of dietary polysaccharides.

Individual Bacteroides species contain multiple capsular polysaccharide biosynthesis (CPS) loci, allowing for a large number of combinations of expressed loci in different environmental contexts. We used multi-taxon INSeq to identify CPS loci and their component genes that were critical for fitness in vivo within and across species and strains as a function of diet. The two B. thetaiotaomicron strains vividly illustrate the different contributions of individual CPS loci to survival. CPS4 was the only one of the VPI-5482 strain’s seven CPS loci that functioned as a significant fitness determinant in the context of the 15-member community and either of the two diets (as shown in fig. S11B, 20 of its 21 genes, collectively covered by >1200 Tn mutants, were defined as essential). In contrast, CPS3—a locus in the 7330 strain’s genome composed of 18 genes not found in the VPI-5482 strain—was the only one among its six CPS loci critical for fitness in either diet context (all 18 genes, covered by >850 mutants, had significant fitness indices) (fig. S11C). INSeq established that this strain specificity for CPS fitness effects was evident along the length of the gut (table S9C). The importance of these loci is not simply attributable to their levels of expression: Microbial RNA-seq of fecal samples collected at 4 and 16 days after gavage revealed that neither CPS4 in B. thetaiotaomicron VPI-5482 nor CPS3 in B. thetaiotaomicron 7330 is the dominantly expressed locus in either diet context (table S10).

Identifying a diet supplement that can specifically manipulate B. cellulosilyticus abundance

B. cellulosilyticus WH2 is equipped with more carbohydrate active enzymes (CAZYmes) dedicated to glycan digestion than is any other sequenced Bacteroidetes genome reported to date (7). Its 510 CAZYmes—comprising glycoside hydrolases, polysaccharide lyases, carbohydrate esterases, and associated noncatalytic carbohydrate binding modules—are distributed among 113 PULs. Only one of the 113 PULs (BcellWH2_04321-4327) functioned as a significant fitness determinant in the HF/HS diet context (Fig. 3A). The fitness indices of the 127 Tn mutants that mapped to the six genes in this PUL were remarkably consistent between individually caged animals as a function of their diet. Moreover, as noted above, the relative abundances of these mutants were higher in mice that had undergone an HF/HS-LF/HPP-HF/HS diet oscillation than in mice monotonously fed the HF/HS diet (Fig. 3A). This PUL is contained within a region of the genome spanning BcellWH2_04292-BcellWH2_04327 that contains three hybrid two-component systems (an analysis of putative HTCS regulons in this genomic region is provided in the supplementary materials) (fig. S12).

Fig. 3 Arabinoxylan increases the relative abundance of B. cellulosilyticus WH2 in vivo.

(A) INSeq analysis reveals that all genes in PUL BcellWH2_04321-4327 have significant fitness indices (z values) in the HF/HS diet context. BACWH2_04327, encoding a hybrid two-component system (HTCS) regulator, is the only gene in this PUL that has a significant fitness effect on the LF/HPP diet. Functional annotations for genes in the PUL are shown together with the direction of their transcription. Fitness indices for each gene in the different diet contexts (orange, HF/HS; green, LF/HPP) are plotted as mean values ± SEM. The horizontal dashed line indicates the cutoff for significance (P < 0.05; z test with FDR correction). dist_CE, distant relative of carbohydrate esterase; GH, glycoside hydrolase; HTCS, hybrid two-component system. (B) Experimental design. Adult C57BL/6J germ-free mice were gavaged with a consortium containing 11 wild-type strains plus the four Bacteroides INSeq libraries. Animals were fed the HF/HS or LF/HPP diets with or without supplementation of their drinking water with 7.5% arabinoxylan (n = 5 individually caged mice per group). (C) The relative abundance of B. cellulosilyticus and B. ovatus was defined by COPRO-seq analysis of fecal samples collected at the indicated time points. Mean values ± SEM are plotted. ***P <0.001 compared with the reference group A at 14 days post-gavage (dpg); +++P <0.001 for within group comparisons of the 30- versus 14-dpg fecal sample (Student’s t test after FDR correction). B. ovatus, the only other Bacteroides strain in the community that exhibited significantly increased growth in minimal medium supplemented with arabinoxylan (fig. S13B) did not manifest a significant change in its relative abundance in vivo when arabinoxylan was added to the drinking water. ns, not significant.

The BcellWH2_04321-4327 PUL consists of a hybrid two-component system (HTCS) transcriptional regulator (a characterization of DNA binding motifs and regulated operons is presented in the supplementary materials) and a xylan utilization system core (22), which consists of a SusC/D pair (BcellWH2_04325/26), a hypothetical protein (BcellWH2_04324), a predicted xylanase belonging to glycoside hydrolase family 10 (GH10; BcellWH2_04323), a predicted β-galactosidase from GH35 (BcellWH2_04322), and a predicted feruloylesterase (BcellWH2_04321) (Fig. 3A and fig. S12, A and B). Gas chromatography/mass spectrometry (GC/MS) of the products of acid hydrolysis of the HF/HS diet revealed small amounts of xylose and arabinose (61.2 ± 7.5 μg/g and 62.8 ± 6.2 μg/g, respectively). Our previous in vitro growth studies of B. cellulosilyticus WH2 cultured in minimal medium supplemented with one of 31 distinct carbohydrate substrates, plus RNA-seq analysis of the bacterium recovered at mid-log phase from those minimal media that supported its growth, revealed that this PUL was induced by xylan and arabinoxylan (7) (fig. S13A). Arabinoxylans are made of a xylan backbone with α-l-arabinose side chains. They also contain ferulic acid and other phenolic acids that are covalently linked to them via ester linkages. A likely role for the product of BcellWH2_04321 in the context of this PUL would be removal of the ferulic esters from arabinoxylan. The GH10 xylanase targets the xylan backbone, whereas the GH35 enzyme likely removes the α-linked l-arabinose side chains (α- l-arabinose is identical to β-d-galactose, except for the C-6 moiety).

B. ovatus, the only other Bacteroides strain in the artificial community that could grow in minimal medium containing purified arabinoxylan as the sole carbon source (fig. S13B), contains a PUL (Boavatus_01723-32) that is induced when this medium is supplemented with arabinoxylan (or xylan) (6). Multi-taxon INSeq of fecal samples collected from mice harboring the 15-member community showed that unlike the xylan- and arabinoxylan-inducible B. cellulosilyticus PUL BcellWH2_04321-27, this PUL is not required for survival of B. ovatus in the HF/HS diet context (Tn mutants in its SusC and xylanase genes produced no significant fitness defects) (fig. S13C).

Given these results, we examined the effects of consumption of arabinoxylan purified from wheat on the relative abundances of B. cellulosilyticus and B. ovatus and their fitness determinants. Our rationale was that we could use arabinoxylan to induce expression of the one B. cellulosilyticus WH2 PUL that was a key fitness factor in the HF/HS diet context, and that PUL induction would be accompanied by improved fitness. Therefore, before colonization with the 15-member community containing the INSeq libraries, one group of germ-free mice received the HF/HS diet plus drinking water supplemented with wheat-derived arabinoxylan (7.5% w/v) for 7 days. Another group received the same diet but without arabinoxylan in their drinking water. All groups were then gavaged with the artificial community: The group pretreated with arabinoxylan continued to receive the HF/HS diet and supplemented water ad libitum for the next 14 days and then was switched to unsupplemented water. A reciprocal treatment group received unsupplemented water plus the HF/HS diet for 14 days, followed by a switch to arabinoxylan-supplemented water. A third group received unsupplemented water plus the low-fat, high-plant polysaccharide (LF/HPP) diet for 14 days, followed by a switch to supplemented water (while being maintained on the LF/HPP chow); these animals served as a control because the targeted PUL only manifests itself as a fitness determinant in the context of the high-fat, polysaccharide-deficient diet (n = 5 individually caged mice per treatment groups A, B, and C in Fig. 3B).

COPRO-seq analysis of fecal samples collected at the end of each 14-day treatment period demonstrated that arabinoxylan produced a significant increase in B. cellulosilyticus abundance in mice fed the HF/HS diet (adjusted P value <0.001; Student’s t test; comparing group B to group A at the 14-day time point, and the 30-day versus 14-day time points within group A in Fig. 3C) but no significant effect in the LF/HPP diet context (as seen in the 30-day versus 14-day time points for group C). Consistent with the INSeq results showing that the arabinoxylan utilization PUL in B. ovatus is not a fitness determinant in the HF/HS diet context, we observed no significant effects of arabinoxylan treatment on the relative abundance of this community member (Fig. 3C). We confirmed these findings in a separate experiment in which two groups of gnotobiotic mice harboring the 15-member community were treated for 56 days with a HF/HS diet with or without supplementation of their drinking water with 15% (w/v) arabinoxylan. In this higher-dose, longer-duration, monotonous-diet experiment, the abundance of B. cellulosilyticus WH2 increased significantly (as did levels of cecal short chain fatty acids and deconjugated bile acids), whereas B. ovatus showed no response (fig. S14, A and B). Arabinoxylan treatment did not produce a statistically significant difference in total body weight (Student’s t test).

We identified 407 B. cellulosilyticus WH2 genes that functioned as fitness determinants when mice consumed the HF/HS diet but not when mice also received arabinoxylan in their drinking water (table S11A). The arabinoxylan- and xylan-inducible PUL BcellWH2_04321-4327 was included among this group of genes (fig. S15A). B. cellulosilyticus contains another PUL encoding a predicted xylanase (BcellWH2_04296-4307) that is also induced by xylan and arabinoxylan in vitro. Neither of these two PULs are fitness determinants when B. cellulosilyticus is grown in minimal medium containing arabinoxylan as the sole carbon source (table S7). The BcellWH2_04296-4307 PUL is not a fitness determinant when mice are consuming the HF/HS diet, with or without arabinoxylan, suggesting that together, these two PULs—BcellWH2_04321-04327 (the HF/HS diet specific fitness PUL that is no longer required with arabinoxylan supplementation) and BcellWH2_04296-4307 (not required under either of the two conditions)—provide redundant yet distinguishable functions in the 15-member community context. A recent study of the orthologous PULs in B. ovatus ATCC 8483 showed that they are regulated by different xylan structures: All xylans contain a conserved β1,4-xylose core backbone that is variably decorated with monosaccharide side chains; Bovatus_01727-01732, the ortholog of BcellWH2_04321-04327, is regulated by simpler xylans, whereas B. ovatus_03715-03733 is regulated by more complex (decorated) xylans (23). This finding together with our results suggest that (i) the xylan present in the HF/HS diet is a simpler type that is processed by the BcellWH2_04321-4327 PUL, and (ii) adding a xylan of intermediate complexity (wheat arabinoxylan) induces both of the PULs. We postulate that this dual induction creates a functional redundancy with respect to arabinoxylan utilization in B. cellulosilyticus WH2 that accounts for our finding that neither PUL alone is required for fitness in the setting of arabinoxylan supplementation of the HF/HS diet. Targeted mutation and complementation of genes in one, the other, or both BcellWH2_04321-04327 and BcellWH2_04296-4307 would provide an opportunity to independently validate the roles of these two PULs in different diet contexts. Once genetic tools have been developed to stably express the one or two proteins required to complement the corresponding single- and double-knockout strains in either or both PULs in vivo, these experiments can and should be performed.

There were several transporters that were no longer significant fitness determinants in arabinoxylan-supplemented mice; these include four MFS-type transporters [a nucleoside H+ symporter, plus transporters for galactose/glucose (GlcT), fructose (FruP), and di/tri-peptides], iron and zinc transporters, and two ABC-type antimicrobial peptide transporters (a ranking based on their q scores is available in fig. S15B). Two MFS-type multidrug transporters and one ABC-type antimicrobial peptide transporter were the only B. ovatus genes in the KEGG category “Membrane transport” that lost their significant fitness index scores with arabinoxylan treatment (these transporters are not homologous to those identified in B. cellulosilyticus). A number of human gut Bacteroides, including B. ovatus, are known to be highly resistant to antimicrobial peptides (24). Import of gut mucosal–derived antimicrobial peptides is one strategy gut symbionts use for avoiding destruction of their cell membranes (25). Some Bacteroides species are able to turn to gut mucosal glycans as alterative sources of nutrients when polysaccharides are not well represented in the diet. Arabinoxylan supplementation may mitigate this need, and thus, the observed diminished requirement for antimicrobial peptide transporters could reflect alterations in interactions with the mucosa.

A total of 27 genes were identified in the B. cellulosilyticus genome that functioned as significant fitness determinants when mice received the HF/HS diet plus arabinoxylan but not when animals received the HF/HS diet alone (table S11A). These include (i) all genes in an operon (BcellWH2_00893-895) that includes a sulfate transferase (EC, which is consistent with the observed increase in deconjugated bile acids documented by UPLC-MS of cecal contents (fig. S14C), and (ii) several genes that highlight how increases in arabinoxylan availability enhance the importance of ammonium utilization for synthesis of amino acids and proteins in this community and diet context [an ammonium transporter (BcellWH2_05255), plus glutamine, and glutamate synthetases (BcellWH2_5244 and BcellWH2_05271, respectively)].

Even though B. ovatus did not manifest a significant change in its relative abundance when mice received arabinoxylan supplemented water, multi-taxon INSeq revealed 41 genes that were not fitness determinants when mice consumed the HF/HS diet alone but “acquired” significant z scores when arabinoxylan was introduced (table S11B). These include seven closely linked glycosyltransferase genes (Bovatus_03504-11), suggesting that this organism changes its glycan utilization strategies when it encounters arabinoxylan in the gut environment.


We cannot distinguish a direct effect of a given diet or diet oscillation on a given community member from a primary effect of that diet or diet perturbation on one or more other community members that interact with the member/strain exhibiting changes in its abundance. We could have limited ourselves to mono-colonizations in order to establish direct effects of diet on these features, but we would have lost our ability to describe responses to diet changes and ascertain the niches of these organisms in the more “natural” context of a microbial community.

In our study, multi-taxon INSeq is used to provide an operational description of an organism’s niche by determining which sets of genes allow a bacterial strain to coexist with other strains or species under a defined set of habitat conditions (for example, gnotobiotic mice representing a given genetic background, colonized with a given set of sequenced organisms, and fed a given set of diets). In principle, this approach can be applied to gut and nongut habitats in gnotobiotic mice representing different genetic backgrounds, harboring different microbial consortia and manipulated in various ways to obtain a more comprehensive picture of how host genotype influences the niches available. It also offers a way to address a range of questions: from how to functionally discriminate strains when exploring the microbial genetic foundations of opportunism, stochasticity in community assembly, or stability and resilience after perturbations, to the selection of probiotic candidates for bioremediation of perturbed/dysbiotic microbial communities, or identifying compounds that affect the functional configurations of a targeted microbiota, or characterizing of the nutritional effects/value of diets and their ingredients.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S17

References (2844)

Tables S1 to S14

References and Notes

  1. Acknowledgments: We thank D. O’Donnell and M. Karlsson for their assistance with gnotobiotic mouse husbandry; X. Zhou and A. Bell for their work on PacBio sequencing; and A. Reyes, M. Charbonneau, P. Ahern, and A. Hsiao for many helpful suggestions. This work was supported in part by grants from the NIH (DK30292, DK70977, DK52574, and GM108527), the Crohn’s and Colitis Foundation of America, the European Union’s Seventh Framework Program (FP/2007/2013)/European Research Council (ERC) Grant Agreement 322820 (to B.H.), the Monsanto Company (for PacBio and Illumina sequencing and finishing the B. thetaiotaomicron 7330, B. ovatus ATCC8483, and B. cellulosilyticus WH2 genomes), and the Russian Science Foundation (to D.A.R. and M.S.K.) INSeq, COPRO-seq, and microbial RNA-seq data sets have been deposited in the European Nucleotide Archive (ENA) under accession no. PRJEB9434. The finished and annotated genome sequences of B. thetaiotaomicron 7330, B. ovatus ATCC 8483, and B. cellulosilyticus WH2 have been deposited in the National Center for Biotechnology Information under accession no. PRJNA289334. J.I.G. is cofounder of Matatu, a company characterizing the role of diet-by-microbiota interactions in animal health. N.P.M. is currently an employee of Matatu. A.L.O. is an Adjunct Vice President for Research for Buffalo BioLabs. A U.S. patent application (PCT/US2014/045141) has been filed by Washington University related to methods for identifying dietary supplements, such as arabinoxylans, that can be used to deliberately manipulate the representation of targeted members of the human gut microbiota.
View Abstract

Stay Connected to Science

Navigate This Article