Research Article

Population-level analysis of gut microbiome variation

See allHide authors and affiliations

Science  29 Apr 2016:
Vol. 352, Issue 6285, pp. 560-564
DOI: 10.1126/science.aad3503

“Normal” for the gut microbiota

For the benefit of future clinical studies, it is critical to establish what constitutes a “normal” gut microbiome, if it exists at all. Through fecal samples and questionnaires, Falony et al. and Zhernakova et al. targeted general populations in Belgium and the Netherlands, respectively. Gut microbiota composition correlated with a range of factors including diet, use of medication, red blood cell counts, fecal chromogranin A, and stool consistency. The data give some hints for possible biomarkers of normal gut communities.

Science, this issue pp. 560 and 565

Abstract

Fecal microbiome variation in the average, healthy population has remained under-investigated. Here, we analyzed two independent, extensively phenotyped cohorts: the Belgian Flemish Gut Flora Project (FGFP; discovery cohort; N = 1106) and the Dutch LifeLines-DEEP study (LLDeep; replication; N = 1135). Integration with global data sets (N combined = 3948) revealed a 14-genera core microbiota, but the 664 identified genera still underexplore total gut diversity. Sixty-nine clinical and questionnaire-based covariates were found associated to microbiota compositional variation with a 92% replication rate. Stool consistency showed the largest effect size, whereas medication explained largest total variance and interacted with other covariate-microbiota associations. Early-life events such as birth mode were not reflected in adult microbiota composition. Finally, we found that proposed disease marker genera associated to host covariates, urging inclusion of the latter in study design.

Sequencing-based assessment of microbial communities in human fecal material has linked alterations in gut microbiota composition to disease, as well as chronically suboptimal health and well-being (13). The discovery of these associations has stimulated the search for specific microbiome-based biomarkers for a wide range of pathologies (49). However, major challenges still hamper the once assumed imminent translation of microbiome monitoring into diagnostic and clinical practice. One such hurdle is the lack of knowledge about the impact of host and environmental factors on microbiota variation within an average, healthy population. Such information is essential for robust disease marker identification in clinical metagenomics (10). To identify and characterize major microbiome-associated variables, the Flemish Gut Flora Project (FGFP) initiated a large-scale cross-sectional fecal sampling effort in a confined geographic region (Flanders, Belgium). FGFP collection protocols combined rigorous sampling logistics, including frozen sample collection and cold chain monitoring, with exhaustive phenotyping through online questionnaires, standardized anamnesis and health assessment by general medical practitioners (GPs), and extended clinical blood profiling (fig. S1). Encompassing an equilibrated range of age, gender, health, and lifestyle, the FGFP cohort is expected to be representative for the average gut microbiota composition in a Western European population (table S1). From this cohort, fecal samples of 1106 individuals (98.5% of Western or Eastern European ethnicity; 96.8% born in Belgium) with time-matched blood and questionnaire data were analyzed. Microbiome phylogenetic profiling was performed using 16S ribosomal RNA (rRNA) gene amplicon sequencing. In addition, a Dutch cohort (N = 1135, LifeLines-DEEP, LLDeep; the Netherlands) was profiled and analyzed (11) for validation purposes.

Characterizing the core microbiota

First, we identified a human core microbiota by combining the FGFP and LLDeep data with other U.K. and U.S. studies (1214), yielding nearly 4000 well-profiled individuals. Combined, these data sets comprised a total richness of 664 genera (fig. S2A). Extrapolation estimated total western genus richness at 784 ± 40 (fig. S2B), suggesting that total western richness is still undersampled. Observing total richness would require sampling an estimated additional 40,739 individuals. The current data set yielded a core microbiota (i.e., the genera shared by 95% of samples) composed of 17 genera with a median core abundance (MA) of 72.20% (fig. S2, C and D, and table S2). Complementing this data set with 308 samples collected in Papua New Guinea (15), Peru (16), and Tanzania (17) reduced the size of the human core microbiota to 14 genera. Notably, Alistipes, Clostridium IV, Parabacteroides, and all Actinobacteria were excluded from the global core composition (fig. S3 and table S2). Within the FGFP data set specifically, 35 genera meet the core definition proposed (MA 90.40%), while a 99% cutoff reduced core composition to 20 genera (MA 80.67%; table S2). These 20 core genera also occurred among the top 33 most abundant taxa in the FGFP cohort (table S2). Independently of gender, genus richness correlated positively with age, whereas total core abundance decreased (fig. S4).

Based on unconstrained canonical correspondence analysis of genus-level community composition, we identified the main genera contributing to microbiome variation within the FGFP data set (table S3). Interindividual variation in microbiota composition mainly resulted from changes in relative abundance of core taxa (Fig. 1A). The taxa showing the largest variation in abundance were Ruminococcaceae, Bacteroides, and Prevotella; all previously proposed as enterotype identifiers (18). However, microbiota variation was not only defined by fluctuations in the core or dominant microbiota members, as less abundant genera, such as Akkermansia and Methanobrevibacter, were also discriminative (table S3). The density of individuals within the FGFP microbiome composition landscape resolved into three major peaks, coinciding with the three main contributors to variation identified above (Fig. 1B), as well as enterotypes [based on clustering (18) or Dirichlet multinomial mixtures (19, 20); fig. S5].

Fig. 1 Microbial community variation in the FGFP cohort, represented by principal coordinates analysis (PCoA, genus-level Bray-Curtis dissimilarity).

(A) Top 10 contributors to community variation as determined by canonical correspondence analysis on unscaled genera abundances, plotted on the two first PCoA dimensions (arrows scaled to contribution). (B) FGFP sample density on the PCoA plot; arrows indicate density peaks enriched in the three previously proposed enterotype drivers: Prevotella, Bacteroides, and Ruminococcaceae genera.

Identifying microbiome covariates

Building upon the extensive FGFP phenotyping, we tested 503 metadata variables (table S1) to identify microbiome covariates. To achieve a balance between number of phenotypes of interest and rates of false discovery, a stepwise approach was applied. After removing collinear variables (table S4), 69 factors were shown to correlate significantly [false discovery rate (FDR) <10%] with overall microbiome community variation (Bray-Curtis dissimilarity; Fig. 2 and table S5). Of those covariates, 26 had an analog in the LLDeep record (11). Despite differences in study population and sample analysis (e.g., DNA extraction methods), 24 matching covariates were found to be significantly associated with microbiome composition in the LLDeep cohort, leading to an overall replication success rate of 92% (Fig. 2). All 69 covariates identified correlated with alpha-diversity measures and individual taxa abundances (table S6). However, the predictive power of the linear covariate-based models was limited, as they only explained 1.50 to 14.74% of genus abundance variation (table S7), suggesting additional contribution from unknown factors, stochastic effects, and/or biotic interactions (21). Moreover, correlations were affected by interactions between specific covariates, notably medication (see below; table S8).

Fig. 2 Microbiome covariates identified in the FGFP cohort (left) and their replication in the LLDeep study (right).

Factors are sorted according to their effect size in FGFP and colored based on metadata category (fig. S1), with “Medication” (pink) here split out of the “Health” category (dark blue). Covariates identified in the FGFP and successfully replicated in LLDeep (P < 5%) are colored in green; nonreplicated covariates are in black.

Calculation of the covariates’ combined effect size per phenotypical category revealed that medication had the largest explanatory power on microbiome composition, including 10.04% of community variation (Fig. 3A and table S9). Blood parameters, bowel habits, health status, anthropometric features, and lifestyle followed with decreasing combined correlation, raising the total additive effect size of all categories to 16.43%. To identify nonredundant covariates of microbiome variations from our shortlist of 69 correlating factors, we performed a forward stepwise redundancy analysis (RDA) that resulted in a set of 18 variables (Fig. 3B and table S10) with a cumulative (nonredundant) effect size on community variation of 7.63%. Here, we identified stool consistency as the top single, nonredundant microbiome covariate in the FGFP metadata (see below) (22, 23). Among the other nonredundant covariates were age (12) and gender (24), but also the intake of specific drugs and dietary information (including fiber uptake, bread preference, and fruit consumption; Fig. 3B). Regarding the ongoing debate on the association between microbiome composition and body mass index (BMI) (25, 26), our analyses revealed that effect size is small but significant (table S10). Notably, previously unidentified factors such as red blood cell (RBC) count and hemoglobin concentration indicated covariation of microbiome composition with blood oxygen uptake capacity (27). Previous work in mice has shown an effect of oxygen diffusion on the microbiota (28). Moreover, correlations between RBC counts and Faecalibacterium abundances are in line with the known oxygen requirements of this genus (29). Of the 18 covariates with nonredundant contributions to microbiome variation, 10 were found to be significant by generalized linear model analysis (table S11). This approach confirmed the top covariate status of stool consistency (22, 23) and revealed associations between genus abundances and hip circumference, uric acid concentrations, amoxicillin intake, and chocolate-type preference (namely, an increased abundance of unclassified Lachnospiraceae in participants with a preference for dark chocolate).

Fig. 3 Effect sizes of covariates of FGFP microbiome composition.

(A) Combined effect size of FGFP covariates pooled in predefined categories (Fig. 2 color codes) with covariate distance-based selection. (B) Cumulative effect size of nonredundant covariates selected by stepwise RDA analysis (right bars) as compared to individual effect sizes assuming independence (left bars). Rings in each panel show the fraction of microbial variation explained with the approach.

Out of a total of 503 parameters, stool consistency, as measured by self-assessed Bristol stool scale (BSS) score, emerged as the top feature covarying with fecal microbiome composition. BSS score has been put forward as an indicative measure of transit time (30), but also reflects water availability and potential niche differentiation within the colon ecosystem (23). We confirmed previously reported associations of stool consistency with microbiota richness, prevalence of Prevotella-enterotyped samples, and Akkermansia and Methanobrevibacter abundances (22, 23) (Fig. 4, fig. S6, and table S12). In addition, we showed that 12 out of 20 of the FGFP 99% core genera covary with BSS scores, with overall core abundance increasing in looser stools. We assessed the confounding effect of stool consistency on the remaining 68 microbiome covariates using RDA. Among the features losing most explanatory power were time since previous relief (also indicative of passage rates), blood uric acid and hemoglobin levels, BMI, gender, and frequency of beer consumption (table S12).

Fig. 4 BSS score association to microbiota variation.

(A) BSS score variation across the FGFP cohort, as represented on the genus-level PCoA ordination (Bray-Curtis dissimilarity). Each cell is colored according to median BSS score of individual samples allocated to the cell coordinates. (B) Enterotype distribution over BSS scores [JSD enterotyping (18)] showing an increase in Prevotella individuals with looser stool consistency. (C) Median differences in abundance of the core microbiota (FGFP genus-level core at 99%) and in observed genus richness across BSS score.

Bacterial genera associated with disease

Years of disease-targeted microbiome research have generated an extensive inventory of bacterial genera with a reported association with one or more pathologies. We have assessed correlations between taxa that have been reported to be more abundant or depleted in individuals suffering from specific conditions (table S13) and the set of 18 nonredundant microbiome covariates identified. Our analyses confirmed previous work showing that Akkermansia abundance positively correlated with time since previous relief (23), but it was also negatively associated with insulin resistance risk factors such as BMI and blood triglyceride concentrations (31). Faecalibacterium numbers were, as discussed, dependent on RBC counts, but our analysis did find a decreased abundance in ulcerative colitis patients (32). The presence of Fusobacterium could not be linked to any of the nonredundant covariates identified in this study, which could indicate the specificity of its association with colorectal cancer (8). Given these associations, inclusion of the identified covariates in future clinical study design seems appropriate.

Next, we identified sample subsets with specific taxonomic signatures using a biclustering approach (33). Two stable biclusters were detected, spanning 410 and 374 samples, respectively, with an intersection of 92 (table S14). The first bicluster comprised 15 genera, including several Clostridia, as well as hydrogenotrophic genera, such as Methanobrevibacter and Desulfovibrio. The cluster was predominantly composed of women, individuals with a lower weight, and participants with a longer transit time, as reflected both by stool consistency and time since previous relief. Both microbiota richness and evenness were elevated in this cluster. In contrast, the second bicluster, consisting of seven genera, including Bacteroides and Parabacteroides, comprised individuals with reduced microbiome diversity. Characterization of these individuals revealed a preference for white, low-fiber bread [bread being the major source of carbohydrates in an average Belgian diet (34)] and higher prevalence of recent amoxicillin treatment. Thus, this biclustering analysis hinted at microbiome configurations that at least partially overlap with previously described enterotypes. Indeed, while the Ruminococcus enterotype was overrepresented in the first bicluster, the second was enriched in Bacteroides-type individuals. This, together with the results from Fig. 1B, suggested that although not discrete, enterotypes do indeed represent “densely populated areas in a multidimensional space of community composition,” as stated in the original publication (18).

The effect of medical interventions

When combining FGFP covariates in predefined categories (fig. S1 and table S15), the use of medication showed the largest explanatory value for microbiome variation in our study. The use of medication in the FGFP cohort was widespread [with 1950 records of over-the-counter plus prescription drug intake during the past 12 (antibiotics) or 6 months (all others) prior to sampling]. On the shortlist of 69 FGFP microbiome covariates figured 13 drugs, including antibiotics, osmotic laxatives, inflammatory bowel disease (IBD) medication, female hormones, benzodiazepines, antidepressants, and antihistamine. Independently of other covariates, intake of several of these substances was associated with community composition variation (Fig. 5A and table S15). The only drugs significantly associated with the abundance of specific genera in phenotype-matched case-control analyses were β-lactam antibiotics (FDR <5%). As medication was shown to affect the outcome of microbiome association studies (35), we performed an interaction analysis of covariate-microbiome correlations in the FGFP data set (table S8). Of the covariate interactions detected, 63% was driven by medication (Fig. 5B). This result highlights the versatility of drug-microbiome associations and stresses their importance as potentially confounding factors in clinical studies.

Fig. 5 Drug interactions in the FGFP.

(A) Overview of the association between different types of medication and microbiome composition. Colored boxes (color coding according to medication) represent a significant result in the matched case-control (FDR<5%) or boosted additive general linear modeling (FDR<10%, table S11) analyses. The effect (decrease/increase) of medication on genera abundances is specified. (B) Circos plot showing correlations between covariates and genus abundances (FDR<10%) interacting with drugs. Genera are grouped at phylum level; ribbons represent genus-phenotype associations and are colored according to the confounding medication (gray indicates nonconfounded).

Some early-life events that are generally thought to affect adult microbiota composition were not associated with microbiota composition variation in our study, including mode of birth [cesarean section (N = 36) or vaginal delivery (N = 1036)], place of birth [home (N = 207) or hospital (N = 899); increased diversity in home-born individuals, FDR>5% when controlling for age], and infant nutrition [breastfed (N = 537) or not breastfed (N = 359)] (fig. S7). Residence type [ranging from countryside (N = 77) over rural village (N = 500), small town (N = 272), suburb (N = 137), to city (N = 102)] during early childhood (up to 5 years old), one of the 69 FGFP microbiome covariates, was linked to adult microbial community composition, with a positive correlation between evenness and residence in more industrialized areas, though not statistically significant (FDR >5%) when correcting for age, gender, and BMI. Although the lack of signal in the data was unexpected, these results by no means imply that early-life events do not affect microbiota assembly during infancy, nor do they question previous associations with disease or allergy (36, 37); our analyses only indicated that such events were not significantly associated with microbiome composition at adult age in the FGFP cohort.

Power analysis and conclusions

Finally, the sample size and phenotypic breadth of the FGFP data set provided a unique opportunity to perform an informed power analysis for clinical microbiome studies. In a first approach, we calculated the number of samples needed to assess a difference in dominant microbiota members in a case-control setting where the type of microbiota shift is unknown (e.g., for a discovery project in an unstudied disease). We could detect a 9% difference between taxon proportions with 400 samples per group at a power above 95% and a 5% difference with 500 samples per group at a power of 80% (table S16). In a second approach, we estimated the sample size needed to identify a microbiome shift specific to a known association in a background of other factors (e.g., for intervention studies). Focusing on the prevalent concern of BMI increase and suboptimal health, we assessed the sample size needed to evaluate microbiota compositional changes associated to obesity. To do so, we calculated the independent effect sizes of obesity status, gender, age, and BSS on microbiota variation (table S16). This allowed us to estimate that 865 lean (BMI <25) and 865 obese (BMI ≥30) volunteers would be necessary to study microbiota compositional shifts with P < 5% significance level and a power of 80%. When taking into account gender, age, and BSS score as covariates, the estimated sample size was reduced to 535 (table S16).

Overall, this study identified a global human core microbiota, while also highlighting that total gut diversity is not yet covered, even combining microbiome data from almost 4000 individuals. Building upon rich metadata and a two-cohort design, we identified a set of microbiota covariates with a replication rate of over 92% and a cumulative, nonredundant effect size of 7.63%. This suggests the influence of additional, currently unknown covariates as well as intrinsic microbial ecological processes such as founder effects, species interactions, and dynamics. We showed that some of the medical conditions targeted by fecal microbiota research have much smaller microbiome effect sizes than commonly assumed. However, some of the covariates that we identified (such as BSS and medication) are currently largely ignored and should be taken into account in future clinical studies. Our power analyses showed that large-scale study design is indispensable for characterizing microbiome shifts, even in a controlled setting, confirming that scale indeed matters, but knowledge of confounders can help to ease power issues. The results from this study form a solid basis for the development of microbiome research as a clinical and diagnostic field.

Supplementary Materials

www.sciencemag.org/content/352/6285/560/suppl/DC1

Materials and Methods

Figs. S1 to S9

Additional Data tables S1 to S17

References (3857)

References and Notes

Acknowledgments: FGFP procedures were approved by the medical ethics committee of the University of Brussels–Brussels University Hospital (approval 143201215505, 5/12/2012). A declaration concerning the FGFP privacy policy was submitted to the Belgian Commission for the Protection of Privacy. Written informed consent was obtained from all participants. The FGFP was funded with support of the Flemish government (IWT130359), the Research Fund–Flanders (FWO) Odysseus program (G.0924.09), the King Baudouin Foundation (2012-J80000-004), FP7 METACARDIS HEALTH-F4-2012-305312, VIB, the Rega institute for Medical Research, and KU Leuven. D.V. is funded by the Agency for Innovation by Science and Technology (IWT). S.V.S. is supported by Marie Curie Actions FP7 People COFUND–Proposal 267139 (acronym OMICS@VIB). M.J., S.V.S., G.L.M, S.C., K.F., J.W., and M.V., K.D. are supported by postdoctoral (six) and predoctoral (two) fellowships, respectively, from the FWO. The LifeLines-DEEP study was funded by the Top Institute Food and Nutrition (TiFN GH001), CardioVasculair Onderzoek Nederland (CVON 2012-03), the Netherlands Organization for Scientific Research (NWO-VIDI 864.13.013), and an FP7 ERC Advanced Grant to C.W. (Agreement no. 2012-322698). We thank P. Cornelis and all members of the Raes lab for lively discussions and feedback. We acknowledge the contribution of Flemish GPs and pharmacists to data and sample collection. Finally, we thank all FGFP volunteers for participating in the project. Data are available at the European Genotyping Agency (https://www.ebi.ac.uk/ega/)–study no. EGAS00001001689.
View Abstract

Stay Connected to Science

Navigate This Article