Decoupling function and taxonomy in the global ocean microbiome

See allHide authors and affiliations

Science  16 Sep 2016:
Vol. 353, Issue 6305, pp. 1272-1277
DOI: 10.1126/science.aaf4507


Microbial metabolism powers biogeochemical cycling in Earth’s ecosystems. The taxonomic composition of microbial communities varies substantially between environments, but the ecological causes of this variation remain largely unknown. We analyzed taxonomic and functional community profiles to determine the factors that shape marine bacterial and archaeal communities across the global ocean. By classifying >30,000 marine microorganisms into metabolic functional groups, we were able to disentangle functional from taxonomic community variation. We find that environmental conditions strongly influence the distribution of functional groups in marine microbial communities by shaping metabolic niches, but only weakly influence taxonomic composition within individual functional groups. Hence, functional structure and composition within functional groups constitute complementary and roughly independent “axes of variation” shaped by markedly different processes.

Microbial communities drive global biogeochemical cycling (1). Bacteria and archaea, for example, strongly influence marine carbon, nitrogen, and sulfur fluxes, thereby modulating global ocean productivity and climate (2, 3). Elucidating the processes that shape microbial communities over space and time is important for predicting how biogeochemical cycles will change with changing environmental conditions.

Taxonomic microbial community profiling can reveal intriguing, but often unexplained, variation between environments (4). Differences in metabolic function among organisms are thought to underlie much of this variation as a result of selection for specific metabolic pathways based on physicochemical conditions (“metabolic niche effects” or “environmental filtering”). However, other factors such as biotic interactions (57), limits to spatial dispersal (4), and neutral demographic drift (8) could also affect community composition. Moreover, distantly related microbes can often perform similar metabolic functions, further complicating a mechanistic interpretation of taxonomic community variation (9). Direct estimates of functional potential in terms of community gene content using environmental shotgun sequencing—or metagenomics—have revealed correlations between the distribution of particular metabolic pathways and environmental conditions (2, 10). However, it is not known how metabolic niche effects interact with other community assembly mechanisms, nor how variation in taxonomic composition relates to ecosystem function. Ideally, one would hope to split community variation into distinct components (“axes of variation”), each of which relates to different ecological processes.

Here, we combine taxonomic and functional profiling of more than 100 bacterial and archaeal communities across the global ocean to elucidate the role of environmental filtering, global functional redundancy, and dispersal limitation in shaping marine microbial communities. We generated taxonomic profiles based on shotgun DNA sequences of the 16S ribosomal gene, a standard marker gene in microbial ecology (11). Functional profiles were generated by associating individual organisms with metabolic functions of particular ecological relevance, such as photoautotrophy and nitrate respiration, using an annotation database that we created on the basis of experimental literature. We then explored taxonomic composition, functional potential, and taxonomic composition within individual functional groups in relation to environmental conditions and geographical location.

We performed regression modeling of the relative abundances of functional groups, as well as the proportions of various operational taxonomic units (OTUs) within each functional group, using 15 key abiotic environmental variables such as dissolved oxygen, salinity, temperature, and depth (table S1). For each OTU or functional group, we used a stepwise model selection algorithm to choose the appropriate subset of environmental variables that maximized predictive power while avoiding unrelated variables. In general, environmental conditions strongly predicted the functional profiles of microbial communities but weakly predicted the taxonomic composition within each functional group, as measured by the cross-validated coefficient of determination of the models (Embedded Image). In particular, the Embedded Image for the relative abundances of almost all functional groups [mean Embedded Image = 0.48 ± 0.27 (SD) over all functions] exceeded the average Embedded Image achieved for the OTU proportions within those groups (mean Embedded Image = 0.15 ± 0.09 over all functions; Fig. 1C). To further explore the relative importance of individual environmental variables, we performed correlation analyses between each functional group or OTU and each environmental variable. Consistent with our regression modeling, most environmental variables either correlated more strongly with relative functional group abundances than with OTU proportions within functional groups, or did not correlate strongly with either of them (Fig. 1, D and E). These patterns persisted at higher taxonomic levels (e.g., genus, family, or order; figs. S1 and S2). Hence, our results are not caused by suboptimal choice of taxonomic resolution, but reflect a general lack of environmental effects on the composition within functional groups.

Fig. 1 Functional versus taxonomic predictability.

(A) Cross-validated coefficients of determination (Embedded Image) for relative taxon abundances, achieved by regression based on environmental predictor variables, averaged across taxa. (B) Absolute rank correlations between environmental variables and relative taxon abundances, at various taxonomic levels (one circle per variable and per taxonomic level). Circle surface area and color saturation are proportional to the absolute correlation. (C) Embedded Image achieved by regression for relative functional group abundances (blue bars) as well as for OTU proportions within each functional group (gray bars, averaged across OTUs in each group). (D) Rank correlations between environmental variables and relative functional group abundances. Blue and red colors indicate positive and negative correlations, respectively. Significant correlations (P < 0.05) are indicated by a black perimeter. (E) Absolute rank correlations between environmental variables and OTU proportions within each group, averaged across OTUs. For full distributions of Embedded Image across taxonomic levels, see figs. S2 and S3. For full distributions of correlations across taxonomic levels, see figs. S24 to S28.

The interaction between environmental conditions and function is only partly reflected in overall taxonomic community structure. Regression models of relative taxon abundances against environmental variables at the whole-community level and at any taxonomic resolution had a lower Embedded Image (Fig. 1A and fig. S3) than achieved for most functional groups (blue bars in Fig. 1C), but comparable to or higher than the mean Embedded Image achieved for the taxon proportions within functional groups (gray bars in Fig. 1C and fig. S2). Similarly, environmental variables were generally less correlated to relative taxon abundances at the community level than to relative functional group abundances (Fig. 1B versus Fig. 1D). Further, the distinction between water column zones based on function is comparable in strength to a distinction based on taxonomy at the whole-community level (Fig. 2, B and C, and fig. S4). Indeed, permutational multivariate analysis of variance (PERMANOVA) tests (11) of dissimilarities in community composition yielded similar pseudo-F statistics (23.8 and 24.4, respectively; P < 0.001 in both cases). Similarly, the segregation of functional groups between water column zones (e.g., mesopelagic versus surface) was comparable to or higher than the segregation of taxa (as determined by permutation tests; Fig. 2, E to G). Together, these results support the interpretation that overall community structure results from the superposition of functional structure, which is strongly related to environmental conditions, and composition within functional groups, which is only weakly related to environmental conditions. In particular, an organism’s metabolic potential appears to be the main trait selected for by environmental conditions across the global ocean. In support of this interpretation, we found that OTUs sharing a higher number of functions tend to correlate with each other more positively (Fig. 2H), consistent with a strong role for metabolic niche effects (11).

Fig. 2 Environmental filtering of microbial communities in the global ocean.

(A) Functional community profiles, with samples ordered according to water column zone. SRF, surface water; DCM, deep chlorophyll maximum; MIX, mixed layer; MES, mesopelagic (11). Darker colors correspond to higher relative abundances. (B and C) Metric multidimensional scaling (MDS) of microbial communities (one point per sample) based on Bray-Curtis dissimilarities in terms of functional groups (B) and OTUs (C). Points in greater proximity correspond to similar communities. (D) Community richness in terms of functional groups and OTUs (one point per sample) after rarefaction. (E to G) Box plots of statistical significances of the segregation of taxa or functional groups between water column zones: DCM versus MES (E), DCM versus SRF (F), and MES versus SRF (G). The horizontal line at 0.05 is shown for reference. (H) Box plots of pairwise rank correlations between relative OTU abundances, depending on the number of shared functions. In (E) to (H), whisker bars extend between the 17th and 83th percentiles.

Our results also suggest that energetic and stoichiometric constraints, not the identity of the microbes, drive ocean microbial metabolic function. Indeed, we found that the proportions of individual OTUs within a functional group were generally only weakly correlated to the group’s overall abundance (figs. S5 and S6). Similar smaller-scale observations have been reported previously. In a wastewater treatment plant, the ratio of aerobic ammonia-oxidizing bacteria and heterotrophic bacteria remained constant over time, while highly variable taxonomic composition within each functional group was only weakly explained by environmental factors (8). In another study, metatranscriptomics in two distinct ocean regions revealed strongly conserved diurnal succession patterns of community gene expression, despite largely nonoverlapping taxonomic affiliations of transcripts between the two regions (12).

If environmental conditions interact with functional community structure, what drives the taxonomic variation within individual functional groups? The high taxonomic richness and variability within functional groups indicate a high global functional redundancy (Fig. 3 and fig. S6), which may have emerged partly as a result of horizontal gene transfer (1). The variation within functional groups that is not explained by our regression models could be due to limited granularity of our environmental characterizations and functional assignments. For example, unknown physicochemical variables and phenotypic differences within functional groups could drive location-dependent growth differences between competing clades. This may explain why, in some cases, functional groups were dominated by a few OTUs in multiple samples despite a high OTU richness (fig. S6). However, the variables considered here are known to be important predictors for marine microbial community composition (13, 14), and it is unlikely that unconsidered environmental variables alone could explain the apparent randomness seen within so many functional groups (fig. S6) when compared to the much better predictability of functional community structure (Fig. 1).

Fig. 3 Functional redundancy in the global ocean.

(A) Number of bacterial and archaeal taxa represented within each functional group (one point per group) at various taxonomic levels. At the OTU and genus level, aerobic chemoheterotrophs represent the most diverse group. (B) OTU proportions within the group of aerobic ammonia oxidizers (one color per OTU). Samples are sorted according to the relative abundance of the entire functional group. For OTU proportions within other functional groups, see fig. S6. (C) Association of functional groups (columns) with members of microbial classes (rows). A darker color corresponds to a higher relative contribution of a class (in terms of the number of OTUs) to a functional group. Rows are clustered by taxonomic relationships [based on the SILVA database, release 119 (28), cladogram shown on the right]; columns are hierarchically clustered by similarity.

Alternatively, spatially limited dispersal may cause distribution patterns not attributable to environmental conditions, although its importance for microorganisms is generally thought to be low and to depend strongly on the type of environment [e.g., lakes versus open ocean (4)]. Marine microorganisms, in particular, can be dispersed at global scales by large ocean currents and were previously found to be recruited from a global marine seed bank (15). To test the effects of dispersal limitation, we compared geographical sample distances with dissimilarities in functional and taxonomic composition at the community level as well as within individual functional groups. To avoid comparisons spanning continental land masses, we only considered sample pairs within the same ocean region. When environmental effects were not taken into account, Mantel correlation tests revealed significant positive correlations between geographical distances and community dissimilarities (P < 0.05; Fig. 4, A and B, fig. S7, A and B, and fig. S8, A to C), consistent with previous findings at similar scales (14). We also observed weaker distance decay in similarity within functional groups (figs. S9 and S10). When we accounted for environmental effects, residual dissimilarities showed no significant correlations with geographical distance, neither at the community level (Fig. 4, C and D, fig. S7, C and D, and fig. S8, D to F) nor within functional groups (figs. S11 and S12). This suggests that the distance decay in similarity is driven by environmental variation affecting community function (and, to a lesser extent, composition within functional groups) and that dispersal limitation in the open ocean is of limited importance relative to other community assembly mechanisms.

Fig. 4 Community dissimilarities versus geographical distances in the mesopelagic zone.

(A and B) Bray-Curtis dissimilarities between microbial communities, compared with geographical distances (one point per sample pair). Community dissimilarities are calculated in terms of relative functional group abundances (A) and relative OTU abundances (B). Spearman rank correlations (ρ) and statistical significances (P) are shown. Only sample pairs in the mesopelagic zone and within the same ocean region are considered. Least-squares regression lines are shown for reference. (C and D) Similar to (A) and (B), but showing residual Bray-Curtis dissimilarities after accounting for environmental effects. For other taxonomic resolutions, see fig. S8. For dissimilarities within functional groups, see figs. S9 and S11. For surface water samples, see fig. S7.

Unexplained variation may be driven by community-level processes such as metabolic interdependencies, chemical signaling and defense, or predation by viruses and eukaryotes (57, 16). For example, predation by bacteriophages specialized on specific hosts can cause variation in host community composition that cannot be attributed to environmental variation (16). Functional community structure and taxonomic composition within functional groups thus appear to constitute roughly complementary axes of variation, with the former being affected more strongly by environmental conditions and the latter shaped greatly by community-level processes (5). Other mechanisms such as priority effects (17) or demographic drift (8) may also explain some of the variation within functional groups.

Marker gene sequencing can yield detailed microbial taxonomic community profiles and reveal intriguing variation—for example, down the ocean water column (14). The high dimensionality of these profiles, however, poses a challenge to their mechanistic interpretation. Generic statistical techniques such as principal components analysis (18) are often used to identify important axes of variation, but these axes typically lack an ecological interpretation. Alternatively, detected species may be combined at higher taxonomic levels that resemble the depth at which traits vary across lineages; however, the optimal taxonomic level is highly trait-dependent (9). Further, many metabolic phenotypes are dispersed irregularly across microbial clades (19). Similar functions may be performed in distant clades (Fig. 3, A to C, and fig. S13) and, conversely, members of the same clade can fill separate metabolic niches (9, 19). For example, none of the 28 metabolic functions considered here appear to be monophyletic (Fig. 3C and fig. S13). Irregular trait distributions are caused by diverse evolutionary processes, including adaptive loss of function (20) and metabolic convergence accelerated by frequent horizontal gene transfer (1). As a result, taxonomic profiles, at any level, often obscure the relationship between community structure and biogeochemical processes.

Here we have shown that binning organisms into functional groups enables the integration of functional, taxonomic, and environmental information to elucidate community assembly processes. Comparison of functional profiles with environmental variables (Fig. 1D) yields insight into the processes driving variation in community composition along geochemical gradients and, reciprocally, gives information about the effects of that variation on ecosystem processes, although specific causal mechanisms need to be experimentally validated. For example, our functional profiles reveal that groups capable of fermentation or nitrate respiration are overrepresented in deeper zones, where oxygen is often limiting and nitrate serves as an alternative electron acceptor for respiration (Fig. 2A). The prevalence of alternative metabolic modes at depth leads to increased richness of both functional groups and OTUs, especially in the mesopelagic zone (Fig. 2D). Although an increase of taxonomic richness with depth has been noted before (14), our analysis shows that this richness gradient is related to the number of available metabolic niches.

The zonation of functional groups with depth (Fig. 2A) is reflected in metagenomic profiles of the same samples (fig. S14) (14) and in previous metagenomic studies (2). However, metagenomic profiles do not directly translate into functional potential, because the same or similar genes can be involved in different—in some cases reversed—metabolic pathways, depending on the microorganism (21). For example, variants of the dissimilatory sulfite reductase genes (dsrAB), abundant in the mesopelagic zone (fig. S14), could be involved in either respiratory sulfur reduction or chemolithotrophic sulfur oxidation (22). Such ambiguities are inherent to metagenomics (21) and apply equally to methods that estimate community gene content by projecting marker genes to sequenced genomes (fig. S14) (23). On the other hand, our phenotype profiles suggest that sulfide and sulfite oxidizers are generally more abundant than sulfate respirers (Fig. 2A), indicating that detected dsrAB genes are mainly involved in sulfur oxidation. Translation of taxonomic information into phenotype profiles based on experimental evidence can thus facilitate the ecological interpretation of metagenomes. The full potential of phenotype profiling remains underused because of our current inability to associate many known taxa with any function. For example, a large fraction of the ubiquitous but poorly studied phylum Thaumarchaeota is potentially involved in ammonia oxidation (24) but had to be excluded from our functional annotations (11). Similarly, microeukaryotes likely contribute to several metabolic functions, such as photosynthesis or cellulolysis. Future functional profiling should thus include eukaryotic microorganisms and incorporate putative metabolic potential for uncultured clades revealed by single-cell genomics (25).

The bulk of global biogeochemical fluxes is driven by a core set of metabolic pathways that evolved in response to past geochemical conditions (1). Through time, these pathways have spread across microbial clades that compete within metabolic niches, resulting in an enormous microbial diversity characterized by high functional redundancy. As shown here, splitting variation of microbial community composition into variation of functional structure and taxonomic variation within functional groups reveals an intriguing pattern: The functional component in itself captures most of the variation predicted by environmental conditions, whereas the residual component (i.e., variation within functional groups) only weakly relates to environmental conditions. This has implications for the interpretation of differences in community structure across environments and time. Differences in taxonomic composition that do not affect functional composition may have little relevance to ecosystem biochemistry; conversely, physicochemically similar environments could host taxonomically distinct communities (26). Functional (rather than purely taxonomic) descriptions of microbial communities should therefore constitute the baseline for microbial biogeography, particularly across transects where geochemical gradients shape microbial niche distribution (27). The residual variation within functional groups can then be analyzed separately to elucidate additional community assembly mechanisms such as biotic interactions, dispersal limitation, or demographic drift. An incorporation of global microbial functional profiles, and their response to potentially changing environmental conditions, into future biogeochemical models will greatly benefit reconstructive and predictive modeling of Earth’s elemental cycles.


Materials and Methods

Tables S1 to S4

Figs. S1 to S28

References (2962)

Additional data table S1


  1. See supplementary materials on Science Online.
  2. Acknowledgments: We thank S. P. Otto, A. L. González, M. M. Osmond, and S. J. Hallam for comments and advice. Supported by the University of British Columbia Department of Mathematics (S.L.) and the Natural Sciences and Engineering Research Council of Canada (S.L., L.W.P., and M.D.). The authors declare that they have no competing financial interests. Metagenomic sequence data are accessioned at the International Nucleotide Sequence Database Collaboration (accession numbers listed in table S2). The FAPROTAX database (Functional Annotation of Prokaryotic Taxa) and associated software, which we developed for generating the functional profiles, are freely available at
View Abstract

Navigate This Article