Condition-Dependent Transcriptome Reveals High-Level Regulatory Architecture in Bacillus subtilis

See allHide authors and affiliations

Science  02 Mar 2012:
Vol. 335, Issue 6072, pp. 1103-1106
DOI: 10.1126/science.1206848


Bacteria adapt to environmental stimuli by adjusting their transcriptomes in a complex manner, the full potential of which has yet to be established for any individual bacterial species. Here, we report the transcriptomes of Bacillus subtilis exposed to a wide range of environmental and nutritional conditions that the organism might encounter in nature. We comprehensively mapped transcription units (TUs) and grouped 2935 promoters into regulons controlled by various RNA polymerase sigma factors, accounting for ~66% of the observed variance in transcriptional activity. This global classification of promoters and detailed description of TUs revealed that a large proportion of the detected antisense RNAs arose from potentially spurious transcription initiation by alternative sigma factors and from imperfect control of transcription termination.

Bacterial transcriptomes are surprisingly complex (15) and include diverse and abundant small RNAs and antisense RNAs (asRNAs). Because only a small number of environmental conditions have been investigated, the full extent of transcriptome complexity remains to be established for a single bacterial species. This prompted us to undertake a systematic and quantitative exploration of transcriptome changes in Bacillus subtilis, whose natural habitat, the soil, is subject to severe environmental fluctuations (6). B. subtilis is also a laboratory model for Gram-positive bacteria and is grown in industrial-scale fermentors for the production of enzymes and vitamins. We selected 104 conditions covering the use of various nutrients, aerobic and anaerobic growth, the development of motility, biofilm formation, adaptation to diverse stresses, high–cell-density fermentation, and hallmark adaptations of B. subtilis: the development of competence for genetic transformation, cell differentiation to form resistant spores, and germination from such spores.

In our labs, we grew a prototrophic strain under the selected conditions and hybridized 269 RNA samples to tiled microarrays with a resolution of 22 bases [supporting online material (SOM) 1 and table S1] (7). We then estimated strand-specific RNA signals from the raw data (Fig. 1A) (8) and computed an aggregated expression index for all transcribed regions (SOM 2 and table S2). The expression threshold was stringently set to reduce the number of potentially false short RNA features detected. Of the previously annotated coding sequences (CDSs), only 186 (4.4%) were not expressed under any condition. Most of these CDSs were of unknown function and predicted to originate from horizontal transfer (SOM 3 and table S3). The 30% of the CDSs most highly expressed under each condition were defined as “highly expressed” (SOM 3). Eighty-five percent of all CDSs were highly expressed in one or more conditions (fig. S3A), but only ~3% (144) of all CDSs were highly expressed under all conditions, indicating that most B. subtilis genes are differentially expressed. Genes in the latter group encode proteins with essential functions (9) and enzymes involved in glycolysis, iron sulfur metabolism, and detoxification pathways (table S3).

Fig. 1

Condition-dependent transcription landscape. (A) A region of the B. subtilis genome is shown with 50 representative mRNA abundance profiles (24). Cut-off levels are marked with horizontal lines indicating 1-, 5- and 10-fold above background level. New RNA features are labeled S1 to S1583. The yhjM gene has two alternative 5′ leaders (S373, S374) and a 3′ extended mRNA (S375) due to partially efficient termination. The independent feature S372 is antisense of glcP. (B) Four distinct promoters of the qoxAB genes. A high background is observed on the opposite strand. The lack of termination at ycgT results in an extended transcript (S120) that is antisense of nasDEF. Similarly, extended transcription of the new S555 feature is antisense of the pbpB and ftsL cell-division genes under certain conditions. (C) Expression profiles (three replicates) in the prototype strain (black) and its rho-null mutant derivative (red). In the mutant, the mRNA of ytoQ lacking an intrinsic termination site and that of ylaL with a termination site are elongated up to the next terminator. (D) Projection of the 269 transcriptomes on the three main axes of the principal component analysis (SOM 8). These axes are highly relevant to B. subtilis life-styles and account for up to 60% of the total variability of the data.

For each of the 269 profiles, we used the repertoire of high-confidence 5′ and 3′ mRNA ends to delineate the transcribed regions, which we derived directly from the positions of abrupt increases and decreases in RNA abundance, hereafter called up- and downshifts, respectively (Fig. 1A and SOM 2). More than 85% of previously known transcriptional start sites (10, 11) mapped within ~12 base pairs (bp) of an upshift (SOM 4), indicating that most mRNAs have unprocessed 5′ ends. Therefore, most of the 3242 detected upshifts appear to correspond to genuine promoters, increasing ~threefold the number of promoters mapped in B. subtilis. Downstream of these promoters, 3000 transcription units (TUs) were defined and then decomposed into previously annotated genes and newly identified RNAs (SOM 2 and tables S4 and S5). This analysis revealed that ~46% of all annotated CDSs can be transcribed from more than one promoter. After manual validation, we classified 1583 previously unannotated RNAs (S1 to S1583) according to their structural relationships with neighboring annotated CDSs (Table 1). These RNAs considerably expand the registry of potential regulatory RNAs (SOM 5 and table S6) and increase the total number of potential genes in B. subtilis by 11% (Table 1). The condition-dependent experimental annotation can be queried at

Table 1

Summary of mRNA features. The 1583 previously unidentified RNA features were categorized according to size (columns) and their structural relationships with annotated genes (SOM 2): 5′ and 3′ regions; Indep, RNA independent of previously annotated features; Inter, mRNA segment lying between two genes with distinct promoters; Intra, portion of a polycistronic mRNA from a single promoter. The 3′ RNAs comprised three subgroups: 3′ untranslated regions (3′UTR) and 3′ extended transcripts with either no termination (3′NT) or partial termination (3′PT). Similarly, the Indep RNAs displayed either termination at a site (Indep) or no termination (Indep-NT) (Fig. 1B). L, length; nt, nucleotides; n, number of features. Pred.; predicted.

View this table:

The 3′ ends of most mRNAs were associated with predicted intrinsic terminators (that is, an RNA hairpin followed by a U-track) at which RNA polymerase (RNAP) dissociates without the need for auxiliary proteins (SOM 6 and table S7). A subgroup of 174 TUs generated asRNAs due to the lack of a termination site and read-through transcription (categories Indep-NT, 3′NT, and 3′PT in Table 1). These mRNAs could extend up to 3.4 kb beyond the 3′ boundaries of their cognate CDS with a gradual decrease in signal intensity (Fig. 1B). Overall, 13% of B. subtilis CDSs are overlapped by asRNAs (table S11) that can potentially act to regulate their cognate sense mRNAs at the transcriptional, RNA stability, or translational levels (12). In a mutant strain lacking the termination factor Rho, which is not required for growth of B. subtilis in a rich medium (13), the mRNA extensions reached up to 12 kb (average ~2.8 kb) (Fig. 1C and table S9). Without Rho, additional asRNAs were formed by extension of a subset of TUs, many of which have only partially efficient intrinsic terminators (Fig. 1C and SOM 7), indicating that Rho is a general inhibitor of antisense transcription.

The relations among the expression profiles revealed highly correlated changes in expression (Fig. 1D and SOM 8) and provided functional insights (SOM 9). To elucidate the global transcriptional regulatory network, we used the mRNA signal downstream of each of the 3242 detected upshifts to estimate pairwise correlations between promoter activities (SOM 10). The results are summarized in the form of a tree created by hierarchical clustering (Fig. 2A). In this tree built independently of DNA sequence information, the DNA binding sites for different RNAP sigma subunits (11) are clustered (Fig. 2B), highlighting the prominent role of sigma factor–mediated promoter recognition in differential gene expression.

Fig. 2

Classification of promoters. (A) Hierarchical clustering tree summarizing the correlations (x axis) between promoter pairs arranged on the y axis. Detailed information on each promoter and its neighbors can be found in table S4. (B) Each promoter is shown as a horizontal bar that is colored according to the distinct clusters revealed by the unsupervised identification of bipartite motifs. Color coding of clusters is as displayed in Fig. 2C, and promoters with low-confidence cluster assignment are shown in gray. For each cluster, the defined consensus motif is labeled with the sigma factor name (letter) and motif number (as in table S4). (C) Estimated activity of clusters (y axis, log2 scale) across the conditions that are arranged on the x axis according to the “shortest tour” (SOM 8 and fig. S4). (D) Decomposition of the total variance of promoter activity (SOM 10 and table S12). Unexplained variance is shown in gray.

For de novo genome-wide identification of sigma factor regulons, we developed an unsupervised algorithm combining information from the DNA sequences near the upshifts and from the clustering of promoters to model the bipartite degenerate motifs recognized by the sigma-containing RNAP (SOM 10). This approach is an alternative to the targeted search for motifs in prespecified clusters (14). A predicted sigma factor binding site could be assigned to the majority (2935/3242) of promoters identified (Fig. 2B and table S12). The promoters expected to bind SigA formed six clusters reflecting substantial motif diversity (15), and SigA bound in vivo to all six motifs (SOM 11). The promoters recognized by the other sigma factors formed either distinct clusters, each with a distinctive DNA binding motif (SigB, SigD, SigH, SigI, SigK, and SigL), or clusters that grouped several sigma factors that displayed similar DNA binding motifs (Fig. 2B and SOM 12). This de novo classification is consistent with the cross-recognition observed between the sporulation-specific SigE, SigF, and SigG factors (16, 17) and the extracellular function sigma-factors SigM, SigW, SigX, and SigY (18, 19).

The comprehensive definition of regulons allows quantification of the contribution of each sigma factor to transcriptome plasticity in B. subtilis. The global activity of a single motif cluster was estimated as the average transcription signal for all the promoters of the cluster (Fig. 2C). Assuming a simple linear relation between individual promoters and cluster activities, we computed the fraction of the transcriptome variance explained by each cluster motif (SOM 10). Overall, ~66% of the observed variance was attributed to variations in the activity of sigma factors mainly associated with sporulation and responses to stress, processes that are essential for adaptation and survival in nature (Fig. 2D and table S12). The proportion of variation attributed to sigma factor activities varied considerably. The highest values (>0.75) were obtained for SigB, SigD, and the sporulation sigma factors. In contrast, the lowest value (0.24) was obtained for SigA, suggesting that condition-specific regulation of SigA-dependent promoters relies mostly on other transcription factors (TFs). Our classification captured biologically meaningful regulatory information on TF function, despite the multilevel organization of the TF regulatory network (SOM 13) (20). It also provides leads for further functional analyses and biotechnological applications (SOM 14).

asRNAs exhibited diverse condition-dependent expression profiles (Fig. 3). Indeed, asRNAs are more often initiated from non-SigA promoters (48%) than protein-coding RNAs (26%) (SOM 15 and table S14). asRNAs also tend to be expressed in lower amounts than protein-coding RNAs (fig. S19), and expression of sense and asRNA pairs displays mutual exclusion more often than coexpression (47% negative correlation versus 30% positive correlation). The predominance of negative correlation results primarily from the differential expression of sense and antisense transcripts from SigA and non-SigA promoters, respectively (SOM 15 and fig. S20). Transcription extending beyond the boundaries of CDSs often generates asRNAs (Fig. 1B). Importantly, inefficient termination of a subset of TUs is globally counteracted by the action of Rho, consistent with its role in matching transcription to translational needs (2123). Altogether, ~80% of asRNAs result from transcription initiated by alternative sigma factors or from imperfect transcription termination (table S14). We propose that many asRNAs arise from spurious transcriptional events, which are more prevalent when driven by alternative sigma factors. Supporting this hypothesis, the sigma factor binding sites appear to be less evolutionarily conserved in promoters driving potentially spurious asRNAs than in those driving protein-coding RNAs (SOM 15 and fig. S23). Nevertheless, some asRNAs might have important biological functions (12). We detected asRNAs with established or potential functions (table S6), the latter including the asRNAs expressed in amounts sufficient to regulate their target genes (SOM 15 and fig. S19). In conclusion, our quantitative analysis of the condition-dependent transcriptome defined the contribution of the sigma factor complement to B. subtilis transcriptome plasticity. It revealed that asRNAs generated by inefficient control of transcriptional events might be a drawback of this plasticity, though they might contribute to the creation of previously unknown regulatory functions.

Fig. 3

Patterns of natural antisense transcription across conditions. (A) Variation of the number of highly expressed (upper 30%) asRNAs across the conditions (x axis). (B) Heat map representation: asRNAs are ordered as in table S11 and are colored according to their expression levels.

Supporting Online Material

SOM Text (SOM 1 to 15)

Figs. S1 to S23

Tables S1 to S14

References (26102)

Database S1

References and Notes

  1. Acknowledgments: The data are deposited in the Gene Expression Omnibus (GEO) database (accession numbers GSE27219, GSE27303, GSE27419, and GSE27652), and are tabulated in the SOM and at This Web page provides links to the data archived in the GEO and Openbis databases. This study was supported by the European Commission–funded BaSysBio project (LSHG-CT-2006-037469), the SysMO network funded by the German Federal Ministry of Education and Research (0313978B), Department of Education, Science and Training (CG110055), and the National Health and Medical Research Council (455646). We are grateful to N. Pelé, B. Giovani, C. Sautot, and B. Diep for management support. The authors declare no competing interests.
View Abstract

Stay Connected to Science

Navigate This Article