Transcriptome Complexity in a Genome-Reduced Bacterium

See allHide authors and affiliations

Science  27 Nov 2009:
Vol. 326, Issue 5957, pp. 1268-1271
DOI: 10.1126/science.1176951

Simply Mycoplasma

The bacterium Mycoplasma pneumoniae, a human pathogen, has a genome of reduced size and is one of the simplest organisms that can reproduce outside of host cells. As such, it represents an excellent model organism in which to attempt a systems-level understanding of its biological organization. Now three papers provide a comprehensive and quantitative analysis of the proteome, the metabolic network, and the transcriptome of M. pneumoniae (see the Perspective by Ochman and Raghavan). Anticipating what might be possible in the future for more complex organisms, Kühner et al. (p. 1235) combine analysis of protein interactions by mass spectrometry with extensive structural information on M. pneumoniae proteins to reveal how proteins work together as molecular machines and map their organization within the cell by electron tomography. The manageable genome size of M. pneumoniae allowed Yus et al. (p. 1263) to map the metabolic network of the organism manually and validate it experimentally. Analysis of the network aided development of a minimal medium in which the bacterium could be cultured. Finally, G‡ell et al. (p. 1268) applied state-of-the-art sequencing techniques to reveal that this “simple” organism makes extensive use of noncoding RNAs and has exon- and intron-like structure within transcriptional operons that allows complex gene regulation resembling that of eukaryotes.


To study basic principles of transcriptome organization in bacteria, we analyzed one of the smallest self-replicating organisms, Mycoplasma pneumoniae. We combined strand-specific tiling arrays, complemented by transcriptome sequencing, with more than 252 spotted arrays. We detected 117 previously undescribed, mostly noncoding transcripts, 89 of them in antisense configuration to known genes. We identified 341 operons, of which 139 are polycistronic; almost half of the latter show decaying expression in a staircase-like manner. Under various conditions, operons could be divided into 447 smaller transcriptional units, resulting in many alternative transcripts. Frequent antisense transcripts, alternative transcripts, and multiple regulators per gene imply a highly dynamic transcriptome, more similar to that of eukaryotes than previously thought.

Although large-scale gene expression studies have been reported for various bacteria (17), comprehensive strand-specific data sets are still missing, limiting our understanding of operon structure and regulation. Similarly, the number of classified noncoding RNAs in bacteria has recently been expanded (8), but a complete and unbiased repertoire is still not available. To obtain a blueprint of bacterial transcription, we combined the robustness and versatility of spotted arrays [62 independent conditions and 252 array experiments (9)], the superior resolution of strand-specific tiling arrays (Fig. 1A) (designed after genome resequencing, table S1), and the mapping capacity of RNA deep sequencing [direct strand-specific sequencing (DSSS)] (Fig. 1A and fig. S1) to analyze one of the smallest bacteria that can live outside a host cell, Mycoplasma pneumoniae, with 689 annotated protein-coding genes and 44 noncoding RNAs (ncRNAs).

Fig. 1

Transcriptome feature in the reference condition. (A) The first operon in the genome on the forward strand has a staircase behavior, meaning that the consecutive genes have lower and steady expression levels. (B) Example of an antisense RNA transcript. (C) Analysis of staircase operons. (Left) All reference operons subdivided by the number of protein-coding genes they contain. (Right) All reference operons subdivided by their staircase behavior (see bottom graphs). (D) (Left) Overlap of operon starts and single-gene starts with previously identified –10 promoter sequence motifs in M. pneumoniae (29) and predicted promoters based on hexamers. (Right) Overlap of operon ends and single-gene ends with predicted transcription termination hairpins.

Considering DSSS under reference conditions (9) and 43 tiling arrays from four time series (growth curve, heat shock, DNA damage, and cell cycle arrest) (table S8), we observed the expression of all genes. Using a segmentation algorithm for the tiling arrays (10), we identified an additional 117 regions with no previous annotation (table S2) (9). These regions were further confirmed by DSSS (Fig. 1B and fig. S1) and in four cases by quantitative polymerase chain reaction (table S3). Sequence similarity with known proteins revealed the presence of two previously undescribed protein-coding genes, one pseudogene, one N-terminal truncation, and five 5′ extensions of known genes (table S2). The remaining 108 transcripts are probably regulatory rather than structural RNAs, because comparison of their predicted secondary structures with the ones of coding genes does not show any substantial difference (9). Eighty-nine of them are antisense with respect to previously annotated genes. Out of the nonoverlapping ones, two of them (NEW87 and NEW8) are conserved in M. genitalium and could be involved in DNA replication and repair, and in peptide transport, respectively (9) (figs. S3 to S5). In total, 13% of the coding genes are covered by antisense; this is twice more than in yeast (7%) (11) and about half of what was reported for plants (22.2%) (12, 13) or humans (22.6%) (14). Antisense transcripts may affect expression of the overlapping functional sense transcripts through several mechanisms (15): Double-stranded RNA-dependent mechanisms require coexpression with their targets (16), whereas transcriptional interference implies the mutual exclusion of sense and antisense transcripts (17, 18). In M. pneumoniae, we observed a predominance of double-stranded RNA mechanisms as in mammals (19) (47% positive correlation versus 2% negative correlation). In addition, we detected a reduced expression level of genes targeted by antisense transcripts, as reported in some prokaryotes (9, 18) (fig. S6).

We identified operon boundaries through sharp transcription changes in the tiling reference condition by using local convolution methods (Fig. 1A) (9, 20). More than 90% of the operons (139 polycistronic and 202 monocistronic operons, table S4) were well supported by DSSS reads [DSSS alone was not sufficient to unambiguously characterize operons (fig. S2) (9)]. Most polycistronic operons contain two or three genes (Fig. 1C, fig. S7, and table S4); the largest one is the ribosomal operon containing 20 genes. For the majority of operons, we observed a canonical or slightly altered version of a standard sigma 70 promoter region (fig. S8), with transcription starts located within 60 base pairs (fig. S9) upstream of the translation start (6). In contrast to previous suggestions (21), we observed, as proposed by others (22), a preferential use of termination hairpins for the tight regulation of gene expression (Fig. 1, A and D, and table S5). Moreover, we found that almost half of the consecutive genes within polycistronic operons show a decay behavior (Fig. 1A and fig. S1), indicating that such staircase-like expression is a widespread phenomenon in bacteria (9).

Analysis of the 43 tiling arrays and integration with 252 spotted arrays representing 173 independent conditions, some of them from time series, revealed context-dependent modulation of operon structures involving repression or activation of operon internal genes, as well as of genes located at the beginning or end (Fig. 2, A and B, fig. S10, and table S5). In some cases this modulation can be assigned to specific environmental changes. Down-regulation of the first four genes of the ftsZ operon involved in the initiation of cell division corresponds to entry into stationary phase (Fig. 2B, lower panel). An increase in the expression of arginine fermentation genes (arcA, arcI, and arcC) (Fig. 2B) in stationary phase could be a mechanism to cope with acidification (23). We found formal evidence for a total of 447 transcriptional units (336 monocistronic and 111 polycistronic), implying a high rate of alternative transcripts (42%) in this bacterium under the conditions studied, similar to that in eukaryotes (40%, although still under debate) (24) and archea (40% in H. salinarum) (25). We found that genes that are split into different suboperons tend to belong to different functional categories (9). Thus, although genome reduction leads to longer operons accommodating genes with different functions (26), the latter can still retain internal transcription and termination sites under certain conditions.

Fig. 2

Alternative operon structure. The continuous lines in (A) and (B) indicate expression level measured with tiling arrays. (A) Alternative transcripts discovery pipeline (9). Reference operon 001 is split into three suboperons. (Top) Tiling and DSSS under reference conditions. (Middle) Specific expression changes for genes dNAa and xdj1 involved in DNA repair and replication. (Bottom) The coexpression matrices correspond to the final conditional operon splitting by 252 arrays. (B) Two examples of conditional operons are presented. (Top) Specific induction of the middle genes in operon 126 when the cells reach stationary phase. (Bottom) Repression of the first four genes of the operon 129 involved in cell division, when the cells reach stationary phase. (C), Example of heat shock–induced genes sharing the known CIRCE element. The calculated consensus sequence is represented below.

The high frequency of alternative transcripts of M. pneumoniae genes hints at a situation similar to that in eukaryotes, where many factors contribute to the regulation of gene expression. To further support this hypothesis, we used gene expression clustering under the 62 distinct conditions (table S7) to identify groups of coexpressed genes and their possible common regulatory motifs. Using a correlation cutoff of 0.65, we identified 94 coexpression groups (table S6 and fig. S11), encompassing 416 genes. Thirty of the clusters contained genes from more than two operons. Of these, 14 share a specific sequence motif in their upstream region and another 8 have a specific combination of motifs (fig. S12), which might drive the coexpression (for example, 4 of the 14 motifs are found at splitting sites inside operons). The rest of the genes did not group together, implying complex and multiple levels of regulation orchestrated by the various environmental conditions. This is exemplified by the five heat shock–induced genes containing a regulatory CIRCE (controlling inverted repeat of chaperone expression) element (27) (Fig. 2C). Not all of them clustered together, indicating at least one other regulatory element. Similarly, overexpression of a transcription factor (Fur, ferric uptake regulator) reveals a common motif in all genes significantly changing expression, although they belong to different coexpression clusters (fig. S13 and table S6).

Our work revealed an unanticipated complexity in the transcriptome of a genome-reduced bacterium. This complexity cannot be explained by the presence of eight predicted transcription factors (26). Furthermore, the fact that the proteome organization is not explainable by the genome organization (28) indicates the existence of other regulatory processes. The surprisingly frequent expression heterogeneity within operons, the change of operon structures leading to alternative transcripts in response to environmental perturbations, and the frequency of antisense RNA, which might explain some of these expression changes, suggest that transcriptional regulation in bacteria resemble that of eukaryotes more than previously thought.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S14

Tables S1 to S8


  • * Present address: Center for Genomic Medicine, Graduate School of Medicine, University of Kyoto, Kyoto, Japan.

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank the Genomics core facility at EMBL (Heidelberg, Germany), J. Lozano for help with statistical analysis, and the Ultrasequencing Unit at CRG. This work was funded by the Foundation Marcelino Botín, the Ministry of Education of Spain (MEC)–Consolider, and the European Research Council. M.G. is funded by the Spanish MEC–Formación Profesorado Universitario. V.N. is funded by the Netherlands Organization for Scientific Research.
View Abstract

Stay Connected to Science

Navigate This Article