Global Transposon Mutagenesis and a Minimal Mycoplasma Genome

See allHide authors and affiliations

Science  10 Dec 1999:
Vol. 286, Issue 5447, pp. 2165-2169
DOI: 10.1126/science.286.5447.2165


Mycoplasma genitalium with 517 genes has the smallest gene complement of any independently replicating cell so far identified. Global transposon mutagenesis was used to identify nonessential genes in an effort to learn whether the naturally occurring gene complement is a true minimal genome under laboratory growth conditions. The positions of 2209 transposon insertions in the completely sequenced genomes of M. genitalium and its close relative M. pneumoniae were determined by sequencing across the junction of the transposon and the genomic DNA. These junctions defined 1354 distinct sites of insertion that were not lethal. The analysis suggests that 265 to 350 of the 480 protein-coding genes ofM. genitalium are essential under laboratory growth conditions, including about 100 genes of unknown function.

One important question posed by the availability of complete genomic sequences (1–3) is how many genes are essential for cellular life. We are now in a position to approach this problem by rephrasing the question “What is life?” in genomic terms: “What is a minimal set of essential cellular genes?”

Interest in the minimal cellular genome predates genome sequencing [for a review, see (4)]. The smallest known cellular genome (5) is that of Mycoplasma genitalium, which is only 580 kb. This genome has been completely sequenced, and analysis of the sequence revealed 480 protein-coding genes plus 37 genes for RNA species (2).

The fraction of nonminimal genomes that is essential for cell growth and division has been experimentally measured in yeast (12%) and in the bacterium Bacillus subtilis (9%) (6). The indispensable portion of the B. subtilis genome was estimated to be 562 kb, close to the size of the M. genitalium genome. Theoretical approaches to defining a minimal gene set have also been attempted. With the availability of the first two complete genome sequences (Haemophilus influenzae and M. genitalium) and the assumption that genes conserved across large phylogenetic distances are likely to be essential, a minimal gene set of 256 genes was proposed (7).

Mycoplasma pneumoniae is the closest known relative ofM. genitalium, with a genome size of 816 kb, 236 kb larger than that of M. genitalium (3). Comparison of the two genomes indicates that M. pneumoniae includes orthologs of virtually every one of the 480 M. genitaliumprotein-coding genes, plus an additional 197 genes (8). There is a substantial evolutionary distance between orthologous genes in the two species, which share an average of only 65% amino acid sequence identity. The existence of these two species with overlapping gene content provided an experimental paradigm to test whether the 480 protein-coding genes shared between the species were already close to a minimal gene set. We applied transposon mutagenesis to these completely sequenced genomes, which permitted precise localization of insertion sites with respect to each of the coding sequences.

Populations of 200 to 1000 viable mycoplasmas harboring independent transposon insertions were produced, and libraries of DNA fragments containing the junctions between the transposon and the chromosome were prepared and sequenced (9) (Table 1). Analysis of 2209 transposon junction fragments yielded 1354 different insertion sites. This data set is divided approximately equally between the two organisms. A total of 71% of the insertions were within genes inM. genitalium versus 61% in M. pneumoniae. This represents a substantial preference for intergenic insertion—because coding sequence constitutes 85% of the M. genitalium genome and 89% of the M. pneumoniae genome—and is consistent with the idea that intergenic sequences are less critical than protein-coding regions for viability. Transposon insertions have been identified in 140 different genes in M. genitalium and 179 different genes in M. pneumoniae.

Table 1

Summary of sequenced viable transposon insertion sites.

View this table:

The preference for insertion into the species-specific portion of theM. pneumoniae genome was striking (Fig. 1). The average density of distinct viable insertion events observed inM. pneumoniae–specific regions (1.8 hits/kb) was about 5.5 times that found in the portion common to both species (0.33 hits/kb). This result supports our assumption that the M. pneumoniae–specific portion of the genome is fully dispensable. In addition to the species-specific insertions in M. pneumoniae, insertions were observed widely distributed throughout the shared regions of both genomes. The conspicuous absence of transposon insertions into certain regions expected to be essential—for example, the region containing a cluster of ribosomal genes (MP637 to MP668; Fig. 1)—provides additional support for the validity of transposon mutagenesis as an assay for dispensability. A paucity of hits in other genes involved in transcription, translation, and DNA metabolism was also apparent.

Figure 1

Viable transposon insertions displayed on a composite M. pneumoniaeM. genitalium map. TheM. pneumoniae genome is shown at a scale of 30 kb per line. Colored arrows above the line indicate annotated genes. Genes are colored according to their functional category, as indicated in the key. The genes are numbered sequentially from 1 to 677 as listed on Richard Herrmann's Web page ( html) and are referred to in the text as MP001 to MP677. Red triangles below the line indicate positions of transposon insertions documented inM. genitalium, mapped onto the M. pneumoniaegenome (20). Red triangles above the line indicate positions of transposon insertions documented in M. pneumoniae. Regions of the genome that are M. pneumoniae–specific (absent from M. genitalium) are highlighted in pink directly on the line (20).

Not every transposon insertion within a gene is expected to disrupt gene function. An insertion near the 3′ end of a gene may only remove a nonessential COOH-terminus of the protein. Similarly, an insertion near the 5′ end of a gene may not always destroy gene function. Transposon Tn4001 (10) contains an outward-directed promoter that could drive transcription of flanking chromosomal DNA (11), leading to translation if an internal start site is located nearby downstream. For the purposes of cataloging potentially dispensable genes, we have considered an insertion disruptive if it is within the 5′-most 80% of the gene but downstream of nucleotide 9 of the protein-coding region. This criterion eliminates events in which the 5′ end of the gene may actually be intact because of duplication of a short sequence at the target site (10), and eliminates potentially nondisruptive COOH-terminal insertions. The fraction of putatively disruptive insertions within M. genitalium genes is 66%, compared with 84% for M. pneumoniae (Table 1). This difference can be attributed to a higher proportion of nonessential genes in the M. pneumoniae genome. The majority of M. genitalium orthologs that have disruptive insertions are absent from the third fully sequenced mycoplasma genome,Ureaplasma urealyticum (, consistent with the idea that they are not essential.

One approach to estimating the total number of nonessential M. genitalium genes is to determine the number of dispensable orthologs of these genes in M. pneumoniae (see Table 2). We obtain similar estimates of nonessential M. genitalium orthologs whether we use data from all genes disrupted in M. pneumoniae (∼121 orthologs) or just from those independently disrupted more than once (∼108 orthologs). Using the pooled data from both species (12), the total number of M. genitalium orthologs in which presumptively disruptive insertions are observed is 129, close to the estimates of the total number obtained from the M. pneumoniae data.

Table 2

Estimating the total number of dispensable M. genitalium orthologs in M. pneumoniae. Among the 150 putative gene disruptions in M. pneumoniae, 57 are in genes that have orthologs in the M. genitalium genome, and 93 are in M. pneumoniae–specific genes. We have identified disruptive insertions in 47% (93/197) of the M. pneumoniae–specific genes. It is then a reasonable assumption that 47% of the dispensable M. pneumoniae genes common to both genomes have also been disrupted. This leads to an estimate that ∼121 M. genitalium orthologs are dispensable and ∼318 genes in total (121 + 197) are dispensable in M. pneumoniae. An analogous calculation using only those genes that have been hit more than once leads to an estimate of ∼108 dispensable M. genitalium orthologs.

View this table:

We have also estimated the number of nonessential genes to be ∼180 to ∼215 under the assumption that the number of sites hit per gene follows a Poisson distribution. These larger estimates fit reasonably with the observed proportion of orthologs hit in both species. Therefore, on the basis of our highest and lowest estimates for nonessential genes, we estimate that the number of essential mycoplasma protein-coding genes is between 265 and 350.

The 351 M. genitalium orthologs for which we have not yet identified a disruptive insertion constitute a first approximation to the true set of essential mycoplasma genes. From our estimate of the total number of essential genes (265/351 = ∼3/4), we predict that at least 3/4 of the undisrupted genes are essential. We also expect that most undisrupted genes within each functional class represent essential genes. Examination of the gene disruption data, organized by functional role, reveals that all functional classes of genes are not equally mutable under the selective growth conditions used in this study, which suggests that the genes are closer to a minimal set for some cellular functions than for others (12).

The portion of the mycoplasma genome dedicated to coding lipoproteins is relatively large and suggests that this class of membrane proteins is important to the cell. Among the 19 genes encoding putative lipoproteins, we have identified potential disruptions in 13. There are several plausible interpretations of these seemingly incongruent facts, but perhaps the most likely is that the importance of these proteins is limited to fulfilling essential functions in the human host. This idea is substantiated by the occurrence of disruptions of several genes [M. genitalium/M. pneumoniae orthologs MG191 (MP014), MG192 (MP013), MG218 (MP527), and MG317 (MP388)] that are involved either directly or indirectly in mediating adherence to host cells (13).

The large group of genes with no functional assignment includes many genes with no known homologs outside of the mycoplasmas. As expected, many genes in this group have been disrupted (69 of 180). However, most of the 111 undisrupted genes of unknown function are apparently not dispensable and are expected to encode essential cellular functions.

Relatively few M. genitalium genes have functions related to biosynthesis and metabolism. This limited metabolic capacity has been compensated for by a proportionately greater dependence on transport of raw materials from the extracellular environment. Although the ability to generate nucleotides by salvage pathways has been retained, as have a limited number of biosynthetic and metabolic enzymes, it is evident that these pathways are nonessential in the laboratory, where the organism is apparently able to import nucleosides, amino acids, and other metabolites. Likewise, genes involved in the biosynthesis of cofactors [MG270 (MP450)], fatty acid and phospholipid metabolism [MG310 (MP395)], and hexose conversion [MG118 (MP577)] appear to be dispensable.

Our data strongly support the idea that some metabolic pathways are essential. Glycolysis is thought to be the major source of adenosine triphosphate (ATP) and energy for M. genitalium and M. pneumoniae. We have not observed any disruptive insertions in any of the 10 genes involved in this pathway. Likewise, we have not identified any dispensable genes among the eight genes encoding ATP–proton-motive force interconversion activities.

ABC transporters are a heterotrimeric transport system made up of a specificity (ligand-binding) subunit, a permease, and an ATP-binding protein. ATP-binding subunits are distinct in that many appear to be “orphan” proteins, which are apparently overrepresented, compared to the other two subunits, in all genomes sequenced thus far. Likewise, there are specificity subunits with unknown partners within theM. genitalium genome, although their occurrence appears to be much more limited. Analysis of the M. genitalium genomic sequence data with less stringent searching parameters aimed at finding partners of the orphan specificity subunits led to the identification of potential transport partners (14). Because the sequence relatedness of these transporters was quite low, M. genitalium was thought to compensate for a reduced transporter spectrum by encoding transporters with broadened specificity (14). The current annotation of the M. genitalium genome lists 12 “orphan” ATP-binding proteins. We have obtained central insertions in only three of these genes [MG014 (MP136), MG390 (MP271), and MG467 (MP159)]. The fact that only 25% of the ATP-binding subunits in our data set tolerate insertions suggests that at least some of these orphan subunits do serve an essential function within the cell.

Two of the three subunits of an ABC phosphate transporter [MG410 (MP233) and MG411 (MP232)] have been putatively disrupted. Phosphate transport is thought to be an essential function. The insertion data for these two genes appear to be quite solid, in that MG410 (MP233) insertions were uncovered in both M. genitalium and M. pneumoniae and MG411 (MP232) contained multiple independent insertion events in M. genitalium. This finding forces us to consider the possibility that some as yet undefined transport system exists in these mycoplasmas that can compensate for mutations in the putative phosphate transporter.

Both Mycoplasma species examined in this study include two genes homologous to DNA pol III subunits [MG261 (MP460) and MG031 (MP120)]. We have found an insertion into MG261 (MP460), which supports the idea that it may function as a repair enzyme (3, 15) rather than being the main replication enzyme in the cell. The MG261 (MP460) insertion, together with insertions in therecA and uvrA excision repair genes, represents a special class of dispensable cellular functions. It is not necessarily surprising that cells can tolerate transposon insertions in these genes. It is almost certain that cells bearing such gene disruptions in nature would be quickly selected against. Although it is difficult to address this idea quantitatively, it poses a relevant question for consideration when attempting to define a minimal gene set for cellular life.

As expected, the number of transposon insertions recovered in genes involved in transcription was small. We have identified apparent disruptions in two of the five genes annotated as putative ATP-dependent RNA helicases [MG017 (MP134) and MG308 (MP397)]. The level of functional redundancy within this cellular role is somewhat unexpected and may reflect a broader role for these genes with regard to replication, repair, or transcriptional regulation.

A few insertions were in genes generally believed to be essential (12). Such events represent less than 1% of the total number of mapped insertion sites (1354). They include single putatively disruptive insertion events in two aminoacyl tRNA synthetase genes, the gene for ribosomal protein L28, the DNA replication genesdnaA and gidB, and a sigma factor gene. These unexpected findings forced us to consider explanations other than the dispensability of a function presumed to be essential. Functional assignments for some of these genes on the basis of sequence similarity may be incorrect. Also, some events that meet the criterion used in our analysis may not disrupt gene function. It is highly improbable that these events were recovered by cloning transposon junctions from nonviable cells. However, some cells might contain a functional duplicate copy of a gene in addition to the disrupted gene. It may also be that some functions can be supplied by unexpected uptake of enzymes or other compounds from the medium, or by cross-feeding. Conclusive proof of the dispensability of any specific gene requires cloning and detailed characterization of a pure population carrying the disrupted gene.

It is possible in some cases to verify the disruption of a specific gene function despite the presence of a transposon insertion mutant within a mixed population. For fructose-permease, insertions in both the M. genitalium gene (MG062) and the orthologous M. pneumoniae gene (MP077) were identified. We were able to detect the sequenced insertion events, using the polymerase chain reaction (PCR), in DNA from populations of cells grown in glucose-supplemented medium, but not when the medium was instead supplemented with fructose (Fig. 2).

Figure 2

Analysis of conditionally dispensable genes. PCR was done using one transposon-specific primer directed toward the chromosomal junction, and a gene-specific primer priming toward the particular insertion site to be detected (21). (A) Control experiment in which an insertion in a gene of unknown function [MG296 (MP417)] was detected in the same DNAs used in (B). (B) PCR primers were designed to detect the presence of an insertion in MG062 in pool C of M. genitaliumTn4001 transformants. (C) PCR primers were designed to detect an insertion in the M. pneumoniaeortholog of MG062 (MP077) in pool E of M. pneumoniaeTn4001 transformants. Lane labels G (glucose) and F (fructose) indicate the sugar used to supplement the growth medium (21).

The power of global transposon mutagenesis in our case benefited greatly from application to the fully sequenced and annotated genomes of the closely related M. genitalium and M. pneumoniae. Although M. genitalium may possess close to the minimal genome required for survival in its human host, it is clear from the results presented here that it contains a large number of genes that are dispensable under laboratory growth conditions. Our results imply that of the 111 genes of unknown function that have not been disrupted in our experiments, the majority are essential. The presence of so many genes of unknown function among the essential genes of the simplest known cell suggests that all the basic molecular mechanisms underlying cellular life may not yet have been described. The essential gene set is not the same as the minimal genome. It is clear that genes that are individually dispensable may not be simultaneously dispensable. The data presented here suggest some specific experiments that could be carried out as a first step in the engineering of a cell with a minimal genome in the laboratory environment. One way to identify a minimal gene set for self-replicating life would be to create and test a cassette-based artificial chromosome, an experiment pending ethical review (16).

  • * These authors contributed equally to this report.

  • To whom reprint requests should be addressed.

  • Present address: Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.


View Abstract

Stay Connected to Science

Navigate This Article