Stop codon reassignments in the wild

See allHide authors and affiliations

Science  23 May 2014:
Vol. 344, Issue 6186, pp. 909-913
DOI: 10.1126/science.1250691


The canonical genetic code is assumed to be deeply conserved across all domains of life with very few exceptions. By scanning 5.6 trillion base pairs of metagenomic data for stop codon reassignment events, we detected recoding in a substantial fraction of the >1700 environmental samples examined. We observed extensive opal and amber stop codon reassignments in bacteriophages and of opal in bacteria. Our data indicate that bacteriophages can infect hosts with a different genetic code and demonstrate phage-host antagonism based on code differences. The abundance and diversity of genetic codes present in environmental organisms should be considered in the design of engineered organisms with altered genetic codes in order to preclude the exchange of genetic information with naturally occurring species.

In translation, sometimes stop can mean go

The genetic code appears to be largely conserved across all domains of life. Although limited deviations have been reported, Ivanova et al. used metagenomics to survey the prevalence of stop codon reassignment in naturally occurring microbial populations. Certain bacteria and bacteriophages exhibited lineage-specific recoding of their stop codons. In one specific phage, the genome was restructured into two genetic sets. One set of genes was encoded in a way that didn't gel with the host genome and probably helped with infection. A second set of more host-compatible sequences encoded proteins expressed in the later stages of infection.

Science, this issue p. 909

Since the discovery of the genetic code and protein translation mechanisms (1), a limited number of variations of the standard assignment between unique base triplets (codons) and their encoded amino acids and translational stop signals have been found in bacteria and phages (27). Given the apparent ubiquity of the canonical genetic code, the design of genomically recoded organisms with noncanonical codes has been suggested as a means to prevent horizontal gene transfer between laboratory and environmental organisms (810). It is also predicted that genomically recoded organisms are immune to infection by viruses, under the assumption that phages and their hosts must share a common genetic code (6). This paradigm is supported by the observation of increased resistance of genomically recoded bacteria to phages with a canonical code (9). Despite these assumptions and accompanying lines of evidence, it remains unclear whether differential and noncanonical codon usage represents an absolute barrier to phage infection and genetic exchange between organisms.

Our knowledge of the diversity of genetic codes and their use by viruses and their hosts is primarily derived from the analysis of cultivated organisms. This is due to our limited access to genome sequences from uncultivated organisms, which are estimated to account for 99% in prokaryotes (11). Advances in single-cell sequencing and metagenome assembly technologies have enabled the reconstruction of genomes of uncultivated bacterial and archaeal lineages (1214) and the discovery of a previously unknown reassignment of TGA opal stop codons to glycine (4, 5, 14). These initial findings suggest that large-scale systematic studies of uncultivated microorganisms and viruses may reveal the extent and modes of divergence from the canonical genetic code operating in nature.

To explore alternative genetic codes, we carried out a systematic analysis of stop codon reassignments from the canonical TAG amber, TGA opal, and TAA ochre codons in assembled metagenomes and metatranscriptomes from environmental and host-associated samples, single-cell genomes of uncultivated bacteria and archaea, and a collection of viral sequences (Fig. 1A) (15). All sequence data were obtained from the Integrated Microbial Genomes (IMG) database (16). This global collection of sequences comprised 1776 samples from 145 studies, including 750 samples obtained from 17 human body sites (fig. S1) (17). In total, 5.6 terabases of sequence data, including 450 Gb of contiguous sequences (contigs) greater than 1 kb, were analyzed. All samples were classified into human-associated, other host–associated, soil, marine, or freshwater environments according to their metadata (15, 18).

Fig. 1 Recoded DNA sequences identified worldwide.

(A) Workflow used to identify contigs that contain stop codon reassignment. (B) Map showing the locations of 82 environmental samples around the globe together with nine sample sites (derived from 212 samples) of the human body for which recoded sequences have been identified.

We used a statistic of increased coding potential under alternate genetic codes as calculated by ab initio gene finder Prodigal (19), which was selected for its low rate of false-positive predictions (15). Contigs showing significantly higher coding potential when annotated with modified translation tables were forwarded to filtering and quality control to confirm stop codon reassignment through multiple sequence alignments to known homologs from the National Center for Biotechnology Information protein database (Fig. 1A) (15).

By applying this approach to 450 Gb of contigs larger than 1 kb in size, we identified 31,415 contigs with evidence of stop coding reassignment, adding up to a total of 198 Mb of recoded DNA (Fig. 1A). No recoding was observed in the metatranscriptome data. Varying ratios of reassigned to total contigs were observed in samples from terrestrial and aquatic environments and from human mouth, throat, and stool microbiomes (Fig. 1B and fig. S1). The greatest reassignment ratio was in a groundwater sample from a sulfidic aquifer, where 10.4% of all the assembled contigs displayed evidence that one of the three stop codons had been reassigned (table S2). High ratios of contig recoding were also detected in human oral microbiome (table S2).

Reassignment of all three stop codons was found but with different preferences by domain and habitat (Fig. 2). We observed distinct patterns of stop codon reassignment in the three domains of life, with bacteria showing only opal reassignments, ochre reassignments restricted to eukaryotes, and archaea devoid of codon reassignments (15). Among viruses, we found both amber and opal reassignments. These observations are restricted to DNA viruses only because of the scarcity of sequence information for RNA viruses of bacteria and archaea (Fig. 2) (15). Metagenomes of human body sites showed a high rate of reassignments compared with most other sampling sites. Only 10% of all contigs examined in this study originated from human body sites, but they represented 51% of all contigs with codon reassignments. The majority of the remaining stop codon reassignments were found in freshwater environments (44%), representing 13% (56.0 Gb) of all examined metagenomes. In contrast, marine samples contributed only 4% of recoded sequence, although they represent 48% (211.6 Gb) of the total data set (15). This suggests that codon reassignments are more abundant in freshwater than in marine samples.

Fig. 2 Stop codon reassignment by taxonomy and habitat.

Relative abundance of amber, ochre, and opal stop codon reassignments among bacteria, eukaryotes, and viruses in metagenomes of different habitats. For the sake of clarity, contig sets with less than 1 Mb total length for each combination of domain, stop codon, and habitat were excluded; see fig. S5, A and B (15).

Among bacteria, previous reports of recoding were restricted to the reassignment of opal stop codon (35, 20). Despite our extensive sampling of bacterial sequences, we also observed reassignment exclusively for opal codons. Opal reassignments to Trp have been previously observed in Mollicutes (20) and Candidatus Hodgkinia cicadicola (3), and opal reassignments to Gly have been observed in uncultivated representatives of candidate phyla SR1 and Gracilibacteria (4, 5). Our extensive survey suggests that opal reassignment in bacteria is likely limited to the same specific lineages (Fig. 3). The multiple SR1 and Gracilibacteria sequences enabled us to explore the evolutionary origin of stop codon reassignment in these closely related, uncultivated bacterial lineages. A maximum likelihood phylogenetic tree revealed that opal reassignment occurred in the last common ancestor of these sister lineages after its separation from the Peregrines (PERs) group and before the divergence of SR1 and Gracilibacteria (Fig. 3). The same phylogenetic analysis performed for the opal reassigned members of the class Mollicutes indicates a single reassignment event within the last common ancestor of the Mycoplasmatales and Entomoplasmatales.

Fig. 3 A maximum likelihood phylogenetic tree of bacterial stop codon reassigned sequences, based on concatenated alignments of protein-coding marker genes

The arrow at the root of the tree points to the outgroup (Terrabacteria). The tree shows the recoded taxonomic groups Mycoplasmatales and Entomoplasmatales (opal to Trp), SR1 (opal to Gly), and Gracilibacteria (opal to Gly) along with non-recoded reference phyla. The highly reduced alpha-proteobacterial Candidatus Hodgkinia cicadicola genome was not included. The red circles denote two reassignment events. PVC, Planctomycetes, Verrucomicrobia, and Chlamydiae; FCB, Fibrobacteres, Chlorobi, and Bacteroidetes. Sequences published in (4, 5, 14).

Although the average GC content of the entire data set was 55%, recoded bacterial sequences had an average GC content of 32%, consistent with previous studies (21). In recoded contigs, we observed a shift to low-GC synonymous codons and/or a shift to low-GC nonsynonymous codons for amino acids with similar chemical properties, supporting the hypothesis that changes in the genomic GC content correlate with and may drive reassignment of stop codons (figs. S3 and S4). In addition, low-GC organisms used the ochre stop codon (TAA) to a higher extent (84% in organisms with GC content <32% versus 41% in organisms with GC content >64%) than amber (TAG) and opal (TGA). In extreme cases of low GC content, nearly all genes terminate in an ochre stop codon.

Although our pipeline for alternative genetic code detection was initially developed for prokaryotic genes, it can identify eukaryotic sequences with reassigned stop codons. In agreement with previous reports (2), among eukaryotic sequences we observed recoding for opal reassignment in mitochondrial sequences and ochre and amber reassignment in nuclear sequences. Nuclear sequences with stop codon reassignments appear to belong to the representatives of Ciliophora (table S3), and ochre reassignments were found exclusively in freshwater samples (Fig. 2).

We identified 19 complete and nearly complete DNA phages with amber stop codon reassignments in 177 out of 784 human microbiome samples (15). Previous reports of alternative genetic codes in DNA viruses are limited to opal reassignment to Trp in Mycoplasma phages (7, 22). In our study, we identified two phages with opal reassignment to Gly, a phage with amber reassignment to Ser, and 14 phages with amber reassignment to Gln (table S8)—a code previously observed only in nuclear genes of eukaryotes (23). These reassignments are supported by protein alignments (figs. S6 to S8), as well as by the presence of amber-recognizing Gln-tRNACUA and Ser-tRNACUA in the phage contigs (figs. S9 and S10). We infer from the genome structure of recoded DNA phages that they belong to the order of Caudovirales. None of the amber-reassigned phage sequences was embedded into recognizable bacterial sequences, and no integrases were detected in the phage genomes, suggesting that these DNA phages are lytic.

Because phages largely depend on the translational machinery of their hosts, it has been suggested that they must use the same genetic code (6, 9). Evidence supporting the matching usage of genetic codes between an opal-reassigned phage and its host was obtained by looking for footprints of phage infections in phage-derived spacers of the CRISPR (clustered regularly interspaced short palindromic repeat) adaptive immune system of bacteria (24). Out of 26 unique spacers found on CRISPR-harboring contigs with opal reassignment, two spacers had an exact full-length match in the sequence of opal-recoded phages (table S9). Alignment of protein-coding genes on both contigs confirmed that they have opal to Gly reassignment.

The observation of amber reassignments in phages raises questions about the genetic code of their target hosts, given the apparent absence of amber-recoded bacterial genomes from environments in which amber-recoded phages were present (Fig. 2). This raises the possibility that genetic code differences between phages and hosts do not constitute an obligate barrier to phage infection. By analyzing 29,017 spacers found in CRISPR elements from 553 human oral and stool samples, we identified five spacers (each 33 to 37 bp long) that were identical to sequences from three different amber-recoded phage genomes (table S9). The contigs containing the CRISPR spacers also included bacterial genes with the full complement of canonical stop codons. The identified bacterial genes were nearly identical to genes from two Prevotella strains that were isolated from human airways and subgingival plaque and shown to have a standard genetic code. These data suggest that amber-reassigned phages can infect hosts with different genetic codes, in this case the standard code.

To gain further insight into mechanisms that may enable amber-recoded phages to infect hosts with different genetic codes, we examined the assembled genomes of amber-recoded phages. In several of these phage genomes, we identified genes for peptide chain release factor 2 (RF-2), which terminates translation at ochre and opal stop codons. Sequenced isolate phages lack genes for release factors, apparently harnessing the host-encoded release factors. The presence of RF-2 in a phage genome suggests that the phage may infect a host lacking RF-2; a hallmark of opal-reassigned bacterial genomes (3, 25). Consistent with this possibility, the human oral cavity environments where amber-recoded RF-2–containing phages were detected lack amber-recoded bacteria but are enriched for opal-recoded bacteria. A further atypical feature noted in the genome of one of these phages (phage 2) is a bimodal pattern of amber reassignment across the genome (Fig. 4A). Initial annotation of this phage genome suggests that it is a lytic phage broadly related to T4 (fig. S11), in which amber has been reassigned to code for Gln.

Fig. 4 Phage infections across genetic code boundaries.

(A) Genome of phage 2. The phage genome is broadly divided into two domains with strong bias in codon utilization as well as strand preference. (B) Model of infection of opal-recoded hosts by amber>Gln-recoded phages.

This phage also contains a noncanonical Gln-tRNACUA, but closer examination of amber distribution across its genome reveals two large domains with distinct gene content and codon usage. The low-amber (LA) domain contains genes often found in early-stage phage infection, such as DNA polymerase. The LA domain also contains the RF-2 gene required for normal translation of amber-recoded genes. Open reading frames in the LA domain are almost entirely devoid of in-frame amber codons and instead rely nearly exclusively on canonical glutamine codons to encode for glutamine (Fig. 4A). In contrast, the high-amber (HA) domain with frequent in-frame amber codons contains genes often associated with late stages of phage infection, such as packaging and assembly components (e.g., predicted tail fiber protein, minor tail protein, tail tape measure protein, or tail-associated lysozyme) (Fig. 4A). This distinct codon utilization, combined with the presence of RF-2 and a Gln-tRNACUA in the amber-recoded phage, suggests that the amber-recoded phage actively interferes with the translation of its presumed opal-recoded host through a proposed mode of phage-host antagonism (Fig. 4B). In this model, upon initial phage infection abundant host-derived RF-1 (the releasing factor that terminates peptide chain elongation at amber codons) interferes with the translation of amber-containing phage HA domain genes, so they are initially not expressed. In contrast, critical amber-free phage LA domain genes can be normally translated. Next, phage-derived expression of RF-2 increasingly interferes with translation of opal-recoded host genes. Last, the simultaneous depletion in host-derived RF-1 and the increasing availability of phage-derived Gln-tRNACUA enable the efficient production of assembly and packaging proteins from the phage HA-domain. Although direct in vivo observations of such processes remain to be established, this evidence supports a mechanism of phage-host antagonism in which the host’s viability is undermined by the phage through the targeted codon-based disruption of the translation of the host’s genetic code.

This survey of environmental sequence data revealed the abundance and diversity of stop codon reassignments in prokaryotes and phages. Several lines of evidence suggest that phages are not obligated to adapt to the codon usage of their hosts and that phages can exploit differences in codon usage to manipulate their hosts. Recently, genomically recoded organisms were created in an attempt to isolate the organism’s genetic information from horizontal transfer to natural organisms and viruses (9). The diversity and abundance of recoding among uncultured environmental microbes and their phages suggests that even synthetic genomically recoded organisms (9) may not be immune to the exchange of genetic information with microbes and phages that populate many ecosystems.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S12

Tables S1 to S11

References (2653)

References and Notes

  1. Materials and methods are available as supplementary materials on Science Online.
  2. Acknowledgments: We thank the DOE JGI production sequencing, IMG, and Genomes OnLine Database teams for their support and J. Kim, A. Tadmor, and A. Nord for reviewing the manuscript. The work conducted by the DOE JGI was supported in part by the Office of Science of DOE under contract DE-AC02-05CH11231. Supporting data can be accessed through and can be downloaded from
View Abstract

Stay Connected to Science

Navigate This Article