Metagenomic Analysis of Coastal RNA Virus Communities

See allHide authors and affiliations

Science  23 Jun 2006:
Vol. 312, Issue 5781, pp. 1795-1798
DOI: 10.1126/science.1127404


RNA viruses infect marine organisms from bacteria to whales, but RNA virus communities in the sea remain essentially unknown. Reverse-transcribed whole-genome shotgun sequencing was used to characterize the diversity of uncultivated marine RNA virus assemblages. A diverse assemblage of RNA viruses, including a broad group of marine picorna-like viruses, and distant relatives of viruses infecting arthropods and higher plants were found. Communities were dominated by distinct genotypes with small genome sizes, and we completely assembled the genomes of several hitherto undiscovered viruses. Our results show that the oceans are a reservoir of previously unknown RNA viruses.

High mutation rates and short generation times cause RNA viruses to exist as dynamic populations of genetic variants that are capable of using multiple host species (1). In the oceans, the largest ecosystem on Earth, RNA viruses infect ecologically and economically important organisms at all trophic levels, including heterotrophic bacteria (2), fish (3), crustaceans (4), and marine mammals (5). Recently, a series of previously unknown RNA viruses have been characterized that infect marine phytoplankton. These include positive-sense single-stranded (ss) RNA viruses (HaRNAV and HcRNAV) that lyse the toxic-bloom formers Heterosigma akashiwo and Heterocapsa circularisquama (6, 7), a positive-sense ssRNA virus (RsRNAV) that infects the diatom Rhizosolenia setigera (8), and a double-stranded (ds) RNA virus (MpRNAV) with a genome organization similar to reoviruses that infects the cosmopolitan species Micromonas pusilla (9).

Despite the apparent importance of RNA viruses to marine organisms, almost nothing is known about natural communities of RNA viruses in the sea. The most tantalizing evidence that the diversity of RNA viruses in the sea extends well beyond what has been revealed in culture comes from a study that used gene-specific primers to target a subset of picorna-like viruses (10). The work showed that these positive-sense ssRNA viruses are persistent, widespread, and diverse members of marine virus communities.

Cultivation-independent genomic approaches have recently been used to characterize entire microbial (11, 12) and bacteriophage (13, 14) assemblages from a diversity of ecosystems. This approach does not require prior assumptions of the composition of the target community and produces data that can be used to estimate community structure. For this study we used randomly reverse-transcribed whole-genome shotgun sequencing to characterize the diversity of uncultivated marine RNA virus assemblages.

Natural virus communities were concentrated from English Bay at Jericho Pier (JP) and the Strait of Georgia (SOG), British Columbia, Canada (table S1). RNA was extracted from the purified virus fraction, reverse-transcribed into cDNA, and used to construct libraries representative of the natural RNA virus communities (15). Few sequence fragments [37 and 19% for JP and SOG, respectively (Fig. 1)] showed significant similarity [tBLASTx (16) expect value (E) < 0.001] to sequences in the National Center for Biotechnology Information (NCBI) database and no similarity to sequences from the Sargasso Sea microbial metagenome (17). In contrast, ∼90% of Sargasso Sea microbial sequence fragments are notably similar to sequences in the NCBI database (18). These results imply that most RNA viruses in the sea are distantly related to known viruses and that their genetic diversity is much less explored relative to that of the prokaryotic community.

Fig. 1.

Composition of the JP (outer circle, n = 216) and the SOG (inner circle, n = 61) libraries. The top tBLASTx matches of sequences from JP and SOG with the GenBank nonredundant database (E value G 0.001) are categorized by taxonomic group. Virus families or genera are color coded. The Comoviridae, Dicistroviridae, Marnaviridae, and Picornaviridae are families in the proposed order Picornavirales (25). The percent values for each virus group in each library are shown. The identification of the individual viruses from each taxonomic group can be found in table S1.

Sequence similarity (tBLASTx E < 0.001) in our samples revealed 98% of sequences belonged to positive-sense ssRNA viruses. The one exception was a sequence with similarity to a dsRNA virus. No RNA phage were detected, supporting arguments that most marine phages have DNA genomes (19) and that the predominate hosts of marine RNA viruses are eukaryotes. In addition, no sequences were similar to retroviral or negative-sense ssRNA viruses. Our results are minimum estimates of the richness of marine viral communities, because some viruses may have been excluded by our sampling methods. Nonetheless, we observed sequences resembling those of tombusviruses (20), umbraviruses (21), and nanoviruses (22), all of which are RNA viruses that have not previously been reported from aquatic environments (Fig. 1 and table S2). Most sequences with significant matches to known sequences (77%) were similar to viral genes with known functions, which is not surprising given the limited number of genes encoded by RNA viruses and their relatively small average genome size (Table 1).

Table 1.

Classification of significant tBLASTx matches (E value < 0.001, n = 92) to viral sequences into protein categories.

Protein classification% of total viral hits
RNA-dependent RNA polymerase 39
Capsid 33
Unidentified structural 16
Unidentified nonstructural 7
Helicase 3
RNA binding protein 1
Replication initiator protein 1

The sequence fragments from the two aquatic viral communities were assembled by using a minimal mismatch percentage of 98% and an overlap of 20 base pairs (bp), the most stringent settings given the total introduced error of the RNA virus shotgun library construction method (15). Simulations demonstrated that these parameters correctly reassembled the genomes of different strains of the same species of RNA viruses from a random assortment of sequence fragments. After assembly, 50% of JP and 36% of SOG sequence fragments overlapped with other sequence fragments and formed contiguous segments (contigs) of overlapping sequence fragments. In the JP library, 66% of the overlapping sequence fell within four large contigs, which were subsequently assembled into two complete viral genomes that are similar in structure to each other but that differ from most other known picorna-like viruses (Fig. 2A). In contrast, over 90% of the remaining 14 contigs were formed from three sequence fragments or fewer, indicating that the JP RNA virioplankton is composed of two very abundant genotypes and others that were relatively rare. Similarly, the genotypic composition of the SOG library was also uneven, with 59% of the sequence fragments forming a single contig that contained most of a novel viral genome, including the 3′ untranslated region (UTR), the structural proteins, and all eight conserved regions of the replicase (23) (Fig. 2B). All the remaining sequences fell into contigs composed of two or fewer fragments. Attempts to quantify the structure and diversity of the two RNA virus communities with Phage Communities from Contig Spectrum (PHACCS) (24), an online tool designed to estimate the diversity of phage communities on the basis of the frequency of overlapping sequence fragments from whole-genome shotgun libraries, failed primarily owing to the disproportionate contribution of sequence fragments from a small number of dominant genotypes to the total number of contigs in both RNA virus libraries. Nevertheless, marine RNA virus communities appear to be dominated by even fewer genotypes than the dsDNA phage communities, which are also quite uneven (14).

Fig. 2.

Comparison of the general genomic organization of the RNA virus genomes assembled from the JP and SOG libraries with representative viruses from the (A) proposed order Picornavirales (25) and the (B) family Tombusviridae and genus Umbravirus. Genomes are shown from 5′ to 3′, where conserved RNA virus protein domains are labeled as Hel for helicase; Pro, protease; RdRp, RNA-dependent RNA polymerase; IGR, intergenic region; MP, movement protein; and An, the presence of a poly(A) tail. The characteristic read-through stop codon of the Tombusviridae replicase (represented by a divided RdRp) and the –1 frame shift of the Umbravirus replicase (represented by a staggered RdRp) are also shown (B). Regions in gray refer to sequences that code for protein of unknown function. The colors adjacent to each virus genome correspond to the colors used in Figs. 1 and 3.

The complete genomes assembled from the JP and SOG genomic libraries do not fall within any of the established families of RNA viruses. The JP genomes appear to be dicistronic single molecules of positive-sense ssRNA about 9 kb in size (Fig. 2A). The JP genomes have characteristics similar to viruses in the proposed order Picornavirales (25), including synteny of putative nonstructural genes, a polyadenylate [poly(A)] tail, a similar G + C content, and core regions of sequence similarity. However, phylogenetic analysis based on aligned RNA-dependent RNA polymerase (RdRp) amino acid sequences placed the JP genomes definitively outside the family Dicistroviridae (Fig. 3A), the only dicistronic family of viruses in the proposed order Picornavirales. Instead, the sequences fell within a well-supported clade that included HaRNAV, RsRNAV, and SssRNAV, suggesting that they share a common ancestry with viruses that infect marine protists (Fig. 3A). Phylogenies based on alignments of RdRp sequences from RNA viruses were congruent with established family assignments (10, 23) and hence provided a means of classifying unknown RNA virus sequences from the environment. Like the JP genomes, the SOG genome appears to be from a positive-sense ssRNA virus. BLASTp searches and phylogenetic analyses (Fig. 3B), as well as genomic features such as a putative polymerase domain interrupted by an in-frame termination codon and the absence of obvious helicase motifs (Fig. 2B) (20), indicated similarity to viruses in the family Tombusviridae and the unassigned Umbravirus genus, which infect flowering plants. However, unlike these viruses, the SOG genome had no detectable movement protein (on the basis of sequence similarity) and is therefore unlikely to be from a virus that infects a terrestrial plant.

Fig. 3.

Bayesian maximum likelihood trees of aligned RdRp amino acid sequences from (A) the JP RNA virus community and representative members of the proposed order Picornavirales (25) and from (B) the SOG virus library and representative viruses from the Tombusviridae and Umbravirus genus (see table S3 for complete virus names and sequence accession numbers). Bayesian clade credibility values are shown for relevant nodes in boldface followed by bootstrap values based on neighbor-joining analysis. JP-A, JP-B, and SOG-A are from the assembled environmental genomes. The Bayesian scale bar indicates a distance of 0.1. Environmental sequence numbers followed by a “d” are from excised denaturing gradient gel electrophoresis bands (15).

In the JP sample, 97% of the significant sequence matches were to viruses in the proposed order Picornavirales (25) (Fig. 1 and table S2). Of these, 43% were most similar to HaRNAV, which was first isolated from British Columbia waters (26) and which is the lone genome in the database for a picorna-like, phytoplankton-infecting RNA virus. Although the sequences were divergent from HaRNAV, the results suggest that related viruses were important members of the RNA virus community at the JP site. The second most frequent top scoring matches were to picorna-like virus RdRp sequence fragments amplified from the coastal waters of British Columbia (10), followed by matches to an array of Picornavirales sequences, including viruses infecting higher plants (apple latent spherical virus), arthropods (Taura syndrome virus), and mammals (foot-and-mouth disease virus) (table S2). Nonetheless, the sequences were notably divergent from others in the database, showing that the marine viruses were distantly related to known RNA viruses. One sequence fragment was most similar to a rotavirus sequence, indicating that dsRNA viruses were also likely present, although rare. A significant match (tBLASTx e value = 3 × 10–20) to the RdRp of Sclerophthora macrospora virus A, an unclassified positive-sense ssRNA mycovirus with a unique genome organization (27), further illustrates the genetic novelty of marine RNA viruses.

In contrast, in the SOG sample, 73% of the significant sequence matches and the largest contig containing the highest number of sequence fragments were similar to sequences from the Tombusviridae (Fig. 1 and table S2). Known members of this family infect higher plants and have positive-sense ssRNA genomes greater than 5.5 kb in size (20). These data suggest that another unknown group of viruses related to the Tombusviridae can dominate the RNA virus community in temperate coastal waters. Also present in the community were sequences similar to viruses in the genera Umbravirus and Nanovirus (Fig. 1 and table S2).

Although the population structures of the SOG and JP assemblages were similar, there was little similarity in community composition. Picorna-like virus RdRp sequences were amplified from the JP site but not the SOG sample (10). tBLASTx searches among sequence fragments from both libraries resulted in 7% of SOG and 8% of JP having significant similarity (e value < 0.001) with each other. Numerous factors may have affected the composition of the JP and SOG virus communities, including salinity [12 parts per thousand (ppt) for JP versus 27 ppt for SOG], interannual variability (JP and SOG samples were collected in 2000 and 2004, respectively), and depth of sampling (JP is a surface sample whereas SOG was collected at 11 m; see table S1 for additional station characteristics). An indirectly shared characteristic between the samples was that 4% of JP and 9% of SOG sequence fragments had significant similarity to the same cripavirus (KBV), although these sequences had no demonstrable homology. Even though the JP and SOG communities were very different, they showed the same pattern of unevenness dominated by a few genotypes, consistent with the “boom or bust” oscillations (28) of virus-host dynamics.

Our results demonstrate that marine RNA virus communities are diverse and that their dominant members are distantly related to established groups of viruses. Both Bayesian (29) and neighbor-joining (30) phylogenetic analyses of RdRp sequences strongly supported the occurrence of a distinct clade of marine picorna-like viruses. The only known viruses in this clade are HaRNAV, RsRNAV, and SssRNAV (Fig. 3A), all of which infect marine photosynthetic protists; hence, it seems likely that the environmental sequences were also from viruses that infect phytoplankton. Moreover, the large differences between the communities show that the RNA virus populations can differ greatly between two locations (i.e., they are not the same everywhere).

The congruence between RdRp sequences and the established taxonomy of picorna-like viruses (Fig. 3A) suggests that the environmental RdRp sequences likely originate from 10 previously unknown genera of positive-sense ssRNA viruses. A second well-supported clade included two sequences that were related to sequences from viruses in the genus Cripavirus (Fig. 3A), a group of viruses known only to infect arthropods. This suggests these environmental sequences may have originated from viruses that infect marine arthropods. Phylogenetic analyses of RdRp sequences from the SOG library and representative members of the family Tombusviridae and genus Umbravirus indicated that the environmental sequences did not belong within established genera (Fig. 3B) and clearly supported the existence of other marine RNA viruses that are only distantly related to extant taxa.

Our analyses suggest the existence of a diverse group of RNA viruses that includes sequences related to viruses known to infect marine protists. Compared with the intensive sequencing required to characterize marine prokaryotic communities (17), the relatively small genome size of RNA viruses makes the construction of whole-genome shotgun libraries a realistic approach to rapidly survey the diversity of RNA virus communities. In conjunction with the isolation and the sequencing of individual RNA viruses, genomic surveys of RNA virus assemblages is an important step toward a greater understanding of the diversity and ecological impact of these pathogens in the ocean.

Supporting Online Material

Materials and Methods

SOM Text

Tables S1 to S3


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article