Research Article

The 1.2-Megabase Genome Sequence of Mimivirus

See allHide authors and affiliations

Science  19 Nov 2004:
Vol. 306, Issue 5700, pp. 1344-1350
DOI: 10.1126/science.1101485

Abstract

We recently reported the discovery and preliminary characterization of Mimivirus, the largest known virus, with a 400-nanometer particle size comparable to mycoplasma. Mimivirus is a double-stranded DNA virus growing in amoebae. We now present its 1,181,404–base pair genome sequence, consisting of 1262 putative open reading frames, 10% of which exhibit a similarity to proteins of known functions. In addition to exceptional genome size, Mimivirus exhibits many features that distinguish it from other nucleocytoplasmic large DNA viruses. The most unexpected is the presence of numerous genes encoding central protein-translation components, including four amino-acyl transfer RNA synthetases, peptide release factor 1, translation elongation factor EF-TU, and translation initiation factor 1. The genome also exhibits six tRNAs. Other notable features include the presence of both type I and type II topoisomerases, components of all DNA repair pathways, many polysaccharide synthesis enzymes, and one intein-containing gene. The size and complexity of the Mimivirus genome challenge the established frontier between viruses and parasitic cellular organisms. This new sequence data might help shed a new light on the origin of DNA viruses and their role in the early evolution of eukaryotes.

Mimivirus, the sole member of the newly proposed Mimiviridae family of nucleocytoplasmic large DNA viruses (NCLDVs) was recently isolated from amoebae growing in the water of a cooling tower of a hospital in Bradford, England, in the context of pneumonia outbreak (1). The study of Mimivirus grown in Acanthamoeba polyphaga revealed a mature particle with the characteristic morphology of an icosahedral capsid with a diameter of at least 400 nm. Such a virion size comparable to that of a mycoplasma cell makes Mimivirus the largest virus identified so far. A phylogenetic study with preliminary sequence data from a handful of conserved viral genes tentatively classified Mimivirus in a new independent branch of NCLDVs (1). The sequencing of the genome of Mimivirus was undertaken to determine its complete gene content, to predict some of its physiology, to confirm its phylogenetic position among known viruses, and to gain insight on the origin of NCLDVs.

Overall Genome Structure

The Mimivirus genome (Fig. 1) was assembled (2) into a contiguous linear sequence of 1,181,404 base pairs (bp), significantly larger than our initial conservative estimate of 800 kbp (1). The size and linear structure of the genome were confirmed by restriction digests and pulsed-field gel electrophoresis. Two inverted repeats of about 900 nucleotides are found near both extremities of the assembled sequence, suggesting that the Mimivirus genome might adopt a circular topology as a result of their annealing, as in some other NCLDVs. From transmission electron microscopy pictures, we estimated the volume of the dark central core of the virion (approximated as a sphere) at about 2.6 × 10–21 m3, which is 3.7 times as large as the core volume of Paramecium bursaria chlorella virus (PBCV-1) (3). This is quite consistent with the respective genome sizes (1180/331 kb = 3.56) of the two viruses, indicating similar physical constraints for DNA packing (i.e., a core DNA concentration of about 450 mg/ml).

Fig. 1.

Map of the Mimivirus chromosome. The predicted protein coding sequences are shown on both strands and colored according to the function category of their matching COG. Genes with no COG match are shown in gray. Abbreviations for the COG functional categories are as follows: E, amino acid transport and metabolism; F, nucleotide transport and metabolism; J, translation; K, transcription; L, replication, recombination, and repair; M, cell wall/membrane biogenesis; N, cell motility; O, posttranslational modification, protein turnover, and chaperones; Q, secondary metabolites biosynthesis, transport, and catabolism; R, general function prediction only; S, function unknown. Small red arrows indicate the location and orientation of tRNAs. The A+C excess profile is shown on the innermost circle, exhibiting a peak around position 380,000 (2) (fig. S1).

The nucleotide composition was 72.0% A+T. The genome exhibited a significant strand asymmetry. Both the cumulative A+C excess and the cumulative gene excess plots (2) (fig. S1) exhibit a slope reversal (around position 400,000, Fig. 1) as found in bacterial genomes and usually associated with the location of the origin of replication. Mimivirus genes are preferentially transcribed away from this putative origin of replication. Despite this local asymmetry, the total numbers of genes transcribed from either strand are similar [450 “R” versus 461 “L” open reading frames (ORFs)]. Repeated sequences represented less than 2.2% of the Mimivirus genome (2).

We identified a total of 1262 putative ORFs of length ≥100 amino acid residues, corresponding to a theoretical coding density of 90.5%. Of these ORFs, 911 were predicted to be protein-coding genes, based on their statistical coding propensity and/or their similarity to database sequences. The remaining ORFs have been downgraded to the unidentified reading frame category. We were able to associate 298 ORFs with functional attributes (2).

The overall amino acid composition of the predicted Mimivirus proteome exhibits a strong positive bias for residues encoded by codons rich in A+T. For instance, isoleucine (9.87%), asparagine (8.89%), and tyrosine (5.43%) are twice as frequent in Mimivirus than in amoeba or human proteins. Alanine (encoded by A+T-poor codons GCN) is half as frequent (3.06%) as in the other two organisms. Similar variations have been observed in the amino acid compositions of other DNA viruses rich in A+T (4). For any given amino acid, the relative usage of synonymous codons is also biased by the A+T-rich genome composition. For instance, ATT is largely dominant for Ile, as is AAT for Asn and TAT for tyrosine. In contrast, GCG is rarely used for Ala, CGG is rarely used for Arg, and GGG and GGC are rarely used for Gly. The codon usage in Mimivirus is almost the exact opposite of the one exhibited by Acanthamoeba castellanii: The least frequent codon in the amoeba is systematically the dominant one for Mimivirus. The codon usage in human genes also differs from the one in Mimivirus but to a lesser extent because of the more even vertebrate codon distribution.

NCLDV Core Genes Identified in the Mimivirus Genome

Iyer et al. (5) identified a set of genes present in all or most members of the four main NCLDV families: Poxviridae, Phycodnaviridae, Asfarviridae, and Iridoviridae. These core genes are subdivided into four classes, from the most to least evolutionarily conserved: Class I includes those found in all known NCLDV genome sequences, class II genes are found in all NCLDV clades but are missing in some species; class III genes are identified in three out of the four NCLDV clades; and class IV genes are found in two clades only (5). The pattern of presence and absence of Class I, II, and III core genes in Mimivirus is summarized in Table 1. We identified homologs for all (9 out of 9) class I genes, 6 out of 8 class II genes, 11 out of 14 class III genes, and 16 out of 30 class IV genes (2) (table S2). Both class II genes that are missing in Mimivirus are relevant to the biosynthesis of 3′-deoxythymidine 5′-triphosphate: thymidylate kinase and 3′-deoxipyridine-5′triphosphate pyrophosphatase (dUTPase), a paradox given its A+T-rich genome. Ectocarpus silicosus virus (ESV) also lacks these enzymes. However, Mimivirus exhibits homologs for the class IV core genes thymidylate synthase and thymidine kinase. Additional nucleotide synthesis enzymes include deoxynucleoside kinase (DNK) and cytidine deaminase, as well as the first nucleoside diphosphate kinase (NDK) identified in a double-stranded DNA (dsDNA) virus. Mimivirus also lacks an adenosine 5′-triphosphate (ATP)–dependent DNA ligase (a class III core gene), which was apparently replaced by a nicotinamide adenine dinucleotide (NAD)–dependent ATP ligase (class IV), as found in Iridoviruses (5). With the exception of RNA polymerase subunit 10, the Mimivirus genome exhibits the same transcription-related core genes as found in Poxviridae and Asfarviridae. This suggests that the transcription of at least some Mimivirus genes occurs in the cytoplasm. Overall, the pattern of presence and absence of core genes (class II to IV) in Mimivirus is unlike any of the established patterns. This confirms our initial suggestion (1) that Mimivirus constitutes the first representative of a new distinct NCLDV class (the “Mimiviridae”).

Table 1.

NCLDV core genes (classes I, II, and III) identified in Mimivirus. Black squares, best matching homologs; X, significant homolog detected in all available genomes; x, not in all in available genomes; sub., subunit.

ORF no. PhycodnaviridaePoxviridaeIrido viridaeAsfar viridae Gene group Definition/putative function (View inline)
L206 X X X I Helicase III / VV D5-type ATPase
R322 X X X I DNA polymerase (B family)
L437 X X X I VV A32 virion packaging ATPase
L396 X x X I VV A18 helicase
L425 X X X I Capsid protein D13L (4 paralogs)
R596 X X X I Thiol oxidoreductase (e.g., E10R)
R350 X X X I VV D6R helicase, + 1 paralog
R400 X X X I S/T protein kinase (e.g., F10L)
R450 X X X I Transcription factor (e.g., A1L)
R339 ◼x X X X II TFII-like transcription factor
L524 x X ◼X X II MuT-like NTP pyrophosphohydrolase
L323 x X ◼X X II Myristoylated virion protein A
R493 ◼X x X X II PCNA + 1 paralog
R313 X ◼x X X II Ribonucleotide reductase, large sub.
L312 X ◼x X X II Ribonucleotide reductase, small sub.
Not found x x X X II Thymidylate kinase
Not found x X X X II dUTPase
R429 - X X III PBCV1-A494R-like (9 paralogs)
L37 X X X III BroA, KilA-N term
R382 X X - III mRNA—capping enzyme
L244 - X X III RNA polymerase subunit 2 (Rbp2)
R501 - X X III RNA polymerase largest sub. (Rpb1)
R195 X X - III Glutaredoxin (e.g., ESV128)
R622 X X - III Dual spec. S/Y phosphatase
R311 - x X X III BIR domain (e.g., CIV193R)
L65 - ◼X X X III Virion-associated membrane protein
R480 - X X III Topoisomerase II
L364 X X - III SW1/SNF2 helicase (e.g., MSV224)
Not found x X X - III RuvC-like HJR (e.g., A22R)
Not found x x - X III ATP-dependent DNA ligase (e.g., A50R)
Not found - x X X III RNA polymerase subunit 10

Global Gene Content Statistics

All predicted Mimivirus ORFs were compared with the Clusters of Orthologous Groups (COG) database (6) with the Reverse PSI-BLAST program (7). We found that 194 Mimivirus ORFs exhibited significant matches with 108 distinct COG families (table S3). This is more than twice the number of COGs represented in PBCV-1 virus (46 ORFs matching with 41 COGs). Compared with other NCLDVs, Mimivirus COG profile exhibits a significant overrepresentation in the functional categories of translation (COG category J), posttranslation modifications (COG category O), and amino acid transport and metabolism (COG category E) (X2 test: P < 0.001, P = 0.006, and P = 0.08, respectively) (2) (table S3).

Features in the Mimivirus Genome Unique Among dsDNA Viruses

The detailed analysis of Mimivirus genome (2) revealed a number of unique features, including many genes never before identified in a viral genome. Until now, some of these genes were thought to be the trademark of cellular organisms. These previously unknown and unique genes are listed in Table 2. They can be classified in four generic functional categories: protein translation, DNA repair enzymes, chaperones, and new enzymatic pathways. In addition, Mimivirus is the sole virus and one of the rare microorganisms that simultaneously possesses type IA, type IB, and type II topoisomerases.

Table 2.

Major new features identified in Mimivirus genome. dTDP, 3′-deoxy-thymidine-5′diphosphate; ADP, adenosine 5′-diphosphate.

ORF no. Definition/putative function Comment
R663 Arginyl-tRNA synthetase Translation
L124 Tyrosyl-tRNA synthetase Translation
L164 Cysteinyl-tRNA synthetase Translation
R639 Methyonyl tRNA synthetase Translation
R726 Peptide chain release factor eRF1 Translation
R624 GTP-binding elongation factor eF-Tu Translation
R464 Translation initiation factor SUI1 Translation
L496 Translation initiation factor 4E (mRNA cap binding) Translation
R405 tRNA (Uracil-5-)-methyltransferase tRNA modification
L359 DNA mismatch repair ATPase MutS DNA repair
R693 Methylated-DNA-protein-cysteine methyltransferase DNA repair
R406 Alkylated DNA repair DNA repair
L687 Endonuclease for the repair of UV-irradiated DNA DNA repair
L315 L720 Hydrolysis of DNA containing ring-opened N7-methylguanine DNA repair
R194 R480 L221 Topoisomerase I pox-like, topoisomerase II, topoisomerase I bacterial type DNA accessibility
L254 L393 Heat shock 70-kD Chaperonin
L605 Peptidylprolyl isomerase Chaperonin
L251 Lon domain protease Chaperonin
R418 NDK synthesis of nucleoside triphosphates Metabolism
R475 Asparagine synthase (glutamine hydrolyzing) Metabolism
R565 Glutamine synthetase (Glutamate-amonia ligase) Metabolism
L716 Glutamine amidotransferase domain Metabolism
R689 N-acetylglucosamine-1-phosphate, uridyltransferase Polysaccharide synthesis
L136 Sugar transaminase, dTDP-4-amino-4,6-dideoxyglucose biosynthesis ExoPolysaccharide synthesis
L780 dTDP-4-dehydrorhamnose reductase ExoPolysaccharide synthesis
L612 Mannose-6P isomerase Glycosylation
L230 Procollagen-lysine,2-oxoglutarate 5-dioxygenase Glycosylation, capsid structure
L543 ADP-ribosyltransferase (DraT) ?
L906 Cholinesterase Host infection?
L808 Lanosterol 14-alpha-demethylase Host infection?
R807 7-dehydrocholesterol reductase Host infection?
R322 Intein insertion In DNA polymerase B

Protein translation–related genes. The inability to perform protein synthesis independently from their host is one of the main characteristics distinguishing viruses from cellular (“living”) organisms. However, tRNA-like genes are found in isolated dsDNA viruses species such as bacteriophage T4 (8) and BxZ1 (9), herpes virus 4 (10), and chlorella viruses (11). The chlorella viruses are also the first ones found to encode a translation elongation factor (EF-3) (12). The genome analysis of Mimivirus now greatly expands the known repertoire of viral genes related to protein translation. In addition to six tRNA-like genes [three Leu (two TTAs, and one TTG), Trp (TGG), Cys (TGC), and His (CAC)], the Mimivirus genome exhibits homologs to 10 proteins with functions central to protein translation: four aminoacyl-tRNA synthetases (aaRSs), translation initiation factor 4E (e.g., mRNA cap–binding), translation factor eF-TU [guanosine 5′-triphosphate (GTP)–binding translocation factor], translation initiation factor SUI1, translation initiation factor IF-4A (a helicase), and peptide chain release factor eRF1. In addition, the Mimivirus genome encodes the first identified viral homolog of a tRNA modifying enzyme (tRNA (Uracil-5-)–methyltransferase). All of these ORFs have significant sequence similarity with their eukaryotic homologs and exhibit all the domains and specific signatures expected from functional representatives of these various gene families. Preliminary functional characterizations have been obtained for several of these genes. For instance, we produced Mimivirus tyrosyl-tRNA synthetase in Escherichia coli, purified it, and measured its enzymatic activity (2) (fig. S2). Crystals of the protein have been obtained and its three-dimensional (3D) structure has been determined (13). In addition, mRNAs encoding Mimivirus tyrosyl-, cysteinyl-, and arginyl-tRNA synthetases are found associated with purified virus particles (2) (table S4), suggesting that they are involved in infection.

New DNA repair enzymes. Genomes are subject to damage by chemical mutagens (e.g., free radicals alkylating agents), ultraviolet (UV) light, or ionizing radiations. Different repair pathways have evolved to prevent the lethal accumulation of the various types of DNA errors. They usually correspond to well-conserved protein families found in the three domains of life (Archaea, Eubacteria, and Eukaria) but to a much lesser extent in viruses. The analysis of the Mimivirus genome revealed several types of DNA repair enzyme homologs, including four never before reported in dsDNA viruses. For instance, we identified two genes (L315 and L720) encoding putative formamidopyrimidine-DNA glycosylases, which serve to locate and excise oxidized purines. The Mimivirus genome also exhibits a UV-damage endonuclease (UvdE) homolog (L687). Although this is the first report of such an enzyme in a dsDNA virus, we identified an isolated UvdE homolog among the “hypothetical” proteins of the recently sequenced Aeromonas hydrophila phage Aeh1 (ORF111c, GenBank accession code: AAQ17773). The major mutagenic effect of methylating agents in DNA is the formation of O6-alkylguanine. The corresponding repair is performed by a DNA-[protein]-cysteine S-methyltransferase. The Mimivirus genome encodes the first viral 6-O-methylguanine-DNA methyltransferase (R693). In addition, Mimivirus R406 ORF is strongly homologous to a number of bacterial genes annotated as belonging to the same alkylated DNA repair pathways. Finally, ORF L359 was found to clearly belong to the MutS protein family, which is involved in DNA mismatch repair and recombination. Again, this is the first DNA repair enzyme of this family described in a dsDNA virus. Aside from the above DNA repair system components, which have never before been reported in dsDNA virus, Mimivirus ORF L386 and R555 encode homologs to the rad2 and rad50 yeast genes, respectively, both central to the repair of UV-induced DNA damage. Homologs for these genes are also found in Iridoviruses. Overall, Mimivirus appears uniquely well equipped to repair DNA mismatch and damages caused by oxidation, alkylating agent, or UV light.

Topoisomerases. DNA topoisomerases are the enzymes in charge of solving the topological (entanglement) problems associated with DNA replication, transcription, recombination, and chromatin remodeling (14). Type I topoisomerases (ATP independent) work by passing one strand of the DNA through a break in the opposite strand. Type II topoisomerases are adenosine triphosphatases (ATPases) and work by introducing a double-stranded gap. Topoisomerases of various types are involved in relaxing or introducing DNA supercoils. With the notable exception of Poxviridae, many dsDNA viruses (including NCLDVs and phages) encode their own type IIA topoisomerase. Accordingly, Mimivirus exhibits a large ORF (>1263 amino acids, R480) 41% identical to PBCV-1 topoisomerase IIA amino acid sequence. Its best database match overall is with a homologous protein in the small eukaryote Encephalitozoon cuniculi (42% identical). More surprisingly, Mimivirus is the first dsDNA virus found to also encode a Poxviridae-like topoisomerase (topoisomerase IB). Mimivirus ORF R194 is 27% identical to Amsacta moorei entomopoxvirus topoisomerase IB (AMV052) and 25% identical to the well-studied vaccinia virus topoisomerase (H6R). In addition, to encode both type IIA and type IB topoisomerases, Mimivirus exhibits the first type IA topoisomerase reported in a virus (14). The ORF L221 best overall database match (37%) is with its homolog in Bacteroides thetaiotaomicron (a Gram-negative anaerobe colonizing the human colon) within a well-defined subgroup of well-conserved type IA eubacterial topoisomerases, the prototype of which is E. coli Omega untwisting enzyme. Among all available genome sequences, only a small number of microorganisms simultaneously exhibit topoisomerases of type IA, IB, and IIA. They include yeast, Deinococcus radiodurans, and various environmental bacteria such as Pseudomonas sp., Agrobacterium tumefaciens, and Sinorhizobium meliloti.

Protein folding. The folding of many proteins, in particular those involved in large molecular assemblies, is guided toward their native structures by different families of protein chaperones. The Mimivirus genome uniquely exhibits two ORFs entirely and highly homologous to chaperones of the HSP70 (DnaK) family. ORF L254 is 42% identical to DnaK protein 2 of Thermosynechococcus elongates, and ORF L393 is 59% identical to bovine heat-shock 70-kD protein 1A. In addition, the Mimivirus genome exhibits three ORFs (R260, R266, and R445) with clear DnaJ domain signatures. Proteins containing a DnaJ domain are known to associate with proteins of the HSP70 family. The above Mimivirus ORFs might thus encode a set of proteins interacting to form a specific viral chaperone system, possibly required for the productive assembly of its huge capsid.

In addition to its gene equipment related to protein folding, Mimivirus is the first to encode a homolog to the lon E. coli heat-shock protein, an ATP-dependent protease thought to dispose of unfolded polypeptides. Mimivirus also exhibits components of the ubiquitin-dependent protein degradation pathway, already described in other NCLDVs. Finally, the Mimivirus genome encodes a putative peptidyl-prolyl cis-trans isomerase of the Cyclophilin family (ORF L605). This type of enzyme, seen here in a virus for the first time, accelerates protein folding by catalyzing the cis-trans isomerization of proline imidic peptide bonds. Again, this new virally encoded function might be required for the Mimivirus capsid to be assembled within physiological time limits.

New metabolic pathways. The genome analyses of large Phycodnaviruses and other NCLDVs already contributed the notion that large viruses possess significant metabolic pathways in addition to the minimal infection, replication, transcription, and virion packaging systems. PBCV-1, for instance, exhibits enzymes for the synthesis of homospermidine, hyaluronan, guanosine diphosphate (GDP)–fucose, and many other sugar-, lipid-, and amino acid–related manipulations (15). With its larger genome, Mimivirus builds on this established trend by exhibiting previously described as well as new virally encoded biosynthetic capabilities.

For instance, Mimivirus genome encodes homologs to many enzymes related to glutamine metabolism: asparagine synthase (glutamine hydrolyzing) (ORF R475), glutamine synthase (ORF R565), and guanosine 5′-monophosphate synthase (glutamine hydrolyzing) (ORF L716). All are identified in a dsDNA virus for the first time. In addition, Mimivirus exhibits a glutamine: fructose-6-P aminotransferase (i.e., glucosamine synthase) as previously described in PBCV-1. Mimivirus can proceed further along this pathway with the use of its own encoded N-acetylglucosamine-1-phosphate uridyltransferase (the well-studied GlmU enzyme) (ORF R689) to synthesize uridine 5′-diphosphate–N-acetyl-glucosamine. This metabolite is central to the biosynthesis of all types of polysaccharides in both eukaryotic and prokaryotic systems. The Mimivirus genome encodes six glycosyltranferases: three from family 2, and one each from families 8, 10, and 25.

Glycosyltransferases form a complex group of enzymes involved in the biosynthesis of disaccharides, oligosaccharides, and polysaccharides that are involved in the posttranslational modification of proteins (N- and O-glycosylation), and the synthesis of lipopolysaccharides included in high–molecular weight cross-linked periplasmic or capsular material. Among other NCLDVs, PBCV-1 has been well studied in that respect and shown to encode an atypical N-glycosylation pathway and hyaluronan biosynthesis (15). Other chloroviruses promote the synthesis of chitin (16). Preliminary proteomic studies of Mimivirus particles (see below) indicate that several proteins are glycosylated, including the predicted major capsid protein. In addition, Mimivirus particles are positive upon standard Gram staining (1), suggesting the presence of a reticulated polysaccharide at their surface. It is likely that some of the Mimivirus glycosyltranferases are involved in its synthesis. For instance, Mimivirus encodes (L136) a homolog to perosamine synthetase. Such an enzyme catalyzes the conversion of GDP-4-keto-6-deoxymannose to 4-NH2-4,6-dideoxymannose (perosamine), which is found in the O-antigen moiety of the lipopolysaccharide of various bacteria. Another Mimivirus ORF (L230) is homologous to procollagenlysine, 2-oxoglutarate 5-dioxygenase. This enzyme catalyzes the formation of hydroxylysine in collagens and other proteins with collagen-like amino acid sequences by the hydroxylation of lysine residues in X-Lys-Gly sequences. These hydroxyl groups then serve as sites of attachment for carbohydrate units and are also essential for the stability of the intermolecular collagen cross-links. Given that Mimivirus also contains a large number of ORFs exhibiting the characteristic collagen triple-helix repeat, it is tempting to speculate that the hairy-like appearance of the virion (1) might be due to a layer of cross-linked glycosylated collagen-like fibrils.

Among other enzymes never yet reported in a virus, Mimivirus includes a NDK [Enzyme Classification (EC): 2.7.4.6] (ORF R418). NDK catalyzes the synthesis of nucleoside triphosphates (NTPs) other than ATP. This enzyme may help circumvent a limited supply of NTPs for nucleic acid synthesis, UTP for polysaccharide synthesis, and GTP for protein elongation.

Finally, Mimivirus is also encoding homologs to three lipid-manipulating enzymes: cholinesterase (L906), lanosterol 14-alphademethylase (L808), and 7-dehydrocholesterol reductase (R807), the physiological roles of which remain to be determined but possibly include the disruption of the host membrane.

Intein and introns. Inteins are protein-splicing domains encoded by mobile intervening sequences (IVSs) (17). They self-catalyze their excision from the host protein, ligating their former flanks by a peptide bond. They have been found in all domains of life (Eukaria, Archaea, and Eubacteria), but their distribution is highly sporadic. Only a few instances of viral inteins have been described, in Bacillus subtilis bacteriophages (18) and in the ribonucleotide reductase alpha subunit of Chilo iridescent virus (CIV) (19). Mimivirus is then the second eukaryotic dsDNA virus exhibiting an intein (2). In contrast with the one described for CIV (lacking a C-terminal Asn), Mimivirus intein is canonical and exhibits valid amino acids at all essential positions, as well as the dodecapeptide homing endonuclease motif (20). For reasons not yet understood, inteins are most often found associated with essential enzymes of the DNA metabolism. Inserted within DNA polymerase B, Mimivirus intein is no exception to this rule.

Self-splicing type I introns are a different type of mobile IVS, self-excising at the mRNA level. They are rare in viruses and mostly found in phages. One type IB intron has been identified in several chlorella virus species (15). Mimivirus exhibits four instances of self-excising intron (2), all in RNA polymerase genes: One in the largest and three in the second-largest subunit.

Gene families or protein domains expanded in Mimivirus. The ankyrin-repeat signature is the most frequent motif, found in more than 30 distinct ORFs. This motif, about 33 amino acids long, is one of the most common protein-protein interaction motifs. It has been found in proteins with a wide diversity of functions. Another protein interaction domain, defined by the BTB signature, is found in 20 ORFs. This domain mostly mediates homomeric dimerization. It is found in proteins that contain the KELCH motif such as Kelch and a family of pox virus proteins. We identified 14 different ORFs exhibiting the protein kinase motif (PFAM) signature (21) of the catalytic domain of eukaryotic protein kinases (P < 0.05). Four of them resemble known cell division–related kinases.

The collagen triple-helix motif is another frequently represented motif, found in eight ORFs. This motif is characteristic of extracellular structural proteins involved in matrix formation and/or adhesion processes. Like other collagens, the product of these collagen-like ORFs might be posttranslationally modified by the procollagen-lysine, 2-oxoglutarate 5-dioxygenase homolog uniquely found in Mimivirus genome. Mimivirus also contains eight ORFs with significant similarity to helicases. Finally, Mimivirus exhibits eight ORFs containing a specific glucose-methanolcholine (GMC) oxidoreductase motif. The role of these flavin adenine dinucleotide flavoproteins is unknown.

Phylogeny

Relationship to other NCLDVs. Our preliminary study based on the protein sequences of ribonucleotide reductase small and large subunits and topoisomerase II (1) suggested an independent branching of Mimivirus in the phylogenetic tree of NCLDVs (1). This analysis was refined by using the concatenated sequences of the eight “class I” genes conserved in Mimivirus and all other NCLDVs. The resulting phylogenetic tree again suggested that Mimivirus defines an independent lineage of NCLDVs (Fig. 2) roughly equidistant from known Phycodnaviruses and Iridoviruses.

Fig. 2.

Phylogenetic position of Mimivirus among established NCLDV families. Viral species representing the diverse families of NCLDV are included as follows: Mimivirus, Phycodnaviridae (PBCV and ESV), Iridoviridae [CIV, Regina ranavirus (RR), lymphocystis disease virus type 1 (LDV), and infectious spleen and kidney necrosis virus (ISKNV)], Asfarviridae (African swine fever virus), and Poxviridae [Amsacta moorei entomopoxvirus (AME), variola virus (VAR), fowlpox virus (FOP), bovine papular stomatitis virus (BPSV), Yaba monkey tumor virus (YMTV), sheeppox virus (SHP), and swinepox virus (SWP)]. Fully sequenced viral genomes were analyzed to ensure the proper assessment of orthologous genes. This tree was built with the use of maximum likelihood and based on the concatenated sequences of eight conserved proteins (NCLDV class I genes): vaccina virus (VV) D5-type ATPase, DNA polymerase family B, VV A32 virion packaging ATPase, capsid protein, thiol oxidoreductase, VV D6R helicase, serine/threonine protein kinase, and A1L transcription factor. One of the class I genes (VV A18 helicase) was absent in LDV and was not included. The alignment contains 1660 sites without insertions and deletions. A neighbor joining tree and a maximum parsimony tree exhibited similar topologies (2). Bootstrap percentages are shown along the branches.

Relationship to the three domains of life. There are 63 COGs common to all known unicellular genomes from the three domains of life: Eukarya, Eubacteria, and Archaea. Seven of them are now identified in the genome of Mimivirus: three aminoacyl-tRNA synthetases [ArgRS (COG0018), MetRS (COG0143), and TyrRS (COG0162)], the beta (COG0085) and beta′ (COG0086) subunits of RNA polymerase, the sliding clamp subunit of DNA polymerase [three proliferating cell nuclear antigen (PCNA) paralogs; COG0592], and a 5′-3′ exonuclease (COG0258). The unrooted phylogenetic tree built from the concatenated sequences of those proteins (2) is shown in Fig. 3. Mimivirus branches out near the origin of the Eukaryota domains. This is supported with a high bootstrap value and the Shimodaira-Hasegawa statistical test (2). The tree topology is also invariant to a variety of methodological changes (2) [figs. S3 to S6 and supporting online material (SOM) text]. Consistently, scatter plots for the best BLAST scores against the three domains of life indicate that most Mimivirus ORFs exhibit higher sequence similarities to eukaryotic sequences than to prokaryotic sequences, and are equidistant from the four main eukaryotic kingdoms: Protista, Animalia, Plantae, and Fungi (2) (fig. S7). However, strictly speaking, the tree shown in Fig. 3 can be rooted on any of the deepest branches, including the branch separating Mimivirus from eukaryotes, making its specific affinity with Eukaryota still uncertain.

Fig. 3.

A phylogenetic tree of species from the three domains of life (Eukaryota, Eubacteria, and Archaea) and Mimivirus. The tree was inferred with the use of a maximum likelihood method based on the concatenated sequences of seven universally conserved protein sequences: arginyl-tRNA synthetase (COG0018), methionyl-tRNA synthetase (COG0143), tyrosyl-tRNA synthetase (COG0162), RNA polymerase II largest subunit (COG0086), RNA polymerase II second largest subunit (COG0085), PCNA (COG0592), and 5′-3′ exonuclease (COG0258). The alignment contains 3164 sites without insertions and deletions. Bootstrap percentages are shown along the branches. Similar trees were obtained with the use of a variety of other approaches (SOM text).

Genome Complexity: Mimivirus Versus Parasitic Cellular Organisms

The number of Mimivirus COGs was compared to the numbers found for representatives of the three domains of life with the smallest known genomes: Nanoarchaeum equitans (490 kb), Mycoplasma genitalium (580 kb), and Encephalitozoon cuniculi (2.498 kb) (Fig. 4). Despite its comparable genome size, Mimivirus exhibits fewer identified COGs. However, there was no specific category in which it was significantly underrepresented, except for the translation category (P < 0.01). By this standard, the absence of a functional protein-translation apparatus is what most distinguishes Mimivirus from its parasitic cellular counterparts.

Fig. 4.

Distribution of COG homologs in Mimivirus compared with the cellular organisms of the three domains of life with the smallest known genomes.

Preliminary Analysis of Mimivirus Particles

Detection of viral RNAs. Large viruses such as those of the Herpesviridae family, incorporate viral transcripts during the particle assembly process (22). We thus investigated whether viral RNAs could be found associated with ribonuclease (RNase)–treated Mimivirus particles with the use of reverse transcription polymerase chain reaction and virus-specific primers targeting several genes (2). Positive results were obtained for three aminoacyl tRNA synthetases (TyrRS, CysRS, and ArgRS), DNA polymerase, transcription factor TFIIB, and the predicted major capsid protein gene (L425) (2) (table S4).

Virion proteomics. Constituent proteins of Mimivirus particles were extracted and analyzed. In a preliminary set of experiments, 2D gel electrophoresis resolved 438 spots, many of them visibly corresponding to multiple isoforms of the same gene product (such as glycosylation and phosphorylation) (2) (fig. S8). The most abundant of the best-resolved spots were eluted and characterized by mass spectrometry (Maldi-ToF and ion trap). Six predicted ORF products corresponding to proteins with homologs of known functions were unambiguously identified. As expected, they include the major capsid (L425) and core (L410) proteins but also an mRNA-capping enzyme (R382), thioredoxin (R548), and glutaredoxin (R195), and a GMC-type oxidoreductase (R135).

Virion resistance to adverse conditions. Mimivirus particles remained infectious during 1 year when kept at 4°C, 25°C, and 32°C in Page's amoeba saline (PAS) buffer. Incubation of a suspension of 109 particles in PAS buffer at 55°C from 15 to 90 min reduced its titer by 100. By comparison, no viable E. coli are retrieved when submitted to the same treatment. No diminution in Mimivirus titer was observed after 48 hours desiccation. Mimivirus particles are thus quite resistant to adverse conditions. However, despite its many predicted DNA repair genes, Mimivirus is quickly killed by 35 kilograys of irradiation with gamma rays or exposure for 15 min (30 W, 20 cm) to UV light (2).

Discussion

A common feature to all known viruses is their total dependency on the host translation machinery for protein synthesis. Surprisingly, the Mimivirus genome sequence now reveals genes relevant to all key steps of mRNA translation: tRNA and tRNA charging, initiation, elongation, and termination, with the exception of ribosome components themselves. Two main evolutionary scenarios may account for the presence of this partial complement of translation-related genes in Mimivirus. On one hand, they could be the relics of a more complete ancestral protein-translation apparatus, gradually lost through a genome reduction process similar to the one governing the evolution of intracellular bacteria (23). On the other hand, these genes could have been individually acquired from cellular organisms and used to control the host translation apparatus in favor of Mimivirus mRNAs. The evidence that our phylogenetic analysis did not support a recent acquisition of these genes, together with the low probability that these genes were acquired independently, is in favor of the loss rather than the gain scenario. By extrapolating this model, we could speculate that the Mimivirus lineage originated from a more complex ancestor possibly exhibiting an even more complete protein-translation machinery.

By its particle size, and now by its genome complexity, Mimivirus significantly challenges our vision of viruses. Lwoff (24) proposed that viruses should have at least one dimension lower than 200 nm and speculated that viruses may possess only one type of nucleic acid. Both criteria are invalidated by Mimivirus. Lwoff also pointed out the lack of enzymes generating energy from substrates. This criteria is still valid, because very few genes of this category were detected in Mimivirus. Other criteria such as the strictly intracellular character and the inability to grow or undergo binary fission have not yet been challenged. By these three last criteria, Mimivirus remains a regular virus. However, by the unprecedented number of enzymes and putative metabolic pathways encoded by its 1.2-Mb genome, Mimivirus blurs the established frontier between viruses and the parasitic cellular organisms with small defective genomes such as Rickettsia prowazekii (25), Buchnera (26), Nanoarchaeum (27), Mycoplasma (28), and Tropheryma whipplei (29). As of today, the genome of Mimivirus is larger than the published genomes of 20 cellular organisms from two domains of life (e.g., Archaea and Eubacteria) and five main bacterial divisions: Proteobacteria, Firmicutes, Actinobacteria, Chlamydiae, and Spirochaetes. The presence versus absence of ribosomes remains, at the moment, a key property distinguishing these minimal cellular organisms from large DNA viruses.

Several independent studies have led to the hypothesis that DNA viruses may have a common origin, and a common ancestor, originating before the emergence of the three domains of life (30). Given the inherent uncertainty of phylogenetic reconstruction dating back 3 billion years ago, our results (Fig. 3) are consistent with the hypotheses that a lineage of large DNA viruses could have emerged before the individualization of cellular organisms from the three domains of life (31) or from an ancestor distinct of these three domains (32). The topology of this new “tree of life” is also consistent with the hypothesis that ancestral DNA viruses were involved in the emergence of Eukaryotes (3337).

The serendipitous discovery of Mimivirus from samples initially thought to contain a new type of intracellular Gram-positive bacterium allowed the characterization of the largest virus so far. The sequencing of its 1.2-Mb genome revealed a wealth of genes encoding functions never yet encountered in viruses, probably due to its unprecedented size. The numerous new genes related to the protein-translation apparatus challenge the established vision of viruses. Using these first viral representatives of universally conserved gene families, we could now build a tentative tree of life, within which Mimivirus appears to define a new branch distinct from the three other domains. We believe that our work should prompt the search for more giant viruses, the genome analysis of which could shed additional light on the origin of DNA viruses and their role in the evolution of cellular organisms.

Supporting Online Material

www.sciencemag.org/cgi/content/full/1101485/DC1

Materials and Methods

SOM Text

Figs. S1 to S8

Tables S1 to S4

References

References and Notes

View Abstract

Navigate This Article