Technical Comments

Comment on "The 1.2-Megabase Genome Sequence of Mimivirus"

See allHide authors and affiliations

Science  20 May 2005:
Vol. 308, Issue 5725, pp. 1114
DOI: 10.1126/science.1110820

Raoult et al. (1) analyzed the complete genome sequence of Mimivirus, the largest known virus. One of the most exciting results of this analysis is the identification of seven conserved mimiviral proteins that are evolutionarily related to their eukaryotic homologs: arginyl-tRNA synthetase, methyonyl-tRNA synthetase, tyrosyl-tRNA synthetase, the largest and second largest subunit of RNA polymerase II, DNA-polymerase sliding clamp protein (PCNA), and 5′-3′ exonuclease. Moreover, the phylogenetic analysis of a concatenation of these protein sequences suggests that the Mimivirus emerges at the base of the eukaryotic branch in a tree of species from the three domains of life: eukaryotes, bacteria, and archaea [figure 3 in (1)]. Raoult et al. concluded their analysis by suggesting that “Mimivirus appears to define a new branch distinct from the three other domains” and that it could have played a role in the origin of eukaryotes. Such a revolution in our conception of the tree of life deserves a close scrutiny.

The concatenation of several protein sequences can produce artifactual trees if the proteins have evolved by different mechanisms. The aminoacyl-tRNA synthetases are prominent among the protein families whose members exhibit the most diverse evolutionary histories, as demonstrated by recent phylogenetic analyses (2). For example, the tyrosyl-tRNA synthetase of Escherichia coli was acquired by horizontal gene transfer (HGT) from Gram-positive bacteria (2) (Fig. 1). Multiple HGT events appear to have occurred between eukaryotes and different groups of bacteria (including Proteobacteria and Gram-positive bacteria) for the arginyl-tRNA synthetase, whereas the methionyl-tRNA synthetase of Gammaproteobacteria, such as E. coli, has very likely been acquired by HGT from Archaea (2). Therefore, the aminoacyl-tRNA synthetases from a single species (e.g., E. coli) can have up to three completely different evolutionary origins: bacterial, archaeal, and eukaryotic. The concatenation of these sequences must yield an artifactual phylogenetic tree, where the root most likely occupies a midpoint location to accommodate such a distorted data set (3).

Fig. 1.

Bayesian phylogenetic tree of the tyrosyl-tRNA synthetase, showing the position of Mimivirus (in red) nested within the eukaryotes, as sister to the amoebal genus Entamoeba. The tree also shows the Gram-positive origin of the E. coli gene. Bacterial species are in orange, eukaryotes in green, and archaea in blue. Numbers at nodes are Bayesian posterior probabilities, except for the node concerning the sisterhood of Mimivirus and Entamoeba spp., which also shows the Γ-law corrected maximum-likelihood bootstrap value (100%). The scale bar indicates the number of substitutions per position.

An additional source of problems in reconstructing a universal tree can arise from the use of sequences that are not well conserved among the three domains of life, which introduces a large amount of noise into the data set. This is the case for PCNA, which is very well conserved in archaea, eukaryotes, and Mimivirus, but phylogenetically distant from its bacterial homologs (4). The inclusion of those distant sequences should also induce a midpoint rooting. This problem can be magnified if a reduced number of species is used to reconstruct the phylogenetic tree (5), as in the phylogenetic analysis of Raoult et al. (which used only three sequences for each one of the three domains of life). The only reliable way to reconstruct the evolutionary history of the Mimivirus using the selected protein sequences is to use a much larger sampling of species, including species from the eukaryotic group parasitized by this giant virus, the amoebae. When this approach is used, the phylogenetic trees obtained are completely different, and the Mimivirus now emerges well nested within the eukaryotes for all seven proteins studied by Raoult et al. For example, Fig. 1 shows a Bayesian phylogenetic tree for the tyrosyl-tRNA synthetase, where Mimivirus is closely related, with very strong statistical support, to several amoebal species of the genus Entamoeba. This strongly suggests that the Mimivirus has acquired this gene from its amoebal host. The same is observed for other proteins, especially for those with a better sequence conservation.

The possible role of viruses in the origin and early evolution of the cellular organisms has been a matter of intense debate (69). However, in most cases it becomes clear that these parasites, instead of being a source of new genes for their hosts, are actually very efficient “gene pickpockets” that acquire genetic material from the cells that they parasitize (8). This is most likely also the case for the Mimivirus, which certainly does not define a fourth domain of life but still deserves the title of king among these gene robbers. Mimivirus is therefore an amazing model to study gene acquisition and genome size increase in viruses. Future studies will hopefully clarify the forces behind the voracious appetite of this virus for eukaryotic genes.

References and Notes

Stay Connected to Science

Navigate This Article