Mechanisms of Evolution in Rickettsia conorii and R. prowazekii

See allHide authors and affiliations

Science  14 Sep 2001:
Vol. 293, Issue 5537, pp. 2093-2098
DOI: 10.1126/science.1061471


Rickettsia conorii is an obligate intracellular bacterium that causes Mediterranean spotted fever in humans. We determined the 1,268,755-nucleotide complete genome sequence ofR. conorii, containing 1374 open reading frames. This genome exhibits 804 of the 834 genes of the previously determined R. prowazekii genome plus 552 supplementary open reading frames and a 10-fold increase in the number of repetitive elements. Despite these differences, the two genomes exhibit a nearly perfect colinearity that allowed the clear identification of different stages of gene alterations with gene remnants and 37 genes split in 105 fragments, of which 59 are transcribed. A 38-kilobase sequence inversion was dated shortly after the divergence of the genus.

Rickettsia species live in different ecological niches inside different arthropod hosts (insects or ticks), in which most of them are transmitted vertically from the mother to the progeny (1). R. conoriinaturally infects the dog brown tick Rhipicephalus sanguineus. When transmitted to humans through tick bites, the bacterium causes Mediterranean spotted fever (1,2). R. conorii is closely related to the previously sequenced R. prowazekii (3), the agent of louse-borne typhus. We determined the complete sequence of theR. conorii genome (GenBank accession number AE006914) (Table 1) (4, 5). Comparative analysis of these two closely related Rickettsia sp. (Table 2) provides snapshots of the progression of the gene degradation process, which has been linked to adaptation to intracellular parasitism (3, 6–9).

Table 1

Comparison between R. conorii and R. prowazekii.

View this table:
Table 2

Numbers of ORFs (each split case was counted once). Numbers in parentheses indicate the corresponding number of intact ORFs. Rp, R. prowazekii; Rc, R. conorii.

View this table:

The overall gene order in the R. conorii genome [Figs. 1 and 2 and Web fig. 1 (10)] is remarkably similar to that of R. prowazekii, except for the translocation/inversion of a few short segments in the region corresponding to the end of replication. The detailed sequence comparison of the two genome sequences revealed numerous cases of apparently orthologous pairs of open reading frames (ORFs) (according to the best reciprocal match criteria) exhibiting large differences in sizes. For instance, the phoR gene encoding a 921-residue protein in R. prowazekii becomes a set of three consecutive ORFs (RC0702 through RC0704) of 643, 132, and 82 residues in R. conorii. Similarly, the sca1gene encoding a 1902-residue protein in R. conoriicorresponds to three consecutive ORFs (RP016 through RP018) in R. prowazekii (Fig. 3). In addition to this “gene splitting” phenomenon (11), we also identified many additional R. conorii genes (229) and fewer R. prowazekii genes (6), for which a residual similarity could be found in homologous but noncoding regions in the other genome (12), thus representing cases of “decaying orthologs” (Fig. 2). Finally, 323 R. conorii ORFs exhibited no orthologous relationship (regular, split, or decaying), whereas 24R. prowazekii ORFs had no equivalent in R. conorii. In total, 552 R. conorii genes have no orthologous functional counterpart in R. prowazekii, whereas 30 R. prowazekii genes have no counterpart in R. conorii (Table 2). Those genes are most likely to be responsible for the phenotypic differences between the two species.

Out of the 552 ORFs (reduced to 514 when every set of multiple ORFs arising from split genes is counted as one) constituting the gene excess in R. conorii, 106 ORFs (reduced to 79 as before) were assigned a putative function. These supplementary genes are overrepresented (P < 0.05, χ2 test) as compared with those of R. prowazekii in the categories of DNA replication, transporters, regulatory functions, and drug sensitivity (Table 2). R. conorii has three genes (RC0843-4, RC0017, and RC0450) related to DNA transformation functions, including the DNA uptake protein ComF, the competence operon protein ComE3, and the chromosomal transformation protein Smf. The presence of such a DNA transformation gene has not been previously described for other obligate intracellular parasites [R. prowazekii(3), Chlamydia spp. (13), andMycobacterium leprae (9)]. R. conorii's capability for exogenous DNA uptake is further suggested by the presence of four ORFs of apparently foreign origin: one phage-related protein (RC0490), one insertion element (RC0688), and two lysozymelike proteins from viruses (RC0727 and RC1298). BothRickettsia species are naturally resistant to penicillin and aminoglycoside antibiotics, and R. conorii exhibits higher resistance than R. prowazekii to antibiotics (14). Consistently, its genome contains nine additional genes related to its sensitivity to drugs, including four genes for β-lactamases (RC1243-4) and its regulation (RC0535, RC0788, and RC1358); three drug efflux transporters (RC0301, RC0564-9, and RC1181); an aminoglycoside 3′-phosphotransferase (RC0947); and an acetyltransferase (RC0554). R. conorii is known to move around inside host cells by propulsion produced by continuous actin polymerization (15). No clear homolog of proteins known to be responsible for the actin-based motility of Listeria monocytogenes (ActA) or Shigella flexneri (IcsA) (16) was found, but an ORF (RC0909) coding for a 520-residue–long protein exhibits an overall organization similar to that of ActA. Both proteins share a highly charged NH2-terminus (∼300 residues) and a central proline-rich region. RC0909 has a weak similarity to the WASP homology domain 2, found in a family of proteins known to regulate the formation of the actin filaments.

As intracellular parasites, Rickettsia have small genomes and an evolutionary tendency toward further genomic reduction (6). Therefore, genes found as multiple copies may outline their specific adaptations. Using BLAST (17), we identified six gene families with more than three paralogs. Comparing their copy numbers with those in other bacterial genomes, we found that five gene families (Tlc, SpoT, ProP, Sca, and AmpG) were significantly overrepresented (P < 0.05, Fisher's exact test) inRickettsia species [Web table 1 (10)]. Adenosine triphosphate (ATP)/adenosine diphosphate (ADP) translocases are known to be unique to Rickettsia spp. andChlamydia spp. among bacteria and may be of plant origin (18). This gene allows the importation of ATP from the infected host cell. Five copies were found in R. conorii andR. prowazekii. Also, four SpoT copies were found in R. conorii and R. prowazekii (19). The SpoT protein hydrolyzes the nucleotide (p)ppGpp, also known as alarmone. This compound plays a major role in processes related to starvation in various bacteria (20). Alarmone also initiates the expression of virulence genes in Legionella pneumophila(21), the production of antibiotics in Streptomyces coelicolor (22), and the change in the cell density ofMyxococcus xanthus (23). The four copies of SpoT may be related to the adaptation of Rickettsia to long starvation periods in pausing ticks or louse feces. BothRickettsia species have significantly large numbers of ProP proline/betain transporter paralogs: 11 and 7 ORFs for R. conorii and R. prowazekii, respectively. In many organisms, including bacteria and plants, proline transporters play critical roles in the response to osmotic changes in the environment (24). Leishmania donovani, an intracellular protozoan parasitizing both arthropod and mammalian cells, uses temperature-regulated proline transporters to adapt to different host temperature conditions (25). The numerous ProP paralogs in Rickettsia might be linked to their adaptation to osmotic stress or to the temperature-dependent regulation of their virulence known as “reactivation” (26,27). R. conorii harbors five genes (four inR. prowazekii) encoding for outer membrane proteins of the Sca family, including rOmpA, which accounts for antigenic differences between Rickettsia species. R. conorii exhibits four copies of AmpG (three in R. prowazekii) that are likely to contribute to its natural resistance to β-lactam antibiotics. Finally, genes for the ATP-binding protein of multidrug resistance ABC transporters are present as four copies in R. conorii (three in R. prowazekii), but these numbers are not significantly higher than in other bacteria.

The genome of R. conorii exhibits a much higher density of interspersed repetitive DNA than that of R. prowazekii (Fig. 2). In the R. conorii genome, we identified 10 families (656 elements) of repeated DNA (28, 29). Those repeats vary in size between 19 and 172 base pairs (bp) and constitute 3.2% of the entire genome. Overall, the repeat fraction of the genome is G+C rich (40%) and is in part responsible for the higher G+C content ofR. conorii as compared with R. prowazekii. The distribution of the repeated elements is essentially random throughout the genome (Fig. 1). The quasiperfect colinearity maintained between the two Rickettsia genomes contrasts with the view that the multiplication of interspersed repeats promotes genomic rearrangements (30, 31).

Figure 1

Circular representation of the R. conorii genome (strain Malish 7). The outermost circle indicates the nucleotide positions. The second and third circles locate the ORFs on the plus and minus strands, respectively. Function categories are color-coded [see Web fig. 1 (10)]. The fourth and fifth circles locate tRNAs. The locations of three rRNAs are indicated by black arrows. The sixth and seventh circles indicate the locations of repeats. The eighth circle shows the G-C skew (G- C/G+C) with a window size of 10 kb. The region locally breaking the genome colinearity with R. prowazekii is indicated by a shaded sector. The four major genomic segments involved in this rearrangement are colored in blue, yellow, green, and red [see Fig. 3and Web fig. 1 (10) for details].

The analysis of the R. conorii putative coding regions revealed numerous cases of consecutive ORFs matching consecutive segments of a single longer ORF in other species, including R. prowazekii. Such gene fragmentations (e.g., internal stop codons) are usually associated with “pseudogenes.” However, a truncated form of the outer-membrane protein (rOmpA) is normally expressed inR. felis (32). In addition, most of the split ORFs retained the statistical properties (coding potential and codon bias) of normal coding regions and a good similarity with intact protein orthologs. This prompted us to annotate these altered genes by the more neutral designation of “split genes,” pending further experimental evidence. Thirty-seven split genes (11) (resulting in 105 total ORFs) were identified in R. conorii(Table 2). Among them, 14 have intact orthologs in R. prowazekii, 4 exhibit intact paralogs in R. conorii, and 19 have intact homologs in other prokaryotes. In R. prowazekii, we identified 11 split genes (resulting into 23 ORFs) that all have intact orthologs in R. conorii. By reverse transcriptase–polymerase chain reaction (RT-PCR) (33), we examined the detailed transcription pattern of all 37 R. conorii split genes [Web table 2 (10)]. We observed at least one transcript for 30 of 37 genes and RT-PCR products (all of the expected size) for 59 of 105 ORFs. All ORFs were transcribed for 11 of 31 genes, and the sole 5′ ORF was transcribed for 8 genes. These cases are consistent with the continued usage of the promoter of the original gene. However, the RT-PCR results on the other split genes suggested more complex transcription patterns. In seven cases, only the 3′ ORFs were found to be transcribed; in four cases, the 5′ and 3′ segments were found to be transcribed but not the middle segments; and in one case, only the middle segment was detected. Transcripts were much more likely to be detected for larger ORFs (≥70 residues) than for smaller ORFs (<70 residues) (55 of 78 versus 4 of 27;P < 10−6, Fisher's exact test). These results suggest that these split genes might have retained some of their original functions [as already discussed for SpoT genes inR. prowazekii (19)]. The complete assessment of the physiological significances of these genes will require a detailed characterization of their translation products inRickettsia.

It has been previously suggested that most of the intergenic sequences in R. prowazekii consist of decayed genes that are no longer active but are not yet totally eliminated from the genome (3, 8, 19). Through a systematic survey, we identified noncoding remnant sequences for 229 ORFs (out of the 552 R. conorii supplementary genes) at their homologous locations in the R. prowazekii genome (Fig. 2). For example, the R. conorii gene (RC1273) for the outer membrane protein rOmpA is 6063 bp long and is located between the cell division protein FtsK (RC1274) and a hypothetical ORF (RC1272) (Fig. 3). R. prowazekii exhibits the orthologs for FtsK and the hypothetical ORF but not for rOmpA. Part of the intergenic sequence between the R. prowazekii orthologs exhibits a significant similarity to the rOmpA gene of R. conorii, although the remnant sequence identified in R. prowazekii (369 bp) contains several in-frame stop codons. Our comparative genome analysis thus strongly supports a model of massive gene decay in R. prowazekii.

Figure 2

Repeat density and colinearity of theR. conorii and R. prowazekii genomes. The two self-genome comparisons and the cross-genome comparison are presented in the upper left, upper right, and lower right panels. Each dot represents a high-scoring segment pair (HSP) identified by BLASTN (with a fixed database size parameter of 1 Mbp). Self matches are not shown. Red and black dots correspond to HSPs of E value < 10−4 and E value < 10−10, respectively. In addition, the lower left panel presents the similarities of R. conorii 552 supplementary ORFs with the intergenic regions of R. prowazekii. Each dot represents an HSP detected by BLASTN (E value < 0.1). The black/red dots correspond to matches on the same/reverse strand, respectively. For the sake of readability, dot sizes are standard in all the panels and do not correspond to the actual size of the HSP.

Figure 3

Illustration of the colinearity. Three distinct segments from the R. conorii genome aligned with the homologous segments from the R. prowazekii genome are shown. These segments were chosen to show three types of gene alteration: split genes in R. prowazekii (top), a split gene in R. conorii (middle), and a gene remnant in R. prowazekii (bottom). A complete comparative map is available at (10).

We found one clear case of a gene decaying after its horizontal transfer from (or, less likely, to) Chlamydia. R. conorii exhibits a split form (RC0035-38) of the gene for a bifunctional folate synthesis protein described in Chlamydiae as composed of two distinct domains: 7,8-dihydro-6-hydroxymethylpterin-pyrophosphokinase domain (HPPK) and dihydropteroate synthase domain (DHPS). Homologs of this enzyme with the same domain organization are found only in Chlamydiae, plants, and fungi. R. prowazekii exhibits remnant sequences corresponding to the R. conorii ORFs. The proximity of theR. conorii ORFs and Chlamydia genes supported by a phylogenetic tree analysis and the exclusive presence of this gene inR. conorii and Chlamydia species among known bacteria suggest a gene exchange between Rickettsia andChlamydia, as proposed for ATP/ADP translocases (18). The alteration of this folate synthesis gene in bothRickettsia species—detected as a split gene in R. conorii and as a remnant sequence in R. prowazekii—suggests that significant changes of evolutionary constraints occurred after exchanging the gene withChlamydia.

To investigate gene degradation in a wider set of Rickettsiaspecies for which no sequence was available, we performed PCR assays on the genomic DNA of eight different Rickettsia species (R. typhi, R. canadensis, R. helvetica, R. felis, R. australis, R. akari, R. rickettsii, and R. massiliae) using primers derived from seven R. conorii and sevenR. prowazekii supplementary genes [Web table 3 (10)]. With the primers corresponding to the seven R. conorii genes, two or more genomic segments were amplified in seven species [R. rickettsii (7 of 7), R. australis (6 of 7), R. felis (6 of 7), R. helvetica (5 of 7) R. massiliae (5 of 7), R. akari (3 of 7), and R. canadensis (2 of 7)]. With the primers derived from the R. prowazekii genes, one or more genomic segments were amplified in four species [R. typhi(3 of 7), R. australis (1 of 7), R. felis (1 of 7), and R. akari (1 of 7)]. Thus, the supplementary genes observed in R. conorii and R. prowazekii do not originate from recent, species-specific, horizontal acquisitions, although the detailed pattern of PCR amplification does not exactly fit the standard classification of the Rickettsia genus [Web fig. 2 (10)]. Three out of the 14 supplementary genes were found in a split form (insertions/deletions generating stop codons) in one or more of the tested species. Thus, gene degradation appears to be a common feature of Rickettsia, targeting overlapping subsets of potentially dispensable genes while adapting to the selective pressures of different niches.

The few inversions/translocations locally breaking the otherwise perfect colinearity of the R. conorii and R. prowazekii genomes occur in the termination region of DNA replication. We identified several rearranged DNA segments, including a 38-kb segment containing 45 ORFs in R. conorii (Fig. 1). To date this inversion/translocation event within the phylogeny ofRickettsia, we used PCR on a set of primers designed from highly conserved adjacent sequences in the above eight species. AR. prowazekii–like arrangement was observed for R. typhi, whereas that of R. conorii was observed forR. felis, R. rickettsii, and R. massiliae. The result is consistent with the biphyletic division of Rickettsia and suggests that the genome rearrangement event would have happened relatively shortly after the initial divergence of the genus Rickettsia.

Genome reduction is thought to be a main force behind the evolution of parasitic and/or intracellular bacteria (6–9). The sequence of the R. conorii genome is consistent with this view, and R. prowazekii essentially appears as a subset of R. conorii. However, the genomes of R. conoriiand R. prowazekii exhibit large differences in size, as well as in gene and G+C content, thus suggesting an adaptation to their specific niches rather than a simple model of random gene loss. Our analysis pointed out 137 R. conorii genes without any sequence similarity within the R. prowazekii genome. This provides an upper limit on the number of potential genes laterally acquired since their divergence, 40 to 80 million years ago (34). A single gene has its best match in eukaryotes (RC0781 to the NH2-terminal part of yeast biotin-protein ligase), suggesting that Rickettsia have no particular tendency to evolve by acquiring genes from their hosts. Given their genetic isolation, it is tempting to postulate thatRickettsia had to rely on internal mechanisms such as duplication to acquire or modify some of the gene functions required for a better adaptation to their niche. We saw evidence of the very gradual nature of the genome reduction process by identifying all possible intermediates from intact ORFs: transcribed split ORFs, further split ORFs no longer transcribed, fully decayed but still recognizable ORFs, and complete gene disappearance. Similar mechanisms probably occur in the evolution of all bacterial species but have remained undetected because of more active recombination and a faster evolutionary rate (35).

  • * To whom correspondence should be addressed. E-mail: Jean-Michel.Claverie{at}; Didier.Raoult{at}


View Abstract

Navigate This Article