Report

Complete Genome Sequence of the Apicomplexan, Cryptosporidium parvum

See allHide authors and affiliations

Science  16 Apr 2004:
Vol. 304, Issue 5669, pp. 441-445
DOI: 10.1126/science.1094786

Abstract

The apicomplexan Cryptosporidium parvum is an intestinal parasite that affects healthy humans and animals, and causes an unrelenting infection in immunocompromised individuals such as AIDS patients. We report the complete genome sequence of C. parvum, type II isolate. Genome analysis identifies extremely streamlined metabolic pathways and a reliance on the host for nutrients. In contrast to Plasmodium and Toxoplasma, the parasite lacks an apicoplast and its genome, and possesses a degenerate mitochondrion that has lost its genome. Several novel classes of cell-surface and secreted proteins with a potential role in host interactions and pathogenesis were also detected. Elucidation of the core metabolism, including enzymes with high similarities to bacterial and plant counterparts, opens new avenues for drug development.

Cryptosporidium parvum is a globally important intracellular pathogen of humans and animals. The duration of infection and pathogenesis of cryptosporidiosis depends on host immune status, ranging from a severe but self-limiting diarrhea in immunocompetent individuals to a life-threatening, prolonged infection in immunocompromised patients. A substantial degree of morbidity and mortality is associated with infections in AIDS patients. Despite intensive efforts over the past 20 years, there is currently no effective therapy for treating or preventing C. parvum infection in humans.

Cryptosporidium belongs to the phylum Apicomplexa, whose members share a common apical secretory apparatus mediating locomotion and tissue or cellular invasion. Many apicomplexans are of medical or veterinary importance, including Plasmodium, Babesia, Toxoplasma, Neosprora, Sarcocystis, Cyclospora, and Eimeria. The life cycle of C. parvum is similar to that of other cystforming apicomplexans (e.g., Eimeria and Toxoplasma), resulting in the formation of oocysts that are shed in the feces of infected hosts. C. parvum oocysts are highly resistant to environmental stresses, including chlorine treatment of community water supplies; hence, the parasite is an important water- and food-borne pathogen (1). The obligate intracellular nature of the parasite's life cycle and the inability to culture the parasite continuously in vitro greatly impair researchers' ability to obtain purified samples of the different developmental stages. The parasite cannot be genetically manipulated, and transformation methodologies are currently unavailable. To begin to address these limitations, we have obtained the complete C. parvum genome sequence and its predicted protein complement. (This whole-genome shotgun project has been deposited at DDBJ/EMBL/GenBank under the project accession AAEE00000000. The version described in this paper is the first version, AAEE01000000.)

The random shotgun approach was used to obtain the complete DNA sequence (2) of the Iowa “type II” isolate of C. parvum. This isolate readily transmits disease among numerous mammals, including humans. The resulting genome sequence has roughly 13× genome coverage containing five gaps and 9.1 Mb of total DNA sequence within eight chromosomes. The C. parvum genome is thus quite compact relative to the 23-Mb, 14-chromosome genome of Plasmodium falciparum (3); this size difference is predominantly the result of shorter intergenic regions, fewer introns, and a smaller number of genes (Table 1). Comparison of the assembled sequence of chromosome VI to that of the recently published sequence of chromosome VI (4) revealed that our assembly contains an additional 160 kb of sequence and a single gap versus two, with the common sequences displaying a 99.993% sequence identity (2).

Table 1.

General features of the C. parvum genome and comparison with other single-celled eukaryotes. Values are derived from respective genome project summaries (3, 26-28). ND, not determined.

Feature C. parvumP. falciparumS. pombeS. cerevisiaeE. cuniculi
Size (Mbp) 9.1 22.9 12.5 12.5 2.5
(G + C) content (%) 30 19.4 36 38.3 47
No. of genes 3807 5268 4929 5770 1997
Mean gene length (bp) excluding introns 1795 2283 1426 1424 ND
Gene density (bp per gene) 2382 4338 2528 2088 1256
Percent coding 75.3 52.6 57.5 70.5 90
Genes with introns (%) 5 53.9 43 5 ND
Intergenic regions
    (G + C) content % 23.9 13.6 32.4 35.1 45
    Mean length (bp) 566 1694 952 515 129
RNAs
    No. of tRNA genes 45 43 174 299 44
    No. of 5S rRNA genes 6 3 30 100-200 3
    No. of 5.8S, 18S, and 28S rRNA units 5 7 200-400 100-200 22

The relative paucity of introns greatly simplified gene predictions and facilitated annotation (2) of predicted open reading frames (ORFs). These analyses provided an estimate of 3807 protein-encoding genes for the C. parvum genome, far fewer than the estimated 5300 genes predicted for the Plasmodium genome (3). This difference is primarily due to the absence of an apicoplast and mitochondrial genome, as well as the presence of fewer genes encoding metabolic functions and variant surface proteins, such as the P. falciparum var and rifin molecules (Table 2). An analysis of the encoded protein sequences with the program SEG (5) shows that these protein-encoding genes are not enriched in low-complexity sequences (34%) to the extent observed in the proteins from Plasmodium (70%).

Table 2.

Comparison between predicted C. parvum and P. falciparum proteins.

Feature C. parvumP. falciparumView inline CommonView inline
Total predicted proteins 3807 5268 1883
Mitochondrial targeted/encoded 17 (0.45%) 246 (4.7%) 15
Apicoplast targeted/encoded 0 581 (11.0%) 0
var/rif/stevor View inline 0 236 (4.5%) 0
Annotated as proteaseView inline 50 (1.3%) 31 (0.59%) 27
Annotated as transporterView inline 69 (1.8%) 34 (0.65%) 34
Assigned EC functionView inline 167 (4.4%) 389 (7.4%) 113
Hypothetical proteins 925 (24.3%) 3208 (60.9%) 126
  • View inline* Values indicated for P. falciparum are as reported (3) with the exception of those for proteins annotated as protease or transporter.

  • View inline TBLASTN hits (e < -5) between C. parvum and P. falciparum.

  • View inline As reported in (3).

  • View inline§ Predicted proteins annotated as “protease or peptidase” for C. parvum (CryptoGenome database, View popup) and P. falciparum (PlasmoDB database, View popup).

  • View inline Predicted proteins annotated as “transporter, permease of P-type ATPase” for C. parvum (CryptoGenome) and P. falciparum (PlasmoDB).

  • View inline Bidirectional BLAST hit (e < -15) to orthologs with assigned Enzyme Commission (EC) numbers. Does not include EC assignment numbers for protein kinases or protein phosphatases (due to inconsistent annotation across genomes), or DNA polymerases or RNA polymerases, as a result of issues related to subunit inclusion. (For consistency, 46 proteins were excluded from the reported P. falciparum values.)

  • Our sequence analysis indicates that Cryptosporidium, unlike Plasmodium and Toxoplasma, lacks both mitochondrion and apicoplast genomes. The overall completeness of the genome sequence, together with the fact that similar DNA extraction procedures used to isolate total genomic DNA from C. parvum efficiently yielded mitochondrion and apicoplast genomes from Eimeria sp. and Toxoplasma (6, 7), indicates that the absence of organellar genomes was unlikely to have been the result of methodological error. These conclusions are consistent with the absence of nuclear genes for the DNA replication and translation machinery characteristic of mitochondria and apicoplasts, and with the lack of mitochondrial or apicoplast targeting signals for tRNA synthetases.

    A number of putative mitochondrial proteins were identified, including components of a mitochondrial protein import apparatus, chaperones, uncoupling proteins, and solute translocators (table S1). However, the genome does not encode any Krebs cycle enzymes, nor the components constituting the mitochondrial complexes I to IV; this finding indicates that the parasite does not rely on complete oxidation and respiratory chains for synthesizing adenosine triphosphate (ATP). Similar to Plasmodium, no orthologs for the γ, δ, or ϵ subunits or the c subunit of the F0 proton channel were detected (whereas all subunits were found for a V-type ATPase).

    Cryptosporidium, like Eimeria (8) and Plasmodium, possesses a pyridine nucleotide transhydrogenase integral membrane protein that may couple reduced nicotinamide adenine dinucleotide (NADH) and reduced nicotinamide adenine dinucleotide phosphate (NADPH) redox to proton translocation across the inner mitochondrial membrane. Unlike Plasmodium, the parasite has two copies of the pyridine nucleotide transhydrogenase gene. Also present is a likely mitochondrial membrane–associated, cyanide-resistant alternative oxidase (AOX) that catalyzes the reduction of molecular oxygen by ubiquinol to produce H2O, but not superoxide or H2O2. Several genes were identified as involved in biogenesis of iron-sulfur [Fe-S] complexes with potential mitochondrial targeting signals (e.g., nifS, nifU, frataxin, and ferredoxin), supporting the presence of a limited electron flux in the mitochondrial remnant (table S2).

    Our sequence analysis confirms the absence of a plastid genome (7) and, additionally, the loss of plastid-associated metabolic pathways including the type II fatty acid synthases (FASs) and isoprenoid synthetic enzymes that are otherwise localized to the plastid in other apicomplexans. C. parvum fatty acid biosynthesis appears to be cytoplasmic, conducted by a large (8252 amino acids) modular type I FAS (9) and possibly by another large enzyme that is related to the multidomain bacterial polyketide synthase (10). Comprehensive screening of the C. parvum genome sequence also did not detect orthologs of Plasmodium nuclearencoded genes that contain apicoplast-targeting and transit sequences (11).

    C. parvum metabolism is greatly streamlined relative to that of Plasmodium, and in certain ways it is reminiscent of that of another obligate eukaryotic parasite, the microsporidian Encephalitozoon. The degeneration of the mitochondrion and associated metabolic capabilities suggests that the parasite largely relies on glycolysis for energy production. The parasite is capable of uptake and catabolism of monosugars (e.g., glucose and fructose) as well as synthesis, storage, and catabolism of polysaccharides such as trehalose and amylopectin. Like many anaerobic organisms, it economizes ATP through the use of pyrophosphate-dependent phosphofructokinases. The conversion of pyruvate to acetyl–coenzyme A (CoA) is catalyzed by an atypical pyruvate-NADPH oxidoreductase (CpPNO) that contains an N-terminal pyruvate–ferredoxin oxidoreductase (PFO) domain fused with a C-terminal NADPH–cytochrome P450 reductase domain (CPR). Such a PFO-CPR fusion has previously been observed only in the euglenozoan protist Euglena gracilis (12). Acetyl-CoA can be converted to malonyl-CoA, an important precursor for fatty acid and polyketide biosynthesis. Glycolysis leads to several possible organic end products, including lactate, acetate, and ethanol. The production of acetate from acetyl-CoA may be economically beneficial to the parasite via coupling with ATP production.

    Ethanol is potentially produced via two independent pathways: (i) from the combination of pyruvate decarboxylase and alcohol dehydrogenase, or (ii) from acetyl-CoA by means of a bifunctional dehydrogenase (adhE) with acetaldehyde and alcohol dehydrogenase activities; adhE first converts acetyl-CoA to acetaldehyde and then reduces the latter to ethanol. AdhE predominantly occurs in bacteria but has recently been identified in several protozoans, including vertebrate gut parasites such as Entamoeba and Giardia (13, 14). Adjacent to the adhE gene resides a second gene encoding only the AdhE C-terminal Fe-dependent alcohol dehydrogenase domain. This gene product may form a multisubunit complex with AdhE, or it may function as an alternative alcohol dehydrogenase that is specific to certain growth conditions. C. parvum has a glycerol 3-phosphate dehydrogenase similar to those of plants, fungi, and the kinetoplastid Trypanosoma, but (unlike trypanosomes) the parasite lacks an ortholog of glycerol kinase and thus this pathway does not yield glycerol production. In addition to the modular fatty acid synthase (CpFAS1) and polyketide synthase homolog (CpPKS1), C. parvum possesses several fatty acyl–CoA synthases and a fatty acyl elongase that may participate in fatty acid metabolism. Further, enzymes for the metabolism of complex lipids (e.g., glycerolipid and inositol phosphate) were identified in the genome. Fatty acids are apparently not an energy source, because enzymes of the fatty acid oxidative pathway are absent, with the exception of a 3-hydroxyacyl-CoA dehydrogenase.

    C. parvum purine metabolism is greatly simplified, retaining only an adenosine kinase and enzymes catalyzing conversions of adenosine 5′-monophosphate (AMP) to inosine, xanthosine, and guanosine 5′-monophosphates (IMP, XMP, and GMP). Among these enzymes, IMP dehydrogenase (IMPDH) is phylogenetically related to ϵ-proteobacterial IMPDH and is strikingly different from its counterparts in both the host and other apicomplexans (15). In contrast to other apicomplexans such as Toxoplasma gondii and P. falciparum, no gene encoding hypoxanthine-xanthineguanine phosphoribosyltransferase (HXGPRT) is detected, in contrast to a previous report on the activity of this enzyme in C. parvum sporozoites (16). The absence of HXGPRT suggests that the parasite may rely solely on a single enzyme system including IMPDH to produce GMP from AMP. In contrast to other apicomplexans, the parasite appears to rely on adenosine for purine salvage, a model supported by the identification of an adenosine transporter. Unlike other apicomplexans and many parasitic protists that can synthesize pyrimidines de novo, C. parvum relies on pyrimidine salvage and retains the ability for interconversions among uridine and cytidine 5′-monophosphates (UMP and CMP), their deoxy forms (dUMP and dCMP), and dAMP, as well as their corresponding di- and triphosphonucleotides. The parasite has also largely shed the ability to synthesize amino acids de novo, although it retains the ability to convert select amino acids, and instead appears to rely on amino acid uptake from the host by means of a set of at least 11 amino acid transporters (table S2).

    Most of the Cryptosporidium core processes involved in DNA replication, repair, transcription, and translation conform to the basic eukaryotic blueprint (2). The transcriptional apparatus resembles Plasmodium in terms of basal transcription machinery. However, a striking numerical difference is seen in the complements of two RNA binding domains, Sm and RRM, between P. falciparum (17 and 71 domains, respectively) and C. parvum (9 and 51 domains). This reduction results in part from the loss of conserved proteins belonging to the spliceosomal machinery, including all genes encoding Sm domain proteins belonging to the U6 spliceosomal particle, which suggests that this particle activity is degenerate or entirely lost. This reduction in spliceosomal machinery is consistent with the reduced number of predicted introns in Cryptosporidium (5%) relative to Plasmodium (> 50%). In addition, key components of the small RNA–mediated posttranscriptional gene silencing system are missing, such as the RNA-dependent RNA polymerase, Argonaute, and Dicer orthologs; hence, RNA interference–related technologies are unlikely to be of much value in targeted disruption of genes in C. parvum.

    Cryptosporidium invasion of columnar brush border epithelial cells has been described as “intracellular, but extracytoplasmic,” as the parasite resides on the surface of the intestinal epithelium but lies underneath the host cell membrane. This niche may allow the parasite to evade immune surveillance but take advantage of solute transport across the host microvillus membrane or the extensively convoluted parasitophorous vacuole. Indeed, Cryptosporidium has numerous genes (table S2) encoding families of putative sugar transporters (up to 9 genes) and amino acid transporters (11 genes). This is in stark contrast to Plasmodium, which has fewer sugar transporters and only one putative amino acid transporter (GenBank identification number 23612372).

    As a first step toward identification of multi–drug-resistant pumps, the genome sequence was analyzed for all occurrences of genes encoding multitransmembrane proteins. Notable are a set of four paralogous proteins that belong to the sbmA family (table S2) that are involved in the transport of peptide antibiotics in bacteria. A putative ortholog of the Plasmodium chloroquine resistance–linked gene PfCRT (17) was also identified, although the parasite does not possess a food vacuole like the one seen in Plasmodium.

    Unlike Plasmodium, C. parvum does not possess extensive subtelomeric clusters of antigenically variant proteins (exemplified by the large families of var and rif/stevor genes) that are involved in immune evasion. In contrast, more than 20 genes were identified that encode mucin-like proteins (18, 19) having hallmarks of extensive Thr or Ser stretches suggestive of glycosylation and signal peptide sequences suggesting secretion (table S2). One notable example is an 11,700–amino acid protein with an uninterrupted stretch of 308 Thr residues (cgd3_720). Although large families of secreted proteins analogous to the Plasmodium multigene families were not found, several smaller multigene clusters were observed that encode predicted secreted proteins, with no detectable similarity to proteins from other organisms (Fig. 1, A and B). Within this group, at least four distinct families appear to have emerged through gene expansions specific to the Cryptosporidium clade. These families—SKSR, MEDLE, WYLE, FGLN, and GGC—were named after well-conserved sequence motifs (table S2). Reverse transcription polymerase chain reaction (RT-PCR) expression analysis (20) of one cluster, a locus of seven adjacent CpLSP genes (Fig. 1B), shows coexpression during the course of in vitro development (Fig. 1C).

    Fig. 1.

    (A) Schematic showing the chromosomal locations of clusters of potentially secreted proteins. Numbers of adjacent genes are indicated in parentheses. Arrows indicate direction of clusters containinguni-directional genes (encoded on the same strand); squares indicate clusters containing genes encoded on both strands. Non-paralogous genes are indicated by solid gray squares or directional triangles; SKSR (green triangles), FGLN (red triangles), and MEDLE (blue triangles) indicate three C. parvum–specific families of paralogous genes predominantly located at telomeres. Insl (yellow triangles) indicates an insulinase/falcilysin-like paralogous gene family. CpLSP (white square) indicates the location of a cluster of adjacent large secreted proteins (table S2) that are cotranscriptionally regulated. Identified anchored telomeric repeat sequences are indicated by circles. (B) Schematic showing a select locus containing a cluster of coexpressed large secreted proteins (CpLSP). Genes and intergenic regions (regions between identified genes) are drawn to scale at the nucleotide level. The length of the intergenic regions is indicated above or below the locus. (C) Relative expression levels of CpLSP (red lines) and, as a control, C. parvum Hedgehog-type HINT domain gene (blue line) during in vitro development, as determined by semiquantitative RT-PCR using gene-specific primers correspondingto the seven adjacent genes within the CpLSP locus as shown in (B). Expression levels from three independent time-course experiments are represented as the ratio of the expression of each gene to that of C. parvum 18S rRNA present in each of the infected samples (20).

    An additional eight genes were identified that encode proteins having a periodic cysteine structure similar to the Cryptosporidium oocyst wall protein; these eight genes are similarly expressed during the onset of oocyst formation and likely participate in the formation of the coccidian rigid oocyst wall in both Cryptosporidium and Toxoplasma (21). Whereas the extracellular proteins described above are of apparent apicomplexan or lineage-specific invention, Cryptosporidium possesses many genes encoding secreted proteins having lineage-specific multidomain architectures composed of animal- and bacterial-like extracellular adhesive domains (fig. S1).

    Lineage-specific expansions were observed for several proteases (table S2), including an aspartyl protease (six genes), a subtilisin-like protease, a cryptopain-like cysteine protease (five genes), and a Plasmodium falcilysin-like (insulin degrading enzyme–like) protease (19 genes). Nine of the Cryptosporidium falcilysin genes lack the Zn-chelating “HXXEH” active site motif and are likely to be catalytically inactive copies that may have been reused for specific protein-protein interactions on the cell surface. In contrast to the Plasmodium falcilysin, the Cryptosporidium genes possess signal peptide sequences and are likely trafficked to a secretory pathway. The expansion of this family suggests either that the proteins have distinct cleavage specificities or that their diversity may be related to evasion of a host immune response.

    Completion of the C. parvum genome sequence has highlighted the lack of conventional drug targets currently pursued for the control and treatment of other parasitic protists. On the basis of molecular and biochemical studies and drug screening of other apicomplexans, several putative Cryptosporidium metabolic pathways or enzymes have been erroneously proposed to be potential drug targets (22), including the apicoplast and its associated metabolic pathways, the shikimate pathway, the mannitol cycle, the electron transport chain, and HXGPRT. Nonetheless, complete genome sequence analysis identifies a number of classic and novel molecular candidates for drug exploration, including numerous plant-like and bacterial-like enzymes (tables S3 and S4).

    Although the C. parvum genome lacks HXGPRT, a potent drug target in other apicomplexans, it has only the single pathway dependent on IMPDH to convert AMP to GMP. The bacterial-type IMPDH may be a promising target because it differs substantially from that of eukaryotic enzymes (15). Because of the lack of de novo biosynthetic capacity for purines, pyrimidines, and amino acids, C. parvum relies solely on scavenge from the host via a series of transporters, which may be exploited for chemotherapy. C. parvum possesses a bacterial-type thymidine kinase, and the role of this enzyme in pyrimidine metabolism and its drug target candidacy should be pursued. The presence of an alternative oxidase, likely targeted to the remnant mitochondrion, gives promise to the study of salicylhydroxamic acid (SHAM), ascofuranone, and their analogs as inhibitors of energy metabolism in the parasite (23).

    Cryptosporidium possesses at least 15 “plant-like” enzymes that are either absent in or highly divergent from those typically found in mammals (table S3). Within the glycolytic pathway, the plant-like PPi-PFK has been shown to be a potential target in other parasites including T. gondii, and PEPCL and PGI appear to be plant-type enzymes in C. parvum. Another example is a trehalose-6-phosphate synthase/phosphatase catalyzing trehalose biosynthesis from glucose-6-phosphate and uridine diphosphate–glucose. Trehalose may serve as a sugar storage source or may function as an antidesiccant, antioxidant, or protein stability agent in oocysts, playing a role similar to that of mannitol in Eimeria oocysts (24). Orthologs of putative Eimeria mannitol synthesis enzymes were not found. However, two oxidoreductases (table S2) were identified in C. parvum, one of which belongs to the same families as the plant mannose dehydrogenases (25) and the other to the plant cinnamyl alcohol dehydrogenases. In principle, these enzymes could synthesize protective polyol compounds, and the former enzyme could use host-derived mannose to synthesize mannitol.

    Supporting Online Material

    www.sciencemag.org/cgi/content/full/1094786/DC1

    Materials and Methods

    SOM Text

    Fig. S1

    Tables S1 to S4

    References

    References and Notes

    View Abstract

    Navigate This Article