Research Article

Phytophthora Genome Sequences Uncover Evolutionary Origins and Mechanisms of Pathogenesis

See allHide authors and affiliations

Science  01 Sep 2006:
Vol. 313, Issue 5791, pp. 1261-1266
DOI: 10.1126/science.1128796


Draft genome sequences have been determined for the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum. Oömycetes such as these Phytophthora species share the kingdom Stramenopila with photosynthetic algae such as diatoms, and the presence of many Phytophthora genes of probable phototroph origin supports a photosynthetic ancestry for the stramenopiles. Comparison of the two species' genomes reveals a rapid expansion and diversification of many protein families associated with plant infection such as hydrolases, ABC transporters, protein toxins, proteinase inhibitors, and, in particular, a superfamily of 700 proteins with similarity to known oömycete avirulence genes.

Phytophthora plant pathogens attack a wide range of agriculturally and ornamentally important plants (1). Late blight of potato caused by Phytophthora infestans resulted in the Irish potato famine in the 19th century, and P. sojae costs the soybean industry millions of dollars each year. In California and Oregon, a newly emerged Phytophthora species, P. ramorum, is responsible for a disease called sudden oak death (2) that affects not only the live oaks that are the keystone species of the ecosystem but also a large variety of woody shrubs that inhabit the oak ecosystems, such as bay laurel and viburnum (2). Many other members of the oömycete phylum are plant or animal pathogens, and some pose biosecurity threats such as the maize downy mildews Peronosclerospora philippinesis and Sclerophthora rayssiae. Extensive classical and molecular genetic tools and genomics resources have been developed for P. sojae and P. infestans (3, 4).

Oömycetes fall within the kingdom Stramenopila (5, 6), which also includes golden-brown algae, diatoms, and brown algae such as kelp (Fig. 1A). The algal stramenopiles are secondarily photosynthetic, having engulfed a red alga and adopted its plastid approximately 1,300 million years ago (6). However, nonphotosynthetic stramenopiles, such as the oömycetes, do not even have the vestigial plastids found in apicomplexan and euglenoid parasites that originate from phototrophs. Therefore, an important evolutionary question is whether the kingdom Stramenopila was founded by a photosynthetic or nonphotosynthetic organism and, more generally, whether a much larger group of secondarily photosynthetic organisms, called the chromalveolates (6), was founded by a single photosynthetic ancestor.

Fig. 1.

Identification of genes potentially originating from a photosynthetic endosymbiont. (A) Schematic phylogenetic tree of the eukaryotes. The tree is adapted from that of Baldauf et al. (5) that is based on a concatenation of six highly conserved proteins. Filled green circles on the right indicate photosynthetic species, open green circles indicate species with vestigial plastids of photosynthetic origin. The dotted arrows indicate hypothetical events in which an ancient red algal endosymbiont might have been acquired by an ancestor of the chromalveolates (left arrow) or of the stramenopiles alone (right arrow). (B and C) Phylogenetic trees produced using maximum parsimony (with the branch and bound algorithm) of amino acid sequences with the computer program PAUP 4.0b10 (32). Inferred amino acid sequences were aligned using ClustalW, and these were manually trimmed at each end to a position of confident alignment. (B) and (C) show strict consensus trees for two and three equally parsimonious trees, respectively. In both cases, numerals indicate bootstrap support values, and any with less than 80% have been collapsed. Branch lengths are proportional to sequence change using the accelerated transformation mode for character state reconstruction. Trees were rooted by specifying Methanocaldococcus jannaschii and the NCAIR mutase/cpmA cluster of genes as outgroups for (B) and (C), respectively. Taxonomic affinities of the organisms listed are as in (A), with the following additions: green plants, Helicosporidium sp.; cyanobacteria, Nostoc sp., Trichodesmium erythraeum, and Synechocystis sp.; other eubacteria, Pseudomonas aeruginosa, Bacillus halodurans, and Clostridium acetylbutylicum; archaebacteria, Thermoplasma volcanium, M. jannaschii, and Methanopyrus kandleri. In (C), NCAIRm, AIRc, and cpmA denote, respectively, N-phosphoribosylcarboxy-aminoimidazole (NCAIR) mutase, 1-(5-phosphoribosyl)-5-amino-4-imidazole (AIR) carboxylase, and the circadian modifer gene cpmA that is a member of the NCAIR mutase family (14).

We report here the draft genome sequences of P. sojae and P. ramorum. The sequences, a nine-fold coverage of the 95 Mb P. sojae genome and a seven-fold coverage of the 65 Mb P. ramorum genome, were produced using a whole-genome shotgun approach (7). We constructed a physical map of P. sojae to aid the sequence assembly by using restriction enzyme fingerprinting of bacterial artificial chromosome (BAC) clones from two libraries (7). We identified 19,027 predicted genes (gene models) in the genome of P. sojae and 15,743 in the genome of P. ramorum, supported in part by expressed sequence tags (ESTs) from P. sojae and proteomic surveys in P. ramorum (7). Of these, 9768 pairs of gene models could be identified as putative orthologs (7). There are 1755 gene models in P. sojae and 624 in P. ramorum encoding unique proteins that do not have a homolog in the other genome at a significance threshold of 10–8. The overall higher number of predicted genes in P. sojae results from a greater size of many gene families within the species.

There is extensive colinearity of orthologs between the two genomes. One colinear block, illustrated in Fig. 2, spans 1.8 Mb of P. sojae sequence and 0.8 Mb of P. ramorum sequence and contains 425 P. sojae and 265 P. ramorum genes, respectively, of which 170 are orthologous (7). The longest colinear block spans an estimated 4.8 Mb in P. sojae and 2.9 Mb in P. ramorum and contains 1129 P. sojae and 793 P. ramorum gene models, respectively, of which 463 are orthologous. The long-range colinearity between the two genomes is preserved despite the presence of many local rearrangements and many nonorthologous genes. Local disruptions of the gene colinearity are particularly common in the vicinity of genes associated with plant infection such as P. sojae Avr1b-1 (8) (Fig. 2B).

Fig. 2.

Long-range gene colinearity between the genomes of P. sojae and P. ramorum. In (A) and (B), black and red lines link orthologs of like and reversed orientation, respectively. In (A), colored bars indicate orthologs located in different P. sojae sequence scaffolds. Gray bars indicate genes without orthologs. Filled red circles indicate scaffolds linked by a single end-sequenced BAC, and open red circles indicate scaffolds linked by end-sequenced BAC contigs. The boxed area in (A) is enlarged in (B).

The genome sequences of P. sojae and P. ramorum imply several metabolic idiosyncrasies. For example, the CYP51 group of cytochrome P450 enzymes are considered necessary for sterol biosynthesis (9). Consistent with Phytophthora being sterol auxotrophs, none of these genes could be identified in either Phytophthora genome, although most other sterol biosynthetic genes could be recognized. More unexpectedly, neither genome appears to contain any gene for phospholipase C (PLC), an enzyme present in all eukaryotes sequenced so far (10), nor are PLC sequences present in a collection of 75,757 ESTs from Phytophthora infestans (11). In contrast, the diatom Thalassiosira pseudonana has three PLC genes. No other highly conserved genes were identified as missing from both the P. sojae and P. ramorum genomes.

Because P. ramorum has recently appeared in California and Europe, an important priority is the development of genetic markers for population genetics and strain tracking of the pathogen. Through sequencing the P. ramorum genome, we identified ∼13,643 single nucleotide polymorphisms (SNPs) (7) and numerous simple sequence repeats useful for this purpose. The P. sojae genome sequence contains only 499 SNPs, probably because P. sojae is homothallic (inbreeding), whereas P. ramorum is heterothallic (outcrossing).

To address whether the kingdom Stramenopila might have been founded by a photosynthetic ancestor (6), we searched for Phytophthora genes that had especially strong similarities to genes of photosynthetic organisms (7). We identified 855 genes with a putative heritage from a red alga or cyanobacterium (fig. S2), of which 30 are detailed in table S4. Some of the most striking examples of the putative acquisition of genes from a photosynthetic ancestor are provided by genes encoding biosynthetic enzymes targeted to the chloroplasts of photosynthetic organisms and to the mitochondria of nonphotosynthetic organisms. Table S4 includes 12 genes whose protein product has a predicted mitochondrial location in Phytophthora and a predicted plastid location in plants and/or algae. One example, the gene for 2-isopropylmalate synthase (functioning in leucine biosynthesis), is shown in Fig. 1B. Although a few details of this tree appear to be anomalous, owing perhaps to the ancient separation of these lineages and sparse taxon sampling, there are clearly two major phylogenetic groups of this gene: one acquired in fungi by transfer from an α-proteobacterium, presumably the endosymbiont that gave rise to mitochondria, and the other acquired in algae, plants, and stramenopiles from a cyanobacterium, presumably the endosymbiont that originally gave rise to plastids. It is further interesting that this gene in the diatom Thalassiosira pseudonana groups with those of green plants rather than red algae, perhaps indicating a separate ancestry, as has been suggested for some other chromalveolates (12, 13), although this could alternatively be an artifact due to incomplete sampling of lineages or of the genes within them. Figure 1C shows a more unusual example, from the sixth step of purine biosynthesis. The two Phytophthora species, together with the diatom Thalassiosira pseudonana and the green alga Chlamydomonas reinhardtii, are unique among eukaryotes because they have a prokaryotic, organelle-targeted N-phosphoribosyl-carboxy-aminoimidazole (NCAIR) mutase homolog closely resembling that of cyanobacteria (14), in addition to a conventional eukaryotic, cytoplasmic-targeted 1-(5-phosphoribosyl)-5-amino-4-imidazole (AIR) carboxylase (Fig. 1C). The presence of numerous genes of putative phototroph origin in the Phytophthora genomes lends support to the hypothesis that the stramenopile ancestor was photosynthetic, which is consistent with the chromalveolate hypothesis.

Genes involved in the interactions of P. sojae and P. ramorum with their hosts are of central interest. Motile Phytophthora zoospores exhibit chemotaxis toward signals from host tissue such as isoflavones (15). In other eukaryotes, chemotaxis reception is mediated by G protein–coupled receptors (GPCRs) (16). P. sojae and P. ramorum each have 24 GPCRs, four of which show a top match to the Dictyostelium cyclic adenosine monophosphate chemotaxis receptor. Another 12 GPCRs have a C-terminal intracellular phosphatidylinositol-4-phosphate 5-kinase domain similar to the RpkA gene of Dictyostelium (17); this domain would enable signaling to bypass the heterotrimeric G proteins, perhaps explaining why the Phytophthora genomes contain only single genes for G-α and G-β subunits (17).

Because P. sojae and P. ramorum have very different host ranges, it is expected that some of their genes involved in host interactions will have rapidly diverged between the two species as a result of strong selection for effective pathogenesis. Because Phytophthora species are cellular pathogens, secreted proteins are prime candidates for mediators of host interactions (18). The predicted secretomes (7) of the two species (1464 and 1188 proteins, respectively) are evolving significantly more rapidly than the overall proteome. For example, 17% and 11% of the secreted P. sojae and P. ramorum proteins, respectively, are unique at the 30% identify level, whereas only 9% and 4%, respectively, of the overall proteomes are unique. The relatively rapid diversification of the secretomes is also evident in the number of multigene families encoding these proteins: 77% of the proteins belong to families of two or more members, and 30% belong to families of 10 or more members.

Both P. sojae and P. ramorum derive their nutrition biotrophically from living plant tissue during the initial hours of infection, but they switch to necrotrophic growth once the infection has been established, deriving their nutrition from killed plant tissue. As hemibiotrophs, the two species are expected to produce gene products that enable them to evade or suppress the plant's defense responses during early biotrophic infection and to produce gene products that kill and destroy plant tissue during later necrotrophic growth. Table 1 summarizes a wide variety of hydrolytic enzymes encoded by the genomes of the two species in comparison with the genome of the diatom Thalassiosira pseudonana, an autotroph. These destructive enzymes potentially could be associated with the necrotrophic phase. The two Phytophthora genomes encode large numbers of secreted proteases in contrast to the diatom and also encode the pectinases and cutinases required for hydrolyzing plant cell wall and cuticular material. The number of proteinase inhibitor genes required to protect the pathogens from plant proteases is also expanded in the Phytophthora genomes.

Table 1.

Potential infection-related genes in the P. sojae and P. ramorum genome sequences.

Gene productNumbers of genes
P. sojae P. ramorum OrthologsView inlineDiatom
    Proteases, all 282 311 221 314
        Extracellular 47 48 38 8
        Serine proteases 119 127 86 123
        Metalloproteases 71 86 62 84
        Cysteine proteases 67 74 52 63
    Glycosyl hydrolases 125 114 54 n.d.View inline
        Secreted 56 37 23 n.d.
        Pectinesterases 19 15 n.d. 0
        Pectate lyases 43 41 n.d. 0
    Cutinases 16 4 1 0
    Chitinases 5 2 2 49
    Lipases 171 154 n.d. n.d.
    Phospholipases >50 >50 n.d. 23
        Phospholipase C 0 0 0 3
        Phospholipase D 18 18 18 3
Protease inhibitors, all 22 19 13 9
    Kazal 15 12 8 2
    Cystatin 4 4 4 0
Protein toxins
    NPP familyView inline 29 40 7 0
    PcF familyView inline
        Six Cys family 2 4 0 0
        Eight Cys family 17 0 0 0
    Crn familyView inline 40 8 2 0
Secondary metabolite biosynthesis
    Nonribosomal peptide synthetases 4 4 4 16
    Polyketide synthases 0 0 0 0
    Cytochrome P450's 30 24 21 10
        CYP51 clan 0 0 0 1
ABC transporters 134 135 105 63
    PDRView inline (ABCG-full) 45 46 30 3
    ABCG-half 23 22 19 6
    MDRView inline (ABCB) 7 7 4 3
    MRPView inline (ABCC) 23 22 19 6
    Elicitins 18 17 13 0
    Elicitin-like 39 31 22 0
    Avh (RXLR) family 350 350 83 (21)View inline 0
  • View inline* Genes orthologous between P. sojae and P. ramorum were estimated based on bidirectional best BLAST hits and/or using similarity trees created by ClustalW.

  • View inline n.d., not determined

  • View inline Necrosis and ethylene-inducing protein family (19, 20).

  • View inline§ (18, 21).

  • View inline Crinkling and necrosis-inducing protein family (22).

  • View inline Pleiotropic drug resistance transporters.

  • View inline# Multi-drug resistance transporters.

  • View inline** Multi-drug resistance—associated transporters.

  • View inline†† For the Avh family, the estimations of orthology are uncertain due to the rapid divergence of this family. The number in parentheses refers to orthologs that are syntenic and hence most likely to be correct.

  • Gene families encoding proteins previously demonstrated to be toxic to plants show striking diversification; fewer than 25% of the genes remain identifiably orthologous between the two species, and in several cases there are no identifiable orthologs (Table 1). There are also substantial differences in sizes of the gene families. The NPP1 family (19, 20) is more expanded and diversified in P. ramorum, whereas the PcF (18, 21) and crn (22) toxin families are more expanded in P. sojae. Figure 3A illustrates the explosive diversification of the NPP1 toxin family in the genus Phytophthora. This toxin family is interesting because several fungal plant pathogens also contain NPP1 toxin genes (19, 20), but they contain only two to four genes, whereas the Phytophthora species contain 29 or 40 (Fig. 3A).

    Fig. 3.

    Sequence divergence of two potential families of pathogenicity genes. (A) NPP1 or Nep1-like (NLP) protein sequences. A total of 89 sequences were used to construct this phylogram, including 40 P. ramorum and 29 P. sojae sequences. The remaining sequences were retrieved from GenBank. Protein sequences were edited to remove signal peptides and other domains and were aligned using ClustalW, and the unrooted phylogram was made using the neighbor-joining method (MEGA 3.1). The scale bar represents 10% weighted sequence divergence. Species of origin are abbreviated as follows: An, Aspergillus nidulans; Bh, Bacillus halodurans; Ec, Erwinia caratovora; Fo, Fusarium oxysporum; Gz, Giberella zeae; Mg, Magnaporthe grisea; Nc, Neurospora crassa; Pa, Pythium aphanidermatum; Pi, Phytophthora infestans; Pm, Pythium monospermum; Pp, Phytophthora parasitica; Ps, Phytophthora sojae; Pr, Phytophthora ramorum; Sc, Streptomyces coelicolor; Vd, Verticillium dahlia; Vp, Vibrio pommerensis. (B) Similarity of P. sojae Avh genes to P. ramorum. Purple indicates Avh genes, and crimson indicates a set of randomly chosen P. sojae genes having a functional annotation. The red arrow indicates the class that contains the Avr1b-1 gene itself.

    The largest and most diverse family of infection-associated genes identified in the P. sojae and P. ramorum genomes is a superfamily with ∼350 genes in each genome (7) that are similar to four oömycete genes identified as “avirulence” or “effector” genes, namely Avr1b-1 of P. sojae (8), Avr3a of P. infestans (23), and Atr1 (24) and Atr13 (25) of Hyaloperonospora parasitica. We have termed these Avh (avirulence homolog) genes. Avirulence genes were historically identified by their genetic interaction with plant disease resistance genes that encode defense receptors (26). In bacterial plant pathogens, some avirulence proteins function to promote infection by suppressing the plant defense response—hence their renaming as “effector” proteins (26). Many of these bacterial effector proteins are injected into host cells by the type III secretion machinery (26), which explains the intracellular location of many resistance gene–encoded plant defense receptors. Intriguingly, the plant defense receptors that interact with the four cloned oömycete avirulence proteins also have a predicted intracellular location (8, 2325, 27). However, the mechanisms by which the oömycete proteins may enter the plant cell are unknown. The four oömycete avirulence proteins share only very modest sequence similarity, but they do share two motifs, named RXLR and dEER, near the N terminus (24, 28) which are also shared by all of the 700 Avh gene products. Comparison of the 700 Avh sequences reveals a nonrandom distribution of amino acid residues surrounding each motif (7), which could potentially contribute to the their functions. Similarity of the RXLR motif to a motif used by the malaria parasite to transport proteins across the membrane of the parasitiphorous vacuole into the cytoplasm of human erythrocytes (29, 30) suggests that the RXLR motif may function to transport oömycete effector proteins into the plant cytoplasm. Figure 3B shows that the Avh gene family has undergone extensive diversification in comparison with a random set of P. sojae and P. ramorum genes. The diversification of the Avh family, driven presumably by selection pressure from the host defense machinery, underlines the potential importance of this superfamily for infection by these pathogens. Further characterization of these genomes will be published elsewhere (31).

    Supporting Online Material

    Materials and Methods

    SOM Text

    Figs. S1 to S3

    Tables S1 to S5

    References and Notes

    View Abstract

    Navigate This Article