Genome Evolution Following Host Jumps in the Irish Potato Famine Pathogen Lineage

See allHide authors and affiliations

Science  10 Dec 2010:
Vol. 330, Issue 6010, pp. 1540-1543
DOI: 10.1126/science.1193070


Many plant pathogens, including those in the lineage of the Irish potato famine organism Phytophthora infestans, evolve by host jumps followed by specialization. However, how host jumps affect genome evolution remains largely unknown. To determine the patterns of sequence variation in the P. infestans lineage, we resequenced six genomes of four sister species. This revealed uneven evolutionary rates across genomes with genes in repeat-rich regions showing higher rates of structural polymorphisms and positive selection. These loci are enriched in genes induced in planta, implicating host adaptation in genome evolution. Unexpectedly, genes involved in epigenetic processes formed another class of rapidly evolving residents of the gene-sparse regions. These results demonstrate that dynamic repeat-rich genome compartments underpin accelerated gene evolution following host jumps in this pathogen lineage.

Phytophthora infestans is an economically important specialized pathogen that causes the destructive late blight disease on Solanum plants, including potato and tomato. In central Mexico, P. infestans naturally co-occurs with two extremely closely related species, Phytophthora ipomoeae and Phytophthora mirabilis, that specifically infect plants as diverse as morning glory (Ipomoea longipedunculata) and four-o’clock (Mirabilis jalapa), respectively. Elsewhere in North America, a fourth related species, Phytophthora phaseoli, is a pathogen of lima beans (Phaseolus lunatus). Altogether these four Phytophthora species form a very tight clade of pathogen species that share ~99.9% identity in their ribosomal DNA internal transcribed spacer regions (1). Phylogenetic inferences clearly indicate that species in this Phytophthora clade 1c [nomenclature of (2)] evolved through host jumps followed by adaptive specialization on plants belonging to four different botanical families (2, 3). Adaptation to these host plants most likely involves mutations in the hundreds of disease effector genes that populate gene-poor and repeat-rich regions of the 240–megabase pair genome of P. infestans (4). However, comparative genome analyses of specialized sister species of plant pathogens have not been reported, and the full extent to which host adaptation affects genome evolution remains unknown.

To determine patterns of sequence variation in a phylogenetically defined species cluster of host-specific plant pathogens, we generated Illumina reads for six genomes representing the four clade 1c species. We included the previously sequenced P. infestans strain T30-4 (4) to optimize bioinformatic parameters (figs. S1 to S3) (5). To estimate gene copy number variation (CNV) in the five resequenced genomes relative to T30-4, we used average read depth per gene and GC content correction (5) (fig. S4). After GC content correction (6), average read depth provided a good estimate of gene copy number in T30-4 (fig. S5). In the other genomes, we detected 3975 CNV events in coding genes, among which there are 1046 deletion events (Fig. 1).

Fig. 1

Summary of genome sequences obtained for Phytophthora clade 1c species. Six strains representing four species were analyzed. P. infestans T30-4 previously sequenced by Haas et al. (4) was included for quality control. CDS, coding sequence; CNV, copy number variation; SNP, single-nucleotide polymorphism; syn., synonymous.

In total, we identified 746,744 nonredundant coding sequence single-nucleotide polymorphisms (SNPs) in the resequenced strains (Fig. 1). We calculated rates of synonymous (dS) and nonsynonymous (dN) substitutions for every gene (5, 7). Average dS divergence rates relative to P. infestans T30-4 were consistent with previously reported species phylogeny (Fig. 1) (2). We detected a total of 2572 genes (14.2% of the whole genome) with dN/dS ratios >1 indicative of positive selection in the clade 1c strains, with the highest number in P. mirabilis (1004 genes) (fig. S6). A high proportion of genes annotated as effector genes show signatures of positive selection (300 out of 796) (fig. S6). This supports previous observations that effector genes are under strong positive selection in oomycetes (810).

Haas et al. (4) reported that the P. infestans genome experienced a repeat-driven expansion relative to distantly related Phytophthora spp. and shows an unusual discontinuous distribution of gene density. Disease effector genes localize to expanded, repeat-rich and gene-sparse regions of the genome, in contrast to core ortholog genes, which occupy repeat-poor and gene-dense regions (4). We exploited our sequence data to determine the extent to which genomic regions with distinct architecture evolved at different rates. We used statistical tests (table S1) and random sampling (table S2) to determine the significance of differences in CNV, presence/absence polymorphisms, SNP frequency, and dN/dS values in genes located in gene-dense versus gene-sparse regions (5) (fig. S7 and table S3). Although averages of gene copy numbers were similar in both regions, significantly higher frequency of CNV and gain/loss were observed in genes located in the repeat-rich regions (Fig. 2A and fig. S7). Notably, presence/absence polymorphisms were 13 times as abundant in the gene-sparse compared to the gene-dense regions. In addition, even though SNP frequency was similar across the genomes, average dN/dS was significantly higher in gene-sparse regions, indicating more genes with signatures of positive selection (Fig. 2A). Indeed, 23% of the genes in the gene-sparse regions showed dN/dS > 1 in at least one of the resequenced genomes compared to only 11.5% of genes in the gene-dense regions. In total, 44.6% of the genes in the gene-sparse regions showed signatures of rapid evolution (deletion, duplication, or dN/dS > 1) compared to only 14.7% of the remaining genes. The uneven distribution in gene density in the P. infestans genome can be visualized with plots of two-dimensional bins of 5′ and 3′ flanking intergenic region (FIR) lengths (4). We adapted the plots to illustrate the relationships between gene density and polymorphism and confirmed the increased rates in the gene-sparse regions (Fig. 2B and fig. S8). We conclude that different regions of the examined genomes evolved at markedly different rates, with the gene-sparse, repeat-rich regions experiencing accelerated rates of evolution.

Fig. 2

The two-speed genome of P. infestans. (A) Distribution of copy number variation (CNV), presence/absence (P/A) and single-nucleotide polymorphisms (SNP), and dN/dS in genes from gene-dense regions (GDRs) and gene-sparse regions (GSRs). Statistical significance was assessed by unpaired t test assuming unequal variance (CNV, dN/dS); assuming equal variance (SNP frequency); or by Fisher’s exact test (P/A) (P < 0.1; ***P < 10−4). Whiskers show first value outside 1.5 times the interquartile range. (B) Distribution of polymorphism in P. mirabilis and P. phaseoli according to local gene density (measured as length of 5′ and 3′ flanking intergenic regions, FIRs). The number of genes (P/A polymorphisms) or average values (CNV, SNP, dN/dS) associated with genes in each bin are shown as a color-coded heat map.

To gain insights into the functional basis of the uneven evolutionary rates detected in the gene-sparse versus gene-dense regions of the clade 1c species, we plotted genome-wide microarray expression data on the FIR length maps (fig. S9) (4). Gene-dense regions were enriched in genes induced in sporangia, the asexual spores that are produced by all Phytophthora species. In marked contrast, distribution patterns of genes induced during preinfection and infection stages indicate enrichment in genes located in gene-sparse loci (fig. S9). χ2 tests revealed that the relationships between gene density (FIR length) and patterns of gene expression are significant (fig. S9 and table S3). We conclude that the gene-sparse, repeat-rich regions are highly enriched in genes induced in planta, therefore implicating host adaptation in genome evolution.

To assign biological functions to genes with accelerated rates of evolution that populate the gene-sparse, repeat-rich regions, we performed Markov clustering on the predicted proteome of P. infestans and implemented gene ontology mapping. Protein families (tribes) significantly enriched or deficient in genes that locate to gene-sparse regions or are rapidly evolving were identified with Fisher’s exact test. In total, 811 tribes with five or more proteins were generated (44% of proteome) (figs. S10 and S11). Of these, 163 tribes were statistically enriched in genes from gene-sparse regions (Fig. 3A and fig. S12), 123 tribes were enriched in fast-evolving genes (fig. S12), and 65 tribes were enriched for both criteria (Fig. 3B and fig. S12). As expected, several of these tribes (19 out of 65) consist of effector families (4, 1113) (table S4). Other notable tribes include genes encoding various enzymes such as cell wall hydrolases and proteins related to epigenetic maintenance (Fig. 3B and table S4). Surprisingly, tribes annotated as histone and ribosomal RNA (rRNA) methyltransferases were particularly rich in genes located in gene-sparse regions and exhibiting presence/absence polymorphisms (table S4 and figs. S13 and S14). Several genes encoding DOT1-like and SET domain histone methyltransferases and SpoU-like rRNA methyltransferases are exceptional among genes involved in epigenetic maintenance for residing largely in gene-sparse regions and showing high rates of polymorphism (fig. S15).

Fig. 3

Enrichment of P. infestans families (tribes) in genes residing in gene-sparse regions and rapidly evolving genes. (A) The 811 P. infestans tribes with five or more genes (x axis) ranked on the basis of ascending enrichment in GSR genes (y axis). P value of a χ2 test for significance of enrichment is indicated. Additional gene categories (core/not core orthologs, secreted/not secreted, and rapidly/not rapidly evolving) are shown for reference. (B) P values of χ2 tests for tribe enrichment in GSR genes (x axis) and rapidly evolving genes (y axis). Tribes with P values < 0.1 (log10) are shown. Bubble sizes are proportional to the number of genes in tribes. Bubble colors indicate functional categories. Numbers refer to tribe identifiers as listed in table S4.

Our study demonstrates that highly dynamic genome compartments enriched in noncoding sequences underpin accelerated gene evolution following host jumps. Gene-sparse regions that drive the extremely uneven architecture of the P. infestans genome are highly enriched in plant-induced genes, particularly effectors, therefore implicating host adaptation as a driving force of genome evolution in this lineage. In addition, we unexpectedly identified several genes involved in epigenetic processes, notably histone methyltransferases, as rapidly evolving residents of the gene-sparse regions. Histone methylation indirectly modulates gene expression in various eukaryotes (14, 15) and could underlie concerted and heritable gene induction patterns through long-range remodeling of chromatin structure (16). Histone acetylation and methylation are thought to be key regulators of gene expression in P. infestans (17) and could modulate expression patterns of genes located in the gene-sparse regions. In addition, histone hypomethylation reduces DNA stability (18, 19) and may have contributed to genome plasticity in the P. infestans lineage by regulating transposon activity as well as genomic and expression variability (20, 21). Finally, understanding P. infestans genome evolution should prove useful in designing rational strategies for sustainable late blight disease management based on targeting the most evolutionarily stable genes in this lineage.

Supporting Online Material

Materials and Methods

Figs. S1 to S15

Tables S1 to S4


References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank the Broad Institute Sequencing Platform and J. Pike for sequencing; G. Kessel and V. Vleeshouwers for biomaterial; and D. Weigel, J. Dangl, and J. Ecker for useful comments. This project was funded by the Gatsby Charitable Foundation, Marie-Curie IEF contract 255104, National Research Initiative of the U.S. Department of Agriculture grant 2006-35600-16623, NSF grant EF-0523670, and research funding program LOEWE of the Ministry of Research, Science and the Arts of Hesse (Germany). Sequences are deposited in GenBank under the submission accession numbers SRA02326–2329 and SRA024355 and in the Short Read Archive under study accession numbers ERP000341–344.
View Abstract

Stay Connected to Science

Navigate This Article