The Fusarium graminearum Genome Reveals a Link Between Localized Polymorphism and Pathogen Specialization

See allHide authors and affiliations

Science  07 Sep 2007:
Vol. 317, Issue 5843, pp. 1400-1402
DOI: 10.1126/science.1143708


We sequenced and annotated the genome of the filamentous fungus Fusarium graminearum, a major pathogen of cultivated cereals. Very few repetitive sequences were detected, and the process of repeat-induced point mutation, in which duplicated sequences are subject to extensive mutation, may partially account for the reduced repeat content and apparent low number of paralogous (ancestrally duplicated) genes. A second strain of F. graminearum contained more than 10,000 single-nucleotide polymorphisms, which were frequently located near telomeres and within other discrete chromosomal segments. Many highly polymorphic regions contained sets of genes implicated in plant-fungus interactions and were unusually divergent, with higher rates of recombination. These regions of genome innovation may result from selection due to interactions of F. graminearum with its plant hosts.

Fusarium, a genus of plant pathogenic fungi, causes diseases that affect most species of cultivated plants, including root and stem rots, blights, and wilts (1). F. graminearum, which causes Fusarium head blight (FHB) disease on wheat and barley, is a leading cause of economic loss in these crops (2). In addition to reducing seed mass and quality, the fungus contaminates grain with toxic metabolites that are a threat to human health (3). Fusarium species also can directly infect humans, causing localized necrotic diseases (4) and invasive infection, especially in immunocompromised individuals (5).

The F. graminearum genome was wholegenome shotgun sequenced by paired-end sequencing of plasmid, Fosmid, and bacterial artificial chromosome (BAC) clones. The resulting assembly totals 36.1 Mb and displays high sequence quality and continuity. Nearly all (99.8%) of the assembly was anchored to the four chromosomes by genetically mapping markers derived from the genome sequence (6), and an initial set of 11,640 genes was predicted (table S1 and SOM text). Functional categories for the predicted genes were inferred by the presence of conserved InterPro domains (7) and were compared with those found in genomes of the related fungi, Neurospora crassa, Magnaporthe grisea and Aspergillus nidulans. The F. graminearum genome has greater numbers of genes for several protein categories, including predicted transcription factors, hydrolytic enzymes, and transmembrane transporters (Fig. 1 and table S2).

Fig. 1.

Functional classification of F. graminearum proteins and comparison to other fungi. Displayed Interpro gene categories were found to have at least 50% greater gene abundance in F. graminearum than in N. crassa. Representative gene numbers for M. grisea and A. nidulans are also shown. Each circle displays the relative fraction of genes represented in each of the categories for each genome. For a full list of Interpro categories, see

The F. graminearum genome has few high-identity duplicated sequences, fewer by at least a factor of 15 than other related fungi, including Saccharomyces cerevisiae (table S3 and SOM text). Only a few gene pairs originated from recent duplications (fig. S1 and table S4), and we identified only two small families of transposons (table S5 and SOM text). F. graminearum differs from other filamentous fungi because it is homothallic (self-fertile) and rarely out-crosses, which limits the opportunity to acquire new repeats (2). In some ascomycetous fungi, including F. graminearum, the lack of repetitive sequence is due to a genome-wide defense system known as repeat-induced point mutation (RIP) (8). RIP identifies duplicated sequences (9) and introduces C:G to T:A transition mutations in both copies during the sexual cycle; this mutational bias was observed in F. graminearum transposons (tables S6 and S7 and SOM text).

To experimentally confirm the activity of RIP, we examined the stability of transgenically introduced repeats of a hygromycin phosphotransferase (hph) gene during sexual and asexual development. Although cultures derived from asexual spores maintained drug resistance, 42% of cultures derived from ascospores, which are formed during the sexual cycle, were sensitive to hygromycin (table S8), and 99% of mutations found in the hph gene were C to T point mutations, most at CpA sites (tables S9 and S10), indicating that repetitive sequences are frequently mutated by a RIP-like process during meiosis. Ascospores are important for FHB infection (2) and, being homothallic, F. graminearum may undergo meiosis and produce ascospores with greater frequency because it is unconstrained by the necessity for finding a compatible mate. An increased opportunity for meiosis may impact the genome by increasing the frequency of RIP, allowing it to play a more central role.

We compared the assembly of strain PH-1 with ∼ 0.4-fold coverage of whole-genome shotgun sequence from a second strain of F. graminearum, GZ3639. We identified 10,495 single-nucleotide polymorphisms (SNPs) between the two strains and mapped SNP positions along each chromosome. Because the GZ3639 data does not cover every base in the genome, SNP densities were normalized to the number of high-quality base alignments (7). For this normalized data set, SNP densities ranged from 0 to 17.5 SNPs per kb. We found that the distribution of SNPs is biased, because 25% of SNPs are found within 5% of the genome sequence, and 50% of SNPs are within 13% of the genome sequence. Regions exhibiting high SNP densities were clustered along each chromosome (Fig. 2). In particular, all telomere proximal regions displayed very high densities of SNPs and contained the vast majority of the highest SNP density windows. In addition to chromosome ends, three chromosomes were found to have one or two large interstitial regions of high SNP density. Whereas telomeres are well established in many species as sites of sequence variation and rearrangement (1012), the presence of discrete interstitial regions of high diversity in addition to the sub-telomeres is striking. One simple explanation is that these sites may reflect ancestral telomere locations resulting from an ancestral chromosome fusion. Although F. graminearum has four chromosomes, related fungi with similar genome size, including F. verticillioides, F. oxysporum, and F. solani, have many more, ranging from 9 to >17. All closely related species in the F. graminearum species complex, as well as F. culmorum, also have four chromosomes, which indicates that if chromosome fusion occurred, it was not a recent event.

Fig. 2.

Correlation of high SNP regions with recombination and gene sets. The four F. graminearum chromosomes are shown proportional to size. Panels (A) to (D) represent chromosomes 1 to 4, respectively. (I) Distribution of SNPs (red lines) and recombination rate (blue dashed lines) for 50 kb non-overlapping windows across each chromosome (x axis). SNPs are plotted on the left y axis as the number of SNPs per kb of high-quality aligned bases (7). Recombination rate [cM/27 kb (6)] is plotted on the right y axis. (II to V) Gene counts for the following gene sets (7) are shown with relative density shading for tiled 100-kb windows: (II) Genes found in F. graminearum but absent from four related Fusarium species. (III) Secreted proteins. (IV) Genes expressed specifically in planta. (V) Proteins involved in translation.

The regions of highest SNP density were significantly correlated with the regions of highest recombination (0.55, P = 1.2×10–13), similar to correlations of SNP distribution or nucleotide diversity with recombination frequency observed in humans and Drosophila (13, 14). Additionally, regions of high SNP density have significantly lower G+C content than the rest of the genome (–0.43, P = 1.1 × 10–8). The low G+C content of internal regions further supports the idea that these regions may represent ancestral telomeres.

To determine whether high diversity SNP regions evolved recently, we examined the sequence divergence of genes in these regions. We compared F. graminearum coding regions to those resulting from a low coverage (4X) assembly of F. verticillioides (7). Comparing the best matches for F. graminearum proteins from high and low SNP density regions (top and bottom quartiles) to the F. verticillioides assembly revealed that proteins from the highest SNP density regions have fewer putative orthologs compared with the rest of the genome and that these orthologs share lower identity (7). Although variation in the local mutation rate is expected to produce a correlation between polymorphism and divergence, more polymorphisms were found in high SNP regions than predicted on the basis of divergence (table S13), and the ratio of synonymous to nonsynonymous polymorphisms is higher than that of less diverse regions (χ2 value = 3.7 ×10–7) (table S14).

Blast analysis (7) identified 704 genes as specific to F. graminearum, and these show significant enrichment in the high-density SNP regions (P = 4.5 × 10–15). We also compared F. graminearum with the closely related F. asiaticum, F. boothii, F. culmorum,and F. pseudograminearum using genomic DNA hybridizations to a F. graminearum micro-array (15) and identified 382 genes that are F. graminearum specific. These genes were overrepresented (by a factor of 2.7) in the high-density SNP regions (P = 3.4 ×10–34). These data further demonstrate that genomic regions exhibiting the highest intraspecific variability also exhibit the highest interspecific variability.

F. graminearum genes specifically expressed during plant infection—including predicted secreted proteins, major facilitator transporters, amino acid transporters, and cytochrome P450s—are all overrepresented in high SNP density regions (Fig. 2, table S14, and SOM text). Conversely, genes predicted to be highly conserved, such as nuclear encoded mitochondrial genes or genes involved in translation, are underrepresented in regions of high diversity (table S15 and SOM text).

Comparison of gene expression of F. graminearum infection on barley and under varied nutritional culture conditions (7, 15) identified 408 genes as exclusively expressed during barley infection. These genes are highly enriched in the high-SNP-density regions (P = 7.4 × 10–15), and 31% are predicted to be secreted, representing enrichment by a factor of 3 over the genome as a whole (table S14 and SOM text). Four of these genes have similarity to known virulence factors, and another 32 genes are predicted plant cell-wall degrading enzymes (table S16). Among these enzymes are xylanases, which degrade xylan, the major hemicellulose portion of monocot cell walls, pectate lyases, which cleave pectin, another essential component of plant cell walls and cutinases, enzymes that hydrolyze cutin polyesters that coat all outer plant surfaces. Such enzymes may function in the penetration and maceration of plant tissues and for the acquisition of nutrients from plant polymers (16) and may be involved as effector molecules that trigger host-plant defense responses (17). The high genetic diversity of this group of genes suggests that the fungus has a great capacity for adaptability and genetic change during its interaction with even this single host species.

The completed genome of F. graminearum allowed us to identify distinct regions of high diversity. We found that these regions are enriched for infection-related genes, which may allow the fungus to adapt rapidly to changing environments or hosts. Recognition of these high-diversity areas of the genome focuses the direction of future work toward those regions that may have the greatest potential in elucidating the dynamics of host pathogen interactions.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S3

Tables S1 to S16


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article