Report

The Ashbya gossypii Genome as a Tool for Mapping the Ancient Saccharomyces cerevisiae Genome

See allHide authors and affiliations

Science  09 Apr 2004:
Vol. 304, Issue 5668, pp. 304-307
DOI: 10.1126/science.1095781

This article has a correction. Please see:

Abstract

We have sequenced and annotated the genome of the filamentous ascomycete Ashbya gossypii. With a size of only 9.2 megabases, encoding 4718 protein-coding genes, it is the smallest genome of a free-living eukaryote yet characterized. More than 90% of A. gossypii genes show both homology and a particular pattern of synteny with Saccharomyces cerevisiae. Analysis of this pattern revealed 300 inversions and translocations that have occurred since divergence of these two species. It also provided compelling evidence that the evolution of S. cerevisiae included a whole genome duplication orfusion of two related species and showed, through inferred ancient gene orders, which of the duplicated genes lost one copy and which retained both copies.

The filamentous fungus Ashbya gossypii is currently used in industry for the production of vitamin B2 (1). It is also an attractive model to study filamentous growth, because of its small genome, haploid nuclei, efficient gene targeting, propagation of plasmids, and growth on defined media (28). The A. gossypii genome project was initiated when conservation of gene order and orientation (synteny) to Saccharomyces cerevisiae was noted (9). We wanted to determine the complete gene repertoire for future work with this fungus, and we aimed at using the gene order information to fully explain the origin of gene cluster duplications in the S. cerevisiae genome that were proposed to represent relics of a whole genome doubling followed by extensive genome rearrangements (10, 11).

Details on the sequencing of the A. gossypii genome (GenBank accession numbers AE016814 through AE016821) and annotation are available in the supporting online material. The seven chromosomes encode 4718 proteins, 199 tRNA genes, and at least 49 small nuclear RNA (snRNA) genes. The ribosomal DNA carries 40 copies of ribosomal RNA genes sequenced previously (12). The genome lacks transposons and subtelomeric gene repeats, and gene duplications are rare (table S6). The number of protein-coding genes is similar to the 4824 genes found in Schizosaccharomyces pombe (13), suggesting that this may be close to the minimum number of genes needed by a free-living fungus. The genome is extremely compact with an average distance between open reading frames (ORFs) of only 341 base pairs, contributing to an average protein-coding gene size of only 1.9 kb, clearly less than the 2.1-kb average gene size found in S. cerevisiae (14), the 2.5 kb found in S. pombe (13), and the 3.7 kb found in Neurospora crassa (15). The presence of only 221 introns in the entire A. gossypii genome, many at identical positions in S. cerevisiae homologs, contributes to the compact nature of this genome.

A. gossypii and S. cerevisiae diverged more than 100 million years ago, and their genomes differ substantially in GC content (52% for A. gossypii and 38% for S. cerevisiae). Still, for 95% of the protein-coding sequences of A. gossypii, we found homologs in the S. cerevisiae genome, the majority (4281 ORFs) at syntenic locations. Only 175 A. gossypii protein-coding genes showed homology but not synteny with S. cerevisiae genes, and 262 lack homology (table S3). Several genes with no homologs in S. cerevisiae have homologs in S. pombe (table S4), supporting the idea that they are real genes not or no longer present in S. cerevisiae. The annotation of the A. gossypii genome also identified gene functions present in S. pombe and S. cerevisiae but not in A. gossypii (table S5). Protein sequence conservation between syntenic homologs of A. gossypii and S. cerevisiae varies considerably, ranging from less than 20% amino acid identity to nearly 100%. In cases where the sequence identity was less than 30%, synteny was particularly useful in the identification of highly diverged orthologs. The marked fluctuation of sequence conservation of syntenic homologs across the entire genome shows that no one region of the genome is more conserved between these species and that the plasticity of the primary protein sequence varies significantly between genes.

More than 90% of the A. gossypii genome could be divided into several hundred synteny groups. In each of these groups, the homology relation of single genes or subgroups of genes alternated between two S. cerevisiae regions and A. gossypii. We refer to this pattern as double synteny. A relatively simple example of double synteny is shown in Fig. 1A. In this group, thirty-three consecutive protein-coding genes of A. gossypii chromosome I, flanked by two tRNA genes, align with regions of consecutive genes from S. cerevisiae chromosomes XV and XVI. Homologous ORFs typically are of conserved length (table S3).

Fig. 1.

A simple pattern of double synteny and its conversion to an ancient synteny map. (A) Double synteny between A. gossypii ORFs AAL119W to AAL087C and two S. cerevisiae regions, ORFs YOR265W to YOR246C and YPL176C to YPL159C, respectively. Arrows represent ORF sizes and orientation and are drawn to scale. ORFs classified as Watson can appear after rearrangement events as Crick ORFs and vice versa. Lines connect pairs of homologs, with thick lines marking S. cerevisiae twin ORFs that originate from the genome duplication. Crossing lines point to an inversion of five genes. (B) An ancient synteny map. ORFs are represented as rectangles and tRNA genes as squares. Syntenic homologs are connected by vertical bars and, in the case of twin ORFs, by thick vertical bars. Transcription orientation is indicated by white arrows only for the A. gossypii ORFs, because syntenic S. cerevisiae homologs are transcribed in the same direction. Thin lines above and below S. cerevisiae genes mark the most likely extent of syntenic genes at the time when the precursor genome of S. cerevisiae duplicated. These lines are interrupted when the ancient gene order is no longer colinear with the present gene order, as with YPL176C and YPL170W. Such breakpoints of ancient synteny (arrows) mark end points of inversions or translocations.

When both S. cerevisiae regions are combined (with the inversion in chromosome XVI reverted), the resulting gene order matches that of A. gossypii, including gene orientations, which justifies annotation of these ORFs as syntenic homologs. This principle has been applied to all ORFs annotated as syntenic homologs in table S3. Three ORFs did not participate in this synteny relation. The A. gossypii gene AAL119W is a nonsyntenic homolog of S. cerevisiae YFR021W, a gene involved in vacuolar protein processing. The S. cerevisiae gene YOR264W plays a role in daughter cell–specific gene expression, a dispensable function in a filamentous fungus, and this may explain the absence of a homolog in A. gossypii. YPL171C is a nonsyntenic homolog of AGR329C and encodes one member of the dehydrogenase family. Nonsyntenic homologs may represent former syntenic homologs, adjacent sequences of which were involved in genome rearrangements so that insufficient evidence of synteny remained.

The pattern of double synteny reflects the gene order of the most recent common ancestor and changes in this order due to genome rearrangements in both lineages, in particular the loss of many genes in the S. cerevisiae lineage after the genome duplication. This makes reconstructions of ancient S. cerevisiae gene orders relatively easy (Fig. 1B). In this ancient synteny map, protein-coding genes are presented as rectangles and syntenic homologs are connected by vertical bars. All gaps emerging between S. cerevisiae genes in this type of presentation indicate positions at the time of the genome duplication of former homologs that were subsequently lost.

The A. gossypii gene order also allowed us to find inversions that occurred in the S. cerevisiae lineage, such as the one on chromosome XVI (Fig. 1B), because the alignment of homologs reveals their order before inversion. This ancient gene order shows a single break of synteny, between YPL176C and YPL170W. The other breakpoint of this inversion coincides with a double break of synteny (black arrows): Two other S. cerevisiae gene regions share homology and synteny with the eight A. gossypii ORFs proximal to AAL119W. The majority of such double breaks of synteny do not coincide with single breaks, and most of them mark end points of inversions or translocations in the evolutionary past of A. gossypii or of S. cerevisiae before the genome duplication. The A. gossypii genome often carries tRNA genes (open squares) or nonsyntenic homologs (blue rectangles) at such end points of rearrangements, as seen in Fig. 1. The presence of tRNA genes at such sites may be explained by the absence of other interspersed repeated DNA elements in A. gossypii, and therefore the tRNA genes served as sites for homologyinduced rearrangements.

An example of ancient gene order reconstructed from a complex double synteny pattern is shown in Fig. 2 for the centromere region of A. gossypii chromosome I. The centromere-proximal genes display homology and synteny to the centromere regions of S. cerevisiae chromosomes III (region YCL to YCR) and XIV (region YNL to YNR). A duplication of these two regions was proposed 10 years ago (16) and extensively analyzed on completion of the S. cerevisiae genome (10). Other genes in the A. gossypii centromere I region show homology and synteny to two short regions of S. cerevisiae chromosome XII (YLR) and to a very short region of S. cerevisiae chromosome VII (YGL). Again, all gaps between S. cerevisiae genes mark sites of former homologs at the time of genome duplication that were lost during evolution. The alignment of the S. cerevisiae homologs revealed eight synteny breaks when compared with today's S. cerevisiae genome (14). These breaks originate from three reciprocal translocations (Tra) and one inversion (Inv) in S. cerevisiae after genome duplication (white arrows) and from one inversion (black arrows) in the A. gossypii lineage (or the S. cerevisiae lineage before genome duplication) (fig. S4).

Fig. 2.

Multiple rearrangements in an ancient centromere region. A. gossypii ORFs AAL030C to AAR004C are connected by vertical lines with those S. cerevisiae ORFs that encode homologous proteins. All ORFs participating in this pattern are classified in the GenBank entries as syntenic homologs. They originate from S. cerevisiae chromosomes III (YCL and YCR), VII (YGL), XII (YLR), and XIV (YNL and YNR). In addition to the homologous ORFs, homologs to the two flanking A. gossypii tRNA genes (open squares), one snRNA gene (black squares), and the centromeres (black circles) were identified in the syntenic S. cerevisiae regions and connected by lines. Black rectangles represent two of 262 A. gossypii ORFs that lack a homolog in S. cerevisiae (table S3). The transcription direction is indicated by white arrows and is identical in the syntenic S. cerevisiae ORFs. Thin horizontal lines above and below S. cerevisiae genes mark the extent of ancient synteny, as inferred from the reconstruction of gene deletion and genome rearrangement events (fig. S4). A major difficulty rests in the assignments of the ends of these lines in cases of extensive gene losses, e.g., between YNL001W and YNL007C. These cases are rare, and the final decision is based on the reconstruction of the most likely rearrangement events that interrupted the original gene order. End points of such events are marked by white arrows when the gene order in one of the two syntenic S. cerevisiae regions is affected (single breaks of synteny) and by black arrows when both gene orders are simultaneously affected (double breaks of synteny).

We aligned the A. gossypii and S. cerevisiae genomes according to these principles and highlighted synteny breaks. An example of a map of synteny breaks using A. gossypii chromosome I as template is given in Fig. 3, and the entire map is given as fig. S5. Close to 96% of the A. gossypii genome aligns with two regions from the S. cerevisiae genome, including seven duplicated centromere regions and relics of a centromere region down-stream of AAL174C and ACR029C (Table 1 and table S3). These genome alignments are interrupted by 328 double breaks of synteny and 168 single breaks of synteny.

Fig. 3.

Alignments of syntenic S. cerevisiae genes with chromosome I of A. gossypii. The vertical bars represent 381 ORFs, 27 tRNA genes, and 2 snRNA genes of A. gossypii chromosome I. Thick horizontal bars correspond to those regions of the S. cerevisiae genome that display synteny to the A. gossypii genes. The three capital letters and the three-digit numbers refer to the systematic nomenclature of S. cerevisiae genes. For example, the two bars marked as YJL 074 to 080 and YKR 010 to 014 (top left) encompass seven A. gossypii genes with homology, conserved gene order, and conserved gene orientation to the S. cerevisiae genes. This group of genes represents a cluster of ancient synteny, with ancient synteny referring to the degree of synteny at the time of S. cerevisiae genome doubling, when most A. gossypii genes had two syntenic S. cerevisiae homologs. In order to match the ancient synteny, thick bars are often extended at one end to complement the loss of one of the duplicated S. cerevisiae genes. This ancient synteny map displays double breaks and single breaks of synteny. The different origins of these breaks in the boxed region are explained in detail in fig. S4. The gaps of the homology bars at double and single breaks are due to A. gossypii genes that either lack homology to any S. cerevisiae gene or display homology to a nonsyntenic S. cerevisiae gene. Centromeres are indicated by black circles and show the evolutionary conservation of the centromere region after genome doubling.

Table 1.

Centromere assignments based on synteny of the genes flanking the seven A. gossypii and 16 S. cerevisiae centromeres. Roman numerals indicate the chromosome number in the respective organism. The two remaining centromeres (X and XII) have synteny with regions on chromosomes I and III, with the double break in synteny coming at the expected position of the centromere.

A. gossypii chromosomes S. cerevisiae chromosomes
I III, XIV
II VIII, XI
III XIII, XV
IV V, IX
V II, IV
VI I, VII
VII VI, XVI
Noncentromeric Region (I and III) X, XII

The essentially complete coverage of the seven A. gossypii chromosomes by clusters of ancient synteny, each containing two S. cerevisiae gene regions, demonstrates that both organisms originate from the same ancestor with seven or eight chromosomes. A speciation event, probably involving translocations (17) and an accompanying change in chromosome number, generated the precursors of A. gossypii and S. cerevisiae. At some later time, a genome duplication in the S. cerevisiae precursor opened new possibilities for functional divergence not available for the evolution of A. gossypii. The duplication event created ∼5000 twin ORFs in the duplicated S. cerevisiae genome, and 496 of these ancient twin ORFs can still be seen in the double synteny patterns (table S7). Several of these twin ORFs diverged and now encode proteins of different functions like ORC1, which is essential for DNA replication, and SIR3, which is important for gene silencing (18). Other examples can be extracted from the functional descriptions in table S7. For 59 pairs of the twin ORFs, functions are not known.

What does the frequency of different types of synteny breaks tell us about the time span since A. gossypii and S. cerevisiae diverged? On the basis of adjusted numbers of synteny breaks (fig. S5), we estimate 120 viable genome rearrangements in the A. gossypii lineage and 180 viable rearrangements in the S. cerevisiae lineage (∼60 before genome duplication). If one assumes similar rates of genome rearrangements in both species and takes into account a recent increase in S. cerevisiae rearrangement due to spreading of transposable elements (19), the time span since divergence of both species is about twice as long as the time span since the genome duplication in S. cerevisiae. This method for estimating relative evolutionary time scales from genome rearrangement frequencies has the potential to be used more often in the future, when additional whole genome synteny patterns become available.

Supporting Online Material

www.sciencemag.org/cgi/content/full/1095781/DC1

Materials and Methods

Tables S1 to S8

Figs. S1 to S5

References and Notes

References and Notes

View Abstract

Navigate This Article