Functional Characterization of the S. cerevisiae Genome by Gene Deletion and Parallel Analysis

See allHide authors and affiliations

Science  06 Aug 1999:
Vol. 285, Issue 5429, pp. 901-906
DOI: 10.1126/science.285.5429.901


The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs (more than one-third of the ORFs in the genome). Of the deleted ORFs, 17 percent were essential for viability in rich medium. The phenotypes of more than 500 deletion strains were assayed in parallel. Of the deletion strains, 40 percent showed quantitative growth defects in either rich or minimal medium.

The budding yeast S. cerevisiae serves as an important experimental organism for revealing gene function. In addition to carrying out all the basic functions of eukaryotic cells, up to 30% of positionally-cloned genes implicated in human disease have yeast homologs (1). Determining the function of all yeast gene products will be an important step toward understanding their function in metazoans and lays the foundation for a more complete comprehension of cellular processes and pathways.

A powerful way to determine gene function is the phenotypic analysis of mutants missing the gene. Several genome-wide approaches have been proposed including genetic footprinting and random mutagenesis (2, 3). While genetic footprinting has the advantage that all genes can be tested for their contribution to fitness under a particular growth condition relatively quickly, it has the disadvantage that the mutant strains cannot be recovered. In addition, testing each additional condition is as time-consuming as the first. Random mutagenesis is relatively rapid, but the subsequent matching of phenotypes to genes is slower. In addition, with random approaches a certain fraction of genes may be missed, even with oversampling. These limitations can be overcome by deleting each gene in the genome in a directed fashion and by marking each yeast gene with a molecular “barcode” that allows the phenotypes of the mutant strains to be assayed in parallel.

The precise deletion of yeast genes can be efficiently accomplished using a polymerase chain reaction (PCR)–mediated gene disruption strategy that exploits the high rate of homologous recombination in yeast (4). For this method, short regions of yeast sequence [∼50 base pairs (bp)] identical to those found upstream and downstream of a targeted gene are placed at each end of a selectable marker gene through PCR. The resulting PCR product, when introduced into yeast cells, can replace the targeted gene by homologous recombination. For most genes, >95% of the resulting yeast transformants carry the correct deletion (5). In addition, this method can be modified so as to introduce two molecular barcodes (UPTAG and DOWNTAG) into the deletion strain. The barcodes or “tags” are unique 20-base oligomer (20-mer) sequences that serve as strain identifiers (6, 7). We show that these barcodes allow large numbers of deletion strains to be pooled and analyzed in parallel in competitive growth assays. This direct, simultaneous, competitive assay of fitness increases the sensitivity, accuracy and speed with which growth defects can be detected relative to conventional methods.

To take full advantage of this approach and to accelerate the pace of progress, an international consortium was organized to generate deletion strains for all annotated yeast genes. Here, we report the construction of precise start-to-stop codon deletion mutants for 2026 ORFs (8).

Genes essential for viability in yeast, in particular those encoding proteins lacking human homologs, have been proposed to be the best targets for antifungal drugs. When spores from the 2026 heterozygous strains were germinated on YPD (yeast extract–peptone–dextrose) media at 30°C haploid deletants could not be recovered for 356 ORFs (see yeast_deletion_project/deletions3.htmlfor an exact list) (9). Despite the considerable interest in these genes as potential drug targets, only 56% of these ORFs had previously been shown to be essential for viability (10). Of the 2026 ORFs analyzed, 1620 were not essential for viability in yeast. For these one additional homozygous and two haploid deletants (Table 1) were also constructed.

Table 1

Genotypes of strains used in study. For the YD strains, in a few cases deletions were generated in the BY4730 and BY4739 parent strains. These haploid deletants, as well as the resulting homozygous and heterozygous diploids, areHIS +.

View this table:

A computational Smith-Waterman analysis indicated that 8.5% of the identified nonessential ORFs in the yeast genome have a closely related homolog elsewhere in the genome (P < 1.0e-150), whereas only four (1%) of the essential genes (PYK1, YDR341C, PRP22, and MYO2) encoded proteins that were homologous to another predicted protein in the genome. The redundancy may be why more genes in the yeast genome are not essential. The essential genes were distributed fairly evenly across the chromosome but were slightly biased toward being located near other essential genes (60% of essential genes were within 5 kb of another essential gene, whereas 47% of nonessential genes were). Essential genes were generally not found within 50 kb of the telomeres (Fig. 1A). Essentials were also more heavily transcribed. Transcripts were detected for >99% of essential genes versus 90% of nonessential genes (11). The average number of transcripts per cell for all essential genes was 70% higher than for all nonessential genes. The functional classification of the essential genes versus the nonessential genes is shown in Fig. 1B.

Figure 1

(A) Genomic locations of 1620 nonessential (short black bars) and 356 essential genes (tall black bars). Deletions were generated in consecutive groups on multiple chromosomes. A lighter gray background indicates the location of chromosomal duplication blocks (23). For 15 of the 356 essential ORFs, a haploid null mutant had been previously described. These inconsistencies may be due to differences in strain background or in the conditions used for germination of spores. For example, the previously-constructed yhr101c (big1) andybr196c (pgi1) null mutants show glucose sensitivity, and the ybr256c (rib5) deletion mutant requires a riboflavin supplement for viability (24). In several cases, haploid null mutants were reported to have slow-growth phenotypes [ymr308c (PSE1),yol022c, ypl243c (srp68),ypl210c (srp72), and ydr353w(trr1) (25)] or to be temperature sensitive [ydr113c (pds1) andygr216c (gpi1) (26)]. In some cases, haploid deletions strains were constructed for genes previously determined to be essential by others [HKR1,RNR1, and FUN9 (27)]. (B) Distribution of functional classes of essential (inner circle) and nonessential (outer circle) ORFs using criteria from the Munich Information Centre for Protein Sequences (MIPS) (10).

The phenotypic analysis of the deletion strains, in particular those whose cognate protein is not essential to life, is a formidable task. The role of many genes will likely be manifested only under very specialized growth conditions, necessitating the examination of many different conditions. Previous work demonstrated that the barcodes allowed the relative abundance of their respective strains to be measured when 12 strains were grown competitively for many generations (6). The barcoding scheme thus has the potential to accelerate the phenotypic analysis of the deletion strains by allowing the growth rates of all strains to be assayed simultaneously. The first 558 homozygous deletion strains constructed were pooled (12) and grown in rich and minimal media for about 60 generations. During this time, aliquots were removed from the two pools. The tags were amplified, and hybridized to high-density arrays containing the tag complements (Fig. 2A) (13). The hybridization data were used to calculate the relative growth rates for each deletion mutant in the population (14). It was expected that the growth rate for each strain obtained independently with the UPTAG and DOWNTAG signals would agree. For strains in which both the UPTAG and DOWNTAG signal were at least threefold over background, the correlation for growth rates measured with the UPTAG and DOWNTAG were 0.97 in rich medium, and 0.94 in minimal medium (Fig. 2, B and C). The weakest correlations were observed for strains that were the most growth impaired (growth rate <0.6 of that of the wild type) because sufficient signal was detected only for first few time points.

Figure 2

Analysis of 558 homozygous diploid strains in rich and minimal medium. Pools were grown for ∼60 generations in minimal medium (SD) supplemented with histidine, uracil, and leucine, or in rich medium (YPD) at 30°C. The batch-transfer method (2, 3) was used to ensure that the cell densities did not exceed 4 × 107 cells/ml during the growth study. At least 500,000 cells were transferred during each dilution to avoid unequal representation of strains. Samples (approximately 5 × 107 cells) were taken at six different time points for each growth study and genomic DNA was extracted. The two tags for each strain were amplified separately with biotin-labeled primers and were hybridized together to the arrays. (A) Two-color comparison of scanned images of high-density oligonucleotide arrays hybridized with fluorescently-labeled tags amplified from 558 strains grown for 0 (red channel) or 6 hours (green channel) in minimal medium. Approximately 4000 different features, each containing more than 107 20-mers of a specific sequence, were synthesized in a specific physical location on the array by photolithography and photosensitive oligonucleotide chemistry (6, 28). Only a portion of the array is shown. The features containing probes to the bar codes for deletion strains that exhibit a growth defect in minimal medium, but not in rich medium, are labeled and indicated with an arrow. The array sequences were assigned sequentially to different deletion strains. A second-generation array has been designed, which contains enough tag complement sequences for every gene in the entire genome. [(B) and (C)] Correlation of growth rate data obtained with UPTAG and DOWNTAG sequences for strains grown in rich (B) and minimal (C) media. Data are shown for 331 strains (of the 401 strains that contained both an UPTAG and a DOWNTAG sequence) which had t = 0 UPTAG and DOWNTAG hybridization signals that were both at least threefold over background. More frequent sampling during the initial growth period should improve the correlation. (D) Normalized hybridization intensity data for the 10 slowest-growing (yer014c-a, rnr1, hem14, mot2, pfk2, rpl27a, yer044c, rp50a, ymr188c, yel029c) and the 10 fastest-growing strains in rich medium. (E) Hybridization data for the 10 slowest-growing (hem14, arg5,6, ade1, yer068c-a, ilv1, yer014c-a, yer044c, lys7, gcn4, rnr1) and 10 fastest growing strains in minimal medium.

As expected, the strains disappearing at the highest rates in minimal medium but which grew relatively normally in rich medium (at >98% of the pool growth rate) included all of the strains carrying deletions in genes known to be essential for growth in minimal medium includingade1 [tag average <0.50 in minimal (M); 0.98 in rich (R)]; arg5,6 (<0.50, M; 0.98, R); yer068c-a(<0.50, M; 0.99, R; overlaps arg5,6); ilv1(<0.50, M; 1.01, R); arg4 (<0.50, M; 1.0, R);gcn4 (0.53, M; 0.98, R); hom3 (0.54, M;1.0, R); and ade4 (0.56, M; 1.01, R). In addition, thegyp1 (0.78, M; 0.99, R) deletant showed a minimal medium-specific growth defect (15). GYP1(YOR070C) is a GTPase activating protein for Sec4p, a protein in the secretion pathway (16). Mutants oflys7 also showed a growth defect in minimal medium but also grew somewhat slowly in rich medium (0.52, M; 0.88, R). Six strains exhibiting a specific growth defect in rich medium were also identified. These strains included cin8 (0.80 R; 0.95, M);erg6 (0.71, R; 0.96, M); rpl39 (0.85, R; 0.98, M); yml193C-A (0.85, R; 0.98, M; overlaps ribosomal proteinrpl36a); esc1 (0.83, R; 0.97, M) andyml013w (0.78, R; 0.95, M).

Altogether, almost 40% of the deletants examined showed some sort of growth defect in the competitive growth assay (Fig. 3): 24 (5%) at less than 75% of the pool doubling time; 27 (5%) from 75 to 85%; 80 (15%) at 80 to 98%; and 71 (14%) at 98 to 100%. Strains that grew poorly in rich medium generally also grew poorly in minimal medium aside from the exceptions (e.g. rnr1, hem14) described above. The phenotypic profiles for the deletion strains were in good accordance with those obtained using other methods (17).

Figure 3

Deletion map for 336 ORFs and the results of parallel phenotypic analysis for 226 ORFs on chromosome XIII. Data for additional chromosomes can be found Chromosome right arms are shown with white backgrounds and left arms with gray. ORFs for which deletions were not generated (gray bars) resulted from failure during PCR or oligonucleotide synthesis (5.2%), failure for unknown reasons (4%), failure to pick unique primers (3.3%), or failure to generate deletions in all four strains (2.5%).

It is often assumed that if a gene is expressed under a particular set of conditions, then that gene is important for growth under those conditions. Deletion of the up-regulated gene would then be expected to cause a decrease in growth. These data offered a unique opportunity to correlate changes in gene expression with deletion phenotypes. The transcript abundance of all genes in the genome was measured in rich and minimal media to determine whether the set of genes whose inactivation produced a quantitative fitness defect in rich and minimal media would be the same set whose expression was induced under these growth conditions (18). Surprisingly little correlation existed between the growth study data and the expression data. Deletion of genes specifically induced in minimal media was no more likely to affect fitness in minimal medium than deletion of the uninduced genes. Of the genes showing a strong minimal medium growth defect, onlyARG4 and ARG5,6 were significantly up-regulated (greater than twofold) in minimal medium relative to rich medium. Similarly, only one of the six genes that showed a rich medium–specific growth defect was upregulated more than 2.5-fold in rich medium relative to minimal media. These data indicate the importance of multiple approaches in genome-wide functional analysis studies.

The results we present demonstrate that quantitative fitness data can be rapidly obtained under various conditions. Although the presence of the KanMX4 gene has been shown to have no effect on the fitness of some deletion strains (19), it is theoretically possible that the encoded neomycin phosphotransferase could have an impact on a particular deletion strain. The composition of a pool of deletion strains and the conditions under which the pool was cultured could also have an effect on the observed fitness of the strains. Finally, the phenotypes of certain deletions strains might be complemented by factors released into the medium by other strains in the pool. Additional tests will be required to determine how frequently these artifacts will occur.

These results also show that thousands of deletants can be systematically made once the sequence of a genome is known. Several laboratories in Europe and North America are collaborating to finish construction of tagged deletions for all annotated S. cerevisiaeORFs within 1 year (20). Currently, more than three-quarters of the ORFs in the yeast genome have been deleted (5100 ORFS and more than 15,300 strains generated). Whereas a significant amount of work was expended to construct the strains, in contrast to other methods for generating functional data (2,3), the strains provide a lasting resource. In addition, the availability of a consistent set of isogenic strains should provide a better way for researchers to compare their results with those of others, easing the task of curating the functional assignments that hitherto have been made in various strain backgrounds. Finally, while other efforts have been mounted by a European consortium and others to generate deletion strains (21), the inclusion of barcodes significantly enhances the usefulness of the strains. The ability to assess thousands of strains quantitatively and in parallel will significantly decrease the amount of labor and materials needed for fitness screens (22) and increase the reliability of the data interpretation and functional classifications.

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed. E-mail: dbowe{at}


View Abstract

Stay Connected to Science

Navigate This Article