Direct Allelic Variation Scanning of the Yeast Genome

See allHide authors and affiliations

Science  21 Aug 1998:
Vol. 281, Issue 5380, pp. 1194-1197
DOI: 10.1126/science.281.5380.1194


As more genomes are sequenced, the identification and characterization of the causes of heritable variation within a species will be increasingly important. It is demonstrated that allelic variation in any two isolates of a species can be scanned, mapped, and scored directly and efficiently without allele-specific polymerase chain reaction, without creating new strains or constructs, and without knowing the specific nature of the variation. A total of 3714 biallelic markers, spaced about every 3.5 kilobases, were identified by analyzing the patterns obtained when total genomic DNA from two different strains of yeast was hybridized to high-density oligonucleotide arrays. The markers were then used to simultaneously map a multidrug-resistance locus and four other loci with high resolution (11 to 64 kilobases).

Knowledge of genetic variation is important for understanding why some people are more susceptible to disease than others or respond differently to treatments. Variation can also be used to determine which genes contribute to multigenic or quantitative traits such as increased yield or pest resistance in plants or for understanding why some strains of a microbe are exceptionally virulent. Genetic variation can also be used for identification purposes, both in microbiology and forensics, for studies of recombination, and in population genetics (1). Rapid and cost-effective ways to analyze variation are needed (2).

High-density oligonucleotide arrays have been used to simultaneously measure the expression of every gene in the entire yeast genome (3, 4). These expression arrays contain a total of 157,112 25-mer probes derived from yeast genome coding sequences. Although some regions of the genome have overlapping probes, the arrays cover 21.8% of the nonrepetitive regions of the yeast genome. Because the extent of hybridization of a target sequence to an oligonucleotide probe depends on the number and position of mismatches between the two sequences (5, 6), we hypothesized that a substantial fraction of the allelic variation between any two strains of yeast could be detected simply by hybridizing genomic DNA from the two strains to the arrays and analyzing the hybridization differences (Fig. 1A).

Figure 1

(A) Detecting allelic variation with high-density arrays. For nonduplicated regions of the genome, a minimum of 20 25-base oligonucleotide probes was chosen from yeast genomic sequence (S288c) for every annotated ORF in the yeast genome (3). Probes (only from predicted coding regions) were generally arranged on the array in order of their chromosome position. In addition to probes designed to be perfectly complementary to regions of yeast coding sequence (PM), probes containing a single base mismatch in the central position of the oligonucleotide were also synthesized in a physically adjacent position. The mismatch probes serve as background and nonspecific hybridization controls in other analyses (3, 31). If probes complementary to YJM789 DNA fragments containing polymorphisms (*) are found on the array, decreases in signal intensity at these probes relative to the S96 signal may be observed when YJM789 DNA is hybridized to the array. The amount of signal decrease will depend on several factors, such as initial probe intensity and whether the probed fragment is completely absent in YJM789 or contains a small substitution. The location of the polymorphism within the probe sequence will also affect the observed intensity decrease. (B) Comparative genomic DNA hybridization patterns. Genomic DNA from two strains of S. cerevisiae, YJM789 and S96, was fluorescently labeled and hybridized to two different arrays. Scanned images of the arrays were collected, digitally colored red or green, and then electronically superimposed. A portion of the composite image is shown. Probes that hybridized to S96 DNA more efficiently than YJM789 DNA are red, and probes that hybridize to both DNA samples with equal intensity are yellow. A region that is completely deleted in YJM789 is indicated by an arrow. The figure closeup shows a region in which one of the mismatch features is bright green. Shotgun sequencing of YJM789 demonstrated that the actual sequence of YJM789 was complementary to the sequence of the oligonucleotide in the mismatch row and not to that in the perfect match row.

Allelic variation is widespread in different strains and in different individuals in a population. The frequency of variation between common laboratory strains of yeast is estimated to be as high as 1% (7). Two Saccharomyces cerevisiae strains, S96 (MATa ho lys5) and YJM789 (MATα ho::hisG lys2 cyh), a clinical isolate from a human lung, were chosen for study (8). The strains are phenotypically different—at least five simple genetic loci, including a cycloheximide sensitivity locus from YJM789, can be followed in crosses between these two strains. Partial shotgun sequencing of YJM789 revealed one instance of allelic variation every 160 bases (9), with slightly more variation in noncoding regions (10). The high degree of array coverage (22%) and the frequency of variation suggested that if only a fraction of the variation could be reproducibly detected, a new genetic map containing a large number of closely spaced markers could be constructed. These markers could then be used to map the loci contributing to the phenotypic differences between the strains.

To test this, we isolated, fragmented, and biotin-labeled genomic DNA from both S96 and YJM789 (11). Each sample was hybridized to two different sets of arrays for 2 hours. Then the arrays were washed, stained with a phycoerythrin-streptavidin conjugate, and scanned with a laser confocal scanning device that detects and records the amount of fluorescence at about 3 million physical locations (3). Comparison of the images revealed hybridization differences for the two strains (Fig. 1B).

It was anticipated that these hybridization differences could be reproducibly detected and thus could serve as genetic markers. Markers were selected by analyzing the scanned images of arrays hybridized with DNA samples from each parental strain (three times each) and from 14 haploid progeny derived from sporulation of a YJM789/S96 diploid (12, 13). A total of 3714 of the probes on the array were estimated to have greater than 99% probability of being a marker distinguishing the two strains on the basis of their exhibiting a consistent bimodal distribution across all hybridizations. These markers were expected to be from probes whose complementary sequence is completely absent in YJM789 or whose complementary sequence contained a base change near the central region of the oligonucleotide probe. Excluding the ribosomal DNA (rDNA) repeat on chromosome (chr) XII, the average marker spacing was 3510 base pairs (bp). A total of 14 gaps were observed, with the largest gap (59 kb) centered near position 150,400 on chr III (14).

To determine whether the set of markers was reliable for linkage analysis, we examined meiotic inheritance. An S96/YJM789 diploid was sporulated, and DNA from four segregants of one tetrad was isolated and hybridized to the arrays. Each of the 3714 markers was assigned a genotype on the basis of whether the observed hybridization signal was closer to the YJM789 or the S96 expected signal response. The probability (p) that the observed signal was of S96 origin was computed (15). It was expected that half of the markers would be scored as having an S96 origin and half would be scored as YJM789 and that most markers would segregate with a ratio of 2:2 in the four segregants. The chromosomal locations of the markers, each marker's score (S96 or YJM789), and the location of reciprocal recombination events are shown for one chromosome (XIII) (Fig. 2).

Figure 2

Inheritance of markers for one chromosome in one tetrad from a cross between YJM789 and S96. Red ticks indicate the location of markers that have a less than 0.5% probability (p) of having an S96 origin; blue, p= 0.5 to 50%; yellow, p = 50 to 99.5%; and green,p > 99.5%.

For the entire genome, 97 reciprocal crossovers were observed, close to the expected value of 86 (16). For 1220 of the markers,p was less than 0.005 (high probability of YJM789 origin) or greater than 0.995 (high probability of S96) for all four segregants. For this set, 94.5% segregated with a ratio of 2:2; 51% were S96 in origin, and 49% were YJM789 in origin. Some of the markers segregating 3:1 or 4:0 are probably the result of nonreciprocal recombination events, which occur in yeast at frequencies ranging from 0.5 to 30% per locus per tetrad (17), consistent with these results. For the remaining markers, pwas intermediate (between 0.005 and 0.995) for at least one of the segregants in the tetrad, making it difficult to estimate the frequency of gene conversion. Of all the markers (3714), 78.3% segregated with a ratio of 2:2. These data suggest that the probability of misscoring a marker is about 5%, but the probability that a marker will be incorrectly scored for a particular hybridization is strongly correlated with its p value and is thus predictable. In studies of single-marker events such as gene conversion or for high-resolution mapping, increased confidence in individual marker accuracy could be obtained by repeating those hybridizations that gave overall low confidence scores (p). Even with some noise, a very clear inheritance pattern was discerned, indicating that linkage analysis could be performed with this set of markers.

The YJM789 strain (MATα lys2 ho:hisG cyh) and the S96 strain (MAT a lys5 ho) are phenotypically distinguishable. It was predicted that the genomic regions responsible for these differences could be identified by hybridizing DNA from segregants of an S96/YJM789 diploid to the array and analyzing the inheritance of markers. YJM789 (MATα) and S96 (MAT a) are auxotrophic for lysine but have mutations in two different loci: lys2 (YJM789) and lys5(S96) (18). YJM789 also carries an insertion in the homothallic mating type locus (ho::hisG) (19), whereas S96 has a deletion in the same locus (ho). In addition, relative to S96, YJM789 is hypersensitive to multiple drugs, including cycloheximide (cyh). Thecyh locus segregated 2:2 in 99 tetrads of a cross between S96 and YJM789, indicating that a single locus is responsible for the phenotype. Altogether, four known and one unknown loci (cyh) could be scored in the cross. The segregants of 99 tetrads were genotyped (20). Of the 396 segregants examined, 17 segregants were identified that were MATα lys2 LYS5 ho cyh. DNA from 10 of these segregants was hybridized to the arrays and analyzed (21, 22) (Fig. 3).

Figure 3

Inheritance of DNA in 10 segregants for 5 of the 16 chromosomes. Tick colors are as described for Fig. 2. The data are superimposed on a diagram showing the probable location of chromosomal breakpoints, calculated as described in the text. Arrows indicate the known locations of genes. lys2,LYS5, MATα, and pdr5 were all inherited from YJM789 (pink), whereas ho was inherited from S96 (dark green). All segregants except 1a (ho pdr5 lys5 MAT a), 1b (ho::hisG MATα), 1c (ho lys2 pdr5 lys5 MATα), and 1d (ho::hisG lys2 MAT a) areho lys2 pdr5 MATα. Data for the entire genome can be found at

The most probable parental origin of all DNA segments was determined by estimating the locations of recombination breakpoints for each of the segregants for the entire genome by means of a maximum likelihood method (23). This procedure eliminated noise by considering each marker in the context of its neighbors. These data were used to identify regions with a very low probability of random segregation. Probability minima (probability = 0.001 per interval) were located only on chromosomes II, III, IV, VII, and XV ( The physical size of these intervals ranged from 10.7 kb (LYS2) to 90 kb (HO), with an average genetic size of 17 centimorgans (cM), close to the 20 cM expected (24). Four of these regions encompass the known locations of LYS2 (chr II, 469,702), MAT (chr III, 198,278), LYS5 (chr VII, 215,281), and HO (chr IV, 46,272). The cyh locus could be unambiguously mapped to the remaining unassigned 57-kb region on chr XV (Fig. 4). These data strongly suggest that PDR5 (chr XV, 619,838), a multidrug resistance pump (25), is the gene responsible for cycloheximide sensitivity. To confirm the role of PDR5 in cycloheximide sensitivity, we deleted the PDR5 gene in the S96 genetic background and crossed the resulting strain to YJM789. The deletion strain was unable to complement the cycloheximide sensitivity of YJM789 (26).

Figure 4

Calculated probability of random segregation for chr XV. The y axis (log base 10) indicates the probability of random segregation calculated with a binomial distribution. The names and locations of ORFs [taken from SGD (16)] inside the intervals with the lowest probability of random segregation [10 out of 10 = (1/2)10] are shown and are shaded in gray. The minimum interval (559,541 to 616,363) is located just upstream of the PDR5 gene (619,838 to 624,373) because of a chromosomal breakpoint being assigned to a position 3 kb upstream of PDR5 for one segregant (86c). Although several markers both upstream and downstream ofPDR5 show S96 inheritance for this segregant, markers fromPDR5 itself were of the YJM789 pattern. The misassignment of the chromosome breakpoint is most likely due to a gene conversion event near the breakpoint. Data for the other chromosomes can be found

The set of 3714 markers constitutes about 4.7% of the estimated variation between the strains. At 1.0-cM resolution, the map marker density exceeds that of the traditional yeast genetic map (2600 markers) assembled over a period of 40 years (16). The high marker density and the fact that all markers can be scored simultaneously should allow the mapping of quantitative or multigenic trait loci (27). This method also offers a substantial advantage over any method for scanning or scoring markers described to date: The method does not depend on having probes to the second allele on the array, and because of the sensitivity of the arrays, all markers can be scored in parallel, in a few hours, without amplification steps, gels, or enzymatic manipulation (6,28–30). The method is powerful because of the ease with which genetic markers are identified: A new set of informative markers can be quickly selected for any pair of strains, thus allowing efficient access to the unlimited genetic diversity in the natural world.

  • * To whom correspondence should be addressed. E-mail: winzeler{at}

  • These authors contributed equally to the work.


View Abstract

Navigate This Article