Genetic Dissection of Transcriptional Regulation in Budding Yeast

See allHide authors and affiliations

Science  26 Apr 2002:
Vol. 296, Issue 5568, pp. 752-755
DOI: 10.1126/science.1069516


To begin to understand the genetic architecture of natural variation in gene expression, we carried out genetic linkage analysis of genomewide expression patterns in a cross between a laboratory strain and a wild strain of Saccharomyces cerevisiae. Over 1500 genes were differentially expressed between the parent strains. Expression levels of 570 genes were linked to one or more different loci, with most expression levels showing complex inheritance patterns. The loci detected by linkage fell largely into two categories: cis-acting modulators of single genes and trans-acting modulators of many genes. We found eight such trans-acting loci, each affecting the expression of a group of 7 to 94 genes of related function.

Genetic linkage analysis has traditionally focused on mapping loci that affect one or a small number of organism-level phenotypes. DNA microarray technology (1,2) makes it possible to apply such analysis to global patterns of gene expression, with the transcript abundance of each of thousands of genes treated as a quantitative phenotype (3). Although it has recently become clear that genetic variation has a strong effect on gene expression (4–7), little is known about the genetic basis of natural variation in expression levels (the number and type of loci involved, the effect of each locus, and the interaction between loci).

We carried out linkage analysis of global expression levels in a cross between two strains of the budding yeast Saccharomyces cerevisiae. The parents were haploid derivatives of a standard laboratory strain (BY) and a wild isolate from a California vineyard (RM) (8). We first measured expression of 6215 genes in six independent cultures of each parent undergoing log-phase growth in a defined medium and found profound differences in expression (9). A total of 1528 genes showed differential expression atP <0.005, whereas only 23 are expected by chance (10). At P <0.15, we observed 3422 differences, compared to 724 expected by chance, suggesting that nearly half (2698 out of 6215) of all the genes in the genome are differentially expressed. Of the 1528 messages that were different at P<0.005, 1165 differed by <twofold, 363 by >twofold, 147 by >fourfold, and 62 by >eightfold. Expression measurements in 40 haploid segregants from a cross between the two parents showed that parental differences in expression were highly heritable; the median proportion of the observed variation that is genetic was estimated to be 84% (11).

Genetic markers were identified with oligonucleotide microarrays using a method previously described by Winzeler et al. (12,13). The resulting genetic map of 3312 markers covered >99% of the genome. Analysis of four segregants from one tetrad showed 2:2 marker segregation (Fig. 1), with 73 crossovers observed across the genome, which is close to the expected number of 86 (14). In the analysis of the cross, we found that mating type, kanamycin resistance, and auxotrophies to lysine, uracil, and leucine were correctly linked with LOD scores >9 to regions containing the genes MAT, HO, LYS2,URA3, and LEU2, respectively. We also carried out linkage analysis of flocculation (agglutination of cells in liquid culture). Although neither parent is flocculent, tetrad analysis showed 1 flocculent:3 nonflocculent segregation in the cross. We found linkage to a pair of loci, one containing the FLO8 gene (chr V) and the other containing the FLO1 gene (chr I) (15).FLO1 encodes a cell wall protein responsible for agglutination of cells, and FLO8 encodes a transcription factor that regulates FLO1 expression (16). Sequencing of the corresponding BY and RM alleles showed that BY, but not RM, carried the S288c flo8 null mutation (17) and that RM, but not BY, carried a short deletion in FLO1.

Figure 1

Chromosome XII genotypes of four segregants (a, b, c, and d) isolated from a single tetrad. Each vertical bar represents one genetic marker, colored red, green, or blue when the genotype of the segregant at the marker is RM, BY, or ambiguous, respectively. The crosses indicate inferred positions of crossovers. The centromere position is shown by a black circle. The scale bar represents 100 kb.

We next tested for linkage between markers and the abundance of each message (18). Expression levels of 570 messages showed linkage to at least one locus at P <5 × 10−5(53 are expected by chance). Two examples of segregation of gene expression with the genotype of a linked marker are shown in Fig. 2. Two hundred and five of the linkages remained significant at P <2 × 10−6 (<1 is expected by chance). Message levels for all engineered auxotrophies linked to regions containing the respective genes.

Figure 2

Expression levels of parents and segregants for two genes that show linkage. In each panel, the first column shows expression levels for all 40 segregants, and the second and third columns show expression levels for six replicates of each parent. The fourth and fifth columns show expression levels for segregants that inherited the linked marker from BY and RM, respectively. (A) The gene is YLL007C, and the marker lies in YLL009C. (B) The gene is XBP1 (YIL101C), and the marker lies in YIL060W. Note that, in this example, the effect of the locus is in the opposite direction from the difference between the parents, illustrating transgressive segregation.

Of the 1528 messages with parental differences at P <0.005, 308 showed linkage to at least one locus at P <5 × 10−5. An additional 262 messages were not called different between the parents at P <0.005, but showed linkage in the cross. This observation can be explained in three ways. First, a linkage may be a false positive; as noted above, we expect 53 false-positive linkages at P <5 × 10−5. Second, a true difference in expression levels may exist between the parents, and be statistically significant in a comparison of 40 segregants separated by parental genotype, but not be statistically significant in a comparison of six replicates from each parent. Third, each parent may harbor several loci with alleles of opposite effect on message levels, reducing the parental difference relative to the range of the segregants. This phenomenon, called transgressive segregation, is common (19, 20) and is observed for a number of messages; an example is shown in Fig. 2B.

Conversely, levels of 1220 messages differed between the parents atP <0.005 but did not show linkage to any locus; as noted above, we expect only 22 false-positive differences at P<0.005. Simulations showed that if each of these differences in expression were caused by a single locus, we would expect to detect linkage for 97% of them (21). Simulations also showed that if multiple loci were involved and, for each differentially expressed message, the locus with the strongest effect accounted for more than a third of the difference, we would expect to detect linkage for >29% of these messages (21). Thus, detection of linkage for just 20% of the differences in the real data (308 out of 1528) indicates that most messages are affected by multiple loci and that most loci account for less than a third of the total parental expression differences. We also used simulations to estimate the number of loci involved under a model in which each difference is caused byn loci of equal effect (21). The results showed that we would expect to detect linkage for 82% of the differences ifn = 2, 59% if n = 3, 49% ifn = 4, and 39% if n = 5. The fact that in the real data set we detected linkage for only 20% of the differentially expressed messages implies that, under this model, >5 loci affect each message. In reality, both the number of loci contributing to expression differences and the distribution of their effects undoubtedly vary among genes, but our data are inconsistent with one or two major loci explaining the observed differences in expression for more than a small fraction of genes. Because the estimates above assume that all loci from one parent act in the same direction, existence of transgressive segregation implies even greater complexity.

To determine whether the loci found by linkage act in cis or in trans, we looked for messages whose levels were linked to markers within 10 kb of their own gene. Such “self-linkages” suggest that a polymorphism affecting a gene's expression lies within the gene itself or its regulatory region, rather than elsewhere in the genome. We found that 185 (32%) of the 570 messages that show linkage at P <5 × 10−5 fell into this category. Because no message is expected to link to its own gene by chance (18), a more accurate estimate of 36% for the fraction of cis-acting loci is obtained by dividing the number of self-linkages by the expected number of true positives (517).

We next considered whether there are many trans-acting regulatory polymorphisms, each affecting one or a few messages, or a small number of such polymorphisms with effects on many messages. We divided the genome into 20-kb bins and counted the number of linkages to markers within each bin (Fig. 3). With a random distribution of linkages across the genome, no bin is expected to contain >5 linkages (22). In our data, 10 bins contained >5 linkages, ranging from 7 to 87. In two cases, two nearby bins contained >5 linkages and were combined into one group for future analyses. Over 40% of all linkages (231) fell into one of the eight groups (Table 1).

Figure 3

The number of linkages plotted against genome location. The genome is divided into 611 bins of 20 kb each, shown in chromosomal order from the start of chromosome I to the end of chromosome XVI. The dashed line is drawn at 5 linkages; no bin is expected to contain 5 linkages by chance (22). The regions with an unusually large number of linkages are marked 1 through 8 and correspond to the groups in Table 1.

Table 1

Groups of messages linking to loci with widespread transcriptional effects. The location of the center of the linked bin is shown as chromosome:base pair. Lists of genes in each group are available as supplementary information (32).

View this table:

Groups 2 through 4 contain known members of the leucine biosynthesis, mating, and uracil biosynthesis pathways and link to LEU2, MAT, and URA3, respectively. Because Leu2 and Ura3 are biosynthetic enzymes rather than transcription factors, these linkages illustrate indirect transcriptional effects on other genes in the perturbed pathways. The other five loci represent natural polymorphisms between the parent strains with large transcriptional effects. The genes within each group appear to be functionally related based on annotated group members.

Group 5 contains 28 genes, 13 of whose products contain heme, regulate heme synthesis, or are involved in fatty acid or sterol metabolism. Five of them are known to be regulated by the heme-dependent transcriptional activator Hap1 (essential for anaerobic growth, which requires ergosterol metabolism), and the gene encoding Hap1 is in the linkage region for this group. The S288c HAP1 allele was previously shown to carry a Ty1 insertion that reduces transcriptional activation of iso-1 cytochrome c (CYC1) by Hap1 (23). We amplified and sequenced the BY and RMHAP1 alleles and found that BY, but not RM, carries this Ty1 insertion, consistent with our observation that CYC1 andCYC7 are underexpressed in all segregants inheriting the BY allele. These results strongly suggest that the other genes in group 5 are also regulated by Hap1. We searched for the known Hap1 binding site consensus sequence in the upstream regions of these genes and found sites containing at most one mismatch for 11 genes.

Group 1 contains 18 genes, 10 of which were found to be co-regulated in previous array experiments, with six expressed specifically in daughter cells during budding (24). The gene controlling this group may be CST13, which is in the region of linkage, shares the group's expression pattern, and is also expressed specifically in daughter cells during budding (24). Group 6 consists of 16 genes, all putative subtelomerically encoded helicases; these closely related genes may cross-hybridize on arrays. SIR3, a known transcriptional silencer active at telomeres, is within the linked region. Group 7 contains 94 genes, with 50 known to function in mitochondria of which 34 function as mitochondrial ribosomal proteins. This group shares 52 genes with a previously defined mitochondrial expression cluster (25). Several genes in the region of linkage function in mitochondria. Group 8 consists of 19 genes, 11 of which were previously shown to be expressed under acidic conditions in the presence of the transcription factors Msn2 and -4 (26), and 17 have at least one Msn2 and -4 binding site in their upstream region.

Unlike experiments that measure the correlations among transcript levels under different conditions, our approach allows causal connections to be made between modulator loci and the genes whose expression they directly or indirectly affect. In addition, studying naturally occurring alleles in the context of segregating variation allows the discovery of subtle effects obscured in strains with engineered knockouts.

We have found that regulatory genetic variation is characterized by a high rate of cis-acting alleles and a small number of trans-acting alleles with widespread transcriptional effects. Finally, genetic variation in physiological and behavioral quantitative phenotypes is known to be highly complex. Our results indicate that even in a single-cell organism grown in a controlled environment, variation in gene expression typically also has a polygenic basis.

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed. E-mail: leonid{at}


View Abstract

Navigate This Article