Research Article

Replication Dynamics of the Yeast Genome

See allHide authors and affiliations

Science  05 Oct 2001:
Vol. 294, Issue 5540, pp. 115-121
DOI: 10.1126/science.294.5540.115

Abstract

Oligonucleotide microarrays were used to map the detailed topography of chromosome replication in the budding yeastSaccharomyces cerevisiae. The times of replication of thousands of sites across the genome were determined by hybridizing replicated and unreplicated DNAs, isolated at different times in S phase, to the microarrays. Origin activations take place continuously throughout S phase but with most firings near mid–S phase. Rates of replication fork movement vary greatly from region to region in the genome. The two ends of each of the 16 chromosomes are highly correlated in their times of replication. This microarray approach is readily applicable to other organisms, including humans.

The replication of eukaryotic chromosomes is highly regulated. Replication is limited to the S phase of the cell cycle; and within S phase, initiation of replication is controlled with respect to both location and time. The sites of initiation, called replication origins, have been best characterized in the budding yeast Saccharomyces cerevisiae, in which a functional assay based on plasmid maintenance has allowed identification of potential origins of replication [autonomous replication sequence elements (ARSs)]. There are estimated to be ∼200 to 400 ARSs in the yeast genome (1, 2), and most, but not all, function as chromosomal origins (3). The few origins investigated at the sequence level usually encompass ∼200 base pairs (bp); most contain a perfect match or a one-base mismatch to an 11-bp ARS consensus sequence (ACS) (4,5). However, the presence of an ACS is not sufficient to predict an origin of replication: There are many more ARS consensus sequences in the genome than origins and the ∼200 bp of sequence that flank the ACS, although essential, share no obvious sequence similarities (3).

From the analysis of a few origins, the general features of origin control in yeast appear to be as follows (6–8). Most (perhaps all) origins are found in intergenic regions, but only a subset are chosen as replication origins in any given S phase. Furthermore, origins range in efficiency from those that are active in almost every cell cycle to those that are used in only a small proportion of cell cycles. Origins are activated in a reproducible temporal sequence through S phase. The time of origin activation appears not to be an intrinsic property of the origin itself; rather, it is imposed by cis-acting elements that are separable from the origin (9, 10). Some trans-acting factors involved with the temporal program have been identified recently (11–14), but how they read and execute the program is still unclear.

How origins are chosen for activation and how timing is determined remain open questions. With the availability of the yeast genome sequence and DNA microarray technologies, it is possible to expand our understanding of origin activity greatly by examining the kinetics of replication across the entire yeast genome. Here we provide the locations and times of activation of the most efficient origins, as well as the directions of replication fork movement and fork migration rates from these origins. After growth in isotopically dense culture medium, cells are released into S phase, and replicated [heavy-light (HL)] DNAs and unreplicated [heavy-heavy (HH)] DNAs are isolated from samples collected 10, 14, 19, 25, 33, 44, and 60 min later (Fig. 1). Each DNA sample is labeled and hybridized to a high-density oligonucleotide array (15). Hybridization reveals the relative abundance of specific genome sequences present in the unreplicated versus replicated DNAs at different times in S phase. Summing the hybridization values over all the samples for both HH and HL DNA at each chromosome coordinate gives an aggregate %HL value for that coordinate, which reflects its relative time of replication [Fig. 1B (16)] [see (17) for details]. Graphing these aggregate %HL [%HL(total)] values versus chromosome coordinates yields a replication profile for a chromosome (Fig. 1C), with higher values of %HL(total) being indicative of earlier replication.

Figure 1

Outline of the experiment and the data-processing steps. (A) MAT a yeast cells carrying a temperature-sensitive lesion in CDC7 are grown for many generations in medium containing 13C glucose and 15N nitrogen [(15NH4)2SO4] (28). The cells are arrested before the onset of DNA replication by the addition of α factor and by shifting the cells to the restrictive temperature, then are washed and resuspended in isotopically light medium. When DNA replication begins again, the newly synthesized DNA is labeled exclusively with the light isotopes. Samples are collected throughout S phase. The DNA is fragmented with a restriction enzyme (Eco RI) and fractionated by cesium chloride density-gradient centrifugation to separate the molecules carrying the two different density labels. These two fractions are then biotinylated and separately hybridized to high-density arrays carrying probes to the entire genome. (B) Data processing (16) illustrated with a hypothetical data set for one early- and one late-replicating region. The shaded boxes with the numbers below them represent cells of the microarray and the corresponding hybridization values (a black box corresponds to no signal). The sum of the hybridization intensities obtained with unreplicated, fully dense DNA (HH) for the various S phase samples is calculated, as is the sum of the intensities with replicated (HL) DNA. The % HL(total)is then calculated with the equationEmbedded ImageA DNA fragment that replicates early in S phase will be present in the HL fraction through much of the time course; in contrast, a late-replicating fragment will be represented in the HL fraction only at later times in S phase. Thus, the %HL(total) value is a direct reflection of the time of replication of a fragment, which we have calculated in the past ast rep (10, 11,28) [see (17)]. (C) The replication profile for a chromosome is constructed by plotting the %HL(total) as a function of the chromosome coordinate. Peaks represent regions that replicate earlier than the neighboring sequences and must therefore correspond to origins of replication; the taller a peak, the earlier the origin fires (ori, origin). Valleys correspond to termination zones. Shoulders along the lines connecting peaks and valleys (open arrow) could result either from inefficient origins or from changes in the fork migration rate. The slope of the line connecting a peak and a valley gives the direction and rate of fork migration through that region: A shallow slope reflects a fast fork, whereas a steep slope reflects a slow fork.

Interpreting the plots.

To allow automated identification of peaks and valleys in the replication profiles, a Fourier convolution algorithm was used to smooth them (16, 17). These smoothed profiles were used to identify potential origins of replication: Because a peak reveals a region that replicates before its neighbors, it must correspond to an origin of replication. Likewise, valleys correspond to termination zones (“terminus” in Fig. 1C). Tall peaks correspond to early-activated origins and shorter peaks to later-activated origins. Overall, we detected 332 origins in the yeast genome. Replication profiles for all chromosomes and the corresponding data in tabular form can be found in the supplementary material (17).

Chromosome VI has been studied extensively (18,19) and is a good candidate for assessment of the microarray approach. The replication profile deduced from the microarray data for chromosome VI is shown in Fig. 2. Superimposed on this plot are the locations of restriction fragments that are known to contain origins [gray bars; heights of bars correspond to the replication times of the origin fragments themselves or of adjacent restriction fragments, determined previously (18)]. The array data clearly show a good match with the known replication characteristics of chromosome VI. For example, the tallest peak in the profile, which predicts the location of the earliest-activated origin on the chromosome, matches the known location of ARS607, which has been shown previously to be the earliest-activated origin on the chromosome (18). Likewise, the shorter peaks (such as the one centered at ∼63 kb) correspond to the locations of origins known to be activated later in S phase.

Figure 2

Replication profile for chromosome VI, showing t rep plotted as a function of coordinate. The %HL(total) values were converted tot rep values using the correlation between the set of %HL values and the corresponding set oft rep values calculated for each coordinate (16, 17). The small circle on the X axis marks the location of the centromere. Dark gray bars correspond to restriction fragments known to contain replication origins (18, 19). The heights of the bars correspond to their relative replication times (18), scaled to match thet rep values calculated for the microarray experiment. Inverted triangles mark the four origins that are used in ≥50% of the cell cycles. Gaps in the profile correspond to regions of low probe density. Numbers above the peaks indicate the robustness of the peaks on a scale of 1 to 9 [larger numbers are indicative of more robust peaks (16)].

Predicting origin locations.

To further test the efficacy of the array data in predicting origin locations and replication times, we used two approaches. First, we asked how well the peaks in replication profiles reveal the locations of 18 origins known to be efficient and localized to regions of ≤5 kb (20). Peaks in the replication profiles occurred close to each of these 18 origins (21), with a mean distance of 3.7 kb between predicted origins and centers of known origin windows (Table 1), a match that is significantly better than that expected by chance (P = 4.8 × 10−17).

Table 1

Match between known efficient origins and peaks in replication profiles.

View this table:

In a second approach, we examined chromosome X, which had been largely unexamined for replication. Four prominent peaks were selected at random from its replication profile (a, b, d, and e in Fig. 3A). Restriction fragments (∼13 kb or smaller) corresponding to the predicted origins were tested by two-dimensional (2D) gel electrophoresis. Abundant bubble structures were detected at each of the four locations (Fig. 3B), confirming that they did correspond to active replication origins, whereas flanking restriction fragments tested did not show bubble structures (22). Given that bubble structures can be detected only if replication originates within the central half of a restriction fragment, and assuming that there are ∼400 origins in the yeast genome (2), the probability of detecting origin activity in any random 13-kb fragment is ∼0.22. Therefore, the likelihood of having origin activity in all four fragments just by chance is 0.002. This value is a conservative estimate, as two of the four fragments tested were actually substantially smaller than 13 kb. These results indicate that the replication profiles can indeed predict locations of efficient origins. The shoulders on the sides of peaks may indicate the locations of inefficient origins; one example of a 2D gel analysis detecting a weak origin from such a shoulder (f in Fig. 3A) is shown (Fig. 3B).

Figure 3

Replication profiles predict origin locations. (A) Replication profile for chromosome X. Numbers above the peaks indicate the robustness of the peaks (16). (B) Origin activity at five locations on chromosome X. Features marked a, b, d, e, and f in the profile (A) were tested for origin activity by 2D gel electophoresis (46). The small horizontal bars under the letters in (A) correspond to the restriction fragments that were tested. The origin marked e corresponds toARS121. Bubbles were detected in each fragment, indicating the presence of active origins. The long exposure required to see the bubble intermediates in region f indicates that the origin at ∼250 kb is inefficient.

Some discrepancies between the array data and previous reports were observed; for example, the array data predicted an origin at ∼235 kb on chromosome VI, where no origin had previously been reported. Although errors arising from poor hybridization signals or from assumptions made in the smoothing process (21) may lead to such discrepancies, they could also indicate genuine origins that had been missed previously. In the chromosome VI example, 2D gel analysis of the directions of fork migration supports the prediction of the array data (22).

Predicting replication times.

Replication times predicted by the microarray analysis for several locations on chromosome X (a through d in Fig. 3A) were tested by slot blot analysis of CsCl density gradient fractions as described previously (18). In each case, the kinetics of replication predicted by the microarray data matched those detected by the standard slot blot method (Fig. 4A). As an additional test of the overall replication timing profile, the time of replication was assessed separately for ∼60 restriction fragments on chromosome X (23), which correspond to annotated open reading frames (ORFs) in the Saccharomyces Genome Database. The replication profile deduced from the microarray experiment closely matched the profile obtained from this slot blot experiment (Fig. 4B).

Figure 4

Profiles predict the temporal program of replication. (A) Replication kinetics of fragments corresponding to locations marked a through d in Fig. 3A were determined by slot blot analysis as described (28). The DNA samples used were from the same experiment as was used for hybridization to the microarrays. “Early marker” and “late marker” refer to the replication kinetics of ARS305 (chromosome III) and R11 (chromosome V), known to replicate very early and very late in S phase, respectively (33). (B) Replication times of 60 restriction fragments on chromosome X from slot blot hybridization (23). The profiles deduced from microarray data (gray shaded area; data of Fig. 3A) and from this slot blot experiment (bars connected by dotted lines) are shown. The width of a bar reflects the length of the restriction fragment that was tested; the height of the bar (mean of six replicates) corresponds to the relative %HL(total) for that fragment, and therefore corresponds to replication time: High bars represent early replication, and low bars represent late replication. Because this slot blot method measures the relative times of replication rather than the absolute times, the plot shows just the relative replication times (on the y axis), with the microarray and slot blot results scaled to match.

Taken together, the success of the replication profiles in predicting both general origin locations and relative times of replication for different chromosomal segments demonstrates the viability of using microarrays to map the topography of replication. At present, origin activity has been confirmed only for origins predicted with high confidence levels (16). More precise assignments of origin locations, especially for less efficient origins, will have to await several repetitions of the experiment. However, the replication profiles will provide useful information even without precise and exhaustive identification of all origins, such as in the comparison of genotypes or culture conditions.

Origin activation times.

S phase in the microarray experiment spans an interval of ∼55 min. Although we have previously described origins as belonging to distinct “early” or “late” classes, this genome-wide analysis reveals that origins really show a continuum of activation times (Fig. 5A). Most of the origin firings occur near mid-S. This observation is consistent with measurements made on a randomly selected set of 24 origins (20) examined by our premicroarray assay (22). Each chromosome shows a range of origin activation times, with the distribution of activation times varying from chromosome to chromosome (Fig. 5B). As deduced earlier from DNA fiber autoradiography for mammals and yeast (1, 24), adjacent origins are generally activated at about the same time (Fig. 5C).

Figure 5

Origin spacing and activation times. (A) Distribution of activation times of all predicted origins. (B) Distribution of origin activation times on two separate chromosomes. (C) Difference in activation time between pairs of adjacent origins. (D and E) Comparison of base composition and replication profiles. The base composition was calculated for a sliding window centered at locations 1 kb apart (47) for chromosome III (D) and VI (E). The plots were generated using a sliding window of 30 kb; window sizes ranging from 1 to 50 kb were also tried but did not reveal any correlations to origin location or replication time.

No sequence elements that were absolute predictors of origin location were found; nor have we, so far, uncovered any DNA sequence determinants that allow the prediction of replication time. An obvious feature to examine with respect to origin activation time is base composition. The yeast genome has broad variations in base composition along the length of the chromosomes, with AT-rich and GC-rich isochores occurring with a periodicity of ∼50 kb (25). Although the 11-bp AT-rich ARS consensus sequence is expected to occur at elevated frequency in AT-rich isochores, there does not appear to be any correlation between base composition and either origin frequency or activation time (Fig. 5, D and E). One measure of how closely the replication profiles are dictated by the underlying DNA sequence is the extent of similarity between profiles for regions believed to have arisen from gene duplications. We were unable to detect any similarity in the replication profiles for the largest reported blocks of duplicated sequence (26, 27). If the underlying DNA sequence does directly dictate the replication profile, the sequences in these blocks must have diverged enough to alter their replication profiles.

Replication of centromeres and telomeres.

Consistent with previous observations on a subset of centromeres and telomeres (28), we find that centromeres are replicated earlier than subtelomeric regions [the most distal ORF sequences represented on the microarrays (Fig. 6A)]. The average replication time of 10-kb windows containing centromeres is significantly earlier than that of all 10-kb windows (P = 1.9 × 10−9); replication of subtelomeric regions, on the other hand, is much later than the genomic average (P = 2.4 × 10−16). However, subtelomeric regions as a class are not the last sequences to be replicated; for instance, sequences at ∼280 kb on chromosome IV are replicated later than many telomeres. The times of replication of the two ends of the same chromosome are highly correlated (Fig. 6B), raising the possibility that intrachromosomal telomere interaction (29) may influence the time of origin activation.

Figure 6

Time of replication of centromeres and telomeres. (A) Distribution of replication times of centromere regions (CEN) and the most distal unique sequences represented on the oligonucleotide microarrays (TEL). (B) The times of replication of telomeres on the same chromosome. (Inset) Correlation in replication time between the left end and the right end of each chromosome. (C) Position effect on origin activation time. The plots show a comparison of the average activation time of predicted origins within a sliding 20-kb window centered at increments of 5 kb from chromosome ends (top panel) or centromeres (bottom panel). The difference in average t rep for origins within the window when compared to the average of all predicted origins is plotted (Δt rep; open circles) along with the probability in each instance of the observed difference being due to chance (P value; solid circles). The dashed horizontal line shows the threshold P value of P = 0.05.

The average activation time for the set of most distal predicted origins is ∼5 min later than that of all other origins (P = 2.5 × 10−4), consistent with the observation that telomeres cause late activation of origins placed in their vicinity (9, 29, 30). Although this position effect is known to extend at least ∼30 kb from the right telomere of chromosome V, the distance over which telomeres generally exert an effect had not been determined previously. To map the telomere position effect more systematically, the distribution of activation times of all predicted origins located within a sliding 20-kb window was compared to that of all predicted origins in the genome. As the sliding window was moved inward from the chromosome ends, the average activation time of predicted origins within the window approached that of the genomic average (Fig. 6C, top). The difference in average activation time was statistically significant (P < 0.05) until the 20-kb window was centered ∼45 to 50 kb from the end (Fig. 6C, top), suggesting that the telomere position effect, on average, extends at least ∼35 kb.

Because centromeres are replicated earlier than the bulk chromosomal sequences, we did a similar analysis for origins that flank centromeres. The average activation time for the 32 predicted origins flanking centromeres was found to be ∼5 min earlier than that of all other origins (P = 8.0 × 10−5). As with the telomeres, this difference in activation time, when compared to the genomic average, decreased at increasing distances from the centromeres (Fig. 6C, bottom): Predicted origins up to a distance of ∼25 kb from the centromeres showed a significantly earlier activation time than the genomic average. These observations raise the possibility that centromeres, like telomeres, may have a position effect on origin activation. Consistent with this idea, introduction of a centromere close to one potential origin on a plasmid containing two identical ARS elements led to preferential activation of the origin closer to the centromere (31).

Fork migration rates.

The rates of fork migration were measured by taking absolute values of the slopes of the lines connecting peaks and valleys (that is, origins and termini) in the replication profiles, ignoring the region immediately flanking each peak or valley (5 kb on either side), where local flattening of the curve introduces artifacts in the measurement of fork rates. As with origin activation times, a broad range of fork rates was observed (32), with a mean of 2.9 kb/min and a median of 2.3 kb/min (Fig. 7). These values are close to a previous estimate of fork migration rates made for isogenic cells grown under similar culture conditions [3.7 kb/min (1)]. Previously determined slow fork rates in two late-replicating regions [the right end of chromosome V and the left end of chromosome III (10, 30, 33)] are confirmed by our microarray data. However, there is no general correlation between fork rate and the time in S phase when the forks are initiated.

Figure 7

Replication fork migration rates. Fork migration rates were calculated for regions between peaks and valleys in the replication profiles. The histogram shows the maximum estimate for fork rate within each peak-valley interval.

Differences in fork rates throughout the genome could result from some local property of the DNA sequence or chromatin structure being replicated by a fork. Alternatively, qualitative differences in proteins assembled at different origins (such as minichromosome maintenance protein composition differences) could result in different rates of fork movement. However, models proposing qualitative differences between forks at different origins would have to accommodate the observation that there are some origins where the leftward and rightward forks move at different rates (for example, compare the left and right slopes at the origin labeled “a” in Fig. 3, located at ∼300 kb on chromosome X, where the leftward fork is estimated to be moving at 3.9 kb/min and the rightward fork at 1.4 kb/min). We cannot rule out the possibility that some origins are unidirectional in a subset of the cells (32). If such origins exist, they could account for asymmetry in fork rates between leftward and rightward forks from an origin.

Replication and transcription.

A general correlation between transcription and replication has been observed in mammalian cells—genes that are actively transcribed are often replicated early in S phase (34)—and has led to the suggestion that replication timing is one way for the cell to control transcription (35, 36). With one exception, we found no such correlation between transcription and replication time in yeast (37). The lone exception was the family of eight histone genes, which are replicated on average ∼10 min earlier than the genome average of 31 min (P= 6 × 10−6). A limitation of our analysis is that we have only examined one growth regimen; it is possible that the program of replication changes when cells modify their transcription profile to adapt to altered growth conditions. We anticipate that some version of the method described here, in combination with single molecule studies (38) and genome-wide analysis of transcription, will be especially useful in examining cellular responses to altered growth conditions (including the response to drugs such as hydroxyurea or DNA-damaging agents such as methylmethane sulfonate).

For the future, we envision at least two immediate benefits of this approach. First, we are compiling a database of origin locations and activation times that will help identify consensus sequence elements affecting origin choice and activation time. Second, this method provides a powerful way of comparing the topography of chromosome replication in different cell cultures: between cells of different genotypes (wild type versus mutant) and for cells of the same genotype under different growth conditions.

We see no reason why this method [or related methods (39)] cannot be applied to other organisms, including cultured human cells. There is no requirement for sequenced genomes; all that is required is that an ordered set of unique-sequence clones spaced at reasonable intervals be available for constructing microarrays and that some method of synchronizing cell populations and of density-labeling the DNA be available.

  • * These authors contributed equally to this work.

  • To whom correspondence should be addressed. E-mail: raghu{at}u.washington.edu

  • Present address: Aventa Biosciences, 4757 Nexus Centre Drive, Suite 200, San Diego, CA 92121, USA.

  • § Present address: The Salk Institute for Biological Studies, Laboratory of Genetics, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA.

REFERENCES AND NOTES

View Abstract

Navigate This Article