Report

Genome-Scale Identification of Nucleosome Positions in S. cerevisiae

See allHide authors and affiliations

Science  22 Jul 2005:
Vol. 309, Issue 5734, pp. 626-630
DOI: 10.1126/science.1112178

Abstract

The positioning of nucleosomes along chromatin has been implicated in the regulation of gene expression in eukaryotic cells, because packaging DNA into nucleosomes affects sequence accessibility. We developed a tiled microarray approach to identify at high resolution the translational positions of 2278 nucleosomes over 482 kilobases of Saccharomyces cerevisiae DNA, including almost all of chromosome III and 223 additional regulatory regions. The majority of the nucleosomes identified were well-positioned. We found a stereotyped chromatin organization at Pol II promoters consisting of a nucleosome-free region ∼200 base pairs upstream of the start codon flanked on both sides by positioned nucleosomes. The nucleosome-free sequences were evolutionarily conserved and were enriched in poly-deoxyadenosine or poly-deoxythymidine sequences. Most occupied transcription factor binding motifs were devoid of nucleosomes, strongly suggesting that nucleosome positioning is a global determinant of transcription factor access.

Nucleosomes prevent many DNA binding proteins from approaching their sites (13), whereas appropriately positioned nucleosomes can bring distant DNA sequences into close proximity to promote transcription (4). Current understanding of the primary structure of chromatin and its effects on gene expression comes from a handful of well-characterized loci (see examples below). High-resolution measurements of nucleosome positions over chromosome-scale distances would enhance our understanding of chromatin structure and function.

To measure nucleosome positions on a genomic scale, we developed a DNA microarray method (5) to identify nucleosomal and linker DNA sequences on the basis of susceptibility of linker DNA to micrococcal nuclease (fig. S1). Nucleosomal DNA was isolated, labeled with Cy3 fluorescent dye (green), and mixed with Cy5-labeled total genomic DNA (red). This mixture was hybridized to microarrays printed with overlapping 50-mer oligonucleotide probes tiled every 20 base pairs across chromosomal regions of interest (fig. S1B). A graph of green:red ratio values for spots along the chromosome is expected to show nucleosomes as peaks about 140 base pairs long (6), or six to eight microarray spots, surrounded by lower ratio values corresponding to linker regions (fig. S1C).

To objectively compare our data to published nucleosome positions, we developed a hidden Markov model (HMM) (7) to determine nucleosome/linker boundaries (fig. S1, D to G). HMMs use observable data to infer hidden states responsible for generating the signal. Here, observable signals are hybridization values of the tiled probes (fig. S1, C and D), and the hidden states are nucleosomal and linker states (fig. S1E). Well-positioned nucleosomes should cover ∼140 base pairs or six to eight probes (fig. S1E, N1 to N8) and have a high green:red ratio, whereas stretches of ≥nine probes were classified as “fuzzy” or delocalized nucleosomes (fig. S1E, DN1 to DN9). Linkers are expected to have lower ratios (fig. S1D) and may have variable length (fig. S1E, return arrow on node L). The model calculates the probability that a given probe on the array corresponded to nucleosomal, fuzzy nucleosomal, or linker DNA (fig. S1F) and identifies the most likely nucleosome positions (fig. S1G). To capture nucleosome peaks with low maxima (see arrow, Fig. 1B), we detrended data by using a peak-to-trough measure and analyzed it with the HMM. Nucleosomes identified exclusively from the detrended data were annotated as “low” nucleosomes and may correspond to nucleosomes found only in a subpopulation of cells (Fig. 1B and Materials and Methods).

Fig. 1.

Microarray data reproduce multiple sources of published data. The y axis represents the log(2) ratio of the hybridization values for Cy3 (nucleosomal DNA) versus Cy5 (genomic DNA), whereas the x axis represents the chromosomal coordinates of the microarray oligonucleotides. Yellow ovals represent nucleosomes inferred by HMM; red ovals give literature positions. The x axis in (A) to (C) shows distance to ATG, whereas the x axis in (D) to (F) shows chromosome III coordinate. (A) MFA2 promoter. Blue line indicates data from BY4741 (MATa), whereas the red line shows data from BY4742 (MATα). Data are taken from a proof-of-concept array, where tiling was every 25 base pairs, and represent average of three experiments. (B) HIS3 promoter. Thick blue line shows the median of eight independent microarrays for BY4741. Individual replicates are shown as thin lines. Arrow indicates “low” nucleosome only identified in detrended data. (C) CHA1 promoter as in (B). Delocalized nucleosomes shown as overlapping pale ovals. (D) Chromosome III centromere as in (B). (E and F) Published DNase I hypersensitive sites (11) are indicated with arrows. Red arrows represent strong bands in the published study, whereas black arrow represents a dubious band.

We characterized nucleosome positions by using two microarrays. We first used a proof-of-concept array covering the MFA2 and PHO5 promoters (Fig. 1A and fig. S4). With information gained from this array, we designed a microarray to measure nucleosome positions over half a megabase of the S. cerevisiae genome. Chromosome III was tiled, except for regions of extensive cross-hybridization, with 50-mers overlapping every 20 base pairs, leaving 30 overlapping continuous sequences (contigs) covering 278,960 base pairs. In addition, one kilobase of promoter sequence was tiled for 223 genes on other chromosomes.

Nucleosomal DNA from eight independent cultures of log-phase yeast was hybridized to this microarray. We validated the microarray by comparing nucleosome positions determined with our approach to published nucleosome positions. We correctly identified nucleosomes at the MFA2, HIS3, PHO5, and CHA1 promoters, over the chromosome III centromere, and over the silent mating type loci (Fig. 1 and figs. S4 and S5) (810). We also found that 27 of 32 deoxyribonuclease (DNase) hypersensitive sites on chromosome III (11) fall in long linkers identified by our method (Fig. 1, E and F, and fig. S4). Lastly, a coarse-grained view of our high-resolution data reproduced recent genome-wide chromatin immunoprecipitation studies at ∼1-kb resolution (12, 13) (fig. S6). Thus, our method faithfully reproduced high- and low-resolution characteristics of chromatin previously described with the use of three distinct assay types: micrococcal nuclease sensitivity, DNase I sensitivity, and histone occupancy.

The availability of thousands of nucleosome positions facilitates the elucidation of global chromatin properties, such as the fraction of nucleosomes that are well-positioned (Fig. 2 and table S1). Nucleosomes might be expected to occupy multiple positions in ensemble measurements, because there is little thermodynamic preference of the histone octamer for most genomic DNA (14). In addition, yeast growing exponentially are a heterogeneous mixture of cells in different cell cycle and epigenetic states (15, 16). However, examination of our data revealed pervasive examples of well-positioned nucleosomes. Global HMM identification of well-positioned and delocalized nucleosomes revealed that 65 to 69% of nucleosomal DNA was found in well-positioned nucleosomes (table S1).

Fig. 2.

Local and global views of delocalized nucleosomes. (A and B) Well-positioned and delocalized nucleosomes. Data graphed as in Fig. 1. (C) Global nucleosome occupancy on chromosome III. Nucleosome density was calculated from HMM calls, and a 500–base pair running average was plotted. Red rectangles indicate regions not tiled. (D) Delocalized nucleosomes are inhomogeneously distributed on chromosome III. Fraction of nucleosomal probes found in delocalized nucleosomes plotted as in (C).

Although average nucleosome density is relatively constant over chromosome III, delocalized nucleosomes are inhomogeneously distributed (Fig. 2, C and D). Passage of RNA polymerase through coding regions temporarily disrupts nucleosomes (17, 18), which rapidly reassemble behind the polymerase (19). High transcription rates may cause nucleosomes to appear delocalized, because polymerases (and transiently disassembled nucleosomes) occupy distinct positions in the ensemble (see CHA1 above). We found that highly expressed genes were enriched for delocalized nucleosomes (P = 0.007) (fig. S7). Furthermore, delocalized nucleosomes were found farther from transcriptional start sites than were well-positioned nucleosomes (fig. S8).

Nucleosome occupancy has been proposed to exclude transcription factors from a subset of their specific consensus motifs (13, 20), and recent work demonstrated that promoters bound by many transcription factors are grossly nucleosome-depleted (13). To investigate this phenomenon at high resolution, we compared our data to a database of transcription factor motifs bound under a variety of conditions (21). We plotted the fractions of bound and unbound motifs that were in nucleosomes or in linkers (Fig. 3). A total of 47% of unbound motifs were found in linker DNA sequences, very close to the baseline measurement of 48% for all intergenic sequences. In contrast, over 87% of the motifs that are associated with transcription factors under our growth conditions were depleted of nucleosomes. Thus, functional transcription factor binding sites are predominantly nucleosome-free in vivo. Furthermore, the set of functional motifs that are unbound under our assay conditions showed the same linker enrichment as motifs that are bound (Fig. 3). For example, the promoter upstream of YCL050C (Fig. 4D), which is not bound by any transcription factors at standard growth temperatures, is bound by Hsf1 in heat-stressed yeast (21). Our measurements indicated the Hsf1 binding motif was located in a linker (and thus accessible for factor binding), even in the absence of heat stress.

Fig. 3.

Functional transcription factor binding motifs are more accessible than unbound motifs. Oligonucleotide probes were separated into all intergenic probes (background), unbound motifs under all conditions tested, motifs bound in YPD, and motifs bound in any other condition but unbound in YPD (21). The percentage of each group of probes determined in this work to be nucleosomal is shown. Error bars indicate SEM.

Fig. 4.

Long NFRs are common in promoters. (A to D) Top graphs show microarray data. Middle images show gene annotation (Crick genes are red and Watson genes are blue), bound transcription factor motifs from (21)(colored rectangles), and inferred nucleosome positions (ovals). Bottom graphs show an aggregate sequence conservation score (29). Green rectangles highlight upstream NFRs. Individual examples: (A) Chr XVI 833335-834655, (B) Chr XVI 29155-30195, (C) Chr III 21845-22925, and (D) Chr III 38745-39785. (E) Expanded view of conserved sequence. Sequences are shown for S. cerevisiae, S. mikatae, S. paradoxus, and S. bayanus. Hsf1 binding site is outlined in purple; polyA stretches are highlighted in orange.

Intergenic DNA in yeast is nucleosome-depleted relative to coding DNA (12, 13). This could correspond to decreased population occupancy of several nucleosomes or high population occupancy of sparse nucleosomes. Consistent with the latter possibility, nucleosome-free regions (NFRs) of ∼150 base pairs were found about 200 base pairs upstream of many annotated coding sequences (Fig. 4, A to D). The pervasiveness of this signal can be seen by averaging data for all tiled genes (Fig. 5A): a long linker dominates the average. We iteratively aligned promoters by correlation to the average profile, thus aligning NFRs, and selected 90% of promoters to eliminate the noise introduced by rare promoters lacking an NFR. This averaged alignment shows several regularly spaced nucleosomes surrounding the NFR (Fig. 5B), with an internucleosome distance between 160 and 170 base pairs [an average internucleosomal distance of 160 to 165 base pairs was previously measured in yeast (6)].

Fig. 5.

NFRs are pervasive in promoters and occur over conserved sequences. (A) Promoters were aligned by start codon, and microarray data were averaged and plotted. The x axis shows distance from ATG. (B) After alignment by NFR, data from 90% of tiled promoters were averaged. The x axis shows distance from NFR. (C) Chromosome III promoters were aligned as in (B), and conservation values were averaged for one kilobase around NFR. Top graph has the same x axis scale as (B); bottom graph shows expanded view. (D) Chromosome III probes were separated into groups as indicated, and sequence conservation was averaged and plotted. For intergenic regions, linkers were separated into NFRs and all other linkers. Error bars indicate SEM.

Noting that functional transcription factor motifs are similarly found ∼100 to 500 base pairs upstream of start codons (21), we identified NFRs as sites of 51% of bound motifs found on our array (Fig. 4). This suggested that NFRs are transcriptional start sites, predicting that the extent of the 5′ untranslated region of the genes assayed in this study could be identified by using these data. RNA (total RNA and mRNA) was isolated and hybridized to our microarray. As expected, 5′ ends of transcripts coincided with NFRs (fig. S9), identifying these regions as transcriptional start sites.

The conservation of functional transcription factor motifs between related Saccharomyces species (22) lead us to investigate sequence conservation in NFRs. The bottom graphs in Fig. 4, A to D, show an aggregate conservation score from seven sequenced Saccharomyces species. Coding DNA was highly conserved, but there was also marked conservation of intergenic sequence surrounding, but not limited to, transcription factor binding sites (Fig. 4E). To investigate this conservation globally, we aligned promoters by the NFR and averaged the conservation scores. Notable here is a peak of average sequence conservation in the NFR surrounded by valleys of poor conservation (Fig. 5C). We also investigated sequence conservation by partitioning probes containing coding or intergenic sequences into bins defined by the HMM output (Fig. 5D). Coding regions were highly conserved regardless of nucleosomal context. Conversely, intergenic sequences found in nucleosomes were poorly conserved, whereas those in NFRs were more highly conserved across evolution. Thus, biologically meaningful regulatory information in intergenic sequences falls into clusters that are accessible to the cell. This is not only due to conservation of transcription factor binding sites, because the region of conservation often includes a great deal of sequence beyond the transcription factor binding motif (Fig. 4E).

Conserved nucleosome-free sequences included not only transcription factor binding sites but also multiple stretches of poly-A or poly-T (Fig. 4E). Poly(dA-dT) stretches incorporate poorly into nucleosomes because of their relative rigidity (2325). Globally, we found that NFRs were enriched for poly(dA-dT) (fig. S10A), as expected from the prevalence of these elements in yeast promoters (26). Conversely, these homopolymer stretches globally had increased likelihood of being in linkers (fig. S10B). This was not caused by sequence specificity of micrococcal nuclease (27), because hybridizations of micrococcal nuclease-treated naked DNA showed little correlation with nucleosomal data (r = 0.09). Our results suggest that poly(dA-dT) stretches play a causal role in establishing many NFRs.

Chromatin in yeast is well-ordered (over 69% of nucleosomal DNA was found in well-positioned nucleosomes), and delocalized nucleosomes are found distant from NFRs (fig. S8). Taken together, our results are consistent with a modified “statistical positioning” (28) mechanism underlying this global order, where nucleosomes are prevented from association with promoter regions either by sequence characteristics such as poly(dA-dT) elements or by nucleosomal eviction by recruited proteins, and nucleosomes are subsequently well-positioned between nearby NFRs because of structural constraints imposed by packaging short stretches of sequence with nucleosomes.

It will be interesting to determine whether the accessible transcription factor binding sites, highly positioned nucleosomes, and stereotyped promoter architecture found in yeast chromatin will be conserved features of metazoan chromatin.

Supporting Online Material

www.sciencemag.org/cgi/content/full/1112178/DC1

Materials and Methods

SOM Text

Figs. S1 to S10

Table S1

References and Notes

References and Notes

View Abstract

Navigate This Article