Report

Coordinated Effects of Sequence Variation on DNA Binding, Chromatin Structure, and Transcription

See allHide authors and affiliations

Science  08 Nov 2013:
Vol. 342, Issue 6159, pp. 744-747
DOI: 10.1126/science.1242463

DNA Differences

The extent to which genetic variation affects an individual's phenotype has been difficult to predict because the majority of variation lies outside the coding regions of genes. Now, three studies examine the extent to which genetic variation affects the chromatin of individuals with diverse ancestry and genetic variation (see the Perspective by Furey and Sethupathy). Kasowski et al. (p. 750, published online 17 October) examined how genetic variation affects differences in chromatin states and their correlation to histone modifications, as well as more general DNA binding factors. Kilpinen et al. (p. 744, published online 17 October) document how genetic variation is linked to allelic specificity in transcription factor binding, histone modifications, and transcription. McVicker et al. (p. 747, published online 17 October) identified how quantitative trait loci affect histone modifications in Yoruban individuals and established which specific transcription factors affect such modifications.

Abstract

DNA sequence variation has been associated with quantitative changes in molecular phenotypes such as gene expression, but its impact on chromatin states is poorly characterized. To understand the interplay between chromatin and genetic control of gene regulation, we quantified allelic variability in transcription factor binding, histone modifications, and gene expression within humans. We found abundant allelic specificity in chromatin and extensive local, short-range, and long-range allelic coordination among the studied molecular phenotypes. We observed genetic influence on most of these phenotypes, with histone modifications exhibiting strong context-dependent behavior. Our results implicate transcription factors as primary mediators of sequence-specific regulation of gene expression programs, with histone modifications frequently reflecting the primary regulatory event.

Functional genomic elements have been linked to specific chromatin signatures in different cell types (1), illustrating control of transcriptional processes through multiple layers of genome organization. Although allele-specific gene expression is widespread (2), it has been difficult to pinpoint the upstream cis-regulatory variants and how they affect chromatin states. We performed chromatin immunoprecipitation (ChIP) of five histone posttranslational modifications (hPTMs) (H3K4me1, H3K4me3, H3K27ac, H3K27me3, and H4K20me1), three transcription factors (TFs) (TFIIB, PU.1, and MYC), and the second largest RNA polymerase II subunit RPB2 (POLR2B) (fig. S1) in lymphoblastoid cell lines (LCLs) derived from two parent-offspring trios (3). A subset of the ChIP assays was also performed in eight additional unrelated individuals. We further profiled one of the trios with global run-on sequencing (GRO-seq), which measures nascent transcription at all transcribed regions (fig. S2), and examined available deoxyribonuclease I (DNase I)–seq and CCCTC-binding factor (CTCF) ChIP-seq data (4). All 14 individuals were additionally profiled for mRNA expression (5). Clustering of the molecular phenotypes along promoters and enhancers was consistent with published reports (1) (figs. S3 to S5).

We identified sites of allele-specific (AS) TF binding, hPTM, and transcription for all assays (5), ranging from 11 to 12% for TFs (4, 6) to 6 to 30% for hPTMs at heterozygous sites accessible for the analysis (median across all individuals) (Fig. 1A and fig. S6). Notably, in the two trios, fewer AS effects were observed in mRNA (mRNA-seq, 5%) than in nascent transcripts (GRO-seq, 27 to 28%) (5), likely reflecting posttranscriptional modifications.

Fig. 1 Allele-specific (AS) activity within transcriptional and chromatin layers.

(A) Proportion of accessible heterozygous SNP sites showing significant AS activity (median across all individuals, n = 3 to 14). (B) Consistency of allelic effects within genomic regions of TF binding and histone modification. Bars represent the proportion of peaks with a consistent allelic direction at two or more SNP sites.

Multiple heterozygous single-nucleotide polymorphisms (SNPs) overlapping regions of TF activity showed high consistency in allelic direction within individuals (Fig. 1B and fig. S7, A and B). Allelic consistency in nascent transcripts and histone modifications was high even with sites several kilobases apart and decreased with genomic distance (logistic regression, P < 0.05; fig. S7C). The strongest AS effects were enriched at promoters, whereas the allelic signals of marks of enhancer activity (PU.1, H3K4me1, H3K27ac) or heterochromatin (H3K27me3) showed a more dispersed distribution (fig. S8). We also analyzed all accessible heterozygous SNPs overlapping known expression quantitative trait loci (eQTLs) from the 1000 Genomes phase 1 populations (5, 7) and observed an enrichment of allelic bias at eQTLs relative to non-eQTLs for TFs (P = 0.016, Mann-Whitney U test) but not for hPTMs (fig. S9); this finding suggests that a TF binding change is often causal to the gene expression change.

Linking hPTM signatures with specific DNA sequence features has proven difficult (8), but for sequence-specific TFs it is possible to assess whether the observed AS effects are due to motif-disrupting variants (fig. S10). Categorization of significant AS binding sites, with respect to predicted TF motifs, revealed three classes of binding SNPs (B-SNPs): B-SNPs located either within (class I) or adjacent to (class II) predicted PU.1 and MYC consensus TF motifs, or B-SNPs in motif-devoid peaks (class III). Class I sites were enriched for B-SNPs relative to the other two classes (fig. S11, A and B, for PU.1; fig. S12, A and B, for MYC), which suggests that SNP-mediated disruption of the TF motif is likely causal to the observed AS binding activity. However, most TF AS binding events (70%, PU.1; 97%, MYC) appear triggered through TF consensus motif–independent mechanisms (figs. S11A and S12A) (6, 9). For example, allelic binding cooperativity tests (5) revealed four additional motifs (NFKB1, POU2F2, PRDM1, and STAT2), located proximal to the PU.1-bound site, that show covariance with AS PU.1 binding activity [false discovery rate (FDR) = 5%; Fig. 2A and fig. S13] and collectively explain another 7.5% of AS PU.1 binding activity.

Fig. 2 DNA sequence properties at allele-specific PU.1 binding sites.

(A) SNPs in PU.1-bound and cooperative TF motifs are predictive of AS PU.1 binding (5% FDR) (5). (B) PU.1-bound regions (peaks) with homotypic PU.1 motifs show a weak response toward motif-disrupting SNPs. Motif-disrupting SNPs were split into two classes (one or two PU.1 motifs per peak) and grouped according to their motif impact (1, lowest; 10 highest).

Despite a strong correlation between motif score differences and AS binding (figs. S11C and S12C; >90% expected direction), we observed that the majority of motif-disrupting SNPs do not show significant allelic effects (figs. S11A and S12A). Therefore, we tested whether homotypic TF motifs (i.e., multiple motifs for the same TF) located within PU.1-bound regions might buffer the effects of motif-disrupting SNPs (5, 10, 11) and found that TF-bound regions with homotypic motifs exhibit fewer allelic effects (41% versus 25%; P = 0.0087, Mann-Whitney U test). In addition, the impact of SNPs on TF motifs scales with the likelihood of observing significant AS effects (Fig. 2B and fig. S12D), but this trend is not significant if a second, unaffected homotypic TF motif is located nearby (Fig. 2B and fig. S11D). These results suggest that homotypic motif clusters buffer the effect of genetic variation over several similar binding sites.

We investigated the genetic component of allele-specific chromatin and binding signals by (i) comparing the direction of allelic bias at shared significant AS sites across 10 unrelated individuals (Fig. 3A), and (ii) testing for transmission of allelic effects from parents to children (Fig. 3B and fig. S17) (4). Allelic directions at shared significant AS sites in the unrelated individuals were significantly correlated (P < 0.05, Spearman correlation; fig. S16A), with mRNA showing the highest degree of consistency in allelic directions between individuals, followed by TF binding and histone modification, respectively (Fig. 3A and figs. S14 to S16). We observed evidence of significant parental transmission with all three regulatory TFs (ρ = 0.44 to 0.75, P ≤ 0.02, Spearman correlation; Fig. 3C and fig. S17), consistent with their strong sequence dependence (4, 6). For hPTMs, evidence of transmission was detected for the active histone marks H3K4me1, H3K4me3, and H3K27ac (ρ = 0.12 to 0.21; P ≤ 0.02), but their level of transmission was lower than for TFs. Transmission signals for mRNA levels and nascent transcription were significant and comparable to TFs (ρ = 0.46 and 0.50; P = 0.0008 and P = 1.3 × 10–7, respectively). We observed only weak transmission for POLR2B (fig. S17), possibly due to the distinct activity states of the polymerase (12). We determined the genetic control of the transmission signal of histone marks at known eQTLs (7) and DNase I sensitivity QTLs (dsQTLs) (13), respectively, because the former are enriched within TF binding sites (13). Transmission of the active marks H3K4me1, H3K4me3, and H3K27ac was stronger near eQTLs and dsQTLs (ρ = 0.31 to 0.57) than genome-wide (Fig. 3D and fig. S20), which suggests that the transmission behavior of the overall chromatin state depends on the properties of the underlying sequence. Collectively, these findings indicate coordinated and genetically driven changes between TF binding and histone modifications, and suggest that TFs are the primary determinants of regulatory interactions (1416).

Fig. 3 Genetic component of allele-specific transcriptional and chromatin activity.

(A) Distribution of pairwise correlation coefficients of significant AS sites between all unrelated individuals of European origin (CEU, n = 10) for each molecular phenotype. Correlation of the reference allele ratio at shared significant AS SNP sites was calculated using Spearman rank correlation. (B to D) Correlation of the paternal allele ratio of the child and that inferred from the parents at SNP sites where parents are opposite homozygotes and the child has a significant allelic effect. (B) Examples of transmitted PU.1 and H3K27ac SNP sites. (C) Genome-wide transmission results. GRO-seq signal was analyzed separately for each strand (filled and empty points, forward and reverse strand, respectively; P value represents combined data). (D) Transmission results of H3K4me1 and H3K27ac near dsQTLs (±1 kb window around the dsQTL).

To further assess the extent of allelic coordination (AC) between distinct genomic regulatory layers, we calculated the correlation between AS effects across pairs of molecular phenotypes (fig. S21). We observed that each testable phenotype exhibits significant correlation in allelic ratios with one or multiple phenotypes (Spearman correlation, P < 0.05). The majority of AC events reflect relationships between distinct regulatory layers that have also been observed quantitatively [e.g., POLR2B-H3K4me3 at promoters (1, 17, 18); GRO-seq–H3K4me1–H3K27ac at putative enhancers (19)] (Fig. 4A and fig. S21). These results support a strong allelic (i.e., local) interconnectivity among regulatory and general TFs, histone modifications, and transcription.

Fig. 4 Local, short- and long-range coordination between transcriptional and chromatin layers.

Results of allelic coordination (A) and haplotypic coordination (B) analysis at gene regions (genes ± 50 kb) (5). Coordination of the allelic effect was considered between all pairs of assays. SNP sites within genomic regions were required to show a significant AS effect in both assays. Only assay pairs with ≥20 SNPs were considered for the analysis. Significant Spearman rank correlation coefficients (P < 0.05) between the paternal allele ratios of the SNP pairs are indicated with colored lines ranging in intensity from ρ = –1.0 (blue) to ρ = 1.0 (red). Nonsignificant correlations are indicated with gray lines; missing lines indicate lack of sufficient data points for analysis.

eQTLs are often located distal to their target genes (20), indicating that allelic signals within regulatory layers might extend over short and long distances. We examined haplotypic coordination (HC)—defined as long-range coordination in allelic direction on the same chromosome—of AS effects at nonoverlapping heterozygous sites (5) (Fig. 4B and fig. S21), and found that every TF and histone mark exhibits HC with one or more regulatory layer(s) around genes and their flanking regions (fig. S21; Spearman correlation, P < 0.05). The degree of coordination varied between regulatory layers, ranging from –0.24 (GRO-seq–CTCF; P = 0.03) to 0.64 (MYC-mRNA; P = 2.9 × 10–8). The majority (>90%) of significant HC events were positive; that is, the allelic bias co-occurred on the same haplotype (Fig. 4B and fig. S21). For 25% of assay pairs tested, the strength of HC was significantly correlated with the genomic distance between SNP pairs (logistic regression, P < 0.05; odds ratio = 0.19 to 2.2) (fig. S22). For example, the enhancer-associated histone marks H3K4me1 and H3K27ac showed allelic consistency up to 200 kb with the TF PU.1. Thus, a single or few variant(s) likely trigger long-distance allelic effects over many of the regulatory layers acting on a genomic region.

Our work has revealed abundant allele-specific activity across all regulatory layers. Parental transmission of the allelic effects suggests that DNA sequence variations affecting transcription, TF binding, and histone modifications are largely transmitted from parents to children, with allelic histone effects showing more context-dependent behavior compared to TFs. Coordinated allelic and haplotypic behavior at different functional elements of the genome suggest that TF binding, histone modifications, and transcription operate within the same allelic framework. This is consistent with the fact that a few TFs can induce cellular reprogramming and massive changes in the chromatin landscape (21), and that the maintenance of a transcription-permissive environment and transcriptional memory are independent of histone modifications (22). Both histone modifications and TF binding are under genetic control, but histone modifications are more prone to stochastic, possibly transient effects and likely reflect (23), rather than define, coordinated regulatory interactions.

Supplementary Materials

www.sciencemag.org/content/342/6159/744/suppl/DC1

Materials and Methods

Figs. S1 to S31

Tables S1 and S2

References (2439)

References and Notes

  1. See supplementary materials on Science Online.
  2. Acknowledgments: Supported by Swiss National Science Foundation grants CRSI33_130326 (E.T.D., B.D., A.R., N.H.), 31003A_132958 (N.H.), 31003A_130342 (E.T.D.), and 31003A_129835 (A.R.); the European Research Council (E.T.D.); the Louis-Jeantet Foundation (E.T.D.); European Molecular Biology Organization fellowship ALTF 2010-337 (H.K.); a fellowship from the doctoral school of the Faculty of Biology and Medicine, University of Lausanne (R.M.W.); the NCCR Frontiers in Genetics Program (M.G.-A., J.B., E.T.D., B.D.); the Japanese-Swiss Science and Technology Cooperation Program (Japan Science and Technology Agency/ETH Zürich) (B.D.); École Polytechnique Fédérale de Lausanne (B.D.); and NIH grants HG004845 and GM25232 (J.T.L.). The computations were performed at the Vital-IT (www.vital-it.ch) Center for High-Performance Computing of the Swiss Institute of Bioinformatics. All data in this publication are available through ArrayExpress (www.ebi.ac.uk/arrayexpress/) under accession numbers E-MTAB-1883 (RNA-seq), E-MTAB-1884 (CHiP-seq), and E-MTAB-1885 (GRO-seq). The authors declare no competing financial interests.
View Abstract

Navigate This Article