Epigenomic Enhancer Profiling Defines a Signature of Colon Cancer

See allHide authors and affiliations

Science  11 May 2012:
Vol. 336, Issue 6082, pp. 736-739
DOI: 10.1126/science.1217277


Cancer is characterized by gene expression aberrations. Studies have largely focused on coding sequences and promoters, even though distal regulatory elements play a central role in controlling transcription patterns. We used the histone mark H3K4me1 to analyze gain and loss of enhancer activity genome-wide in primary colon cancer lines relative to normal colon crypts. We identified thousands of variant enhancer loci (VELs) that comprise a signature that is robustly predictive of the in vivo colon cancer transcriptome. Furthermore, VELs are enriched in haplotype blocks containing colon cancer genetic risk variants, implicating these genomic regions in colon cancer pathogenesis. We propose that reproducible changes in the epigenome at enhancer elements drive a specific transcriptional program to promote colon carcinogenesis.

Although noncoding functional elements play a central role in establishing gene expression patterns that drive normal development, cell-type identity, and evolutionary processes, their potential involvement in the context of common cancers remains unknown. The mono- and dimethylated forms of histone 3 lysine 4 (H3K4me1/2) broadly mark multiple classes of gene enhancer elements (13). Here, we present an epigenomic comparison of H3K4me1-marked gene enhancer elements in a cohort of colorectal cancer (CRC) cell lines and normal colon epithelial crypt cells, from which colon cancer is derived.

We performed H3K4me1 chromatin immunoprecipitation sequencing (ChIP-seq) analysis on three preparations of normal epithelial crypts as well as primary CRC cell lines derived from two early-stage tumors (V432 and V703), two late-stage tumors (V8 and V9P), and five liver metastases (V400, V457, V481, V503, V9M). On average, we detected ~71,000 peaks significantly enriched for H3K4me1 at a false discovery rate of less than 5% (table S1). The distribution of H3K4me1 relative to annotated genes is similar between colon cancer samples and crypt controls, with most H3K4me1 sites mapping to intergenic and intronic regions located distal to transcription start sites (fig. S1). We compared H3K4me1 patterns between all 12 colon samples and 9 unrelated human cell types (4). H3K4me1 patterns in tumors are more similar to those of colon crypt than of noncolon cells, consistent with the notion that colon tumors are derived from colon crypts (fig. S2). Moreover, there is less variation between the colon samples than between unrelated cell types.

We identified thousands of H3K4me1 sites, or variant enhancer loci (VELs), that are differentially enriched (lost or gained) in each of the CRC samples compared to normal colon crypts (Fig. 1A). On average, less than 0.05% of VELs map to regions altered in DNA copy number, and thus, the vast majority of VELs are unlikely to be the result of copy-number variations related to malignant transformation. VELs comprise 28 to 61% of all putative enhancers present in a given CRC sample (Fig. 1B). ChIP-seq analysis of H3K27ac, an epigenetic mark of active enhancer elements, revealed that ~40% of gained VELs acquire H3K27ac in CRC. Of lost VELs, 70% are enriched for H3K27ac in normal crypts and show virtually no detectable levels of H3K27ac in CRC (fig. S3). We also performed global mapping of deoxyribonuclease I (DNase I)–hypersensitive sites in two CRC lines using DNase-seq (5). Consistent with acquisition and loss of enhancer marks, virtually all gained VELs map to open chromatin sensitive to DNase I digestion, and lost VELs map to DNase I–insensitive regions (fig. S3, D and E). Collectively, the data indicate that multiple changes in chromatin state and function accompany the changes in H3K4me1 at VELs. Lastly, we verified that H3K4me1 sites are functionally active using luciferase reporter assays (fig. S4).

Fig. 1

H3K4me1 ChIP-seq identifies variant enhancer loci (VELs). (A) UCSC browser views of H3K4me1 profiles from three normal crypts and a CRC cell line (V400), illustrating an example of a gained (left) and a lost (right) VEL. Heatmaps show the corresponding H3K4me1 ChIP-seq signals ±5 kb of VEL midpoints. (B) Number of VELs and unchanged H3K4me1 sites in CRC samples relative to normal controls. (C) Number of unique and common VELs. (D) Distribution of VELs among CRC lines. VELs are shown in blue. (E) Percentage of control enhancers and VELs that overlap with H3K4me1 sites in any of nine noncolon cell types. All comparisons are significant by χ2 test (P < 10e-10).

A higher number of VELs than expected by random chance are common to multiple CRC samples. Specifically, we detected 2604 gained VELs common to five or more lines, and 2047 lost VELs common to six or more CRC lines (P < 0.001). Both unique and common VELs are distributed relatively evenly among the CRC samples (Fig. 1D); 197 VELs are shared between all 9 CRC samples. The universally common VELs are dispersed throughout the genome on multiple chromosomes and do not appear to cluster in any meaningful way (fig. S5).

We ranked VELs by their level of specificity in crypts and nine unrelated cell types. Compared to a control set of H3K4me1 sites invariant between CRC samples and crypts, lost VELs are highly crypt specific, whereas gained VELs are relatively noncrypt specific (fig. S6A). These relationships also held true for common VELs (fig. S6B). We also determined that 67 to 92% of gained VELs map to H3K4me1-marked loci in any one of the nine noncolon cell types, compared to 9 to 11% for lost VELs and 24 to 31% for control enhancers (Fig. 1E). Collectively, these data indicate that in colon cancer, the chromatin configuration is altered by acquisition of putative enhancer marks that are normally found in noncolon cell types, and by loss of putative enhancer marks that typify normal crypt differentiation status; the net effect leading to a less colon-specific phenotype.

Multiple approaches were used to assess the relationship between VELs and gene expression. Compared to control genes not linked to gained VELs, genes linked to gained VELs are generally more highly expressed in CRC samples than in crypts, and genes linked to lost VELs are expressed at a lower level in CRC samples than in crypts (Fig. 2A and fig. S7). For all CRC samples, the effect of lost VELs on gene repression is more pronounced than the effect of gained VELs on gene overexpression, indicating that lost VELs are more likely than gained VELs to confer a functional effect. Overexpressed genes are 1.6 to 6.2 times more likely than randomly selected control genes to have gained VELs (Fig. 2B). Repressed genes are 2.8 to 8.7 times more likely than controls to have lost VELs (Fig. 2C). One VEL is generally sufficient to confer an effect on gene expression, and additional VELs confer more marked changes in a relatively quantitative fashion (Fig. 2, D and E, and fig. S8). Genes associated with gained VELs are generally expressed at high levels in crypt controls, and their expression is further elevated in CRC (Fig. 2F and fig. S9). Genes associated with lost VELs are expressed at mid to high levels in crypt controls and generally become either attenuated or silenced in CRC (Fig. 2F and fig. S9). These results are consistent with the above findings indicating that the majority of lost VELs lose the active H3K27ac enhancer mark, whereas the minority of gained VELs acquire H3K27ac. We also found that correlations of global gene expression between CRC samples and crypts improved when VEL genes were not considered (fig. S10A). Common VELs are also enriched for genes frequently dysregulated in the CRC cell lines (fig. S10B). Collectively, the data indicate that gained and lost VELs are highly predictive of local cancer-specific overexpressed and repressed genes, respectively. Consistent with these positive correlations, lost VEL gene promoters often show decreases in H3K4me3 and/or H3K27ac in CRC relative to crypts, and gained VEL gene promoters show increases in H3K4me3 and/or H3K27ac in CRC relative to crypts (fig. S11). However, there is also a class of VEL genes that do not show measurable differences in promoter-associated H3K4me3/H3K27ac between normal crypts and CRC, but clearly show expression changes (fig. S11, B and C).

Fig. 2

VELs correlate with aberrant gene expression. (A) Fold change in expression of VEL and control genes for a representative CRC line (V400). Number of (B) gained and (C) lost VELs associated with overexpressed and repressed genes, respectively. Fold change in expression of genes associated with variable numbers of (D) gained VELs and (E) lost VELs in CRC sample V400. (F) Levels of all genes (gray) and aberrantly expressed genes (>1.5-fold relative to crypts) associated with VELs in CRC sample V400.

If VELs are indeed cancer-related events, then aberrantly expressed genes associated with common VELs ought to validate as aberrantly expressed in primary tumors. We determined that overexpressed genes associated with gained VELs common to five to nine lines, and repressed genes associated with lost VELs common to six to eight lines, validated as aberrantly expressed in primary tumors at a rate two to eight times higher than that determined when the VEL was not considered (Fig. 3, A and B). These results suggest that VELs are a signature predictive of the in vivo colon cancer transcriptome more robustly than the aberrant gene expression patterns associated with the colon cancer cell lines from which the VELs themselves were identified. 15-Hydroxyprostaglandin dehydrogenase (PDGH), a colon tumor suppressor gene associated with the VEL signature and repressed in CRC, is shown in Fig. 3C (6).

Fig. 3

Common VELs predict aberrant gene expression in primary tumors. (A) (Left) Red bars represent the percentage of overexpressed genes associated with gained VELs common to five or more lines (G5 to G9) that validate as overexpressed in primary tumors. Black bars represent the baseline predictive power when the VEL is not considered, i.e., the percentage of overexpressed genes in five or more cell lines that validate as overexpressed in primary tumors. G9 genes that validated as overexpressed in primary tumors are listed in brackets. (Right) Same as left, but for lost VELs common to six or more lines (L6 to L9). (B) Heatmap of expression of VEL-associated genes in (A) (red and blue bars) in normal colon tissue (n = 16) and primary CRC tumors (n = 120). (C) UCSC Browser view of H4K4me1 ChIP-seq signals across the PDGH locus, associated with a lost VEL common to six CRC samples (highlighted in yellow).

Twenty single-nucleotide polymorphisms (SNPs) have been identified through genome-wide association studies to confer risk to CRC (718). We used variant set enrichment analyses (VSEs) to test whether enhancers and VELs were significantly enriched among the 20 CRC-risk SNPs [or variants in linkage disequilibrium (LD) with the CRC risk SNPs (clusters), designated as the annotated variant set (AVS)]. Among the 20 clusters of SNPs comprising the AVS, 16 (80%) overlapped at least one H3K4me1 site in colon crypt (Fig. 4A). Similar analyses in nine other cell types indicated that the CRC AVS association was specific to H3K4me1 enhancers in colon crypt and HepG2 cells (Fig. 4B). Furthermore, significant associations were detected between the AVS and low-frequency lost VELs (L1 and L2, Fig. 4B), and not common gained or lost VELs. An example is shown in Fig. 4C. Of the eight SNPs associated with unique lost VELs, five (rs719725, rs6983267, rs10505477, rs7014346, and rs3802842) were associated with enhancers in crypt and HepG2 cells, and not in any other cell types, indicating that SNP/enhancer associations exclusive to the disease-relevant tissue are particularly important. Although VSE tests for enrichment of enhancers in linkage disequilibrium with the CRC AVS as a whole, we did detect multiple instances in which individual risk SNPs (or variants in strong LD with the risk SNP) overlapped VELs, despite the lack of significance with the entire AVS. For example, rs4444235 was significantly associated with gains common to seven CRC lines (P = 0.004). Rs4444235 maps to the enhancer of BMP4 and increases its expression (19). Accordingly, gained VELs at this locus correlate with increased BMP4 expression in CRC cell lines. Furthermore, lost VELs associated with risk SNPs rs719725 and rs9929218 were associated with reduced expression of potential target genes, JMJD2C and TMED6, in CRC samples containing the lost VELs. Collectively, these findings provide further evidence that enhancers and VELs are relevant to CRC pathogenesis.

Fig. 4

Colon enhancers and VELs are associated with genetic risk variants for CRC. (A) Results of VSE analysis showing that 16 of 20 CRC-risk SNP clusters map to H3K4me1-marked enhancers in a colon crypt sample (C101, red diamond), compared to a null distribution (gray). (B) (Left) The results of VSE analyses testing the association between CRC-risk SNPs and H3K4me1 sites in 10 cell types. The red line represents the significance threshold. The lower horizontal line represents the unadjusted significance threshold. The individual CRC SNPs found to be associated with H3K4me1 enhancers in each cell type are indicated in boxes, above each boxplot. (Right) VSE analysis of the CRC-risk SNPs and VELs. Control enhancers are H3K4me1 sites that are unchanged between CRC samples and crypts. L1 corresponds to unique lost VELs; L2 to L9 correspond to losses common to two to nine lines. G1 corresponds to unique gained VELs; G2 to G9 correspond to gains shared between two and nine lines. (C) Example of a lost VEL directly overlapping a CRC risk SNP shown within the relevant haplotype block structure (red).

Our epigenomic comparison of H3K4me1-marked gene enhancer elements in colon cancer cells suggest that central changes at enhancers drive a specific transcriptional program to promote colon carcinogenesis. Lost VELs appear to contribute to this signature more than gained VELs, as lost VELs confer a greater functional effect on expression than gained VELs, are better predictors of gene expression in primary tumors than gained VELs, typify colon crypt identity, are far more concordant across tumors than gained VELs, and are more robustly associated with CRC-risk SNPs than gained VELs. Most, but not all, VELs are linked to changes in promoter-associated H3K4me3 and H3K27ac. Thus, VELs capture novel and global information about the chromatin state that is related to gene expression. Moreover, these findings suggest that some of the VEL genes identified in this study would likely remain undiscovered through analysis of these promoter marks alone. Lastly, most VELs are common to at least two of nine (>20%) CRC samples. The commonality of the epigenetic colon cancer signature captured by VELs contrasts with the marked heterogeneity in mutations in colon cancer candidate driver genes revealed by genome sequencing and suggests either that VELs capture pathway outputs that are downstream of sets of gene mutations or that they capture epigenetic alterations that are independent of and more common than gene mutations (2022). Clearly, the number of enhancers consistently altered across multiple CRC tumors is likely far greater than genes commonly mutated in colon cancer. These findings, even when adjusted for the notion that enhancers are two to five times more prevalent than genes, suggest that the epigenetic terrain at gene enhancer elements in colon cancer is less heterogeneous than the genetic landscape of protein-coding genes.

Supplementary Materials

Materials and Methods

Figs. S1 to S14

Table S1


References and Notes

  1. Acknowledgments: We thank A. Ting and K. Guda for helpful comments and discussion; Z. Zhang for providing Perl scripts for data analysis; P. Manaenkov for assistance with data visualization; and S. Edelheit, N. Beckloff, and N. Molyneaux from the Case Western Reserve University Genomics Core for sequencing and informatics assistance. This work was supported in part by NIH grants R01HD056369 and R01CA160356 (to P.C.S.), R01CA1555004 (to M.L.), R01-LM009012 and R01-LM010098 (to J.H.M. and R.C.-S.), 1P50CA150964 and NIH UO1 CA152756 (to S.M.), and 5T32GM008056-29 (to O.C.). B.A.-Z. is a predoctoral student in the Molecular Medicine Ph.D. Program of Cleveland Clinic and Case Western Reserve University, funded in part by the Med into Grad initiative of the Howard Hughes Medical Institute. All data are available in Gene Expression Omnibus through the accession numbers GSE36401, GSE36204, and GSE36400.

Stay Connected to Science

Navigate This Article