PerspectiveGene Expression

Transcription factors read epigenetics

See allHide authors and affiliations

Science  05 May 2017:
Vol. 356, Issue 6337, pp. 489-490
DOI: 10.1126/science.aan2927

Decoding precisely how sequence-specific DNA binding proteins (called transcription factors) recognize, access, and act at their genomic binding sites is challenging. One shortcoming is the lack of knowledge about DNA binding specificities (motifs) for hundreds of the estimated ∼1600 human transcription factors. Another is how transcription factor binding is modulated by “epigenetics”—a contentious term that refers to heritable states of both cells and organisms, as well as the covalent chemical modifications of DNA and protein that often provide the underlying mechanism (1). DNA methylation at cytosine and guanine dinucleotides (mCG) satisfies most views of epigenetics, as it is inherited across cell divisions and functions in imprinting (parent-of-origin-dependent gene expression). On page 502 of this issue, Yin et al. (2) provide a comprehensive look at the extent to which human transcription factor binding is affected by mCG, and make a striking finding: Many homeodomain transcription factors—perhaps the best-characterized developmental regulators in biology (3)—can bind to specific methylated DNA sequences.

Most CG dinucleotides in mammalian cells are methylated, but mCG is depleted at active regulatory sequences (4). mCG is generally thought to occlude transcription factor binding, as the methyl groups protrude into the major groove (see the figure), where many transcription factors bind. There are dedicated CG and mCG binding factors (CXXC and MBD domain-containing proteins, respectively), and on page 503 of this issue, Takahashi et al. (5) allude to their importance in regulating de novo DNA methylation. mCG binding proteins can also mediate silencing of gene expression (6), but whether methylation generally controls or reacts to transcription factor binding remains largely unexplored (7). Regardless, activation of methylated DNA (e.g., in cell differentiation) presumably requires “pioneer” transcription factors that can jump-start methylated regions. Some additional transcription factors and other proteins that can bind to methylated DNA have been identified in screens of protein function (8, 9), but their precise sequence specificity and/or roles in mechanisms such as development and cell-type specification are generally unknown.

Yin et al. adapted a method for determining transcription factor sequence specificities, called high-throughput systematic evolution of ligands by exponential enrichment (HT-SELEX), by comparing results from methylated and unmethylated DNA (methyl-HT-SELEX). To further quantify the effect of methylation, the authors also developed a clever adaptation that employs bisulfite sequencing to discriminate between methylated and unmethylated DNA in the same pool. mCG preferences identified by methyl-HT-SELEX agree with a previous microarray approach (10), but the new method should enable discovery of larger motifs, as libraries of random sequences are much more complex than microarray probes.

New binding sites

A methylated DNA structure (PDB:5EF6) is shown. “Methyl-plus” binding may be a pioneering mechanism for transcription factors such as HOXB13, which can also bind to unmethylated DNA in cancer cells.


Using this method, Yin et al. obtained DNA methylation preferences for almost half of all human transcription factors, ∼60% of which are affected by mCG. There are nuances to analyzing the data. Relative preference to millions of individual sequences with and without methylation (and many lacking CG dinucleotides) are compared simultaneously. The data should spur development of methods to describe this new dimension. Nonetheless, the classifications presented by Yin et al., which are based partly on manual data analysis, indicate that transcription factor binding is more often positively affected by methylation (“methyl-plus”) (34%) than negatively influenced (“methyl-minus”) (23%). Intriguingly, methyl-plus binding is often restricted to sequences that are not bound when the DNA is unmethylated (51% of cases), indicating that DNA methylation creates new transcription factor binding sites.

Methyl-plus binding is more prevalent for some transcription factor classes than others. The most striking case is homeodomain transcription factors, which have clearly established functions in laying out the body plan and controlling the identity of dozens of cell types (3). A majority of these factors, and nearly all of the particular subclasses [including the Pit-Oct-Unc (POU), myeloid ecotropic viral integration site (MEIS), and posterior subclasses], displayed methyl-plus binding. Yin et al. observed, using co-crystal structures, that mCG binding by several homeodomain transcription factors is mediated by direct hydrophobic interactions between amino acid residues and the methylated cytosines. Identity of these residues correlates with mCG binding across homeodomain proteins; thus, mCG binding can potentially be predicted for proteins not analyzed in this study.

To confirm that mCG binding is used in vivo, Yin et al. employed two very different cell culture models. In mouse embryonic stem cells either devoid of or with increased DNA methylation, chromatin immunoprecipitation sequencing (ChIP-seq) confirmed that endogenously expressed regulators of stem cell fate—octamer-binding transcription factor 4 (Oct4) (methyl-plus) and v-myc avian myelocytomatosis viral related oncogene, neuroblastoma derived (Mycn) (methyl-minus)—are attracted and repelled by DNA methylation, respectively. In a model of prostate cancer (11), a more complex relationship is observed. Here, exogenously expressed homeobox B13 (HOXB13) (methyl-plus) binds methylated DNA sites in prostate epithelial cells. Most of these sites have lost DNA methylation in prostate cancer cells, however, while still binding HOXB13 (see the figure). This change in DNA state could reflect a new type of pioneer transcription factor activity.

The findings of Yin et al. should fuel further exploration of the role of mCG binding in regulating cell identity and development by homeodomains and other transcription factors. The study also contributes to the goal of obtaining a binding motif for every human transcription factor. Together with two recent ChIP-seq-based studies (12, 13) that focused on the large and mostly uncharacterized Cys2-His2 zinc finger class of transcription factors, Yin et al. have cut the number of human transcription factors lacking binding data roughly in half.

More generally, the study of Yin et al. contributes a unique perspective to evaluating how transcription factors bind DNA. Transcription factor motif modeling is critical for the study of global gene regulation, allowing us to predict potential binding sites in the genome. Given that so many transcription factors are affected by chemical modifications to DNA, we are now faced with a clear necessity to incorporate DNA methylation into motif models, and a new type of data from which to learn to read the genome the way transcription factors do.


Navigate This Article