Review

Developmental enhancers and chromosome topology

See allHide authors and affiliations

Science  28 Sep 2018:
Vol. 361, Issue 6409, pp. 1341-1345
DOI: 10.1126/science.aau0320

Abstract

Developmental enhancers mediate on/off patterns of gene expression in specific cell types at particular stages during metazoan embryogenesis. They typically integrate multiple signals and regulatory determinants to achieve precise spatiotemporal expression. Such enhancers can map quite far—one megabase or more—from the genes they regulate. How remote enhancers relay regulatory information to their target promoters is one of the central mysteries of genome organization and function. A variety of contrasting mechanisms have been proposed over the years, including enhancer tracking, linking, looping, and mobilization to transcription factories. We argue that extreme versions of these mechanisms cannot account for the transcriptional dynamics and precision seen in living cells, tissues, and embryos. We describe emerging evidence for dynamic three-dimensional hubs that combine different elements of the classical models.

Transcriptional enhancers are short segments of DNA that activate gene expression in an orientation-independent manner in response to intrinsic and external signals. A typical human protein-coding gene contains multiple enhancers, each bound by a specific combination of sequence-specific transcription factors (TFs). These factors can activate appropriate target genes over long distances, and sometimes even across chromosomes. In this review, we discuss insights provided by emerging technologies in imaging and genomics. These methods are beginning to provide elegant visualization of enhancer-promoter communication and are transforming our understanding about how enhancers work.

Enhancers work within the context of chromatin domains

Developmental enhancers mediate localized patterns of gene expression in space and time. For example, the ZRS enhancer regulates expression of the Sonic hedgehog (Shh) gene within a specific region (the ZPA) of developing limb buds in vertebrate embryos (1). Dominant mutations in the ZRS cause familial forms of polydactyly—individuals with supernumerary toes or fingers. The ZRS maps approximately one megabase from the Shh transcription start site, but the basis for such long-range gene control remains unknown (1). Even in the compact Drosophila genome, there are examples of enhancers working over distances of 70 to 100 kb, including the cut wing margin enhancer and the svb epidermal enhancer (2). These examples are likely not exceptional—recent large-scale approaches suggest that 20 to 30% of Drosophila enhancers may act as distal elements, skipping intervening genes (35). However, the average enhancer is not so remote, typically mapping ~20 to 50 kb in vertebrates and 4 to 10 kb in Drosophila from their target genes. Even at these distances, it seems unlikely that TFs directly touch the RNA polymerase II (Pol II) machinery at core promoters without being brought into physical proximity. How is this achieved?

“How remote enhancers relay regulatory information to their target promoters is one of the central mysteries of genome organization and function.”

Data from both imaging and Hi-C (genome-wide chromosome conformation capture assays) indicate that metazoan genomes are organized into a series of topological associating domains [TADs; discussed in detail in (6, 7)], which bring distant cis-regulatory elements into proximity, such as enhancers and promoters. The boundaries of TADs are often delineated by clusters of binding motifs for insulator proteins, such as CTCF, and the promoters of actively transcribed genes, such as tRNAs (810). How TADs are formed and affect gene expression remains unclear. In vertebrates, loop extrusion is the current prevailing model for TAD formation (11). This involves the loading of one or more cohesin complexes that form tripartite rings around chromatin and actively extrude a chromatin loop from one direction until they reach a barrier that blocks their activity. CTCF is likely the main barrier protein in vertebrates, although its binding is quite dynamic, with residence times of 1 to 2 min (12). In flies, embryos lacking CTCF develop normally (13), suggesting that either other barrier proteins exist, or that TADs can be formed by other mechanisms. The yeast condensin complex can also extrude DNA loops at the remarkable rate of ~1.5 kb per second in vitro (14). Although in vivo rates have yet to be determined, this suggests that even the remote ZRS enhancer could be brought into proximity with the Shh promoter in about 10 min. As discussed below, loop extrusion incorporates features of two classical models of enhancer activity: looping (e.g., TAD loop domains) and tracking (e.g., convergent movements of cohesin complexes loaded at flanking insulators).

TADs impose regulatory constraints on developmental enhancers

How TADs and TAD boundaries impinge on enhancer function remains an open question. There is considerable evidence that TAD boundaries act as insulators to preclude inappropriate enhancer-promoter interactions. An enhancer located in one TAD preferentially interacts with “local” promoters rather than those located in neighboring TADs (15). Compelling evidence comes from genetic studies in mice that removed TAD boundaries (16) or CTCF binding sites within boundaries (17, 18), or created chromosomal inversions or duplications that fuse adjacent TADs (19). These manipulations can cause developmental disorders as a result of inappropriate enhancer-promoter interactions (16, 19). Such a scenario is seen for a rare genetic disorder affecting human limb development (16). A large TAD containing the Epha4 gene is flanked by two smaller TADs containing Wnt6 and Ihh genes on one side and Pax3 on the other. Chromosomal rearrangements that disrupt the TAD boundaries result in new interactions of Wnt6, Ihh, and Pax3 with enhancers located in the Epha4 TAD (16).

Depletion of CTCF (20) or cohesin (21, 22) leads to a dramatic diminishment of TAD structures in vertebrates; however, this has only modest effects on gene expression. Hundreds—not thousands—of genes are affected, and less than half exhibit elevated expression, suggesting spurious gains in enhancer-promoter interactions in the absence of a boundary (21). It therefore appears that TADs constrain the action of just a subset of enhancers. We note, however, that it is possible that many genes may display less precise spatial or temporal expression in the absence of TADs when measured with quantitative single-cell methods.

Promoter specificity, distance, and competition influence enhancer interactions

As discussed above, diminishment of TAD organization in trans, via depletion of CTCF or cohesin, leads to relatively mild changes in gene expression. Yet, the removal of individual TAD boundaries in cis can cause dramatic phenotypes. What is the basis for this apparent discrepancy? The latter might arise from the occurrence of promiscuous enhancers located near TAD boundaries (Fig. 1A). Once a boundary is removed, there is a good probability for activation of neighboring genes. Indeed, genetic studies indicate that some enhancers can interact with any promoter in their vicinity (15), whereas others have an inherent preference for particular promoter types, e.g., TATA or DPE (23) (Fig. 1B). For example, a shared enhancer in the mouse Mrf4-Myf5 locus differentially regulates linked Mrf4 and Mrf5 promoters in intercalated myotome and fetal muscle fibers, respectively (24). This enhancer mediates both patterns when linked to a heterologous promoter, suggesting promoter specificity within the endogenous locus. Thus, it appears that some enhancers have the capacity to interact with many sequences throughout an entire TAD, but productively engage just a subset of available promoters. How is this specificity achieved?

Fig. 1 Properties of enhancers influence the impact of TAD boundaries.

After boundary deletion (dashed lines), enhancers (E) have different abilities to regulate promoters in neighboring TADs depending on their (A) promiscuity, (B) promoter specificity, (C) distance to other promoters.

One potential mechanism is promoter competition, whereby shared enhancers are sequestered by the closer or stronger promoter when presented with multiple choices (Fig. 1C). Competition has been documented in a variety of developmental processes. A notable recent example concerns the proto-oncogene c-Myc (25). c-Myc overexpression is a major cause of many human cancers. Expression is normally attenuated by PVT1, a long noncoding RNA located just downstream. The PVT1 promoter is located between the c-Myc promoter and a number of remote 3′ enhancers that regulate expression in different tissues, including lymphocytes. The proximal PVT1 promoter sequesters these shared enhancers to attenuate c-Myc expression.

Proximal promoters generally have a competitive advantage over more distal promoters, but this can be negated by insulators and TAD boundaries. The higher-order topologies present in TADs increase the proximity, and therefore probability, of long-range enhancer-promoter interactions. For example, both the ZRS enhancer and distant Shh promoter are located within a common TAD, which reduces the effective distance separating them. When the TAD structure is altered by a genetic inversion, ZRS-Shh interactions are diminished (26). However, Shh expression was restored by reducing the genomic distance separating the enhancer and promoter within the inversion chromosome. Thus, promoter specificity, distance, and competition all influence the functional consequences of disrupting TAD boundaries (Fig. 1).

Models of enhancer-promoter communication

Even with the contraction in space afforded by TADs, the distances between enhancers and target promoters can be large, as seen for ZRS-Shh interactions. How can enhancers convey regulatory information across such distances? Various models for enhancer-promoter communication have been proposed over the years (Fig. 2), including tracking (or scanning), linking (or chaining), looping, and mobilization to transcription factories. Although the field has largely converged on looping, it is interesting to consider other models in light of genome topologies such as TADs.

Fig. 2 Models of enhancer-promoter communication.

(A) Pol II binds to an enhancer and tracks along chromatin (synthesizing RNA), pulling the enhancer with it. (B) TFs bound to a regulatory element oligomerize, chaining to the promoter. (C) Looping (in bacteria, lambda) requires protein-protein interactions between factors on the same face of the helix. (D) Long-range loops can bring enhancers close to a promoter, but not in direct proximity. Tracking or linking could bridge the distance.

The tracking model proposed that Pol II bound to upstream regulatory elements could move along DNA, pulling the enhancer with it until coming into contact with a proximal promoter. The motor force of Pol II elongation was seen as the key mediator of this enhancer mobilization (Fig. 2A). Proposed evidence for tracking came from inserting “road blocks” (insulators) between enhancers and promoters. Long noncoding RNAs (lncRNAs) emanating from enhancers directed toward their target promoters were suggested to be a manifestation of this process within the β-globin locus (27). It is conceivable that loop extrusion—cohesin-driven spooling of DNA—could foster tracking of distant enhancers.

The linking model is an extension of observations seen for the Lambda repressor, whereby protein-protein oligomers bridge distal elements and target promoters (Fig. 2B). One proposed linking factor in metazoans is the Drosophila Chip protein (28), which was proposed to oligomerize from the enhancer to the promoter. However, more recent studies with the vertebrate ortholog, Lbd1, suggests that it might actually form targeted loops through homo-dimerization when bound at enhancers and promoters (29).

The looping model proposes that factors bound to two distinct sites would physically interact with each other, resulting in extrusion of the intervening DNA (Fig. 2C). As TF binding is dynamic, so too is loop formation. In this mechanism, the intervening DNA is passive during the formation of loops, in contrast to tracking or linking models where it could play an active role. The first evidence for looping came from Escherichia coli over 30 years ago (30), where optimal interactions required proteins to be located on the same side of the helix (Fig. 2C). The common feature of looping and tracking is the reliance on an adenosine 5′-triphosphate (ATP)–driven motor that can move along DNA, e.g., cohesin and/or condensin complexes in the case of loops and TADs, and Pol II elongation for tracking.

Mechanisms of loop formation

In vertebrates, a typical TAD contains several transcription units and dozens of enhancers. How do the right enhancers interact with the right promoters within the context of TADs? Sequence-specific binding of CTCF and the recruitment of cohesin can contribute to targeted enhancer-promoter interactions. The first evidence for this came from a genetic screen in Drosophila, which identified mutants that perturb long-range activation of cut and Ubx (31). This identified Nipped-B (NIPBL in vertebrates), a protein required for cohesin loading. CTCF-cohesin complexes have since been implicated in the formation of long-range loops in many contexts, acting positively or negatively to regulate enhancer-promoter interactions (32, 33) (Fig. 2C). However, there are also many examples where enhancers skip over CTCF bound-regions to regulate specific target genes (8). Thus, CTCF can sometimes function as a barrier or insulator, forming loops that block inappropriate enhancer-promoter interactions, but sometimes not.

One explanation for this apparent specificity is an orientation dependence of CTCF binding sites. The formation of loops is facilitated by a convergent arrangement of such sites (10). The first hints of this directionally came from the 5′ boundary element in the chicken β-globin locus and the ICR in the mouse H19-Igf2 locus, both of which depend on CTCF and function in an orientation-dependent manner (34). Inverting the orientation of specific CTCF sites can alter enhancer-promoter loops, causing changes in gene expression (18, 35). This orientation-dependent requirement for CTCF interactions is reminiscent of the requirement of helical phasing in bacterial, bacteriophage, and SV40 elements. In principle, CTCF orientation can help ensure that the correct enhancers loop to the right promoters, but such a mechanism cannot account for all aspects of enhancer specificity within TADs.

CTCF and cohesin are expressed in most or all tissues and are generally involved in constitutive looping interactions (9, 32). However, cohesin can partner with other factors to mediate tissue-specific enhancer-promoter proximity. Sox2 expression in embryonic stem (ES) cells, for example, depends on tissue-specific interactions between its promoter and a distal enhancer that is mediated by cohesin-Mediator complexes (9). However, the relatively mild changes in gene expression resulting from CTCF and cohesin depletion suggests that additional proteins also mediate dynamic enhancer-promoter loops within TADs. YY1 is one example (36), which is broadly, if not ubiquitously expressed. Tissue-specific enhancer-promoter loops generally depend on tissue-specific TFs, such as GATA1 and Klf1 in erythrocytes (37, 38). At the β-globin locus, GATA1 mediates promoter loops independently of cohesin by interacting with Lbd1 (38).

Given the density of enhancers, insulators, and TF binding sites throughout the genome, it is difficult to envisage how tracking or linking models could produce specific activation of the appropriate target gene. Enhancers often skip genes to interact with preferred target promoters, and many enhancers (~50% in Drosophila) are located in introns or 3′ regulatory regions. In such cases, enhancer tracking or oligomerization of linking proteins would likely occlude the transcription of associated coding sequence. Strict versions of the tracking model also fail to explain transvection—gene activation across paired chromosomes. In principle, tracking or linking mechanisms could work over short distances, particularly when considered in the context of TADs and specific loops within TADs (Fig. 2D). This would bring enhancers in close proximity with other enhancers and promoters to generate a three-dimensional “hub” for the propagation of short-range signals. In such a scenario, the enhancer need not physically touch the promoter, but rather, physical proximity may be sufficient for the regulation of promoter activity (Fig. 2D, see below).

Are loops sufficient for gene activation?

The emerging picture from both chromatin-capture and imaging techniques is that multiple enhancers and promoters are organized in complex nonbinary topologies (4, 8, 10). This view clashes with the simplest models of TF-directed looping of a specific enhancer to a specific promoter. Here, we consider different types of enhancer-promoter topologies and their roles in transcription activation.

Enhancer-promoter proximity is globally correlated with gene activity—for example, active enhancers are generally found near active promoters (and other active enhancers) (4, 8, 10, 39). However, proximity appears to be sufficient for activating some genes but not others (Fig. 3A). The locus control region (LCR), for example, forms contacts with the globin promoters in erythroid cells where the genes are active, but not in the brain where they are inactive (40). Moreover, forcing a loop between the LCR enhancer and β-globin promoter is sufficient to activate gene expression (29), indicating that proximity acts as a trigger. By contrast, enhancers at other loci are already in proximity with their target promoter prior to gene expression (4, 4143). For example, the T helper type 2 locus contains three genes (interleukin-3, -4, and -5) that are coordinately expressed in a subset of T lymphocytes. The type 2 LCR comes into proximity with the IL-3, -4, and -5 promoters during the specification of different T cell lineages, prior to their activation (41). Only at later stages do the genes become activated, but the LCR-promoter topology remains unchanged. Similarly, a study of 100 loci during Drosophila development indicates that many embryonic enhancers are in preformed topologies with their target promoters prior to gene activation (4). Most of these genes contain promoter-proximal paused Pol II (4), suggesting that they are primed for rapid induction.

Fig. 3 Two types of topologies at complex loci.

(A) Left: Enhancer (E)–promoter proximity at the time of gene expression. Right: Preformed (compacted) topologies prior to gene expression. (B) Insulator:insulator pairing brings transgenic eve-promoter and endogenous enhancers (E) in proximity—from ~700 to ~400 nm, without lacZ-reporter transcription. (C) Further compaction (~335 nm) occurs during reporter transcription.

What triggers activation? In some cases, this may be due to subtle, dynamic movements of an enhancer or promoter within a preformed topology (4, 42). At other loci, the recruitment of a TF to a prelooped enhancer may trigger activation, as seen at the T helper type 2 locus in mammals (41). These preformed loops, or “hubs,” might be assembled by TFs present at earlier developmental stages preceding activation. GATA3 and STAT6 are thought to perform this role in the T helper type 2 locus.

We suggest that gene activation is a two-step process at many loci: Enhancers and promoters come into proximity (local compaction) to prime expression, and then subtle topological changes trigger activation (Fig. 3). Recent live imaging supports this; the Drosophila eve locus contains a series of 5′ and 3′ enhancers located close to the promoter that mediate expression in segmentation stripes along the embryo. A lacZ reporter gene containing MS2 RNA stem loops was linked to the eve promoter and Homie insulator and positioned ~140 kb from the endogenous eve locus (44). In the absence of looping (e.g., insulator-insulator interactions), the lacZ reporter gene is located ~700 nm from the endogenous locus. This distance is reduced to less than 400 nm upon insulator pairing (Fig. 3B). However, this proximity is not sufficient for activation in the majority of cells. Active foci of transcription exhibit an even tighter association; ~330 nm rather than ~400 nm (Fig. 3C). It therefore appears that looping is necessary but not sufficient for transcriptional activation of the reporter gene. Similar results were obtained in a transvection assay across homologous chromosomes. Only half of the paired alleles display expression (45).

These observations are reminiscent of the Shh locus—the long-range ZRS enhancer and Shh promoter are contained within a common looped topology throughout the limb bud. Super-resolution microscopy indicates that the enhancer and promoter are in even closer proximity in the posterior limb where Shh is active, but not in anterior regions where it is inactive (46). In principle, preformed topologies should reduce the search time and space of enhancer-promoter interactions. Spatial confinement, for example, is the major parameter that determines interaction frequency between distal V(D)J regions comprising immunoglobulin genes in B lymphocytes (47).

Transcription hubs

In addition to tracking, linking, and looping, “transcription factories” was an influential model for enhancer activity (48). According to this view, TFs bound to distal enhancers mobilize the associated gene to discrete foci that reside in fixed locations within the nucleus. Most of the evidence for factories was obtained by using fixed tissues in cultured cells, including embryonic blood cells. These assays revealed discrete nuclear foci containing the phosphorylated form of Pol II, in addition to many coexpressed genes that are located in different chromosomal positions, and even on different chromosomes (48, 49). Because many fewer Pol II foci were detected [40 to 200 per cell (48)] compared to the number of actively transcribed genes per nucleus, the factory model proposed that multiple coexpressed genes move in and out of preassembled factories. With advances in live imaging, we now know that the system is much more dynamic. For example, super-resolution live imaging revealed highly dynamic and transient clusters of Pol II (50). These clusters do not reside in fixed locations within the nucleus, but are instead formed de novo upon transcriptional stimulation, persisting for short periods, on the order of a minute.

A dynamic variant of the transcription factory model (hubs) is gaining momentum as it incorporates features of all classical models of enhancer-promoter interactions, explains many observations reported for transcription factories, and accounts for more contemporary observations such as transcriptional bursting. According to this model, prelooped topologies serve as hubs or traps for the accumulation of Pol II and other complexes required for gene expression (Fig. 4). Liquid-liquid phase transitions were proposed to facilitate this process (51) because many TFs, coactivators, and components of the basal transcription machinery contain intrinsically disordered domains that can foster such interactions. Studies of the assembly of germline determinants (P-granules) in Caenorhabditis elegans indicate that different RNA and protein subunits associate through such phase transitions (52). Live-imaging assays permitted direct visualization of coalescing P-granule “droplets” in early C. elegans embryos.

Fig. 4 Hub and condensates model.

Preformed topologies increase local TF, coactivator, and Pol II concentration (hubs or microenvironments), where different enhancers (E) dynamically share common resources.

According to this “hub and condensate” model for enhancer-promoter communication, enhancers need not directly touch their target promoters, but merely come into proximity, within 100 to 300 nm (45). The coalescence or aggregation of multiple Mediator complexes, preinitiation complexes, and Pol II could serve to bridge enhancers to their target promoters over such distances (Fig. 4). Obviously, this model is somewhat speculative, although recent studies (5355) provide direct visualizations of dynamic Pol II, TF, and Mediator condensates at sites of active transcription. Moreover, active enhancers have a higher diffusion rate than inactive enhancers (56), which could be interpreted as less-restricted movement of TFs within liquid condensates.

Conclusion

Single-cell genomics, genome editing, single-molecule live imaging, and super-resolution methods are animating classical “snapshots” of enhancer-promoter communication. The first “movies” suggest that each of the classical mechanisms—tracking, linking, looping, factories—could contribute to the overall process; however, none is sufficient. A holistic understanding will require the simultaneous visualization of enhancers, promoters, nascent transcripts, TFs, Pol II, and associated cofactors at specific loci within higher-order chromosomal topologies. In addition, complex genetic loci contain multiple enhancers and often use alternative promoters. Limitations in detection methods and resolution currently restrict the visualization of all the moving parts. However, the remarkable advances that we have witnessed in recent years give great promise for a new synthesis in our understanding of complex developmental and disease processes.

References and Notes

Acknowledgments: We are grateful to members of the Furlong and Levine labs, and to J. Jaynes, for comments and discussions. We apologize to colleagues whose work we could not discuss. Funding: We thank the European Union‘s Horizon 2020 research and innovation programme for financial support [grant agreement 664918 (MRG-GRammar) to E.E.M.F., NIH GM118147 to M.L.]. Competing interests: The authors declare no competing interests.
View Abstract

Subjects

Navigate This Article