Special Viewpoints

Gene Order and Dynamic Domains

See allHide authors and affiliations

Science  22 Oct 2004:
Vol. 306, Issue 5696, pp. 644-647
DOI: 10.1126/science.1103864


When considering the daunting complexity of eukaryotic genomes, some comfort can be found in the fact that the human genome may contain only 30,000 to 40,000 genes. Moreover, growing evidence suggests that genomes may be organized in such a way as to take advantage of space. A gene's location in the linear DNA sequence and its position in the three-dimensional nucleus can both be important in its regulation. Contrary to prevailing notions in this postgenomic era, the bacteriophage λ, a paragon of simplicity, may still have a few things to teach us with respect to these facets of nonrandom genomes.

Nearly 40 years have passed since Jacob and Monod received the Nobel Prize for their contributions to elucidating the transcriptional regulation of the bacteriophage λ. The paradigm of gene regulation offered by λ stresses the importance of gene proximity (in the form of operons), gene order, and competitive DNA binding by regulatory factors, largely dictated by their effective concentrations (1). These strategies used by the phage during the switch between lysogeny and lysis may seem to have little bearing in this postgenomic age. Yet the basic principles of phage λ regulation appear to be taking shape at the level of the coordinated regulation of transcriptomes—gene networks dedicated to eukaryotic cell fates such as death, division, and differentiation. Emerging evidence indicates that eukaryotic genes may be organized in the form of genomic domains that allow for their coexpression through mechanisms both similar to and different from λ.

Tandem Gene Arrays

Size has always seemed to matter for molecular biologists attempting to understand the coordinated regulation of eukaryotic genes. Given the vast difference in relative length, eukaryotic genomes have long been thought to operate under a set of principles different from lower organisms, such as phage λ. However, analysis of the growing list of sequenced eukaryotic genomes is beginning to yield commonalities in gene regulation regardless of genome size (2). Even before the availability of whole genome sequences, experimental evidence suggested that the first principle of λ regulation, gene proximity, plays an important role in the regulation of tandem gene arrays, colinear series of individual genes often formed through duplication events. Tandem gene arrays are common features of eukaryotic genomes, and they are both ubiquitously expressed [e.g., small nuclear RNA, histones, ribosomal DNA (rDNA), major histocompatibility complex (MHC), etc.] and tissue specific [e.g., Hox, immunoglobulin (Ig), T cell receptor (TCR), β-globin, cadherin, olfactory, etc.]. The size of tandem gene arrays varies—for example, the mouse Ig heavy chain (IgH) contains ∼3 Mb, whereas β-globin contains ∼120 kb (λ itself is 48 kb)—but the prevalence of these loci suggests that proximity in the linear gene order is important in coordinate gene regulation. The actual role that linear proximity plays in transcriptional regulation is still being established. The rDNA arrays may represent one potential mechanism in that a shared linear position facilitates the nucleolar localization of the relevant ribosomal genes (3). Additionally, the Ig and β-globin loci, as shown by fluorescence in situ hybridization (FISH), are specifically positioned in the nucleus during cellular differentiation, and their nuclear localization has been correlated with their state of activity (4, 5). The clustering of genes in arrays may therefore allow the simultaneous localization of genes to compartments of the nucleus important in their regulation.

Tandem gene arrays also demonstrate the second principle of λ regulation: the activation of genes according to their linear order in the genome. With phage λ, the position of the cl gene (the repressor) to the left of the operator and of the cro gene (Cro) to the right are requisite features of the genetic switch that determines the choice between lysogenic and lytic cell fates (1). Similarly, the β-globin, Hox, and Ig gene arrays demonstrate a sequential and developmentally regulated activation of genes according to their order. Not unlike phage λ, the sequential activation of these tandem gene arrays may be due to the proximity of the “early” genes to the dominant regulatory element of a given loci. The β-globin locus undergoes sequential activation of globin genes during embryogenesis, with those genes closest to the locus control region (LCR) being upregulated first (6). With the Ig heavy chain locus (IgH), the genes first to undergo germline transcription are those closest to the intronic and 3′ enhancers (7). For the Ig and TCR gene arrays, this type of consecutive activation is correlated with the sequential events of V-D-J recombination. The vertebrate Hox gene clusters, however, uniquely illustrate this phenomenon with linear gene order. In addition to the chronological activation of vertebrate Hox genes, the proteins themselves demonstrate a corresponding activity along the anteroposterior axis (8). These genes are therefore uniquely linked in their order, activation, and function.

The third property of λ regulation is the competitive binding of operator sequences by either the repressor or Cro (1). The cooperative binding of the dimerized repressor protein ensures transcription of the cl gene and continuation of the lysogenic cell fate. In the event of external influences (such as irradiation), the effective concentration of the repressor is diminished, allowing for the up-regulation of cro and the subsequent activation of λ early genes. Although they have not been elucidated in such regulatory detail, tandem gene arrays also demonstrate sensitivity to both protein availability and concentration, and in some cases are involved in an autoregulatory feedback mechanism like phage λ. For example, the Hox gene clusters encode homeodomain transcription factors critical in decisions of cell fate during metazoan development, and in vertebrates some of these same proteins also contribute to the regulation of the clusters themselves (8).

With the β-globin locus, an exchange in heterodimerization partners of a small Maf protein, MafK, leads to the switch from repression of the locus in committed but undifferentiated erythroid cells to activation of transcription in differentiated cells (9). Before differentiation, a heterodimer composed of MafK and the repressor Bach1 recruits transcriptional corepressor complexes to the locus, resulting in repression of globin gene expression. Upon induction of erythroid differentiation, an exchange of MafK-binding partners occurs: Bach 1 is replaced by the transcriptional activator p45. This, in turn, leads to displacement of corepressor complexes from the locus and recruitment of coactivators, resulting in globin gene expression. The mechanism behind this exchange involves the relocation of MafK in the nucleus (10). Before the induction of differentiation, MafK colocalizes with centromeric heterochromatin, whereas p45 is restricted to euchromatic nuclear compartments. Terminal differentiation is accompanied by the relocation of MafK (and the β-globin locus) to euchromatic regions and formation of the MafK/p45 heterodimer (10). Differentiation is also associated with a relative increase in p45 and a decrease in Bach1 (11). Thus, similar to phage λ, the availability and concentration of transcription factors play a key role in effecting a switch between repressed and activated states.

Gene Clusters

The availability of genomic sequences has now allowed researchers to determine whether, beyond tandem gene arrays, eukaryotes possess clusters of coexpressed genes that are related in cellular function or fate. That is, do genomes show a level of organization in which genes that are part of a transcriptome are also proximal in the linear genome? Given the examples of tandem gene arrays, it is perhaps not surprising that all eukaryotic genomes examined to date display a tendency for the organization of coexpressed genes, often related in function, to be linearly clustered (2). Therefore, as opposed to displaying a homogenous distribution of genes along chromosomes, genomes appear to be specifically organized for gene regulation.

The nonrandom nature of gene distribution along chromosomes is pronounced in vertebrate genomes. By integrating the human genomic sequence with SAGE (serial analysis of gene expression) data for genome-wide mRNA expression patterns from 12 tissue types, a human transcriptome map has been determined (12). The linear distribution of expression along chromosomes depicted in this map reveals that the human genome is organized into regions of high and low levels of gene activity. The highly active regions, RIDGEs (regions of increased gene expression), are separated by large regions of low activity, anti-RIDGEs.

RIDGEs and anti-RIDGEs coincide with gene-dense and gene-poor chromosomal domains, respectively. Therefore, gene activity is inherently compartmentalized along the chromosome, which may indicate a higher order genomic structure. A more recent analysis of SAGE data suggests that RIDGEs form because they may be composed of ubiquitously and highly expressed housekeeping genes (13). Nonetheless, it is interesting to consider these highly expressed regions in light of the synteny between the genomes of human and mouse. The syntenic blocks of DNA, which constitute ∼90% of both genomes, vary in length, but half are ∼20 Mb, similar to the size of RIDGEs (14). It may be that evolution deals with a currency much larger than a single gene; evolutionary forces may impinge on genomic domains because of the importance of proximity in gene regulation.

Beyond the organization at the level of RIDGEs, genomic approaches have uncovered clusters of coregulated genes encompassing 2 to 30 genes. In the budding yeast Saccharomyces cerevisiae, chromosome correlation maps, which plot the expression patterns from various conditions or cell stages along the linear gene order of the chromosomes, reveal that genes from transcriptomes dedicated to cell cycle phases, sporulation, and the pheromone response are found in pairs throughout the genome (15). These pairs of genes are often functionally related as well. Similarly, in Caenorhabditis elegans, mRNA-tagging and microarray approaches have uncovered clusters of two to five genes that are coregulated in specific cell lineages, including muscle, sperm, oocytes, and the germ line (16). An analysis of gene expression in Drosophila, using microarray data determined under 80 different experimental conditions, has revealed an organization of coexpressed genes in groups of 10 to 30, covering 20 to 200 kb (17). The clustering of genes within the Drosophila genome has also been demonstrated by expressed sequence tag (EST) database analyses of tissue-specific expression from the testis, head region, and embryo. In each cell type, coregulated genes are organized into clusters of three or more genes, with a trend toward large groupings (18). In humans, several metabolic pathways are also shown to have their protein components encoded in genes that are proximal in the genome (19). Therefore, the clustering of coregulated, lineage-restricted genes indicates a functional organization of transcriptomes that define a given cell type.

The regulatory mechanisms that govern the expression of the tandem gene arrays described above offer an attractive model for the tissue-specific coregulation of gene clusters. Like phage λ, the gene arrays use shared regulatory elements and fluctuating regulatory protein concentrations to ensure their proper expression patterns. In large part these tools are successful because of the proximity of the tandem genes. Gene clusters that are involved in the differentiation or maintenance of a cell type may therefore be selected for the same regulatory purposes. In S. cerevisiae, for example, several of the tandem genes observed in the chromosome correlation maps share a common upstream activating sequence (15). Whether all of the gene clusters identified have a shared enhancer seems unlikely. However, a universal property that proximity bestows is an increased local concentration of the pools of regulatory proteins that coexpressed genes likely have in common. Specifically, because of the number of binding sites for regulatory proteins in colinear genes, gene proximity may ensure an effective concentration of these proteins, creating a localized transcriptional center or “expression hub” (2). This idea is supported by examples of definitive nuclear subcompartments with increased localized protein concentrations that a given gene cluster may associate with (or, indeed, may help to form) to direct its regulation. The nucleolus represents a paradigm of this phenomenon, being essentially a concentration of proteins that guarantees ribosomal biogenesis as well as regulation of other genes required for protein translation (3, 20).

On the basis of the evidence presented above, it is attractive to consider lineage-specific transcription not as the coregulation of hundreds of disparate genes, but rather as the coordinated regulation of various gene domains. Beyond the examples of tandem gene arrays and gene clusters, genomic approaches have yielded evidence for another feature of domain-based genomes. Analysis of the histone modifications associated with active and inactive genes indicates that related modifications are shared by neighboring genes. For example, in Drosophila, a very tight correlation between gene expression and five different euchromatic histone modifications was observed (21). The pattern of euchromatic histone modification is “binary,” with active genes showing all such marks and inactive genes lacking such marks. These modifications are restricted to the transcribed region, and the degree of the euchromatic histone marks correlates with transcript abundance, suggesting a process of chromatin modification that is intrinsically coupled to transcription. Regardless of the basis of these marks, however, genes with these histone modifications exist in neighborhoods of two to three, similar to coexpressed gene clusters of the same size. A genomic analysis of repressive markers such as heterochromatin protein 1 (HP1) has revealed a similar association with sets of coregulated genes (22). Therefore, the organization of the genome into expression hubs enriched (or depleted) in such modifications and chromatin-binding proteins may facilitate the formation of nuclear subcompartments.

Dynamic Domains

The nucleus is a dynamic organelle, embodied in its great Houdini act of mitosis, in which we see the nucleus duplicate its chromosomal content, disappear, and then reappear in two places at once. In parallel, the protein content of the interphase nucleus is in constant flux (23). Fluorescence recovery after photobleaching (FRAP) studies have shown that regulatory proteins (such as transcription factors) have rapid diffusion rates, as do structural components of the nucleus. HP1, for example, colocalizes with constitutive heterochromatin and had long been thought to statically associate at these structures during interphase. However, FRAP analysis indicates that most of the HP1 is in fact highly diffusible (24).

This dynamic nature of the nucleus seems to belie the presence of nuclear bodies (such as Cajal bodies, interchromatin granule clusters, PML bodies, etc.). But these bodies are themselves dynamic (their integral components are constantly exchanged), and their structure in the nucleus is inseparable from their function: If you disturb the structure of a body you abrogate its function, and vice versa (2). Many genes, such as the β-globin and Ig loci, are also localized to specific nuclear compartments according to their state of transcriptional activity. Moreover, recent studies suggest that loci have the ability to mobilize to specific locations within the interphase nucleus (25). Although the extent to which loci can sample the nuclear environment appears to differ among organisms, evidence indicates that genes have the capacity to move beyond their segregation during mitosis.

During interphase, chromosomes are maintained as discrete structures termed chromosome territories (CTs). The majority of evidence indicates that genes are preferentially positioned at the territory surface, including contours and invaginations, of their respective chromosomes (2). Tandem gene arrays that are highly transcribed have been shown to loop away from their territories, as identified with whole-chromosome paints in FISH analysis. Territory looping may indicate an association of the looped gene array with the nuclear bodies that are involved in transcription and are found in the interchromatin compartment that runs among the CTs. The MHC, for example, has recently been shown to localize near PML bodies and is looped from its CT when being transcribed (26, 27). For the vertebrate Hox cluster, territory looping has also been correlated with the sequential activation of particular genes (28). Studies of wild-type and mutant β-globin loci have helped to clarify the role of transcriptional activity in territory looping (29). In erythroid cells, the β-globin locus is looped away from its CT even before transcriptional induction. However, in the absence of the LCR, which is required for high-level globin gene transcription, the locus is positioned at the CT surface. Furthermore, if the β-globin LCR is replaced by elements from a B cell–specific LCR that represses transcription of reporters in erythroid cells, looping is partially restored but is now correlated with localization of the looped locus to centromeric heterochromatin of another CT. These results indicate that territory looping is not simply a consequence of activity, but may play an important role in cell type–specific transcriptional activation or repression of a locus.

Although the basis for looping of a locus from its chromosome territory has yet to be established, the β-globin locus may offer some general clues. The β-globin LCR contains several binding sites for small Maf proteins, and, as described above, the MafK/p45 heterodimer is essential for relocation and activation of the locus during differentiation. Interestingly, addition of the transactivation domain of p45 is sufficient to relocate MafK from heterochromatic to euchromatic nuclear compartments (10). Because the p45 activation domain interacts with coactivator complexes that also bind other erythroid-specific genes, it is tempting to speculate that such complexes comprise expression hubs to which erythroid-specific genes may relocate (via mass action, for example). A similar scenario is possible for repression of the locus by MafK/Bach1-recruited corepressor complexes before erythroid differentiation and in nonerythroid cells. Moreover, the on-off rates of chromatin-associated proteins, including the small Maf heterodimers, are likely influenced by the pattern of histone modifications (30).

The observations highlighted above have led many to suggest that nuclear organization may be cell type–specific: The topological organization of the interphase nucleus may be involved in the establishment and propagation of cell-specific patterns of gene expression. Dynamic domains comprising both tandem gene arrays and gene clusters may represent the means by which a nucleus is configured. Given that a transcriptome includes hundreds of genes, it may be expedient for transcriptional regulation to have these domains compartmentalized together within the nucleus. Coregulated gene clusters (including the two alleles of the same cluster) may therefore be proximally positioned to facilitate their regulation through the creation of expression hubs with shared concentrations of regulatory proteins (Fig. 1). The combined effect of the association of domains from throughout the genome would yield a cell-specific nuclear topology, resulting in definable patterns of chromosome organization.

Fig. 1.

What is the underlying nuclear organization of gene activity? At left is a spectral karyotype (SKY) image of an interphase nucleus from a hematopoietic progenitor. SKY allows for the simultaneous visualization of all chromosomes and is primarily applied to metaphase chromosomes to detect abnormalities. Growing evidence supports the idea that a nuclear topology exists that ensures the transcriptional program (or transcriptome) that gives rise to or maintains a given cell type. Therefore, the SKY image may represent the appropriate nuclear topology for the progenitor's transcriptome. However, the actual chromosomal organization of an entire genome has yet to be established. Nonetheless, tandem gene arrays and the linear clustering of genes that are coexpressed and dedicated to a particular cell fate argue that spatial proximity is important in coordinate gene regulation. Given the evidence for chromatin mobility and the looping of gene loci from chromosome territories, it is an attractive possibility to consider that coregulated gene clusters from different genomic regions may also be proximal in the nuclear volume, as depicted in the theoretical magnification at right. The regulation of these localized gene clusters may take advantage of protein concentrations (or may be the basis for them), as exemplified by the various types of nuclear bodies found in the nucleus.

Recent evidence has indicated that the chromosomal organization within a nucleus is maintained upon cell division. FRAP analysis with histone–fluorescent protein fusions has revealed that chromosomes appear to remain in their relative nuclear positions (31). It is therefore possible the nucleus of a particular cell type does have its genome specifically organized for the expression of its relevant transcriptome. The spectral karyotype (SKY, which allows the simultaneous detection of all chromosomes) (32) of an interphase nucleus in Fig. 1 would therefore represent the topology that ensures the overall regulation of that cell type. Evidence supports the idea that chromosomes have specific positions within the nucleus. For example, gene-dense and gene-poor chromosomes have been shown to preferentially localize to the nuclear center and periphery, respectively (33). Additionally, a recent study focusing on a subset of chromosomes has shown that lineage-specific associations of certain chromosomes can occur (34). What remains to be established, however, is the organization of an entire genome at the level of the chromosome. Furthermore, whether a given nuclear topology changes upon cell differentiation has yet to be demonstrated.

With the idea of cell-specific nuclear topology, we have clearly moved beyond the realm in which bacteriophage λ can lead by example. The dynamic regulation of hundreds to thousands of genes requires a level of coordination unnecessary for a simple phage. Still, it is important to remember that the functions of the nucleus, such as transcription, are intertwined with its structure. If the principles of eukaryotic gene regulation find basic parallels with a less complicated example, then there is hope that these principles, aided by the merging of technologies such as SKY and 3D microscopy, may allow us to fully appreciate the dynamic organization of a genome within its nucleus.

References and Notes

View Abstract

Stay Connected to Science

Navigate This Article