To convert the encoded genetic information from eukaryotic DNA into proteins, base sequences of genes are first transcribed into RNA by RNA polymerase II. To produce functional RNA molecules, dozens of accessory factors are needed to define the proper locations for RNA polymerase II to begin and end transcription. Although we have some basic knowledge about how these factors work, it is still not possible to take a eukaryotic genome sequence and accurately predict what RNA species it will produce. Recent efforts to map and sequence “transcriptomes” have only increased the challenge by revealing a much more complex set of RNAs than expected, including many that do not produce proteins. The latest surprise, described in four papers in this issue (1-4), is a new class of transcripts that initiate near the expected transcription start sites upstream of protein-encoding sequences. However, these RNAs are short, present at low abundance, and often occur in the direction opposite to that of the protein-coding region (see the figure). It remains to be seen whether these RNAs have a function, but their existence challenges our simplistic models about how the DNA sequences known as “promoters” define transcription start sites.
In current textbook models, promoters comprise two interacting parts. Basal promoter elements bind accessory transcription initiation factors that position RNA polymerase II in the right place and direction. Enhancer elements bind regulatory factors that specify the physiological conditions or cell types where the gene will be expressed. The enhancer and basal promoter complexes functionally and physically interact to determine how often an RNA transcript is produced. The problem with this model is that enhancers can work over large distances of DNA in both directions, whereas basal elements are made up of short, low-complexity sequences (such as the TATA element) that appear frequently by chance. Transcription would likely initiate promiscuously throughout the genome without some way to suppress most transcription start sites.
The major mechanism for suppressing widespread transcription is to sequester potential transcription start sites by wrapping most of the genome in nucleosomes, subunits of chromatin in which DNA is coiled around histone proteins (5). Indeed, “real” transcription start sites are typically found in nucleosome-free regions generated by DNA sequences intrinsically resistant to nucleosome wrapping or by targeted modification and removal of nucleosomes to expose the underlying promoter sequences (6). Because transcription by RNA polymerase II through protein-coding sequences requires nucleosome displacement, eukaryotic cells even have mechanisms to quickly replace repressive nucleosomes in the wake of RNA polymerase II. Loss of this repression leads to cryptic RNAs initiating within transcribed regions (7). Therefore, functional eukaryotic promoters not only must attract RNA polymerase II, but also evade nucleosomal repression.
So what are we to make of these new, promoter-associated transcripts? The simplest view is that they arise from random, weak basal promoter elements that escape suppression. This idea is supported by the fact that the new RNAs are largely derived from DNA in nucleosome-free regions. Interestingly, Preker et al. (3) show that noneukaryotic DNA placed next to a strong mammalian promoter will also produce short divergent transcripts, indicating minimal sequence specificity. He et al. (2) also report anti-sense transcription—the synthesis of RNA complementary to the protein-coding RNA—initiating near 3′ ends of genes, another region that often has reduced nucleosome occupancy.
Although “backward” transcripts are less abundant than the “forward” (coding, full-length) transcripts produced by nearby transcription start sites, a clear correlation in their expression suggests that both types of RNAs respond to the same inducers of gene expression. Interestingly, the new papers find similar amounts of RNA polymerase II associated with the upstream and downstream transcription start sites. The mystery is why RNA polymerase II traveling in one direction can produce RNAs thousands of nucleotides long, whereas polymerases moving in the opposite direction don't get very far. However, as noted by Core et al. (1), even at many of the transcription start sites that produce full-length RNAs, much of the RNA polymerase II that starts at the promoter does not effectively make it to the end of the gene. The RNA polymerase II molecules that accumulate near promoters are often referred to as “paused,” but it is unclear whether they await a positive signal to continue elongating (synthesizing RNA) or instead quickly terminate transcription soon after initiation. In either case, it appears that a rate-limiting step is escape of RNA polymerase II into processive, long-range elongation.
Even if the short promoter-associated RNAs simply result from incomplete suppression of cryptic initiation, it would be a mistake to assume that there is no associated function. The RNAs produced from these unanticipated transcription start sites may have some undiscovered role, but it is perhaps more likely that the act of transcription itself affects expression of the nearby gene. As suggested by several of the new papers, this could be mediated by transcription-coupled changes in the DNA topology or local chromatin structure. Recent studies in yeast have uncovered interesting regulatory relationships between closely spaced transcription start sites. In the case of the SER3 gene, RNA polymerase II initiating at an upstream transcription start site reads through the SER3 promoter to repress synthesis of the full-length messenger RNA (mRNA) (8). At several genes involved in nucleotide biosynthesis, the concentration of available nucleotides influences the choice between several possible transcription start sites. Although a subset of these is used when the cell needs full-length mRNA, the other transcription start sites produce short noncoding transcripts (9). In yeast, these and other cryptic unstable transcripts use an alternative transcription termination pathway that preferentially acts during early elongation (10, 11). This pathway targets cryptic unstable transcripts for rapid degradation by a complex of nucleases called the exosome. Interestingly, Preker et al. report that the mammalian promoter-associated RNAs are also exosome substrates, contributing to their lower levels.
As complete transcriptomes of cells are cataloged at increasingly finer levels of detail, the hope is that we will be able to discern the rules that determine where RNAs are made and how they are processed. However, we should remain open to the idea that expression of the genome may be rather sloppy, with many (perhaps even most) (12) initiation events generating nonproductive transcripts that are rapidly degraded. This “noise” could provide abundant raw material for evolution. A cryptic transcription start site upstream of the “correct” initiation site might produce an RNA with additional protein-coding sequence or altered translation efficiency. A minor transcription start site within a gene could produce a truncated protein variant that is targeted to a different subcellular location. If any of these events provide some selective advantage, over the course of time, the cryptic transcription start site could become an alternative one and eventually the real transcription start site. If a low amount of bidirectional transcription around transcription start sites were harmful, cells probably would have evolved additional mechanisms for further suppression. The prevalent nature of the short promoter-associated transcripts suggests that their synthesis may serve some functional role, but this remains to be proven.