Special Perspectives

The Eukaryotic Genome as an RNA Machine

See allHide authors and affiliations

Science  28 Mar 2008:
Vol. 319, Issue 5871, pp. 1787-1789
DOI: 10.1126/science.1155472


The past few years have revealed that the genomes of all studied eukaryotes are almost entirely transcribed, generating an enormous number of non–protein-coding RNAs (ncRNAs). In parallel, it is increasingly evident that many of these RNAs have regulatory functions. Here, we highlight recent advances that illustrate the diversity of ncRNA control of genome dynamics, cell biology, and developmental programming.

RNAs are an integral component of chromosomes and contribute to their structural organization (1, 2). It is now becoming apparent that chromatin architecture and epigenetic memory are regulated by RNA-directed processes that, although the exact mechanisms are yet to be understood, involve the recruitment of histone-modifying complexes and DNA methyltransferases to specific loci (3). Whereas long non–protein-coding RNAs (ncRNAs) have been classically implicated in the regulation of dosage compensation and genomic imprinting in animals (4), they seem to play a much broader role in the epigenetic control of developmental trajectories (3). For example, it was recently shown that 231 long ncRNAs associated with human HOX gene clusters are colinearly expressed along developmental axes (5), one of which, termed HOTAIR, transcribed from the HOXC locus, was studied in detail and found to recruit Polycomb complexes to repress gene expression in trans at the HOXD cluster (5) (Fig. 1). Other ncRNAs will likely perform similar functions, such as the intergenic transcripts from globin and antigen receptor loci, which have been associated with complex epigenetic phenomena (6, 7).

Fig. 1.

Recent examples of the various levels of regulation of eukaryotic gene expression and cell biology by ncRNAs. dsRNA, double-stranded RNA; HMT, histone methyltransferases; HP1, heterochromatin protein 1; PARs, promoter-associated RNAs; PcG, Polycomb group proteins; RISC, RNA-induced silencing complex; RITS, RNA-induced initiation of transcriptional gene silencing; siRNA, small interfering RNA; TFIIB, transcription factor IIB; and UCE, ultraconserved element. See text for details, other acronyms, and references. For additional examples, see (3, 58).

Small ncRNAs have been consistently linked with heterochromatin formation via the RNA interference (RNAi) pathway (8), including Piwi-interacting RNAs (piRNAs) (9), which guide PIWI family proteins to control transposon activity from flies to vertebrates (10). However, piRNAs might also regulate euchromatin formation, given that PIWI is required for establishing euchromatin in certain subtelomeric regions in Drosophila (11).

Higher-level nuclear organization and chromosome dynamics are also regulated by ncRNAs in a variety of systems. For example, the formation of the kinetochore and centromeric heterochromatin in fission yeast is dependent on cell cycle–regulated centromeric repeat–derived RNAs and the RNAi pathway, whereas kinetochore assembly and chromosome segregation require the ribonuclease activity of a component of the exosome (1215). These findings reveal an RNA-based mechanistic link between these processes in mitosis. In Tetrahymena, RNAs direct heterochromatin formation and DNA elimination via RNAi-dependent recruitment of Polycomb complexes and histone methylation (16). The RNAi pathway along with directed histone modifications also regulates the organization of the nucleolus in Drosophila (17).

Likewise, long ncRNAs direct programmed whole-genome rearrangements during ciliate differentiation (18). In mammals, transcription of long ncRNAs contributes to various processes including T cell receptor recombination (7), maintenance of telomeres (19, 20), X-chromosome pairing required for dosage compensation (21) and inactive X-chromosome perinucleolar localization (22).

The functional organization of chromatin can also be regulated by ncRNAs derived from repetitive elements. In mice, bidirectional transcription of a retrotransposed SINE B2 sequence by RNA polymerase (RNAP)II and RNAPIII relocates the associated growth hormone locus into nuclear compartments and locally defines the heterochromatin-euchromatin boundary, regulating the expression of the gene during organogenesis (23) (Fig. 1). Given the abundance of transcribed repetitive sequences, this may represent a genome-wide strategy for the control of chromatin domains that may be conserved throughout eukaryotes (2325). Moreover, such observations and others suggest that a large portion of the genome may, in fact, be functionally active and that transposon-derived sequences may not be reliable indices of the rate of neutral evolution (26).


Noncoding RNAs can regulate transcription by interacting with transcription factors, RNAP, or DNA itself (Fig. 1). The small double-stranded RNA NRSE directs the activation of neuronal genes containing the upstream conserved NRSE sequence by triggering the REST transcriptional machinery (27), components of which are also targets of microRNAs (miRNAs) (28). Analogously, the ncRNA Evf-2 is transcribed from an ultraconserved enhancer associated with the Dlx-5/6 locus and interacts with Dlx-2 transcription factor to induce enhancer activity (29). Indeed, it appears that most ultraconserved elements in the human genome (sequences that, in many cases, act as enhancers and have remained essentially unchanged throughout amniote evolution) (30) are transcribed in a regulated manner whose aberrant expression may be involved in pathological processes such as cancer (31).

In addition, promoter-directed sequence-specific RNAs, including miRNAs (32), have been shown to induce (32, 33) or repress transcription (34, 35), the latter involving epigenetic modification and targeting of promoter-associated low-copy RNA (34, 35) [in one case by triplex formation (35), which appears common in human chromatin (36)]. The abundance of promoter-associated transcripts in the human (37, 38), Arabidopsis (39), and yeast (40) genomes suggests that such examples may represent widespread phenomena.

RNAPIII can also produce various types of regulatory RNAs. In mice, heat shock induces RNAPIII transcription of a B2 SINE element producing a structured modular RNA that represses RNAPII in trans at specific loci (41). These same properties and function in human cells under heat shock were identified for ncRNAs derived from Alu elements, primate-specific SINEs that comprise 10.5% of our genome, and show a striking convergent evolution of SINE function as trans-repressors of gene expression (42).

RNA Processing and Translation

MiRNAs regulate a wide range of processes in animals and plants by directing translational repression or degradation of mRNAs (Fig. 1), with consequent effects on global regulatory circuits (28). However, miRNAs can also promote activation of gene expression under conditions such as stress, depending on the association with regulatory factors (43), and even activate translation during cell cycle arrest (44).

The formation and target specificity of miRNAs can be modulated by tissue-specific A-I editing (45) and RNA binding proteins (46, 47). Furthermore, miRNAs may be differentially processed from both sense and antisense strands of the same hairpin or from sense and antisense transcripts from the same locus (48, 49), which expands the potential of a single genomic locus to generate multiple miRNAs with different targets.

In addition, many miRNAs derive from introns of protein-coding genes, in some cases by splicing rather than the canonical Drosha pathway (5052), thus, along with small nucleolar RNAs (snoRNAs), making up intronic parallel outputs of gene expression (3), which may be much more extensive than expected.

Other classes of ncRNAs can also act post-transcriptionally. In a recent report, a number of RNAPIII-transcribed short ncRNAs were found to have sequence complementarity to protein-coding genes and provided evidence for the existence of a sense-antisense–based regulatory network wherein RNAPIII transcripts control their RNAPII counterparts (Fig. 1) (53).

A World of Noncoding RNAs

The examples above provide proof-of-principle that RNA can regulate gene expression at many levels and by using a wide array of mechanisms (Fig. 1). The ENCODE project showed that at least 93% of analyzed human genome nucleotides are transcribed in different cells (54), with similar findings in mouse (55) and other eukaryotes, which indicate that there may be a vast reservoir of biologically meaningful RNAs that could greatly exceed the ∼1.2% encoding proteins. A fraction of RNAs with short open reading frames (ORFs) potentially encodes peptides (56) but on the other side of the ledger many currently annotated ORFs are not conserved and may be false, which could reduce the number of protein-coding genes in the human genome (57).

There has been debate about whether these ncRNAs are (in the main) functional or simply noise. In some cases, it may be the transcript or merely the act of transcription, or both (35), that are relevant. Nevertheless, many observations indicate that substantial numbers of ncRNAs are intrinsically functional. These include the fact that many loci produce spliced (and alternatively spliced) transcripts that are developmentally regulated (5, 55, 58) and that at least some antisense and intergenic ncRNAs can function in trans (5, 41, 53, 59). A large fraction of ncRNAs are expressed in specific regions of the brain, exhibiting precise cellular locations (60). Some mark new domains within the cell (61, 62), which means that ncRNAs are also set to have a major impact in cell biology.

Comparative analyses indicate that ncRNA promoters are, on average, more conserved than those of protein-coding genes (55) and that ncRNA sequences, secondary structures, and splice site motifs have been subject to purifying selection (63, 64). Moreover, many ncRNAs are evolving quickly (65), and some have undergone recent positive selection, as exemplified by HAR1 RNA expressed in the human brain, which contains the sequence conserved in mammals that most rapidly diverged after the human-chimpanzee separation (66).

Given the functional versatility of RNAs, it is plausible that ncRNAs have represented a rich substrate for evolutionary innovations in eukaryotes. In support of this idea, regulatory RNAs are centrally involved in the ontogeny of many organisms, from unique developmental pathways in protozoa (18) to the control of conserved or clade-specific developmental regulators in multicellular animals (5, 52). In addition, there is mounting evidence that RNA can transmit information intergenerationally, so as to mediate non-Mendelian inheritance of epigenetic changes in mice (67) and plants (68).

Although the need for large-scale approaches to explore the function of ncRNAs is evident, a glance at the genome browser will show noncoding expressed sequence tags associated with most genes of interest that may have regulatory functions. As an example, a recent study of ncRNAs associated with tumor suppressor genes focused on an RNA antisense to the p15 gene and showed that it acts to alter histone methylation to silence expression of this gene, with important implications for cell differentiation and tumorigenesis (59).

Indeed, ncRNAs are already being identified as markers for cancer (69, 70) and associated with other complex diseases such as coronary disease and diabetes (71, 72). The elucidation of their function may significantly contribute to the understanding and treatment of such conditions. It may also transform our understanding of the genetic programming of multicellular organisms, particularly as it appears that regulation dominates the information content of complex systems (3).

References and Notes

View Abstract

Navigate This Article