Special Viewpoints

An Expanding Universe of Noncoding RNAs

See allHide authors and affiliations

Science  17 May 2002:
Vol. 296, Issue 5571, pp. 1260-1263
DOI: 10.1126/science.1072249


Noncoding RNAs (ncRNAs) have been found to have roles in a great variety of processes, including transcriptional regulation, chromosome replication, RNA processing and modification, messenger RNA stability and translation, and even protein degradation and translocation. Recent studies indicate that ncRNAs are far more abundant and important than initially imagined. These findings raise several fundamental questions: How many ncRNAs are encoded by a genome? Given the absence of a diagnostic open reading frame, how can these genes be identified? How can all the functions of ncRNAs be elucidated?

Over the years, a number of RNAs that do not function as messenger RNAs (mRNAs), transfer RNAs (tRNAs), or ribosomal RNAs (rRNAs) have been discovered, mostly fortuitously. The non-mRNAs have been given a variety of names (1, 2); the term small RNAs (sRNAs) has been predominant in bacteria, whereas the term noncoding RNAs (ncRNAs) has been predominant in eukaryotes and will be used here. ncRNAs range in size from 21 to 25 nt for the large family of microRNAs (miRNAs) that modulate development in Caenorhabditis elegans,Drosophila, and mammals (3–8), up to ∼100 to 200 nt for sRNAs commonly found as translational regulators in bacterial cells (9, 10) and to >10,000 nt for RNAs involved in gene silencing in higher eukaryotes (11–13). The functions described for ncRNAs thus far are extremely varied (Table 1).

Table 1

Processes affected by ncRNAs.

View this table:

Some ncRNAs affect transcription and chromosome structure. TheEscherichia coli 6S RNA binds to the bacterial σ70 holoenzyme and modulates promoter use (14), and the human 7SK RNA binds and inhibits the transcription elongation factor P-TEFb (15, 16). Another human ncRNA, SRA RNA, was identified as interacting with progestin steroid hormone receptor and may serve as a coactivator of transcription (17). Several extremely long ncRNAs detected in insect and mammalian cells have been implicated in silencing genes and changing chromatin structure across large chromosomal regions (11–13). Examples include the humanXist RNA required for X chromosome inactivation and mouseAir RNA required for autosomal gene imprinting. TheXist RNA is produced by the inactive X chromosome and spreads in cis along the chromosome (13). The chromosome-associated RNA has been proposed to recruit proteins that affect chromatin structure; however, much remains to be learned about the mechanism by which Xist and other long ncRNAs establish and/or maintain gene silencing. Another eukaryote-specific RNA that is required for proper chromosome replication and structure is the telomerase RNA. This ncRNA is an integral part of the telomerase enzyme and serves as the template for the synthesis of the chromosome ends (18).

ncRNAs play roles in RNA processing and modification. The catalytic ribonuclease P (RNase P) RNA, found in organisms from all kingdoms, is responsible for processing the 5′ end of precursor tRNAs and some rRNAs (19). In eukaryotes, small nuclear RNAs (snRNAs) are central to splicing of pre-mRNAs (20), and small nucleolar RNAs (snoRNAs) direct the 2′-O-ribose methylation (C/D-box type) and pseudouridylation (H/ACA-box type) of rRNA, tRNA, and ncRNAs by forming base pairs with sequences near the sites to be modified (21). Homologs of the two classes of snoRNAs have been found in archaea (22); however, counterparts have not yet been identified in bacteria, even though the rRNAs are modified. The less ubiquitous guide RNAs (gRNAs) present in kinetoplasts direct the insertion or deletion of uridine residues into mRNA (RNA editing) by mechanisms that involve base-pairing as well (23,24).

ncRNAs also regulate mRNA stability and translation. The first discovered miRNAs, C. elegans lin-4 andlet-7, repress translation by forming base pairs with the 3′ end of target mRNAs (7, 8). Many of the recently identified miRNAs are likely to act in a similar fashion. However, it is conceivable that some members of this large family target mRNAs for degradation, as is the case for the similarly sized small interfering RNAs (siRNAs) that are processed and amplified from exogenously added, double-stranded RNA and lead to gene suppression in a process termed RNA interference (25, 26). As yet there is no evidence for miRNAs in bacteria, archaea, or fungi, but it might be fruitful to search for RNAs of <25 nt in these organisms. Several ncRNAs have been found to regulate translation and possibly mRNA stability in E. coli (9, 10, 27). These sRNAs form base pairs at various positions with their target mRNAs, and they have been shown to repress translation by occluding the ribosome binding site and to activate translation by preventing the formation of inhibitory mRNA structures.

Finally, ncRNAs affect protein stability and transport. One unique bacterial sRNA is recognized as both a tRNA and an mRNA by stalled ribosomes (tmRNA) (28). Alanylated tmRNA is delivered to the A site of a stalled ribosome; the nascent polypeptide is transferred to the alanine-charged tRNA portion of tmRNA. The problematic transcript then is replaced by the mRNA portion of tmRNA, which encodes a tag for degradation of the stalled peptide. It is not yet clear whether there is a counterpart to this coding RNA in archaeal and eukaryotic cells. In contrast, a small cytoplasmic RNA that forms the core of the signal recognition particle (SRP) required for protein translocation across membranes is found in organisms from all kingdoms (29).

The mechanisms of action for the characterized ncRNAs can be grouped into several general categories (Fig. 1). There are ncRNAs where base-pairing (often <10 base pairs and discontinuous) with another RNA or DNA molecule is central to function. The snoRNAs that direct RNA modification, the bacterial RNAs that modulate translation by forming base pairs with specific target mRNAs, and probably most of the miRNAs are examples of this category. Some ncRNAs mimic the structures of other nucleic acids; the 6S RNA structure is reminiscent of an open bacterial promoter, and the tmRNA has features of both tRNAs and mRNAs. Other ncRNAs, such as the RNase P RNA, have catalytic functions. Although synthetic RNAs have been selected to have a variety of biochemical functions, the number of natural ncRNAs shown to have catalytic function is limited. Most, if not all, ncRNAs are associated with proteins that augment their functions; however, some ncRNAs, such as the snRNAs and the SRP RNA, serve key structural roles in RNA-protein complexes. Several ncRNAs fit into more than one mechanistic category; the telomerase RNA provides the base-pairing template for telomere synthesis and is an integral part of the telomerase ribonucleoprotein complex. The mechanisms of action for a number of ncRNAs (such as the 7SK RNA) are not known, and it is probable that some ncRNAs act in ways that have not yet been established. Some investigators have suggested that many ncRNAs are vestiges of a world in which RNA carried out all of the functions in a primitive cell. However, given the versatility of RNA and the fact that the properties of RNA provide advantages over peptides for some mechanisms, it is likely that a number of ncRNAs have evolved more recently (30, 31).

Figure 1

Different mechanisms of ncRNA (red) action. (A) Direct base-pairing with target RNA or DNA molecules is central to the function of some ncRNAs: Eukaryotic snoRNAs direct nucleotide modifications (green star) by forming base pairs with flanking sequences, and the E. coli OxyS RNA represses translation by forming base pairs with the Shine-Dalgarno sequence (green box) and occluding ribosome binding. (B) Some ncRNAs mimic the structure of other nucleic acids: Bacterial RNA polymerase may recognize the 6S RNA as an open promoter, and bacterial ribosomes recognize tmRNA as both a tRNA and an mRNA. (C) ncRNAs also can function as an integral part of a larger RNA-protein complex, such as the signal recognition particle, whose structure has been partially determined (49).

How Many ncRNAs Exist?

The first ncRNAs were identified in the 1960s on the basis of their high expression; these RNAs were detected by direct labeling and separation on polyacrylamide gels. Others were later found by subfractionation of nuclear extracts or by association with specific proteins. A few were identified by mutations or phenotypes resulting from overexpression. The serendipitous discoveries of many of these ncRNAs were the first glimpses of their existence, but this work did not presage the vast numbers that appear to be encoded by a genome.

Several systematic searches for ncRNA genes have been carried out in the past 4 years. Among the computation-based searches, there have been screens of the yeast Saccharomyces cerevisiae and archaealPyrococcus genomes for the short conserved motifs present in snoRNAs (32, 33). In other searches, the intergenic regions of S. cerevisiae, E. coli, Methanococcus jannaschii, and Pyrococcus furiosus chromosomes have been scanned for properties indicative of an ncRNA gene. Criteria for identifying candidate intergenic regions have included large gaps between protein-coding genes (34), extended stretches of conservation between species with the same gene order (35, 36), orphan promoter or terminator sequences (34, 36, 37), presence of GC-rich regions in an organisms with a high AT content (38), and conserved RNA secondary structures (39, 40). Other searches for ncRNAs have involved large-scale cloning efforts that have taken into account specific ncRNA properties. In studies of mouse (41, 42) and the archaeon Archaeoglobus fulgidus (22), total RNA between 50 to 500 nt was isolated, and arrays of cDNA clones obtained from the RNA were screened with oligonucleotides corresponding to the most abundant known RNAs. Clones showing the lowest hybridization signal then were randomly sequenced. In recent screens for C. elegans, Drosophila, and human miRNAs, RNA molecules of less than 30 nt were isolated, and cDNA clones were generated upon the ligation of primers to the 5′ and 3′ ends of the RNA (3, 4) or upon RNA tailing (5). Other miRNAs were isolated and cloned on the basis of their association with a complex composed of the human Gemin3, Gemin4, and IF2C proteins (6). In most studies, Northern blots have been carried out to confirm that the cloned genes are expressed as small transcripts. These blots also have provided information about spatial and temporal expression patterns as well as potential precursor and degradation products.

Despite the success of the recent systematic efforts, it is certain that not all ncRNAs have been detected. Estimates for the number of sRNAs in E. coli range from 50 to 200 (1, 35), and estimates for the number of miRNAs in C. elegans range from hundreds to thousands (7). There also are many non–protein-coding regions of the bacterial and eukaryotic chromosomes for which transcription is detected (43, 44), but it is not known how many of these regions encode defined, functional ncRNAs. Extensions of the various systematic searches should lead to the identification of more ncRNAs. However, limitations of the current approaches should be noted. Most of the computation methods have focused on the intergenic regions. It has already been shown that some of the ncRNAs are processed from longer protein- or rRNA-encoding transcripts (42). It also is quite possible that ncRNAs are expressed from the opposite strand of protein-coding genes. On the other hand, expression-based methods may miss ncRNAs that are synthesized under very defined conditions, such as in response to a specific environmental signal, during a specific stage in development, or in a specific cell type. Much attention has been focused on characterizing the “proteome” of a sequenced organism. The recent discovery of hundreds of new ncRNAs illustrates that the “RNome” also will need to be characterized before a complete tally of the number of genes encoded by a genome can be achieved.

What Are All the Functions of ncRNAs?

An astonishing variety of ncRNA functions have already been found, but there are many ncRNAs for which the cellular roles are still unknown. For instance, Y RNAs, small cytoplasmic RNAs associated with the Ro autoantigen in several different organisms, are still enigmatic even after many years of study (45). With the more systematic identification of increasing numbers of ncRNAs, the question of how to elucidate the functions of all ncRNAs is becoming more and more prominent.

Approaches that have succeeded previously are an obvious place to start in answering the question of function, but it is likely that new approaches also will need to be developed. For genetically tractable organisms, ncRNA knockout or overexpression strains can be screened for differences in phenotypes (such as viability) or whole- genome expression patterns. The functions of several ncRNAs were identified by the biochemical identification of associated proteins, and the development of more systematic methods for characterizing ncRNA-associated proteins should be fruitful. As the knowledge base of what sequences are critical for the formation of specific structures or for base-pairing expands, and as computer programs for predicting structures improve, computational approaches should become an increasingly important avenue for elucidating the functions of ncRNAs. The three-dimensional structures of only a limited number of RNAs and RNA-protein complexes have been solved. An increase in the structural database may bring to light recognizable RNA or RNA-protein domains associated with specific functions.

Information about when ncRNAs are expressed and where ncRNAs are localized is useful for all experiments aimed at probing function. Many of the C. elegans miRNAs are synthesized only at very specific times in development, and thus they have also been called small temporal RNAs (stRNAs). Among the snoRNAs, some are expressed exclusively in the brain (41), and one of the bacterial sRNAs is only detected upon oxidative stress (9, 10). It is likely that other ncRNAs will be found to have very defined expression and localization patterns and that these will be critical to function.

There are many more ncRNAs than was ever suspected. A big challenge for the future will be to identify the whole complement of ncRNAs and to elucidate their functions. This is an exciting time for investigators whose work has focused on ncRNAs. However, scientists studying all aspects of biology should keep ncRNAs in mind. The phenotypes associated with specific mutations may be due to defects in a ncRNA instead of being due to defects in a protein, as is usually expected. Investigators developing purification schemes for specific proteins or activities should be aware of the possible presence of an ncRNA component; many purification procedures are designed to remove nucleic acids. There may be ncRNAs lurking behind many an unexplained phenomenon.


View Abstract

Stay Connected to Science

Navigate This Article