An Abundant Class of Tiny RNAs with Probable Regulatory Roles in Caenorhabditis elegans

See allHide authors and affiliations

Science  26 Oct 2001:
Vol. 294, Issue 5543, pp. 858-862
DOI: 10.1126/science.1065062


Two small temporal RNAs (stRNAs), lin-4 andlet-7, control developmental timing in Caenorhabditis elegans. We find that these two regulatory RNAs are members of a large class of 21- to 24-nucleotide noncoding RNAs, called microRNAs (miRNAs). We report on 55 previously unknown miRNAs in C. elegans. The miRNAs have diverse expression patterns during development: a let-7 paralog is temporally coexpressed withlet-7; miRNAs encoded in a single genomic cluster are coexpressed during embryogenesis; and still other miRNAs are expressed constitutively throughout development. Potential orthologs of several of these miRNA genes were identified in Drosophila and human genomes. The abundance of these tiny RNAs, their expression patterns, and their evolutionary conservation imply that, as a class, miRNAs have broad regulatory functions in animals.

Two types of short RNAs, both about 21 to 25 nucleotides (21–25 nt) in length, serve as guide RNAs to direct posttranscriptional regulatory machinery to specific mRNA targets. Small temporal RNAs (stRNAs) control developmental timing inCaenorhabditis elegans (1–3). They pair to sites within the 3′ untranslated region (3′ UTR) of target mRNAs, causing translational repression of these mRNAs and triggering the transition to the next developmental stage (1–5). Small interfering RNAs (siRNAs), which direct mRNA cleavage during RNA interference (RNAi) and related processes, are the other type of short regulatory RNAs (6–12). Both stRNAs and siRNAs are generated by processes requiring Dicer, a multidomain protein with tandem ribonuclease III (RNase III) domains (13-15). Dicer cleaves within the double-stranded portion of precursor molecules to yield the 21–25 nt guide RNAs.

lin-4 and let-7 have been the only two stRNAs identified, and so the extent to which this type of small noncoding RNA normally regulates eukaryotic gene expression is only beginning to be understood (1–5). RNAi-related processes protect against viruses or mobile genetic elements, yet these processes are known to normally regulate only one other mRNA, that ofDrosophila Stellate (16–20). To investigate whether RNAs resembling stRNAs or siRNAs might play a more general role in gene regulation, we isolated and cloned endogenousC. elegans RNAs that have the expected features of Dicer products. Tuschl and colleagues showed that such a strategy is feasible when they fortuitously cloned endogenous Drosophila RNAs while cloning siRNAs processed from exogenous dsRNA in an embryo lysate (12). Furthermore, other efforts focusing on longer RNAs have recently uncovered many previously unknown noncoding RNAs (21, 22).

Dicer products, such as stRNAs and siRNAs, can be distinguished from most other oligonucleotides that might be present in C. elegans by three criteria: a length of about 22 nt, a 5′-terminal monophosphate, and a 3′-terminal hydroxyl group (12,13, 15). Accordingly, a procedure was developed for isolating and cloning C. elegans RNAs with these features (23). Of the clones sequenced, 330 matched C. elegans genomic sequence, including 10 representinglin-4 RNA and 1 representing let-7 RNA. Another 182 corresponded to the Escherichia coli genomic sequence.E. coli RNA clones were expected because the worms were cultured with E. coli as the primary food source.

Three hundred of the 330 C. elegans clones have the potential to pair with nearby genomic sequences to form fold-back structures resembling those thought to be needed for Dicer processing of lin-4 and let-7 stRNAs (Fig. 1) (24). These 300 clones with predicted fold-backs represent 54 unique sequences:lin-4, let-7, and 52 other RNAs (Table 1). Thus, lin-4 andlet-7 RNAs appear to be members of a larger class of noncoding RNAs that are about 20–24 nt in length and are processed from fold-back structures. We and the two other groups reporting in this issue of the journal refer to this class of tiny RNAs as microRNAs, abbreviated miRNAs, with individual miRNAs and their genes designated miR-# and mir-#, respectively (25,26).

Figure 1

Fold-back secondary structures involving miRNAs (red) and their flanking sequences (black), as predicted computationally using RNAfold (35). (A) miR-84, an miRNA with similarity to let-7 RNA. (B) miR-1, an miRNA highly conserved in evolution. (C) miR-56 and miR-56*, the only two miRNAs cloned from both sides of the same fold-back. (D) Themir-35–mir-41 cluster.

Table 1

miRNAs cloned from C. elegans. 300 RNA clones represented 54 different miRNAs. Also included are miR-39, miR-65, and miR-69, three miRNAs predicted based on homology and/or proximity to cloned miRNAs. miR-39 and miR-69 have been validated by Northern analysis (Fig. 3), whereas miR-65 is not sufficiently divergent to be readily distinguished by Northern analysis. AllC. elegans sequence analyses relied on WormBase, release WS45 (33). Some miRNAs were represented by clones of different lengths, due to heterogeneity at the miRNA 3′ terminus. The observed lengths are indicated, as is the sequence of the most abundant length. Comparison to C. briggsaeshotgun sequencing traces revealed miRNA orthologs with 100% sequence identity (+++) and potential orthologs with >90% (++) and >75% (+) sequence identity (24, 34). Five miRNA genomic clusters are indicated with square brackets. Naming of miRNAs was coordinated with the Tuschl and Ambros groups (25,26).

View this table:

We propose that most of the miRNAs are expressed from independent transcription units, previously unidentified because they do not contain an open reading frame (ORF) or other features required by current gene-recognition algorithms. No miRNAs matched a transcript validated by an annotated C. elegans expressed sequence tag (EST), and most were at least 1 kb from the nearest annotated sequences (Table 1). Even the miRNA genes near predicted coding regions or within predicted introns are probably expressed separately from the annotated genes. If most miRNAs were expressed from the same primary transcript as the predicted protein, their orientation would be predominantly the same as the predicted mRNA, but no such bias in orientation was observed (Table 1). Likewise, other types of RNA genes located within C. elegans intronic regions are usually expressed from independent transcription units (27).

Whereas both lin-4 and let-7 RNAs reside on the 5′ arm of their fold-back structures (1, 3), only about a quarter of the other miRNAs lie on the 5′ arm of their proposed fold-back structures, as exemplified by miR-84 (Table 1 and Fig. 1A). All the others are on the 3′ arm, as exemplified by miR-1 (Table 1 and Fig. 1B). This implies that the stable product of Dicer processing can reside on either arm of the precursor and that features of the miRNA or its precursor—other than the loop connecting the two arms—must determine which side of the fold-back contains the stable product.

When compared with the RNA fragments cloned from E. coli, the miRNAs had unique length and sequence features (Fig. 2). The E. coli fragments had a broad length distribution, ranging from 15–29 nt, which reflects the size-selection limits imposed during the cloning procedure (23). In contrast, the miRNAs had a much tighter length distribution, centering on 21–24 nt, coincident with the known specificity of Dicer processing (Fig. 2A). The miRNA sequence composition preferences were most striking at the 5′ end, where there was a strong preference for U and against G at the first position and then a deficiency of U at positions 2 through 4 (Fig. 2B). miRNAs were also generally deficient in C, except at position 4. These composition preferences were not present in the clones representing E. coli RNA fragments.

Figure 2

Unique sequence features of the miRNAs. (A) Length distribution of the clones representing E. coli RNA fragments (white bars) and C. elegans miRNAs (black bars). (B) Sequence composition of the unique clones representing C. elegans miRNAs and E. coli RNA fragments. The height of each letter is proportional to the frequency of the indicated nucleotide. Solid letters correspond to specific positions relative to the ends of the clones; outlined letters represent the aggregate composition of the interior of the clones. To avoid overrepresentation from groups of related miRNAs in this analysis, each set of paralogs was represented by its consensus sequence.

The expression of 20 cloned miRNAs was examined, and all but two (miR-41 and miR-68) were readily detected on Northern blots (Fig. 3). For these 18 miRNAs with detectable expression, the dominant form was the mature 20–24 nt fragment(s), though for most, a longer species was also detected at the mobility expected for the fold-back precursor RNA. Fold-back precursors for lin-4 andlet-7 have also been observed, particularly at the stage in development when the stRNA is first expressed (1,14, 15).

Figure 3

Expression of newly found miRNAs andlet-7 RNA during C. elegans development. Northern blots probed total RNA from mixed-stage worms (Mixed), worms staged as indicated, and glp-4 (bn2) adult worms (24). Specificity controls ruled out cross-hybridization among probes for miRNAs from the mir-35–mir-41 cluster (24). Other blots indicate that, miR-46 or -47, miR-56, miR-64 or -65, miR-66, and miR-80 are expressed constitutively throughout development (30).

Because the miRNAs resemble stRNAs, their temporal expression was examined. RNA from wild-type embryos, the four larval stages (L1 through L4), and young adults was probed. RNA from glp-4 (bn2) young adults, which are severely depleted in germ cells (28), was also probed because miRNAs might have critical functions in the germ line, as suggested by the finding that worms deficient in Dicer have germ line defects and are sterile (14, 29). Many miRNAs have intriguing expression patterns during development (Fig. 3). For example, the expression of miR-84, an miRNA with 77% sequence identity to let-7 RNA, was found to be indistinguishable from that of let-7 (Fig. 3). Thus, it is tempting to speculate that miR-84 is an stRNA that works in concert with let-7 RNA to control the larval-to-adult transition, an idea supported by the identification of plausible binding sites for miR-84 in the 3′ UTRs of appropriate heterochronic genes (30).

Nearly all of the miRNAs appear to have orthologs in other species, as would be expected if they had evolutionarily conserved regulatory roles. About 85% percent of the newly found miRNAs had recognizable homologs in the available C. briggsae genomic sequence, which at the time of our analysis included about 90% of the C. briggsae genome (Table 1). Over 40% of the miRNAs appeared to be identical in C. briggsae, as seen with thelin-4 and let-7 RNAs (1,3). Those miRNAs not absolutely conserved betweenC. briggsae and C. elegans might still have important functions, but they may have more readily co-varied with their target sites because, for instance, they might have fewer target sites. When the sequence of the miRNA differs from that of its homologs, there is usually a compensatory change in the other arm of the fold-back to maintain pairing, which provides phylogenetic evidence for the existence and importance of the fold-back secondary structures.let-7, but not lin-4, has discernable homologs in more distantly related organisms, including Drosophila and human (31). At least seven other miRNA genes (mir-1, mir-2, mir-34,mir-60, mir-72, mir-79, andmir-84) appear to be conserved inDrosophila, and most of these (mir-1,mir-34, mir-60, mir-72, andmir-84) appear to be conserved in humans (24). The most highly conserved miRNA found, miR-1, is expressed throughout C. elegans development (Fig. 3) and therefore is unlikely to control developmental timing but may control tissue-specific events.

The distribution of miRNA genes within the C. elegansgenome is not random (Table 1). For example, clones for six miRNA paralogs clustered within an 800–base pair (800-bp) fragment of chromosome II (Table 1). Computer folding readily identified the fold-back structures for the six cloned miRNAs of this cluster, and predicted the existence of a seventh paralog, miR-39 (Fig. 1D). Northern analysis confirmed the presence and expression of miR-39 (Fig. 3). The homologous cluster in C. briggsae appears to have eight related miRNAs. Some of the miRNAs in the C. eleganscluster are more similar to each other than to those of the C. briggsae cluster and vice versa, indicating that the size of the cluster has been quite dynamic over a short evolutionary interval, with expansion and perhaps also contraction since the divergence of these two species.

Northern analysis of the miRNAs of the mir-35–mir-41cluster showed that these miRNAs are highly expressed in the embryo and in young adults (with eggs), but not at other developmental stages (Fig. 3). For the six detectable miRNAs of this cluster, longer species with mobilities expected for the respective fold-back RNAs also appear to be expressed in the germ line; these longer RNAs were observed in wild-type L4 larvae (which have proliferating germ cells) but not in germ line–deficient mutant animals (Fig. 3) (30).

The close proximity of the miRNA genes within themir-35–mir-41 cluster (Fig. 1D) suggests that they are all transcribed and processed from a single precursor RNA, an idea supported by the coordinate expression of these genes (Fig. 3). This operon-like organization and expression brings to mind several potential models for miRNA action. For example, each miRNA of the operon might target a different member of a gene family for translational repression. At the other extreme, they all might converge on the same target, just as lin-4 and let-7 RNAs potentially converge on the 3′ UTR of lin-14 (3).

Another four clusters were identified among the sequenced miRNA clones (Table 1). Whereas the clones from one cluster were not homologous to clones from other clusters, the clones within each cluster were usually related to each other, as seen with themir-35–mir-41 cluster. The last miRNA of themir-42–mir-44 cluster is also represented by a second gene, mir-45, which is not part of the cluster. This second gene appears to enable more constitutive expression of this miRNA (miR-44/45) as compared with the first two genes of themir-42–mir-44 cluster, which are expressed predominantly in the embryo (Fig. 3).

Dicer processing of stRNAs differs from that of siRNAs in its asymmetry: RNA from only one arm of the fold-back precursor accumulates, whereas the remainder of the precursor quickly degrades (15). This asymmetry extends to nearly all the miRNAs. For the 35 miRNAs yielding more than one clone, RNAs were cloned from both arms of a hairpin in only one case, miR-56 (Fig. 1C and Table 1). The functional miRNA appears to be miR-56 and not miR-56*, as indicated by sequence conservation between C. elegans and C. briggsae orthologs, analogy to the other constituents of themir-54–mir-56 cluster, and Northern blots detecting RNA from only the 3′ arm of the fold-back (30).

We were surprised to find that few, if any, of the cloned RNAs had the features of siRNAs. No C. elegans clones matched the antisense of annotated coding regions. Of the 30 C. elegansclones not classified as miRNAs, 15 matched fragments of known RNA genes, such as transfer RNA (tRNA) and ribosomal RNA. Of the remaining 15 clones, the best candidate for a natural siRNA is GGAAAACGGGUUGAAAGGGA. It was the only C. elegans clone perfectly complementary to an annotated EST, hybridizing to the 3′ UTR of gene ZK418.9, a possible RNA-binding protein. Even if this and a few other clones do represent authentic siRNAs, they would still be greatly outnumbered by the 300 clones representing 54 different miRNAs. Our cloning protocol is not expected to preferentially exclude siRNAs; it was similar to the protocol that efficiently cloned exogenous siRNAs from Drosophila extracts (12). Instead, we propose that the preponderance of miRNAs among our clones indicates that in healthy, growing cultures of C. elegans, regulation by miRNAs normally plays a more dominant role than does regulation by siRNAs.

Regardless of the relative importance of miRNAs and siRNAs in the normal regulation of endogenous genes, our results show that small RNA genes like lin-4 and let-7 are more abundant inC. elegans than previously appreciated. Results from a parallel effort that directly cloned small RNAs fromDrosophila and HeLa cells demonstrates that the same is true in other animals (25), a conclusion further supported by the orthologs to the C. elegans miRNAs that we identified through database searching. Many of the miRNAs that we identified are represented by only a single clone (Table 1), suggesting that our sequencing has not reached saturation and that there are over 100 miRNA genes in C. elegans.

We presume that there is a reason for the expression and evolutionary conservation of these small noncoding RNAs. Our favored hypothesis is that these newly found miRNAs, together withlin-4 and let-7 RNAs, constitute an important and abundant class of riboregulators, pairing to specific sites within mRNAs to direct the posttranscriptional regulation of these genes (32). The abundance and diverse expression patterns of miRNA genes implies that they function in a variety of regulatory pathways, in addition to their known role in the temporal control of developmental events.

  • * To whom correspondence should be addressed. E-mail: dbartel{at}


Stay Connected to Science

Navigate This Article