Identification of Novel Genes Coding for Small Expressed RNAs

See allHide authors and affiliations

Science  26 Oct 2001:
Vol. 294, Issue 5543, pp. 853-858
DOI: 10.1126/science.1064921


In Caenorhabditis elegans, lin-4and let-7 encode 22- and 21-nucleotide (nt) RNAs, respectively, which function as key regulators of developmental timing. Because the appearance of these short RNAs is regulated during development, they are also referred to as small temporal RNAs (stRNAs). We show that many 21- and 22-nt expressed RNAs, termed microRNAs, exist in invertebrates and vertebrates and that some of these novel RNAs, similar to let-7 stRNA, are highly conserved. This suggests that sequence-specific, posttranscriptional regulatory mechanisms mediated by small RNAs are more general than previously appreciated.

Two distinct pathways exist in animals and plants in which 21- to 23-nt RNAs function as posttranscriptional regulators of gene expression. Small interfering RNAs (siRNAs) act as mediators of sequence-specific mRNA degradation in RNA interference (RNAi) (1–5), whereas stRNAs regulate developmental timing by mediating sequence-specific repression of mRNA translation (6–11). siRNAs and stRNAs are excised from double-stranded RNA (dsRNA) precursors by Dicer (12–14), a multidomain ribonuclease III protein, thus producing RNA species of similar sizes. However, siRNAs are believed to be double-stranded (2, 5, 12), whereas stRNAs are single-stranded (8).

We previously developed a directional cloning procedure to isolate siRNAs after processing of long dsRNAs in Drosophila melanogaster embryo lysate (2). Briefly, 5′ and 3′ adapter molecules were ligated to the ends of a size-fractionated RNA population, followed by reverse transcription polymerase chain reaction (PCR) amplification, concatamerization, cloning, and sequencing. This method, originally intended to isolate siRNAs, led to the simultaneous identification of 16 novel 20- to 23-nt short RNAs, which are encoded in the D. melanogaster genome and are expressed in 0- to 2-hour embryos (Table 1). The method was adapted to clone RNAs in a similar size range from HeLa cell total RNA (15), which led to the identification of 21 novel human microRNAs (Table 2), thus providing further evidence for the existence of a large class of small RNAs with potential regulatory roles. Because of their small size, and in agreement with the authors of two related papers in this issue (16, 17), we refer to these novel RNAs as microRNAs (miRNAs). The miRNAs we studied are abbreviated as miR-1 to miR-33, and the genes encoding miRNAs are namedmir-1 to mir-33. Highly homologous miRNAs are referred to by the same gene number, but followed by a lowercase letter; multiple genomic copies of a mir gene are annotated by adding a dash and a number.

Table 1

D. melanogaster miRNAs. The sequences given represent the most abundant, and typically longest, miRNA sequence identified by cloning; miRNAs frequently vary in length by one or two nucleotides at their 3′ termini. From 222 short RNAs sequenced, 69 (31%) corresponded to miRNAs, 103 (46%) to already characterized functional RNAs (rRNA, 7SL RNA, and tRNA), 30 (14%) to transposon RNA fragments, and 20 (10%) sequences had no database entry. The frequency for cloning a particular miRNA as a percentage relative to all identified miRNAs is indicated. Results of Northern blotting of total RNA isolated from staged populations of D. melanogaster are summarized. E, embryo; L, larval stage; P, pupa; A, adult; S2, Schneider-2 cells. The strength of the signal within each blot is represented from strongest (+++) to undetected (−). let-7 stRNA was probed as the control. GenBank accession numbers and homologs of miRNAs identified by database searching in other species are provided in (21).

View this table:
Table 2

Human miRNAs. From 220 short RNAs sequenced, 100 (45%) corresponded to miRNAs, 53 (24%) to already characterized functional RNAs (rRNA, snRNA, and tRNA), and 67 (30%) of the sequences had no database entry. Results of Northern blotting of total RNA isolated from different vertebrate species and S2 cells are indicated. For legend, see Table 1.

View this table:

The expression and size of the cloned, endogenous short RNAs were also examined by Northern blotting (Fig. 1 and Tables 1 and 2). For analysis ofD. melanogaster RNAs, total RNA was prepared from different developmental stages, as well as from cultured Schneider-2 (S2) cells, which were originally derived from 20- to 24-hour D. melanogaster embryos (18) (Fig. 1 and Table 1). miR-3 to miR-7 are expressed only during embryogenesis and not at later developmental stages. The temporal expression of miR-1, miR-2, and miR-8 to miR-13 was less restricted. These miRNAs were observed at all developmental stages, and significant variations in the expression levels were sometimes observed. Interestingly, miR-1, miR-3 to miR-6, and miR-8 to miR-11 were completely absent from cultured S2 cells, whereas miR-2, miR-7, miR-12, and miR-13 were present in S2 cells, therefore indicating cell type–specific miRNA expression. miR-1, miR-8, and miR-12 expression patterns are similar to those oflin-4 stRNA in C. elegans, as their expression is strongly up-regulated in larvae and sustained to adulthood (19). miR-9 and miR-11 are present at all stages but are strongly reduced in the adult, which may reflect a maternal contribution from germ cells or expression in one sex only.

Figure 1

Expression of miRNAs. Representative examples of Northern blot analysis are depicted (21). The position of 76-nt val-tRNA is indicated on the blots; 5SrRNA serves as a loading control. (A) Northern blots of total RNA isolated from staged populations of D. melanogaster, probed for the indicated miRNA. E, embryo; L, larval stage; P, pupa; A, adult; S2, Schneider-2 cells. (B) Northern blots of total RNA isolated from HeLa cells, mouse kidneys, adult zebrafish, frog ovaries, and S2 cells, probed for the indicated miRNA.

The mir-3 to mir-6 genes are clustered (Fig. 2A), and mir-6is present as triple repeat with slight variations in themir-6 precursor sequence but not in the miRNA sequence itself. The expression profiles of miR-3 to miR-6 are highly similar (Table 1), which suggests that a single embryo-specific precursor transcript may give rise to the different miRNAs or that the same enhancer regulates miRNA-specific promoters. Several other fly miRNAs are also found in gene clusters (Fig. 2A).

Figure 2

Genomic organization of miRNA gene clusters. The precursor structure is indicated as a box, and the location of the miRNA within the precursor is shown in black; the chromosomal location is also indicated to the right. (A) D. melanogaster miRNA gene clusters. (B) Human miRNA gene clusters. The cluster of let-7a-1 and let-7f-1 is separated by 26,500 nt from a copy of let-7d on chromosomes 9 and 17. A cluster of let-7a-3 and let-7b, separated by 938 nt on chromosome 22, is not illustrated.

The expression of HeLa cell miR-15 to miR-33 was examined by Northern blotting using HeLa cell total RNA, in addition to total RNA prepared from mouse kidney, adult zebrafish, Xenopus laevis ovary, and D. melanogaster S2 cells (Fig. 1B andTable 2). miR-15 and miR-16 are encoded in a gene cluster (Fig. 2B) and are detected in mouse kidney, adult zebrafish, and very weakly in frog ovary, which may result from miRNA expression in somatic ovary tissue rather than in oocytes. mir-17 to mir-20 are also clustered (Fig. 2B) and are expressed in HeLa cells and adult zebrafish, but undetectable in mouse kidney and frog ovary (Fig. 1 andTable 2), and therefore represent a likely case of tissue-specific miRNA expression.

The majority of vertebrate and invertebrate miRNAs identified in this study are not related by sequence, but a few exceptions do exist and are similar to results previously reported for let-7 RNA (8). Sequence analysis of the D. melanogaster miRNAs revealed four such instances of sequence conservation between invertebrates and vertebrates. miR-1 homologs are encoded in the genomes of C. elegans, C. briggsae, and humans and are found in cDNAs from zebrafish, mice, cows, and humans. The expression of mir-1 was detected by Northern blotting in total RNA from adult zebrafish and C. elegans, but not in total RNA from HeLa cells or mouse kidney (Table 2) (20). Interestingly, althoughmir-1 and let-7 are both expressed in adult flies (Fig. 1A) (8) and are both undetected in S2 cells, onlylet-7 is detectable in HeLa cells. This represents another case of tissue-specific expression of an miRNA and indicates that miRNAs may play a regulatory role not only in developmental timing but also in tissue specification. miR-7 homologs were found by database searches of the mouse and human genomes and of expressed sequence tags (ESTs). Two mammalian miR-7 variants are predicted by sequence analysis in mice and humans and were detected by Northern blotting in HeLa cells and adult zebrafish, but not in mouse kidney (Table 2). Similarly, we identified mouse and human miR-9 and miR-10 homologs by database searches but only detected mir-10 expression in mouse kidney.

The identification of evolutionarily related miRNAs, which have already acquired multiple sequence mutations, was not possible by standard bioinformatic searches. Direct comparison of the D. melanogaster miRNAs with the human miRNAs identified an 11-nt segment shared between D. melanogaster miR-6 and HeLa miR-27, but no further relationships were detected. It is possible that most miRNAs only act on a single target and therefore allow for rapid evolution by covariation. Highly conserved miRNAs may act on more than one target sequence and therefore have a reduced probability for evolutionary drift by covariation (8). An alternative interpretation is that the sets of miRNAs from D. melanogaster and humans are fairly incomplete and that many more miRNAs remain to be discovered, which will provide the missing evolutionary links.

lin-4 and let-7 stRNAs were predicted to be excised from longer transcripts that contain stem-loop structures about 30 base pairs in length (6, 8). Database searches for newly identified miRNAs revealed that all miRNAs are flanked by sequences that have the potential to form stable stem-loop structures (Figs. 3 and4). In many cases, we were able to detect the predicted precursors (about 70 nt) by Northern blotting (Fig. 1). Some miRNA precursor sequences were also identified in mammalian cDNA (EST) databases (21), indicating that primary transcripts longer than 70-nt stem-loop precursors also exist. We never cloned a 22-nt RNA complementary to any of the newly identified miRNAs, and it is as yet unknown how the cellular processing machinery distinguishes between an miRNA and its complementary strand. Comparative analysis of the precursor stem-loop structures indicates that the loops adjacent to the base-paired miRNA segment can be located on either side of the miRNA sequence (Figs. 3 and 4), suggesting that neither the 5′ nor the 3′ location of the stem-closing loop is the determinant of miRNA excision. It is also unlikely that the structure, length, or stability of the precursor stem is the critical determinant because the base-paired structures are frequently imperfect and interspersed by G/U wobbles and less stable, non–Watson-Crick base pairs such as G/A, U/U, C/U, and A/A. Therefore, a sequence-specific recognition process is a likely determinant for miRNA excision, perhaps mediated by members of the Argonaute (RDE-1/AGO1/PIWI) protein family. Two members of this family, ALG-1 and ALG-2, have recently been shown to be critical for stRNA processing in C. elegans (13). Members of the Argonaute protein family are also involved in RNAi and posttranscriptional gene silencing. In D. melanogaster, these include Argonaute2, a component of the siRNA-endonuclease complex (RISC) (22), and its relative Aubergine, which is important for silencing of repeat genes (23). In other species, these include RDE-1 in C. elegans (24); Argonaute1 in Arabidopsis thaliana (25); and QDE-2 in Neurospora crassa (26). In addition to the RNase III Dicer (12, 13), the Argonaute family represents another evolutionary line between RNAi and miRNA maturation.

Figure 3

Predicted precursor structures of D. melanogaster miRNAs. RNA secondary structure prediction was performed using mfold version 3.1 (32) and manually refined to accommodate G/U wobble base pairs in the helical segments. The miRNA sequence is underlined. The actual size of the stem-loop structure is not known experimentally and may be slightly shorter or longer than represented. Multicopy miRNAs and their corresponding precursor structures are also shown.

Figure 4

Predicted precursor structures of human miRNAs. For legend, see Fig. 3.

Despite advanced genome projects, computer-assisted detection of genes encoding functional RNAs remains problematic (27). Cloning of expressed, short functional RNAs, similar to EST approaches (RNomics), is a powerful alternative and probably the most efficient method for identification of such novel gene products (28–31). The number of functional RNAs has been widely underestimated and is expected to grow rapidly because of the development of new functional RNA cloning methodologies.

The challenge for the future is to define the function and the potential targets of these novel miRNAs by using bioinformatics as well as genetics and to establish a complete catalog of time- and tissue-specific distribution of the already identified and yet to be uncovered miRNAs. lin-4 and let-7 stRNAs negatively regulate the expression of proteins encoded by mRNAs in which 3′ untranslated regions contain sites of complementarity to the stRNA (9–11). Because these interaction domains are only 6 to 10 base pairs long and often contain small bulges and G/U wobbles (9–11), the prediction of miRNA target mRNAs represents a challenging bioinformatic and/or genetic task. A profound understanding of the expression, processing, and action of miRNAs may enable the development of more general methods to direct the regulation of specific gene targets and may also lead to new ways of reprogramming tissues.

  • * To whom correspondence should be addressed. E-mail: ttuschl{at}


Stay Connected to Science

Navigate This Article