Report

Genome-Wide Identification of Human RNA Editing Sites by Parallel DNA Capturing and Sequencing

See allHide authors and affiliations

Science  29 May 2009:
Vol. 324, Issue 5931, pp. 1210-1213
DOI: 10.1126/science.1170995

Abstract

Adenosine-to-inosine (A-to-I) RNA editing leads to transcriptome diversity and is important for normal brain function. To date, only a handful of functional sites have been identified in mammals. We developed an unbiased assay to screen more than 36,000 computationally predicted nonrepetitive A-to-I sites using massively parallel target capture and DNA sequencing. A comprehensive set of several hundred human RNA editing sites was detected by comparing genomic DNA with RNAs from seven tissues of a single individual. Specificity of our profiling was supported by observations of enrichment with known features of targets of adenosine deaminases acting on RNA (ADAR) and validation by means of capillary sequencing. This efficient approach greatly expands the repertoire of RNA editing targets and can be applied to studies involving RNA editing–related human diseases.

Adenosine-to-inosine (A-to-I) RNA editing converts a genomically encoded adenosine (A) into inosine (I), which in turn is read as guanosine (G), and increases transcriptomic diversity (1, 2). It is critical for normal brain function (37) and is linked to various disorders (8). To date, a total of 13 edited genes have been identified within nonrepetitive regions of the human genome (table S1). The limiting factor in the identification of RNA editing targets has been the number of locations that could be profiled by the sequencing of DNA and RNA samples. Even with recent developments in massively parallel DNA sequencing technologies (9), it still remains expensive to sequence whole genomes and transcriptomes, both of which are required to identify RNA editing targets. Here, we report an efficient and unbiased genome-wide approach to identify RNA editing sites that uses tailored target capture followed by massively parallel DNA sequencing.

We first compiled a set of 59,437 genomic locations enriched with RNA editing sites, excluding repetitive regions such as Alu (fig. S1) (10). To reduce biases in detection, the key criteria for previous predictions of editing targets—conservation, coding potential, and RNA secondary structure(1115)—were not taken into account. Over 90% of the previously identified editing targets are present in this data set (table S1). We designed padlock probes (16) for 36,208 sites that best satisfied our criteria for probe design (table S2) (10). Sites near splicing junctions required two different probes [targeting genomic DNA (gDNA) and cDNA], giving rise to a total of 41,046 probes designed for 36,208 sites (table S2).

To identify RNA editing sites, we used gDNA and cDNA from seven different tissues (cerebellum, frontal lobe, corpus callosum, diencephalon, small intestine, kidney, and adrenal), all derived from a single individual so as to rule out polymorphisms among populations. The pool of probes was hybridized to gDNA and cDNA in separate reactions (Fig. 1A and fig. S2). We sequenced the amplicons and identified sites where an A allele was observed in gDNA, whereas at least a fraction of G reads were present in the cDNA samples. A majority of sites were covered with multiple reads (Fig. 1B). Two independent technical replicates were well correlated for both gDNA (Fig. 1C) and frontal lobe cDNA (Fig. 1D). In addition, the editing levels were highly correlated between the two replicates (Fig. 1E).

Fig. 1

Screening for RNA editing sites using padlock capture and massively parallel DNA sequencing technologies. (A) Schematic diagram of the padlock technology. The candidate RNA editing sites are specifically targeted by padlocks in both gDNA and cDNA samples from a single individual. Circles are formed when polymerase, deoxynucleotide triphosphate, and ligase are added, subsequently amplified, and sequenced with an Illumina genome analyzer (Illumina, San Diego, CA). (B) Uniformity of target abundance distributions in sequences obtained for all samples. Each graph shows the abundance of captured target sequences for each target over all samples, in which targets are given in ranked order. Abundance is represented by the log10 of the target coverage normalized to the mean of the target coverage for the sample. The abundance of different sites is nonuniform because of capturing biases and expression-level variations. The gDNA and frontal lobe replicates were combined in this analysis. (C to E) Target capture is highly reproducible for technical replicates. (C) Correlation of coverage of sites for gDNA replicates (Pearson correlation, r = 0.962); (D) correlation of coverage of sites for frontal-lobe cDNA replicates (r = 0.998); and (E) correlation of RNA editing level in frontal-lobe replicates (r = 0.964). Editing level is the number of G reads divided by the sum number of A and G reads when the sum is ≥10.

A total of 57.8 million reads were obtained, among which 55.5 million sequences were mapped to the target regions (Table 1) (10). To identify RNA editing sites, we searched for positions where a homozygous A was seen in gDNA and more than 5% of reads were G in at least two of the seven cDNA samples with a log likelihood score of ≥2 (10). A total of 239 such sites (in 207 targets) with stringent thresholds were identified and referred to as class I (table S3), including 10 of all 13 known edited genes (tables S1 and S3).

Table 1

Statistics of sequencing of samples used in this study.

View this table:

To validate the class I set, we randomly selected 18 different sites, successfully amplified them with polymerase chain reaction, and sequenced them using the dideoxynucleotide (Sanger) method. We also tested gDNA and frontal lobe cDNA from two additional donors (a total of 12 samples per site). Fourteen of the 18 sites were clearly edited, with a majority in all three donors (Fig. 2A and fig. S3). One of the remaining sites, ZNF7, was edited at 1.1% level (2 of 187 individually sequenced clones). The false discovery rate of the set is thus up to 17% (3 of 18 sites).

Fig. 2

Validation of RNA editing sites with conventional Sanger sequencing. (A) Sequencing chromatogram traces of an exemplary site, chr1:212596363, in gDNA and all seven tested cDNAs of the first donor and in gDNA and frontal lobe cDNA from two unrelated donors. Some nearby sites are also edited. A complete list of validated sites is in fig. S3. (B) At site chr8:145550000 [in F-box and leucine-rich repeat protein 6 (FBXL6) gene], the genomic A in the stop codon (TAG) is highly edited, allowing the addition of 29 amino acids to the protein in all three donors. (C) The CADPS site, chr3:62398847, is edited in human (shown is the frontal-lobe cDNA), and the conserved site is edited in mouse as well (shown is the brain cDNA). The editing event leads to amino acid change from glutamic acid (GAG) to glycine (GGG).

RNA editing occurs when ADARs (adenosine deaminases acting on RNA) bind to an extended RNA duplex within target RNAs (17, 18). Indeed, the class I set is significantly enriched, as compared with the 36,208-candidates set, with sites that are located in RNA double-stranded regions (Table 2 and table S4) (10). Previous studies have indicated that ADARs have a sequence preference for strong G depletion in the nucleotide 5′ to the editing site (19). This observation is in agreement with our findings (Table 2 and fig. S4).

Table 2

Features of class I RNA editing sites.

View this table:

Of the 239 class I sites, 55 (23%) are located in coding regions, 38 of which change amino acids (table S3), including one that adds an additional 29 amino acids by changing a stop codon (UAG) to a tryptophan (UGG) (Fig. 2B). There is a clear bias against the coding regions (Table 2), where changes are less likely to be tolerated. Similarly, possible microRNA target sequences are significantly reduced in our set (Table 2).

Sequence conservation has been the main criterion in various attempts to identify new RNA editing sites. However, it has been shown that editing is enriched in the primate lineage, mainly because of widespread editing in Alu repetitive elements (2024). In the class I set, the number of sites with flanking sequences conserved between human and mouse is significantly underrepresented (Table 2) (10). Of those sites that are highly conserved (fig. S5), we sequenced one located in the CADPS (Ca2+-dependent secretion activator) gene in mouse gDNA and cDNA samples and observed an editing signal. This site is probably edited in all vertebrates based on A-to-G changes in supporting expressed sequence tags. Fourteen of the 50 editing sites located in conserved regions harbor a G in at least one of eight other vertebrate genomes (table S5), a phenomenon previously observed in flies (25). From an evolutionary perspective, RNA editing may thus play a role similar to genetic mutation in creating genetic diversity. In contrast to mutation, however, RNA editing provides a much wider spectrum of “genetic dosage”; our data demonstrate that the level ranges from very low to full editing (fig. S6).

In agreement with previous observations that targets of RNA editing are involved in nervous-system function (7, 1115, 2628), we found that the class I sites were enriched with functions such as synapse, cell trafficking, and membrane. Furthermore, many sites are located within genes that are implicated in human brain-related diseases (table S6). In addition to class I sites, many more sites are likely to be edited. When we relaxed our criteria to require only one tissue to be edited, we identified an additional set of 330 potential candidate editing sites as the class II set (table S7). We validated a selected candidate from this set, GLI1 (Glioma-associated oncogene homolog 1, at site chr12:56150891), which was highly edited in the frontal lobe of all three donors (fig. S3). An additional set of 141 sites was identified as class III when the editing level threshold was reduced to 2% (table S8), which suggests that many targets may be edited at very low levels. By sequencing 118 clones of the class III site chr11:74994333 in MAP6 (microtubule-associated protein 6), we found 13 clones with a G at the editing site.

Although it is unclear if the extensive editing of primate Alu sites has any biological role, it may require an increased expression of ADAR proteins in humans, which in turn may lead to the editing of non-Alu RNAs. In support of this scenario, most of the nonrepetitive sites we identified do not seem to be conserved beyond the primate lineage and may play roles in primate-specific functions. Many of the identified editing sites are located in noncoding RNAs that have recently been linked to brain function (29).

The approach described herein can be readily extended to a wider variety of tissues in normal and diseased individuals in order to identify additional RNA editing sites and measure their editing levels. The enlarged set of nonrepetitive RNA editing targets may help unravel rules of RNA editing in human diseases and behavior.

Supporting Online Material

www.sciencemag.org/cgi/content/full/324/5931/1210/DC1

Materials and Methods

Figs. S1 to S10

Tables S1 to S12

References

  • * These authors contributed equally to this work.

  • Present address: College of Medicine, Seoul National University, Seoul 110-799, Korea.

  • Present address: Department of Bioengineering, University of California, San Diego, CA 92093, USA.

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank R. Emeson, M. P. Ball, and F. Isaacs for critical reading of the manuscript; P. Wang and Z. Liu (BioChain Institute) for helping collect human samples; M. Higuchi and P. Seeburg for providing ADAR2−/− mouse brain cDNA; Harvard Biopolymers Facility for help with Illumina sequencing; and A. Ahlford, H. Ebling, and J. Santosuosso for assistance with Sanger sequencing. E.Y.L. was supported by the Machiah foundation. Funding came from National Human Genome Research Institute Centers of Excellence in Genomic Science grant to G.M.C. The Illumina sequencing data are deposited at the National Center for Biotechnology Information Short Read Archive under accession number SRA008181.
View Abstract

Navigate This Article