The Antisense Transcriptomes of Human Cells

See allHide authors and affiliations

Science  19 Dec 2008:
Vol. 322, Issue 5909, pp. 1855-1857
DOI: 10.1126/science.1163853


Transcription in mammalian cells can be assessed at a genome-wide level, but it has been difficult to reliably determine whether individual transcripts are derived from the plus or minus strands of chromosomes. This distinction can be critical for understanding the relationship between known transcripts (sense) and the complementary antisense transcripts that may regulate them. Here, we describe a technique that can be used to (i) identify the DNA strand of origin for any particular RNA transcript, and (ii) quantify the number of sense and antisense transcripts from expressed genes at a global level. We examined five different human cell types and in each case found evidence for antisense transcripts in 2900 to 6400 human genes. The distribution of antisense transcripts was distinct from that of sense transcripts, was nonrandom across the genome, and differed among cell types. Antisense transcripts thus appear to be a pervasive feature of human cells, which suggests that they are a fundamental component of gene regulation.

The DNA in each normal human cell is virtually identical. The key to cellular differentiation therefore lies in understanding the gene products—transcripts and proteins—that are derived from the genome. For more than a decade, it has been possible to measure the levels of transcripts in a cell at the whole-genome level (1). The word “transcriptome” was coined to denote this genome-wide assessment (2). However, it has been difficult to determine which of the two strands of the chromosome (plus or minus) serves as the template for transcripts in a global fashion. Sense transcripts of protein-encoding genes produce functional proteins, whereas antisense transcripts are often thought to have a regulatory role (37).

Several unequivocal examples of antisense transcripts, such as those corresponding to imprinted genes, have been described [reviewed in (37)]. However, estimates of the fraction of genes associated with antisense transcripts in mammalian cells vary from less than 2% to more than 70% of the total genes (818). We have developed a technique called asymmetric strand-specific analysis of gene expression (ASSAGE) that allows unambiguous assignment of the DNA strand coding for a transcript. The key to this approach is the treatment of RNA with bisulfite, which changes all cytidine residues to uridine residues. The sequence of a bisulfite-treated RNA molecule can only be matched to one of the two possible DNA template strands (fig. S1). After generating cDNA from bisulfite-treated RNA with reverse transcriptase (RT), sequencing of the reverse transcription polymerase chain reaction (RT-PCR) product can be used to establish whether a particular RNA was transcribed from the plus or minus strand. To identify the DNA strands of origin for the entire transcriptome, we ligate cDNA fragments derived from bisulfite-treated RNA to adapters and then determine the sequence of one end of each fragment through sequencing-by-synthesis. The number and distribution of the sequenced tags provide information about the level of transcription of each gene in the analyzed cell population as well as the strand from which each transcript was derived.

We used ASSAGE to study transcription in normal human peripheral blood mononuclear cells (PBMCs). Several quality controls were performed to evaluate the library of tags derived from this RNA source. First, we calculated the bisulfite conversion efficiency from the sequences of the tags and found that 95% of the C residues in the original RNA had been converted to U residues (19). Second, we determined whether the bisulfite treatment altered the distribution of tags by preparing libraries without bisulfite treatment. We found a good correlation between the number of sense tags in a gene derived from ASSAGE data and the number of tags derived from sequencing of DNA synthesized from the same RNA used for ASSAGE without bisulfite treatment from the same cells (R2 = 0.59). We also found a correlation between the relative expression levels determined by ASSAGE and those assessed by hybridization to microarrays [R2 = 0.45 (19)].

From the PBMC tag library, 4 million experimental tags could be unambiguously assigned to a specific genomic position in the converted genome (table S1). Of the 4 million tags, 47.5% had the sequence of the plus strand (that is, the template of these transcripts had been the minus strand), and 52.5% had the sequence of the minus strand. This is consistent with the expected equal distribution of sense transcripts from the two strands (20). As shown in table S1, 90.3% of the 4 million tags could be assigned to known genes; the remaining tags were in unannotated regions of the genome. The fraction of unannotated tags (9.7%) is consistent with data from other sources indicating the likely existence of actively transcribed genes in human cells that have not yet been discovered or annotated (6, 2124). Of the informative tags in annotated regions, 11% were antisense and 89% were sense (table S1).

We next assessed the expression of each gene by counting the total number of tags matching a gene or by counting tags with identical sequence matching a gene only once (distinct tags). On average, there were three total tags for each distinct tag, but this number varied widely and reflected the level of expression of the corresponding transcript. With respect to antisense transcription, genes could be divided into three main classes. S genes were defined as those with a ≥5:1 ratio of distinct sense tags to distinct antisense tags; AS genes were defined as those with a ≥5:1 ratio of distinct antisense tags to distinct sense tags. The SAS class included the remaining genes, all of which contained both sense and antisense tags. In PBMCs, we identified 329 (2.5%) AS genes, 2061 (15.9%) SAS genes, and 10,586 (81.6%) S genes among the 12,976 Ensembl genes in which at least five distinct tags were observed (Table 1 and table S2). There were 6457 genes in which at least two distinct antisense tags were found.

Table 1.

Classification of genes with respect to antisense tags. We classified only those genes whose sum of distinct sense and antisense tags was 5 or more. S genes contained only sense tags or had a sense/antisense tag ratio of 5 or more; AS genes contained only antisense tags or had a sense/antisense tag ratio of 0.2 or less; SAS genes contained both sense and antisense tags and had a sense/antisense ratio between 0.2 and 5. Samples were derived from the following sources: PBMC, peripheral blood mononuclear cells isolated from a healthy volunteer; Jurkat, a T cell leukemia line; HCT116, a colorectal cancer cell line; MiaPaCa2, a pancreatic cancer line; MRC5, a fibroblast cell line derived from normal lung.

View this table:

When normalized by length, there was an obvious concentration of antisense tags in exons relative to the entire genome or to introns (P < 0.0001; Fig. 1). Within promoter regions, there was a concentration of antisense tags near the transcription initiation site of the sense transcripts, which gradually tapered off upstream (P < 0.01; Fig. 1 and fig. S2). We also found clear differences between the relative distributions of sense and antisense tags, with a higher proportion of antisense tags than sense tags within promoter and terminator regions of genes (P < 0.0001; Fig. 1). Examples of the distribution of sense and antisense tags derived from S and AS genes are shown in Fig. 2 and fig. S3. The predicted AS transcripts could be confirmed by ASSAGE using gene-specific primers (fig. S4).

Fig. 1.

ASSAGE tag densities in PBMCs. The densities of distinct sense and antisense tags in the indicated regions were normalized to the overall genome tag density. The promoter and terminator regions were defined as the 1 kb of sequence that was upstream or downstream, respectively, of the transcript start and end sites.

Fig. 2.

Tag distribution in the indicated S (A) and AS (B) genes in PBMCs.

To determine whether the patterns described above were particular to PBMCs, we used ASSAGE to study four additional human cell types. In all cases, the patterns observed—including the proportions of S, AS, and SAS genes—were similar to those in PBMCs (Table 1 and table S1). However, the identity of the S, AS, and SAS genes varied among the cell lines, which suggests that the expression of antisense tags may be regulated in a cell- or tissue-specific manner (fig. S5 and tables S2 and S3). These differences were not related to interexperimental variation, as repeat experiments performed with independently generated ASSAGE libraries from the same RNA sample were highly correlated (fig. S6 and table S2) and differential expression could be confirmed by strand-specific PCR from RNA (fig. S7). In every sample, there was a concentration of both sense and antisense tags within exons (relative to the whole genome or to intronic regions) and a preferential concentration in promoter and terminator regions (P < 0.01; figs. S2 and S8).

To determine whether splicing of antisense transcripts occurred, we constructed new libraries from Jurkat and MRC5 cells and determined the sequences of both ends of each cDNA fragment (“paired-end sequencing”). As expected, transcripts levels assessed with this paired-end ASSAGE and the original ASSAGE were highly correlated (fig. S9). The size-selected transcript fragments used to construct these libraries were, on average, ∼175 base pairs in length. A cDNA fragment whose ends were located at genomic positions more than 3 times this distance (>600 base pairs apart) would be expected to represent spliced transcripts. By this criterion, more than 20% of sense-strand cDNA fragments were spliced (fig. S10). In contrast, only ∼1% of antisense fragments exhibited this spliced pattern. Sequencing of five putative spliced antisense transcripts confirmed the splicing, and comparison with genomic DNA revealed the splice site consensus sequences at the expected locations (figs. S11 and S12).

Our results raise many questions about the genesis and metabolism of antisense transcripts. It has been hypothesized that antisense transcripts are widely and promiscuously expressed, perhaps because of weak promoters distributed throughout the genome [reviewed in (25, 26)]. Our data argue against this hypothesis in human cells: Promiscuous expression would lead to a uniform distribution of antisense tags across the genome, whereas the observed distribution was nonrandom, localized to genes and within particular regions of genes, much like sense transcripts (Fig. 1 and figs. S2 and S8). This distribution is consistent with a model wherein many antisense transcripts initiate and terminate near the terminators and promoters, respectively, of the sense transcripts. Some of the apparent antisense transcripts from a gene on the plus strand could actually be sense transcripts originating from unterminated transcription of a downstream gene on the minus strand (or vice versa). However, this idea is not generally supported because there was a poor correlation between antisense tag density within a gene and the density of sense tags from the closest downstream gene (fig. S13). One explanation for the higher density of antisense tags in transcribed regions is that transcription of the sense transcripts from correct initiation sites would reduce nucleosome density throughout the entire transcribed region, thereby increasing DNA accessibility and hence the likelihood of nonspecific transcription (26). This is unlikely, given that genes with high sense tag densities did not generally have high antisense densities. There is substantial evidence that sense transcripts can be negatively regulated by antisense transcripts (37). Such regulation can occur either by transcriptional interference or through posttranscriptional mechanisms involving splicing or RNA-induced silencing complexes (RISCs). Our data support the possibility that antisense-mediated regulation affects a large number of genes.

Supporting Online Material

Materials and Methods

Figs. S1 to S13

Tables S1 to S3


References and Notes

View Abstract


Navigate This Article