The Birth of an Alternatively Spliced Exon: 3' Splice-Site Selection in Alu Exons

See allHide authors and affiliations

Science  23 May 2003:
Vol. 300, Issue 5623, pp. 1288-1291
DOI: 10.1126/science.1082588


Alu repetitive elements can be inserted into mature messenger RNAs via a splicing-mediated process termed exonization. To understand the molecular basis and the regulation of the process of turning intronic Alus into new exons, we compiled and analyzed a data set of human exonized Alus. We revealed a mechanism that governs 3′ splice-site selection in these exons during alternative splicing. On the basis of these findings, we identified mutations that activated the exonization of a silent intronic Alu.

Alu elements are short (about 300 nucleotides in length), interspersed elements that amplify in primate genomes through a process of retroposition (13). These elements have reached a copy number of about 1.4 million in the human genome, composing more than 10% of it (4). A typical Alu is a dimer, built of two similar sequence elements (left and right arms) that are separated by a short A-rich linker. Most Alus have a long poly-A tail of about 20 to 100 bases (5).

Parts of Alu elements, predominantly on their antisense orientation, can be inserted into mature mRNAs by way of splicing (“exonization”). Presumably, the exonization process is facilitated by sequence motifs that resemble splice sites, which are found within the Alu sequence (69) (see fig. S1 for a model of exonization). Because Alus are found in primate genomes only, Alu-derived exons might contribute to some of the characteristically unique features of primates.

We have previously shown that more than 5% of human alternatively spliced exons are Alu-derived and that most, if not all, Alu-containing exons are alternatively spliced (9). We therefore hypothesized that mutations causing a constitutive splicing of intronic Alus would cause genetic diseases, and indeed we found in the literature several instances in which a constitutive Alu insertion caused a genetic disorder (1012).

To study the alternative splicing regulation of exonized Alus, we compiled a data set of exonized Alus from the human genome. An analysis of this data set revealed that two positions along the inverted Alu sequence are most commonly used as 3′ splice sites (3′SSs) in Alu exonizations: position 279 (“proximal AG”) and position 275 (“distal AG”). The relationships between two near AGs in a 3′SS were well characterized previously in the context of constitutive splicing (13, 14). To pinpoint the sequence determinants by which the spliceosome selects one of the two possible AGs in the context of alternative splicing, we aligned the exonized Alus that use either of these AGs to their ancestor.

The 3′SS regions of these instances are shown in Fig. 1. Figure 1 also shows that the proximal AG is selected mostly in exonized Alus of S subfamilies (9 times out of 13), whereas the distal AG is mainly selected in exonized Alus belonging to J subfamilies (12 times out of 16). This differential usage of AG selection in Alu subfamilies is probably because of the polymorphism between the J and S subfamilies in position 277 (Fig. 1, colored yellow), which eliminates the distal AG in Alus of the S subfamilies. As a result, the proximal AG is selected. Although another polymorphism at position 275 creates a new distal AG in the S subfamilies, this new AG is six nucleotides downstream from the proximal AG, a distance that was shown to be out of the effective range for selecting a distal AG in constitutive splicing (14). Indeed, the cases where Alus of the S subfamilies used the distal AG required mutations that shortened the distance between AGs back to four nucleotides (Fig. 1, colored green). This indicates that when the range between the two AGs is four nucleotides or less the distal AG is preferred and when the distance is six nucleotides or more the proximal is preferred.

Fig. 1.

The selection of AGs in the 3′SSs of Alu-derived exons. Alignment is shown for the region near the two most prevalent 3′SSs in the right arm of exonized Alu sequences (in the antisense orientation). Data for 29 exonized Alus, compiled from the results of our previous study (9) as well as newly collected data from the literature (2226), are shown. The 20 nucleotides presented are positions 290 to 271 in the Alu sequence, according to the numbering in (27). The two possible AG dinucleotides (distal and proximal to the PPT) are marked in red. The selected AG dinucleotide, defining the end of the intron, is underlined for each exonized Alu. Selected AG dinucleotides were inferred with the use of alignments of expressed sequences to the human genome (9) (table S1). Those marked by an asterisk next to the gene name are additional Alu exons found in the literature scan (2226). Consensus sequences of subfamilies S and J appear in the first two rows, with positions differing between subfamilies marked in yellow. Rows 30 to 32 represent the 3′SSs of Alu sequences whose constitutive exonization was shown to cause a genetic disease [Alport syndrome (COL4A3), Sly syndrome (GUSB), and OAT deficiency (OAT)]. The mutation causing Alport syndrome is marked light blue (position –7 G to T); exonization in Sly syndrome and OAT deficiency resulted from mutations in the 5′SS. Numbers on top mark the position relative to the distal 3′SS as referred to in this article. Gene names are as given in RefSeq conventions, the Alu exon number is the serial number of the Alu-containing exon in the gene, and the subfam is the Alu subfamily type, inferred with the use of RepeatMasker (28).

However, in five cases (Fig. 1, rows 25 to 29), the proximal AG was selected, even though a distal AG existed less than six nucleotides in range; in all these cases, the G in position –7 (colored purple) was mutated to either A (two cases) or T (three cases). Remarkably, a mutation in the same position in intron 5 of the COL4A3 gene leads to exonization of a silent intronic Alu. This Alu exon is constitutively spliced, resulting in an Alport syndrome phenotype (10). This implies that the G in position –7 suppresses the selection of the proximal AG, causing a shift toward selection of the distal AG. When this G is mutated, the proximal AG is preferred. This is supported by the finding that GAG triplets at ends of introns are poorly cleaved in vitro and extremely rare in vivo (15).

To examine the above hypotheses, we cloned a minigene of the ADAR2 gene (adenosine deaminase, involved in RNA editing) (Fig. 1, row 1). Previously, exon 8 [denoted as exon 5a in (16)] of this gene was found to be an alternatively spliced Alu-derived exon, adding 40 amino acids in frame to the protein (17). In this exon, the distal AG is used as the 3′SS. Trying to characterize the relationship between proximal and distal AGs in the context of alternative splicing, we generated a set of mutations within the 3′SS.

Whereas the Alu exon in the wild-type ADAR2 was included in 40% of the transcripts (Fig. 2B, lane 3), replacement of the G in position –7 to A, U, or C (Fig. 2B, lanes 10 to 12) had two effects. First, as predicted from Fig. 1, the replacement shifted the selection from the distal AG to the proximal one. Second, the replacement resulted in a shift from alternative splicing of the Alu exon toward a nearly constitutive inclusion of the exon in the mature transcript. Our results point to the important role of the G in position –7 in shifting the selection toward the distal AG, thus maintaining the alternative splicing of the Alu-containing exon. Mutation of that G will likely result in a constitutive inclusion of the Alu exon and thus might cause a disease, as occurs in the case of Alport syndrome (10).

Fig. 2.

Splicing assays on ADAR2 minigene mutants. (A) The 3′SS sequence of the wild type and 13 mutants of ADAR2. The proximal and distal AGs are in bold, mutations are shaded, and the selected AGs are boldfaced and underlined. (B) The indicated plasmid mutants were introduced into 293T cells by transfection, total cytoplasmic RNA was extracted, and splicing products were separated in 2% agarose gel after reverse transcription polymerase chain reaction (RT-PCR) (18). Lane 1, DNA size marker; lane 2, vector only (pEGFP); lane 3, splicing products of wild-type (wt) ADAR2; and lanes 4 to 16, splicing products of mutated ADAR2 minigenes, corresponding to the sequences in (A). The two possible minigene mRNA isoforms are shown on the right. Numbers in parentheses indicate percentages of the Alu-containing mRNA isoform as determined by quantified RT-PCR (100% corresponds to the total of both mRNA isoforms). Identical results were also obtained with HeLa cells.

To check whether the proximal AG affects the selection of the distal AG, we mutated the proximal AG to UC or GA (Fig. 2B, lanes 8 and 9, respectively). The GA mutation resulted in a higher ratio of exon inclusion, reaching more than 85% inclusion instead of 40% in the wild type. The UC mutation caused the splicing of the Alu exon to become constitutive, possibly because it strengthened the polypyrimidine tract (PPT) that was originally 18 bases long (on average, the PPT length in exonized Alus was 19 bases ±3). These findings indicate that the proximal AG presumably weakens the selection of the distal AG and is therefore required for maintaining alternative rather than constitutive splicing of the Alu exon. To summarize, when the distal 3′SS is used, the G at position –7 suppresses the selection of the proximal AG, and the proximal AG maintains the alternative splicing.

We then sought to understand whether the nucleotide composition between the two adjacent AGs affects 3′SS selection and ratio of alternative splicing. The two AGs are separated by an AC dinucleotide (Fig. 2A). A deletion of both these nucleotides (position –3 and –4) or only the C (Fig. 2B, lanes 5 and 7) resulted in an exon skipping, pointing to the importance of the C in position –3. Deletion or mutations of the A in position –4 to G or C changed the ratio between the two isoforms (Fig. 2B, lanes 6, 13, and 14). This indicates that position –4 also affects the inclusion ratio.

To test whether increased distance between the two AGs shifts the selection toward the proximal AG, we introduced additional nucleotides between the two AGs (Fig. 3A). Increasing the distance between the proximal and distal 3′SS to six or eight nucleotides resulted in Alu exon skipping (Fig. 3B, lanes 7 and 8). However, when the distance between the two AGs grew to 10 nucleotides, a residual exon inclusion was recovered in a little more than 3% of the spliced transcripts (Fig. 3B, lane 9). In these transcripts, the proximal AG was selected even though it was preceded by G (Fig. 3A).

Fig. 3.

The effect of hSlu7 on AG selection. (A) The sequence of the 3′SS of ADAR2 and the three insertion mutants. Both potential AGs are marked in bold, and the selected AG is boldfaced and underlined. bp, base-pair (B) The indicated plasmid mutants were treated as described for Fig. 2B (18). Lanes 4 to 6 represent cotransfection of the insertion mutants with a plasmid expressing hSlu7; lanes 7 to 9 represent the insertion mutants without additional hSlu7.

We further examined whether hSlu7 (human synergistic lethal with U5 small nuclear RNA), a second-step splicing factor, might be involved in the activation of the proximal AG. This protein is known to be required for correct AG identification when more than one possible AG exists in the 3′SS region (13). Cotransfection of 293T cells with plasmids containing the insertion mutants and with hSlu7 (10-fold higher than endogenous hSlu7 concentrations) (18) led to an increase in the selection of the proximal AG by 10-fold, reaching 32% inclusion when the distance between the proximal and distal AGs was 10 bases (Fig. 3B, lane 6). Presumably, hSlu7 activation of the weak splice site may depend on the existence of a distal AG, because elimination of the distal AG (mutant –2G, Fig. 2) resulted in an exon skipping that was not reversed by increasing the concentration of hSlu7 (19). These results propose that the distal AG can affect the selection of the proximal one negatively when the proximal is preceded by a G nucleotide. The proximal 3′SS can be selected when hSlu7 is present, and the efficiency of this selection is increased when the distal AG is found far enough from the splice site (in our case, 10 nucleotides into the exon). This observation, therefore, indicates that activation of the weak 3′SS (GAG) depends on hSlu7 concentration and suggests a possible role for hSlu7 concentration in alternative-splicing regulation.

Rows 17 to 22 in Fig. 1 show instances in which the proximal AG is selected even though the distal AG is found six nucleotides downstream. However, the +6 base-pair mutant (Fig. 3B, lane 6) resulted in a total exon skipping. The above results suggest that these exonization instances might occur with high hSlu7 concentrations within certain cell types or with high local concentration of hSlu7 within the subregion of the nucleus. From this, we further assumed that in normal conditions these Alu exons would be skipped. We therefore chose one of these genes, one encoding a putative glucosyltransferase (PGT) (Fig. 1, row 17), and cloned a minigene of its exons 11 to 13, including the introns in between (the Alu exon being exon 12). Indeed, when the PGT minigene was transfected into HT1080 and 293T cell lines, only a single mRNA isoform appeared, corresponding to Alu-exon skipping (Fig. 4B, lane 3). Repeating the same experiment with the use of endogenous PGT mRNA also showed Alu-exon skipping (19).

Fig. 4.

Splicing assays on wt and mutated PGT. (A) Sequences from top to bottom are as follows: wt Alu 3′SS of ADAR2; wt Alu 3′SS of PGT; mutant Alu 3′SS of PGT; and mutant sequence of COL4A3, which causes Alport syndrome. Both potential AGs are marked in bold, and the selected 3′SS is boldfaced and underlined. The mutated position is shaded. (B) Transfection was performed in HT1080 cell lines. Total RNA and RT-PCR was performed as mentioned in Fig. 2B (18). Lane 1, DNA size marker; lane 2, vector only (pEGFP); lane 3, splicing product of wt PGT; lanes 4 and 5, splicing products of mutated PGT minigenes, corresponding to the sequences in (A). The two possible minigene mRNA isoforms are shown on the right. The results were reproducible in 293T cell lines (19) as well.

To test if, as predicted from our results, a mutation in position –7 of a completely silent intronic Alu element would result in exonization, we mutated this position in the PGT minigene. As seen in Fig. 4B (lanes 4 and 5), this point mutation was enough to activate the nearly constitutive inclusion of the Alu exon in the mature transcript. As indicated above, the same mutation in the COL4A3 gene activates a constitutive exonization of a silent intronic Alu, resulting in Alport syndrome (10). To assess the importance of our findings, we analyzed the entire content of Alusin the human genome and found that there are at least 238,000 antisense Alus located within introns in the human genome (20). Of these, 52,935 Alus carry a potential ADAR2-like 3′SS, and 23,012 carry a potential PGT-like 3′SS. Our results suggest that many of these silent intronic Alu elements might be susceptible to exonization by the same single point mutation and are thus under strict selective pressure. Such point mutations in human genomic antisense Alus may, therefore, be the molecular basis for predisposition to so-far uncharacterized genetic diseases.

Because all Alu-containing exons are alternatively spliced (9), they add splice variants to our transcriptome while maintaining the original proteins intact. Exonized Alus can, thus, acquire functionality and become exapted, i.e., adapted to a function different than their original (21). When the splicing of an Alu exon is constitutive, however, the transcript encoding to the original protein is permanently disrupted, which could provide the basis for a genetic disorder. Identification of genomic Alus that are one point mutation away from exonization might therefore enable the screening for predisposition for genetic diseases that involve Alu exonization.

Supporting Online Material

Materials and Methods

Fig. S1

Table S1

References and Notes

  • * These authors contributed equally to this work.

References and Notes

View Abstract

Navigate This Article