Exon Shuffling by L1 Retrotransposition

See allHide authors and affiliations

Science  05 Mar 1999:
Vol. 283, Issue 5407, pp. 1530-1534
DOI: 10.1126/science.283.5407.1530


Long interspersed nuclear elements (LINE-1s or L1s) are the most abundant retrotransposons in the human genome, and they serve as major sources of reverse transcriptase activity. Engineered L1s retrotranspose at high frequency in cultured human cells. Here it is shown that L1s insert into transcribed genes and retrotranspose sequences derived from their 3′ flanks to new genomic locations. Thus, retrotransposition-competent L1s provide a vehicle to mobilize non-L1 sequences, such as exons or promoters, into existing genes and may represent a general mechanism for the evolution of new genes.

The human genome is littered with noncoding DNA, often disparaged as “junk DNA.” Much “junk DNA” results from the reverse transcription of cellular RNAs and insertion of the cDNAs into new genomic locations by retrotransposition. L1s make up about 15% of human DNA (1). The majority of L1s cannot retrotranspose, but an estimated 30 to 60 full-length L1s remain retrotransposition-competent (2). These L1s contain a 5′ untranslated region (UTR), two nonoverlapping open reading frames (ORF1 and ORF2), and a 3′ UTR that ends in a polyadenylic acid [poly(A)] tail (3, 4). ORF1 encodes an RNA-binding protein (5), whereas ORF2 encodes an endonuclease (EN) activity (6), a reverse transcriptase (RT) activity (7), and a cysteine-rich (C) domain of unknown function (8) (Fig. 1A). L1 retrotransposition is ongoing because recent insertions have caused diseases in humans and mice (4). L1s also are thought to mobilize Alus and processed pseudogenes, which make up another 10% of human DNA (4, 9). Thus, either directly or through the promiscuous mobilization of cellular RNAs, L1s may be evolutionarily responsible for one-fourth of human DNA (1).

Figure 1

L1s retrotranspose into genes. (A) The retrotransposition assay. Structural features of L1 are described in the text. The 3′ UTR of a retrotransposition-competent L1 (L1.3) was tagged with indicator cassettes designed to detect retrotransposition events into transcribed genes (mneoIRF1to RF3). The construct pJM101/L1.3 RF1 is shown. P indicates the pCMV, and the arrow denotes the transcription start site. SD and SA indicate the splice donor and splice acceptor sites, respectively, of γ-globin intron 2, which interrupts the backward copy of the neo indicator gene (10). Native L1 poly (A)+ (L1pA) and the SV40 poly (A)+ (SV40pA) are depicted by the gray and black lollipops, respectively. An represents the poly(A) tail at the end of the L1 mRNA. The asterisk indicates the position of the artificial splice acceptor, and A′ indicates the poly (A)+ sequence that flanks the neo indicator gene. The predicted structure of an L1 retrotransposition event into a cellular gene is shown. The fusion mRNA is generated by splicing of the preceding exons of a cellular gene to the artificial splice acceptor in theneo indicator gene. Translation of the fusion mRNA results in the production of a functional protein that can transform G418-sensitive (G418S) cells to G418 resistance (G418R). The lowercase letters in the intron represent the sequence of the mneoIRF1 artificial splice acceptor site. The mutated ATA that begins the neo reading frame (gray box) is also depicted. Constructs were assayed for retrotransposition as described (10). (B) Results of the retrotransposition assay. G418R foci were fixed to flasks and stained with Giemsa. Flasks containing 2 to 4 × 106 cells transfected with pJM101/L1.3, pJM101/L1.3 RF1, pJM101/L1.3 RF2, pJM101/L1.3 RF3, and pJM105 are shown. JM105 is an RT-defective allele of L1.2 (10). We also plated 1/10 and 1/100 dilutions of the pJM101/L1.3 assays as additional controls. (C) Sequences at the junctions of the fusion cDNAs. The cellular cDNA-neo fusion junction sequences of clones pJET1 to pJET7 were determined. The approximate size of each cDNA clone is shown. In every instance, the upstream cellular exons spliced to the artificial splice acceptor sequence in the predicted manner. The bold letters indicate the beginning of theneo sequence in each cDNA. The nonbold sequence is derived from the exons to which the neo sequence is fused. The putative ATG initiation codons of pJET1, pJET3, pJET4, pJET5, and pJET6 are underlined. Other details concerning the characterization of the cDNAs are in the text and notes (14–16). (D)Neo-containing fusion cDNAs can confer G418 resistance to cultured HeLa cells. 5 × 105 cells were transfected with the cDNAs from pJET1, pJET3, pJET4, pJET5, and pJET6 (17) in six-well dishes as described (10). At 72 hours, G418 (300 μg/ml) was added, and 10 to 14 days later, G418R foci were fixed to wells and stained with Giemsa. A single representative well from each dish is shown. (a to e) cDNAs of pJET1, pJET3, pJET4, pJET5, and pJET6, respectively. (f) pJM105, an RT-defective allele of L1.2 (10). In identical transient transfections with pCEP4, no foci were ever observed.

We previously tagged candidate L1s with an indicator cassette (mneoI) that could be activated upon retrotransposition to confer G418 resistance (G418R) to transfected human cells. These engineered L1s undergo high-frequency retrotransposition in HeLa cells, and the characterization of four retrotransposition events revealed that they structurally resemble endogenous L1s (10). However, this analysis also yielded other interesting data. First, one insertion occurred into an expressed sequence tag (EST). Second, all four insertions resulted from the retrotransposition of a read-through transcript and transduced 138 base pairs (bp) of 3′ flanking sequence.

To test directly whether L1 retrotransposes into genes, we created a series of indicator cassettes (mneoIRF1 toRF3) that are activated only when the tagged L1 retrotransposes into a transcribed gene (Fig. 1A). We mutated the initiation codon of mneoI from ATG to ATA and replaced the simian virus 40 (SV40) promoter, which drives expression of the retrotransposed mneoI gene, with an artificial splice acceptor (11). To ensure capture of exons in all three reading frames, we inserted zero, one, or two cytosines between the artificial splice acceptor and the mutated ATA initiation codon. Each cassette was subcloned into the 3′ UTR of JM101/L1.3 to create JM101/L1.3 RF1 to RF3. Our strategy predicts that G418Rcolonies should only arise if L1.3 retrotransposes into a cellular gene and the neo gene product is expressed as a COOH-terminal fusion protein with the preceding exons of that gene. In the retrotransposition assay, pJM101/L1.3 RF1 to RF3 each yielded G418R colonies at about 1% of the level of pJM101/L1.3 G418R colonies (Fig. 1B andTable 1). Genomic Southern (DNA) blots demonstrated that the majority of G418R colonies (16 out of 20) resulted from single retrotransposition events to different genomic locations (12).

Table 1

Retrotransposition frequencies of constructs. Individual constructs tested are listed in the first column.N indicates the number of independent transfections. NA indicates not applicable. Retrotransposition frequency, experimental range, and percentage of wild-type activity are reported. Retrotransposition activity was normalized to the appropriate positive control for each different set of experiments.

View this table:

To verify the structure of the predicted in-frame fusion mRNAs, we pooled G418R colonies derived from JM101/L1.3 RF1, isolated polyadenylated RNA, and constructed a cDNA library (13). We screened that library with aneo probe, isolated 13 positive plaques, and sequenced the cDNA inserts. In seven clones (pJET1 to pJET7), theneo cDNA sequence was preceded by a non-L1 ORF that spliced precisely to the mutated ATA initiation codon, indicating use of the artificial splice acceptor (Fig. 1C). In pJET1, L1.3 inserted into intron 7 of the DOC2 tumor suppressor gene (14).In pJET2, L1.3 inserted into the intron preceding or immediately following an alternatively spliced exon in the polypyrimidine tract-binding protein-associated splicing factor (PSF) gene (15). In pJET3 to pJET5, L1.3 inserted into either known genes or genes that shared homology to human ESTs (16). In pJET6 and pJET7, L1.3 inserted into uncataloged sequences that likely represent the partial cDNA sequences of two uncharacterized genes. Because the cDNAs of pJET1, pJET3, pJET4, pJET5, and pJET6 contain putative initiation codons, we asked whether their expression could confer G418 resistance to HeLa cells. We PCR amplified each cDNA (17), subcloned the product into pCEP4, and showed that each cDNA conferred G418 resistance to transfected cells (Fig. 1D). The cDNA clones from pJET8 to pJET13 either represented L1 mRNA expressed from pJM101/L1.3 RF1 or were truncated within the neo sequence and were uninformative (18).

L1s can retrotranspose into either the sense or antisense strand of a gene. Thus, each pJM101/L1.3 RF1 to RF3 retrotransposition event into a gene has a one in six probability of splicing the indicator cassette in frame to the preceding exons. Because each of these constructs retrotransposed at roughly 1% of the frequency of pJM101/L1.3, we estimate that about 6% of L1 retrotransposition events occur into genes. This is a minimum estimate because our assay will not detect insertions that (i) are severely 5′ truncated, (ii) occur into poorly transcribed genes, or (iii) result in the synthesis of enzymatically inactive neo fusion proteins. Because about 15% of the human genome consists of genes (exons plus introns) (19), our results suggest little, if any, bias against genes as sites of L1 retrotransposition. Others have reported that L1s predominate in gene-poor heterochromatin (20), but those analyses reflect selective pressures that have affected L1 accumulation during genome evolution. In cultured cells, our study only detects new retrotransposition events and more accurately reflects L1 integration preferences.

To determine how efficiently L1 read-through transcripts retrotranspose, we subcloned mneoI downstream of the L1.3 native polyadenylation [poly (A)+] signal to create pJM130/L1.3 (21). Here, G418Rcolonies result only from the retrotransposition of L1 RNAs that bypass the L1 poly (A)+ and use the SV40 poly (A)+signal located 2.1 kb downstream of the 3′ end of L1.3 (Fig. 2A). In the retrotransposition assay, pJM101/L1.3 and pJM130/L1.3 yielded G418R colonies at similarly high frequencies (Fig. 2B and Table 1), indicating efficient bypass of the L1.3 poly (A)+.

Figure 2

L1s can transduce flanking DNA. (A) The transduction assay. The mneoI indicator cassette 3′ of the L1.3 poly (A)+ signal was subcloned to create pJM130. G418R colonies result only if an L1 transcript uses the SV40 poly (A)+ signal and retrotransposes. L1 transcripts, which use the L1 poly (A)+and retrotranspose, will lack neo sequences and therefore will not confer G418R to HeLa cells. Features of the L1 are described in Fig. 1A. (B) Results of the transduction assay. G418R colonies were visualized as described in Fig. 1B. Flasks containing cells transfected with pJM101/L1.3, pJM130/L1.3, and pJM130/L1.3 ΔCMV are shown. (C) The SV40 poly (A)+ is dispensable for L1-mediated transduction. Native L1.3 elements lacking pCMV and SV40 poly (A)+ and containing the mneoI gene either in the 3′ UTR [pJM140/L1.3 ΔCMVΔSV40 poly (A)+] or 3′ of the L1 poly (A)+ [pJM141/L1.3 ΔCMVΔSV40 poly (A)+] were assayed for retrotransposition. Flasks contain (a) pJM140ΔCMVΔSV40 poly (A)+, (b) 1/10 dilution of cells plated in (a), (c) 1/100 dilution of cells plated in (a), (d) JM141/L1.3ΔCMVΔSV40 poly (A)+, and (e) JM105. The retrotransposition frequency of pJM141/L1.3 ΔCMVΔSV40 poly (A)+ is about 1% of that of pJM140/L1.3 ΔCMVΔSV40 poly (A)+.

To test whether the cytomegalovirus promoter (pCMV) is responsible for artifactual retrotransposition of read-through L1 transcripts, we deleted it from pJM101/L1.3 and pJM130/L1.3 and drove L1 expression from its 5′ UTR (21). The resultant constructs, pJM101/L1.3ΔCMV and pJM130/L1.3ΔCMV, again retrotransposed at high frequencies (Fig. 2B and Table 1). Thus, L1 can efficiently transduce 3′ flanking sequences when transcription is driven from its native promoter.

We next determined if L1.3 could retrotranspose a flanking exon into a transcribed gene. We subcloned mneoIRF1 toRF3 downstream of the L1.3 poly (A)+ to create pJM130/L1.3 RF1 to RF3 (22). Here, G418R colonies arise only if a read-through transcript retrotransposes into an expressed gene [as in Fig. 1A, except thatmneoIRF1 to RF3 are 3′ to the L1 poly (A)+]. pJM130/L1.3 RF1 to RF3 retrotransposed at 0.1 to 2.7% of the level of pJM130/L1.3 (Table 1). Thus, L1 retrotransposition can mediate the mobilization and duplication (shuffling) of exons in cultured cells.

We hypothesized that a poly(A) tail, and not sequences in the L1 3′ UTR, is critical for retrotransposition (10). To determine whether the length of the L1 poly(A) tail affects transduction, we subcloned the natural 3′ end from another active human L1, LRE1 (3), onto pJM101/L1.3 and pJM130/L1.3 (23). The poly(A) tail in the resultant constructs (pJM140/L1.3 and pJM141/L1.3) was lengthened from 4 to 23 residues. Both pJM140/L1.3 and pJM141/L1.3 retrotransposed at similar frequencies (Table 1), and again deletion of pCMV had little effect (12). Thus, lengthening the L1.3 poly(A) tail neither affects transduction efficiency nor allows substantial competition with the SV40 poly (A)+ for polyadenylation or L1 RT binding.

To determine whether transduction depends on the presence of a consensus downstream poly (A)+ signal, we deleted the SV40 poly (A)+ signal from pJM140/L1.3 ΔCMV and pJM141/L1.3 ΔCMV (23). The pJM141/L1.3 ΔCMVΔSV40 poly (A)+ construct could retrotranspose, but the frequency was reduced to about 0.4 to 1% of that of pJM140/L1.3 ΔCMVΔSV40 poly (A)+ (Fig. 2C and Table 1). We propose that L1 poly (A)+ is a low-affinity poly (A)+ site and can be bypassed if fortuitous, higher affinity poly (A)+ sites are present in 3′ flanking sequences (24). A “weak” poly (A)+ may allow L1s to reside within introns and not wreak havoc on gene expression.

L1 retrotranspositions derived from read-through transcripts have been found in vivo. These retrotranspositions include two mutagenic insertions (25, 26), an ancient insertion into an intron of the dystrophin gene (27) and an ancient insertion that transduced an exon with sequence similarity to exon 9 of the CFTR gene (28). A full-length L1 transcript isolated from ribonucleoprotein particles of mouse F9 cells also contained 1 kb of 3′ flanking sequence (29).

We conclude that (i) L1 can retrotranspose into genes, (ii) L1 can readily transduce DNA from its 3′ flank to new genomic locations, and (iii) L1-mediated transduction can create new genes.

Mechanisms to generate new genes include point mutation, DNA-based exon shuffling (30), large-scale DNA duplication (31), unequal crossing over (32), DNA translocation (33), site-specific recombination (34), and functional processed pseudogene formation (35–37). L1-mediated transduction is an additional potential source of genomic diversification (Fig. 3). It occurs through an RNA intermediate, does not require homologous recombination, and allows dispersal of non-L1 sequences to new genomic sites. Shuffled sequences could be promoters, enhancers, or exons, and their dispersal could lead to the creation of new genes or alter the expression of existing genes. Furthermore, because L1 retrotranspositions are often 5′ truncated, some transductions may lack an L1 sequence.

Figure 3

A model for how L1 retrotransposition can create genomic diversity. An active L1 (top) resides at the chromosomal location shown by the white rectangle and is flanked by target site duplications (gray arrows). Native L1 poly (A)+ and a fortuitous poly (A)+ present in the flanking DNA sequence are depicted by the gray and black lollipops, respectively. The predicted structure of three transduction events is shown. They are (from left to right) (i) non-L1 plus truncated L1, (ii) non-L1 only, and (iii) non-L1 plus full-length L1. Different chromosomes are indicated by different shading patterns. The chromosomal location of the transduced elements is indicated by a black rectangle. The black arrows depict the newly formed target site duplications. The newly formed poly(A) tails are depicted in black. Notably, because most L1 retrotranspositions are 5′ truncated, it is possible that some transduction events are not accompanied by L1 sequences (for example, the middle event shown).

The magnitude of L1-mediated transduction likely depends on the number of active L1s in a genome, their genomic location, and their rate of retrotransposition. The amount of flanking DNA that can be transduced, the extent to which transduction affects genome evolution, and the timing of this process are subject to debate. However, the finding that about 3000 active L1s exist in the present-day mouse genome suggests that L1-mediated transduction may still occur in some organisms (38).

  • * Present address, Departments of Human Genetics and Internal Medicine, University of Michigan Medical School, Ann Arbor, MI, 48109–0650 USA.

  • To whom correspondence should be addressed. E-mail: moranj{at} or kazazian{at}


View Abstract

Navigate This Article