Group II Introns Designed to Insert into Therapeutically Relevant DNA Target Sites in Human Cells

See allHide authors and affiliations

Science  21 Jul 2000:
Vol. 289, Issue 5478, pp. 452-457
DOI: 10.1126/science.289.5478.452


Mobile group II intron RNAs insert directly into DNA target sites and are then reverse-transcribed into genomic DNA by the associated intron-encoded protein. Target site recognition involves modifiable base-pairing interactions between the intron RNA and a >14-nucleotide region of the DNA target site, as well as fixed interactions between the protein and flanking regions. Here, we developed a highly efficientEscherichia coli genetic assay to determine detailed target site recognition rules for the Lactococcus lactis group II intron Ll.LtrB and to select introns that insert into desired target sites. Using human immunodeficiency virus–type 1 (HIV-1) proviral DNA and the human CCR5 gene as examples, we show that group II introns can be retargeted to insert efficiently into virtually any target DNA and that the retargeted introns retain activity in human cells. This work provides the practical basis for potential applications of targeted group II introns in genetic engineering, functional genomics, and gene therapy.

Group II introns are catalytic RNAs that function as mobile genetic elements by inserting directly into target sites in double-stranded DNA (1, 2). This mobility is mediated by a multifunctional intron-encoded protein (IEP) that has reverse transcriptase (RT), RNA splicing (maturase), and DNA endonuclease activities (2–5). After translation, the protein promotes RNA splicing, presumably by facilitating formation of the catalytically active intron RNA structure. It then remains associated with the excised intron to form a ribonucleoprotein (RNP) complex, which has DNA endonuclease/integrase activity. In homing, the major mobility pathway, the excised intron RNA in this complex reverse-splices into a specific target site in double-stranded DNA (6–8). The associated IEP then cleaves the opposite strand in the 3′ exon of the DNA target, 9 or 10 nucleotides (nt) downstream of the intron insertion site, and uses the 3′ end of the cleaved strand as a primer to reverse-transcribe the inserted intron RNA. The resulting cDNA copy of the intron is incorporated into the recipient DNA primarily by recombination mechanisms in yeast mitochondria (7) and by repair mechanisms in bacteria (8). Homing frequencies approach 100% for both fungal mitochondrial and bacterial introns (7,8).

To initiate mobility, the intron-encoded RNP complex uses both its RNA and protein components to recognize specific sequences in its DNA target site (9–11). For the well-studied Lactococcus lactis Ll.LtrB intron, the DNA target site extends from position −26 in the 5′ exon (E1) to position +9 in the 3′ exon (E2; positions numbered from the intron insertion site) (Fig. 1A) (11). A 14-nt region of the DNA target site (E1 −13 to E2 +1) is recognized primarily by base pairing with the intron RNA. This region includes short sequence elements denoted IBS2, IBS1, and δ′, which are complementary to intron sequences EBS2, EBS1, and δ (IBS and EBS refer to intron and exon binding sites, respectively) (Fig. 1, A and B). These same sequence elements are involved in base-pairing interactions required for RNA splicing (1). The regions of the DNA target site flanking the IBS and δ′ sequences are recognized by the IEP. The protein first recognizes a small number of nucleotide residues in the distal 5′ exon region (E1 −26 to −11) and appears to cause local DNA unwinding, enabling the intron to form base pairs with the IBS and δ′ sequences for reverse splicing. Antisense-strand cleavage occurs after reverse splicing and requires additional interactions between the protein and 3′ exon. The finding that at least a 14-nt region of the DNA target site is recognized by base pairing with the intron RNA raises the possibility that group II introns can be retargeted to recognize any 14-nt DNA sequence, juxtaposed to the fixed positions recognized by the IEP. By using crude target site recognition rules deduced from biochemical experiments, the Ll.LtrB intron could in fact be retargeted to specific sites in a plasmid-borne E. coli thyA gene. At best, however, these retargeted introns were very inefficient, presumably reflecting the cumulative effect of multiple changes from the normal target site sequence and/or additional constraints that must be satisfied for optimal base-pairing interactions (11).

Figure 1

Escherichia coli genetic assay based on the Ll.LtrB intron for analyzing group II intron–DNA target site interactions. (A) Natural Ll.LtrB DNA target sequence from position −30 to +15 and base-pairing interactions with the intron RNA. Sequence elements IBS2 and IBS1 in the 5′ exon and δ′ in the 3′ exon of the DNA target are recognized primarily by base pairing with sequence elements EBS2, EBS1, and δ located in domain I of the intron RNA. The intron insertion site in the top (sense) strand and the endonuclease cleavage site in the bottom (antisense) strand are indicated by arrows. (B) Schematic of the Ll.LtrB intron showing base-pairing interactions EBS1-IBS1, EBS2-IBS2, and δ-δ′ between the intron and flanking exons. The inset shows the location of the LtrA ORF and the T7 promoter introduced into intron domain IV in donor plasmids. (C) Genetic assay. The donor plasmid pACD-LtrB is a CamR pACYC184 derivative containing the full-length Ll.LtrB intron and flanking exons, with a phage T7 promoter inserted downstream of the LtrA ORF in intron domain IV (12). The intron and flanking exon sequences (E1 and E2) are cloned behind a T7lac promoter, and E. coli rrnB T1 and T2 transcription terminators are positioned downstream of the intron. The recipient pUCR-LtrB/Tet is a compatible AmpR plasmid with an Ll.LtrB target sequence (ligated ltrB exons E1 and E2) cloned upstream of a promoterless tetR gene (13). An E. coli rrnB T1 transcription terminator, which terminates both E. coli and T7 RNA polymerase, is inserted upstream of the target site, and anrrnB T2 terminator, which terminates E. coli but not T7 RNA polymerase, is inserted between the target site and the tetR gene. A phage T7 TΦ terminator is inserted downstream of thetetR gene to terminate T7 RNA polymerase. Movement of the intron carrying the phage T7 promoter into the DNA target site activates expression of thetetR gene. (D) Mobility assay using pACD-ΔORF+ORF1. This plasmid has a deletion in the loop of intron domain IV, which removes most of the LtrA ORF, and the LtrA protein is expressed separately from a position downstream of the 3′ exon. This configuration gives higher mobility frequencies approaching 100%.

To determine more detailed target site recognition rules, it is necessary to test a large number of different nucleotide combinations. For this purpose, we developed a new E. coli genetic assay (Fig. 1C) in which a modified Ll.LtrB intron containing a phage T7 promoter near its 3′ end is expressed from a T7lac promoter in a chloramphenicol-resistant (CamR) donor plasmid (pACD-LtrB) (12). A compatible ampicillin-resistant (AmpR) recipient plasmid (pUCR-LtrB/Tet) contains the Ll.LtrB target site (ligated E1-E2 sequence) inserted upstream of a promoterless tetracycline resistance (tetR ) gene (13), so that movement of the intron into the target site activates the expression of that gene. To assay mobility, we cotransformed the donor and recipient plasmids into an E. coli(DE3) strain, which contains an isopropyl-β-d-thiogalactopyranoside (IPTG)–inducible T7 RNA polymerase (14). After induction with 2 mM IPTG, cells cotransformed with the wild-type donor, and recipient plasmids gave 10 to 40% AmpRTetR colonies indicative of mobility events, compared to 0.001% for cells transformed with the AmpR recipient plasmid alone. Correct integration of the intron was confirmed by DNA sequencing of 10 mobility events.

As expected, efficient mobility was abolished by mutations that delete a large segment of the intron open reading frame (ΔORF), inhibit the RT (YAAA) or DNA endonuclease (ΔZn or ΔConZn) activities of the IEP, or inhibit the ribozyme activity of the intron RNA (ΔD5) (15). Further, in experiments using a “twintron” construct, in which a self-splicing group I intron (thetd intron) was inserted into the group II intron (8), 95% of the mobility products had spliced thetd intron; this result confirmed that mobility occurs through an RNA intermediate (16). Deletion analysis showed that a target site extending from positions −25 to +9 was sufficient for maximal mobility, whereas further 5′ deletions to −13 reduced mobility by a factor of ∼7000 and 3′ deletions to +4 reduced mobility by a factor of ∼240.

Use of the donor plasmid pACD-ΔORF+ORF1 led to a marked increase in mobility frequency (Fig. 1D) (17). This plasmid has a large deletion in the “loop” of intron domain IV, which removes most of the LtrA ORF, and expresses the LtrA protein separately from a position downstream of the 3′ exon. This configuration gave very high mobility frequencies (∼70% TetR colonies), even without IPTG induction to stimulate donor plasmid transcription, and the frequencies increased to 100% with a low concentration of IPTG (100 μM). The increased mobility frequencies appear to be due to greater resistance of the ΔORF intron to nucleolytic cleavage in domain IV rather than increased expression of the LtrA protein (18).

The very high mobility frequencies enabled us to determine detailed target site recognition rules for the wild-type Ll.LtrB intron. We did so by performing mobility assays with recipient plasmids in which positions −30 to +15 of the Ll.LtrB target site were partially randomized (30% “doped” with non–wild-type nucleotide residues) (19). The data showed that the wild-type nucleotide residues between positions −24 and +7 were selected to different degrees. In the protein-recognition regions, the most critical positions were readily identified as G −21 and T +5, which were shown previously to be stringently required for reverse splicing into the DNA target site and antisense-strand cleavage, respectively (11). In the region of the DNA target site potentially recognized by base pairing with the intron RNA, the data showed strong or moderate selection against nucleotide substitutions at positions −13 to −8 in IBS2, −6 to −1 in IBS1, and +1 to +4 in the δ′ region. Comparison of the number of potential base pairs in selected target sites and the original recipient pool showed some selection for base pairing at each position between −13 and +4, except for −7, which instead showed clear selection against base pairing (Fig. 2A). Over the 16 positions potentially recognized by base pairing, 99% of the selected target sites have 13 or more potential base pairs, and none has less than 12 potential base pairs (Fig. 2B). Because δ-δ′ is not essential for RNA splicing in vitro or in vivo (20), the extended δ-δ′ interaction, potentially involving positions +1 to +4, is presumably required primarily for reverse splicing into DNA.

Figure 2

Base-pairing requirements for different positions of the DNA target site. A mobility assay was performed with the wild-type donor plasmid pACD-ΔORF+ORF2 (17) and a recipient plasmid pool in which DNA target site positions between −30 and +15 were partially randomized (“doped”) to contain 70% of the wild-type nucleotide and 10% each of the three mutant nucleotides (19). The number of potential base pairs with the intron RNA was compared in active target sites (black bars) and the original recipient pool (white bars). (A) Percentage of target sites having a potential base pair with the intron RNA at each position between −13 and +4. (B) Percentage of target sites having the indicated number of potential base pairs over the indicated interval. Selection for base pairing in the active target sites is evident at all positions except −7.

As an initial test of the targeting rules, we designed group II introns targeted to different regions of HIV-1LAI provirus and the human gene encoding the CCR5 chemokine receptor. The latter is required together with CD4 for infection of macrophages by HIV-1, and it has been shown that individuals homozygous for CCR5 mutations are resistant to HIV-1 infection while having no other pathologies (21, 22). Consequently, disabling CCR5 has been considered a means to block HIV-1 infection and the progression of acquired immunodeficiency syndrome.

For targeting, the HIV-1LAI and CCR5 DNA sequences were scanned for the best matches to the fixed positions recognized by the IEP, and the intron RNA was then modified to form base pairs with the adjacent sequences for the EBS-IBS and δ-δ′ interactions (positions −11 to −8 and −6 to +3 or +4). The data from the initial selection experiment with partially randomized DNA target sites (19) were used to obtain a quantitative measure of the ability to substitute a nucleotide residue at a particular position. This “mutability value” was calculated by comparing the ratios (R) of mutant to wild-type nucleotide residues in active target sites and the initial recipient pool, using the expression [(Rmut/wt)active/(Rmut/wt)pool]−1 (23). To select target sites in the HIV-1LAI and CCR5 DNAs, we initially used a mutability value of −0.6 as a lower limit for nucleotide substitutions at positions recognized by the IEP (taken as −30 to −12 and +4 to +7 in initial experiments). The resulting search sequence, 5′-N7(G,T,C)N(G,A)(A,T,C)- N2(G,A)(A,T,C)(G,A,C)N2CN11↓N3(G,A,C)- T(A,T,C)N, where N represents any of the four bases and the downward arrow indicates the intron insertion site, gave 18 matches in HIV-1LAI and two in CCR5. The HIV-1 target sites were ordered by using successively more stringent cutoff values, and introns targeted to the two best sites along with the twoCCR5 sites were tested for their ability to insert into HIV-1LAI and CCR5 DNA targets in the E. coli genetic assay (Fig. 3) (24). Also tested was HIV1-4069s, in which position −12 did not meet the −0.6 cutoff, to determine whether the GC base pair at this position, which was highly conserved in the initial selection experiment (19), could be replaced with a compensatory AT base pair. Because the retargeted introns have modified EBS and δ sequences, complementary IBS and +1 sequences were introduced into the donor plasmid to ensure efficient splicing.

Figure 3

Design and selection of group II introns that insert into specific DNA target sites. (A) Maps showing group II intron insertion sites in the HIV-1 provirus and humanCCR5 gene. Insertion sites in the top (sense) and bottom (antisense) strands are indicated by arrows above and below the target DNA, respectively. Introns are identified by the position number in the target site (HIV-1 sequence, GenBank accession number K02013;CCR5 sequence, GenBank accession numberAF031237), followed by “s” or “a” indicating the sense or antisense strand, respectively. The numbers in parentheses indicate mobility frequencies in the E. coli genetic assay in the presence of 100 μM IPTG. The intron HIV1-54/9186a has integration sites in each long terminal repeat (LTR). (B) DNA target site sequences and base-pairing interactions for designed and selected introns. The wild-type Ll.LtrB target site and base-pairing interactions are shown above for comparison. Nucleotide residues in the HIV-1 andCCR5 target sites that match the wild-type sequence are boxed. Mobility frequencies in the presence or absence of 100 μM IPTG (mean ± SD for at least two experiments) are shown to the right. Mobility events were confirmed by sequencing a region extending from a position downstream of the intron's EBS sequences through the 5′ junction with the target DNA, using primer LtrBA2 (complementary to intron positions 301 to 326). For determination of mobility frequencies, the selected introns were reconstructed in the donor plasmid by PCR and tested in the E. coli genetic assay with the indicated recipient plasmids containing the HIV-1 orCCR5 target sites (26). The selected introns HIV1-54a/9186a and 2654a have mismatches in the EBS-IBS and δ-δ′ interactions, and their mobility frequencies increased substantially when these were “corrected.”

All of the retargeted introns inserted at precisely the correct positions in the HIV-1 and CCR5 target sites, as confirmed by sequencing multiple events. Two introns, HIV1-4021s and HIV1-4069s, inserted at high frequencies (>60% after IPTG induction); the remaining three introns (CCR5-1019s, CCR5-759a, and HIV1-3994s) inserted at lower frequencies (0.16 to 10.6%). The two most efficient introns have compensatory changes at EBS-IBS and δ-δ′ positions −12, −6, or +1, where the wild-type nucleotide was strongly conserved in the initial selection (19); this finding indicates that protein recognition at these positions is not essential for efficient integration. The less efficient introns presumably have deleterious combinations of nucleotides that are not readily predicted at this stage.

To alleviate the necessity of predicting such deleterious combinations, we developed an alternate, selection-based approach in which the desired DNA target site is simply cloned in the recipient vector upstream of the promoterless tetR gene, and introns that insert into that site are selected from a combinatorial library having randomized target site recognition sequences (EBS and δ) (25, 26). The corresponding IBS sequences in the 5′ exon of the donor plasmid library were also randomized to eliminate selection for the wild-type EBS sequences during RNA splicing in vivo. Although the requirement for base pairing between the two sets of randomized sequences in unspliced precursor RNA reduces the complexity of the spliced intron pool, the approach was successful because of the very high integration efficiency in this system.

Single transformations with the combinatorial library yielded 13 introns that inserted at different positions in the HIV-1LAI and CCR5 target sites. The introns were retested individually and were shown to integrate into their target sites at frequencies ranging from <0.001% to 53%. Data for a subset of the selected introns are summarized in Fig. 3. Most of the efficient introns insert into target sites having the G −21 and T +5 residues, which were found to be critically required for protein recognition [see above and (11)]. However, introns HIV1-54a/9186a and CCR5-24s deviate at these positions but still insert at frequencies of 3 to 5%, possibly reflecting partial compensation by other target site nucleotide residues. The most efficient CCR5 target site has a disfavored nucleotide residue at position −16, which excluded it from the initial computer search for potential target sites. Two of the selected HIV-1 introns have mismatches in EBS-IBS and δ-δ′ interactions, and their integration efficiencies increased substantially when these were “corrected.” Such correction can be effected routinely with the use of appropriate polymerase chain reaction (PCR) primers in the process of recloning the selected intron into the donor plasmid. Although the selected introns have a range of mobility frequencies, it should be relatively straightforward to enrich for the most efficient introns by carrying out multiple rounds of selection.

Ultimately, we hope to impede HIV-1 replication in patients by disrupting the HIV-1 coreceptor CCR5 gene or HIV-1 proviral DNA through targeted intron insertion into DNA. To determine whether group II intron RNPs can function in a human cellular environment, we cotransfected 293 embryonic kidney cells or CEM T cells with plasmids containing either CCR5 or HIV-1 DNA targets and RNP particles containing the retargeted introns (CCR5-332s or HIV1-4069s), which had been packaged separately into liposomes (27). PCR analysis of DNA isolated from the transfected cells gave products expected for integration of the introns into the DNA target sites (Fig. 4A) (28). By contrast, such products were not detected with DNA from mock-transfected cells, from the transfection mix incubated without cells, or from cells that were transfected separately with either target DNA or RNPs and mixed before DNA extraction. Restriction enzyme analysis and sequencing confirmed that the retargeted introns had inserted at the correct locations in theCCR5 and HIV-1 DNAs (Fig. 4, A and B).

Figure 4

Intron insertion into CCR5 and HIV-1pol genes in human cell lines. (A) PCR amplification of integration junctions. Integration events into theCCR5 and HIV-1 pol genes are shown schematically at the top. Primer sites are designated by arrows, and restriction sites are labeled. PCR products were analyzed in a 2% agarose gel. Solid arrowheads indicate PCR products corresponding to 5′ integration junctions in the CCR5 (left) and HIV-1 pol(right) genes. Open arrowheads indicate the restriction fragments of integration products. The left lane shows molecular mass markers. (B) Representative sequencing gels for intron integration into the CCR5 (left) and HIV-1 pol (right) genes. The arrows denote integration junctions.

Our results establish that group II introns can be retargeted to insert efficiently into desired DNA target sites and that the intron RNP particles retain activity in human cells. Because the number of obligatory fixed positions is relatively small (<5), potential target sites should exist in most genes, and the overall target sequence should be unique even in large genomes. On the basis of the present results, target site recognition appears to be sufficiently malleable to obtain group II introns that insert into any desired gene. Attractive features of this system for genetic manipulation are the wide host range of the Ll.LtrB intron, an insertion mechanism that is independent of host-cell recombination functions, and the ability to readily introduce additional genetic markers into intron domain IV (5, 8). Introns inserted in the antisense orientation cannot splice and yield unconditional disruptions, whereas those inserted in the sense orientation could be used for conditional disruptions by linking intron splicing to synthesis of the LtrA protein under the control of an inducible promoter. Intron libraries with randomized target site recognition sequences would generate insertions at random chromosomal locations. In other work, we have used group II introns for efficient chromosomal gene disruption in bacteria (29), and experiments are in progress to determine whether eukaryotic chromosomal genes can be targeted similarly by optimizing the introduction or intracellular expression of group II intron RNP particles. Such methods would greatly facilitate analysis of eukaryotic organisms that lack efficient homologous recombination systems (flies, worms, mice, plants, and human cells) and potentially have direct therapeutic applications.

  • * To whom correspondence should be addressed. E-mail: lambowitz{at}


View Abstract

Navigate This Article