Extrachromosomal MicroDNAs and Chromosomal Microdeletions in Normal Tissues

See allHide authors and affiliations

Science  06 Apr 2012:
Vol. 336, Issue 6077, pp. 82-86
DOI: 10.1126/science.1213307

This article has a correction. Please see:


We have identified tens of thousands of short extrachromosomal circular DNAs (microDNA) in mouse tissues as well as mouse and human cell lines. These microDNAs are 200 to 400 base pairs long, are derived from unique nonrepetitive sequence, and are enriched in the 5′-untranslated regions of genes, exons, and CpG islands. Chromosomal loci that are enriched sources of microDNA in the adult brain are somatically mosaic for microdeletions that appear to arise from the excision of microDNAs. Germline microdeletions identified by the “Thousand Genomes” project may also arise from the excision of microDNAs in the germline lineage. We have thus identified a previously unknown DNA entity in mammalian cells and provide evidence that their generation leaves behind deletions in different genomic loci.

Single-nucleotide polymorphisms and copy-number variations are known sources of genetic variation between individuals (15), but there is also great interest in variations that arise during generation of somatic tissues like the mammalian brain, leading to genetic mosaicism between somatic cells. To identify sites of intramolecular homologous recombination during brain development, we searched for extrachromosomal circular DNA (eccDNA) derived from excised chromosomal regions in normal mouse embryonic brains.

We purified eccDNA from nuclei of embryonic day 13.5 (ED13.5) mouse brain and removed linear DNA by digestion with an adenosine 5′-triphosphate (ATP)–dependent exonuclease (6) (fig. S1, table S1, and SOM methods). Multiple displacement amplification (MDA) with random primers (7, 8) enriched circular DNA by rolling-circle amplification. The linear products of MDA were sheared to 500–base pair (bp) fragments and cloned into a plasmid, and clones were sequenced. Out of 93 clones, 73 contained direct repeats of several hundred base pairs (fig. S2), as would be expected from rolling-circle amplification of circles that are a few hundred base pairs long. Only one copy of the repeat sequence was present in the mouse genome (figs. S2 and S3), indicating that the direct repeats were derived from unique nonrepetitive DNA in the genome and could have been generated by rolling-circle amplification of a circularized form of genomic DNA.

Three sequences that appeared more than 2 times in the 73 clones were chosen to confirm the circular nature of the extrachromosomal DNA before any MDA. Outward-directed primers yielded polymerase chain reaction (PCR) products from 10% of total extrachromosomal DNA (without any MDA), but not from linear genomic DNA for two out of the three sequences (Fig. 1A). The PCR products from outward-directed primers had the same junctions as seen between repeats in the MDA products of the extrachromosomal DNA (Fig. 1B). These results are consistent with the circularization of linear genomic DNA to produce eccDNA.

Fig. 1

Tiny circular DNA sequences are detected in the extrachromosomal DNA fraction. (A) Outward-directed PCR primers (Out) amplified DNA fragments from extrachromosomal DNA (E), but not from genomic DNA (G). DNA was amplified by inward-directed PCR primers (In) from both E and G. (B) Sequencing of fragments amplified by Out primers on extrachromosomal fraction. Underlined sequences indicate primers. Junctions between red and blue sequences were the same as that observed in clones in fig. S2. (C) Length distribution of microDNAs from various tissues and cell lines. The library abbreviations are explained in SOM. (D) Electron microscopy (EM) of double-stranded microDNA examined by the cytochrome c drop-spreading method (18) (50 nm = 150 bp). (E) EM of single-stranded microDNA after binding with the T4 gene 32 single-stranded DNA binding protein (19).

To determine the number, size, nature, and source of these short eccDNA sequences, we isolated eccDNA from ED 13.5 mouse brain, heart, and liver, adult mouse brain, mouse (NIH3T3), and human (HeLaS3 and U937) cell lines (table S1). After MDA of the eccDNA, ~500-bp fragments of the amplified DNA were subjected to paired-end sequencing. As a negative control, chromosomal DNA from embryo mouse brain nuclei was treated in a manner identical to that for treatment of the eccDNA fraction. We also examined eccDNA fraction from Saccharomyces cerevisiae by exactly the same procedure (SOM text). Circular DNAs were identified by two different algorithms that were dependent on the identification of junctional tags created by the circularization (fig. S4 and SOM methods). Tens of thousands of unique sequences in the genome were identified as yielding eccDNA (table S2), and their total yield was 0.1 to 0.2% by weight of chromosomal DNA in normal tissue. By contrast, the negative control mouse chromosomal DNA yielded only 114 circles, all arising from contamination by extrachromosomal DNA, because the same circles were abundant in the ecc libraries. No circles were detected in the S. cerevisiae extrachromosomal DNA.

The circular DNA from mouse tissues and cell lines were 80 to 2000 bp long, although >50% were in the 200- to 400-bp range, with clear peaks in the brain and liver at ~200 and ~400 bp (Fig. 1C). In the two human cancer cell lines, where we identified many more circular DNAs, the length distribution also peaked at 200 and 400 bp but had additional peaks with a periodicity of 150 bp (Fig. 1C). The circular DNAs were uniquely mapped to the genome and were not derived from repetitive sequences. These DNAs were therefore different from previously reported eccDNAs that were a few hundred to millions of bases long and derived from chromosomal repetitive sequences, intermediates of mobile elements or viral genomes (911). On the basis of their small size and derivation from unique genomic sequence, we named this family of DNA “microDNA.”

To detect the 200- to 400-base-long microDNAs in cells by a fourth method, we directly examined by electron microscopy the eccDNA fraction from mouse brain, after exonuclease digestion but without rolling-circle amplification. Double-stranded microDNAs that are several hundred bp long were easily detected (Fig. 1D and fig. S5, A and B). We also found single-stranded microDNA visualized after the treatment of DNA by single-stranded DNA binding protein, gp32 (Fig. 1E and fig. S5, A and B). The double- and single-stranded microDNAs were equivalent in number. More than 98% of the circular DNA from mouse brain was small (<1 kb) (SOM text), making this the dominant population of eccDNA in normal somatic tissue.

Thus, PCR with outward-directed primers (Fig. 1, A and B) or electron microscopy (Fig. 1, D and E) on extrachromosomal DNA fraction without MDA confirmed the presence of short circles that were revealed by Sanger sequencing (figs. S2 and S3) or ultrahigh-throughput sequencing (Fig. 1C and fig. S4) of MDA products.

The sources of the microDNAs from the embryo mouse brain (EMB1) were highly enriched in genic regions, especially 5′ regions of genes, exons, and CpG islands (Fig. 2A). A similar trend was also observed in microDNA from other mouse tissues and mouse and human cell lines (fig. S6). Furthermore, the 55% GC content of microDNAs is higher than the 50% GC content of the immediate upstream or downstream flanking regions and the 45% GC composition of the entire genome (Fig. 2B and figs. S7 and S8). The starts and ends of the circles revealed 2- to 15-bp direct repeats of microhomology (Fig. 2C and fig. S9). In the EMB1 library, 37% of the microDNA has this microhomology, whereas in the random model (SOM methods), <3% of the shuffled microDNAs had microhomology of ≥2 bp near the ends (P < 0.0001) (Fig. 2D). Direct repeats were similarly present at the ends of the microDNA from all mouse tissues and human cell lines (Fig. 2D).

Fig. 2

Properties of the loci that give rise to microDNAs. (A) Enrichment of microDNAs observed in the indicated genomic region relative to the expected percentage based on random distribution. (B) Distribution of GC composition in microDNAs in the EMB1 library and their up- and downstream regions (of same length as microDNA). Vertical line: the genomic average GC content. (C) Presence of microhomology near the start and end of a microDNA. “MicroDNA island (blue curve)” is a contiguous stretch of the genome to which the PE-tags map uniquely and correctly. Direct repeats of 2 to 15 bp (red letters) were observed at the junction of the circle (uppercase) with flanking genomic DNA (lowercase). (D) Direct repeats are enriched in different microDNA libraries compared to the random model (RM), generated from the EMB1 sequences. (E) Intersection of microDNAs from EMB1 with positioned nucleosome-occupied regions in the mouse liver (14). Obs: observed overlap with nucleosome-occupied DNA; Exp: expected overlap of 1000 randomizations of each microDNA in the library (P < 0.0001). A similar enrichment is seen with other microDNA libraries (fig. S10). Right: Sequence of a microDNA with A/TA/T periodicity.

The lengths of microDNAs from cancer cell lines show a pronounced periodicity of 150 bp (Fig. 1C), consistent with the possibility that nucleosome wrapping of DNA may contribute to microDNA generation. In addition, although microDNAs are rich in GC content, AA, AT, or TT dinucleotides were found along the length of many circles with a periodicity of 9 to 11 bp (example in Fig. 2E). GC richness periodically punctuated by AA, AT, or TT dinucleotides is a feature of sequences preferentially assembled into nucleosomes (12, 13). Around 50 to 60% of microDNAs in the different libraries overlapped by ≥15 bases with 25-nucleotide tags marking the locations of positioned nucleosomes determined in the mouse liver (14) (Fig. 2E and fig. S10) (P < 0.001 in “t” test from random distribution).

The features of these microDNAs are completely different from those of the sequences obtained from chromosomal DNA, suggesting that the specific characteristics of microDNA are not an artifact of random sampling of cellular DNA by high-throughput sequencing (fig. S11, a to c, and SOM text).

Cells that release a double-stranded circular DNA may be expected to suffer a microdeletion in the source genomic locus. A search for such microdeletions is complicated by the likelihood that different cells will yield different microDNAs, so that a tissue will be mosaic for microdeletions. We therefore selected two genomic loci that yielded microDNAs in multiple brain libraries. One was 20 kb at the 5′ end of the KCNK3 gene in chromosome 5 (30,890,697 to 30,910,805, NCBI37/mm9) enriched by PCR (Fig. 4B), and another was 160 kb on chromosome 10 (80,213,587 to 80,372,454, NCBI37/mm9) enriched by Anchored ChromPET (15). The strategy for finding microdeletions in the selected loci is given in Fig. 3A and the SOM methods. A total of 30 deletions were detected (23 from the KCNK3 locus and 7 from the chromosome 10 locus) (Fig. 3A and fig. S13). Direct repeats were observed at both ends of 25 of the 30 microdeletions (Fig. 3B and fig. S13). The GC composition, length distribution, and AA, AT, or TT periodicity of the microdeletions were also similar to those observed for the microDNA (Fig. 3C and figs. S12 and S13). The results suggest that microdeletions occur in an average of 1 in 2000 chromosomal DNA molecules (SOM text) at susceptible genomic loci in somatic tissues, giving rise to genetic variability between individual normal somatic cells.

Fig. 3

Microdeletions in genomic loci known to yield microDNAs. (A) Algorithm for finding microdeletions in genomic DNA. Details are in the SOM. (B) Microdeletions found in the KCNK3 locus. DNA spanning the indicated locus was amplified from 200,000 copies of 6-month-old mouse brain genomic DNA, and paired-end-sequenced. White square is KCNK3 exon1, and solid line is KCNK3 intron1. Blue squares are positions of microDNAs identified in three independent embryonic brain libraries, and red squares are microdeletions found in the genome in this study. (C) Direct repeats observed near the junctions of microdeletions. (D) GC composition of the microdeletions identified in the two loci. The deleted sequences were rich in GC content compared to the genomic average of 46%.

Fig. 4

Germline deletions of <1000 bp in the Thousand Genomes Project have properties similar to those of microDNAs. (A) Length distribution peaks at 100 and 350 bp. (B) Deletions in genic areas are enriched in 5′UTRs, exons, CpG islands, and regions 200 bp upstream from genes. (C) GC content of deletion and upstream and downstream regions is greater than the genomic average. The upstream and downstream sequence was of the same length as the deletions. (D) Seventy percent of the microdeletions had flanking direct repeats. Length distribution of the direct repeats is shown. Direct repeats ≥15 bp are shown at 15 bp.

The widespread occurrence of microDNAs led us to consider whether microdeletions in germline sequence could also result from the excision of microDNAs. Indeed, the germline deletions of <1000 bp reported in the Thousand Genomes project (16) had features similar to that of microDNAs (Fig. 4, A to D, and SOM text). Briefly, the germline microdeletions peaked in length at 100 and 350 bp; were enriched in exons, 5′-untranslated regions (5′UTRs), and CpG islands; were rich in GC content; and had a high frequency of short direct repeats flanking the deleted fragments. This close overlap between the nature of the sequences lost in germline microdeletions and the microDNAs reported here suggests that these deletions are also generated by the excision and loss of microDNAs.

Unlike formerly described eccDNA (911), microDNAs are small, map to unique DNA sequence, and arise from genes. Very short direct repeats at the starts and ends of microDNAs suggest that fork stalling or template switching during replication repair or microhomology-mediated repair may produce microDNAs. Circularization of microDNAs could be facilitated by the wrapping of DNA around positioned nucleosomes. The known correspondence of positioned nucleosomes with 5′ ends of genes could explain the enrichment of microDNAs from the 5′ ends of genes. MicroDNAs could also originate as displaced Okazaki fragments from replication forks collapsed at strongly bound nucleosomes or GC-rich DNA. Single-stranded microDNAs may arise from such ligated Okazaki fragments, from deletion of excess DNA produced by replication slippage, or from nuclease digestion of nicked double-stranded circles. However, the microdeletions detected in genomic loci most likely arise from excision of double-stranded circles. The generation of microDNAs and microdeletions may produce a large pool of individual-specific or somatic-clone–specific copy-number variations of small segments of the genome. The genetic mosaicism in somatic tissues may lead to functional differences between cells in a tissue. Finally, persistent microDNAs may provide the extrachromosomal genetic “cache” that has been postulated to account for non-Mendelian genetics in plants (17).

Supplementary Materials

Materials and Methods

SOM Text

Figs. S1 to S13

Tables S1 and S2

References (2030)

References and Notes

  1. Acknowledgments: This work was supported by R01 CA60499 and GM84465 (to A.D.), and GM31819 and ESO13773 (to J.D.G.). We thank all members of the Dutta Lab for helpful discussions and A. Prorock for assistance with DNA sequencing. Accession number for the sequence data submitted to Gene Expression Omnibus: GSE36088.

Stay Connected to Science

Navigate This Article