ReportsGenetics

Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI)

See allHide authors and affiliations

Science  14 Apr 2017:
Vol. 356, Issue 6334, pp. 189-194
DOI: 10.1126/science.aak9787

Making an unbiased library

Sequencing the genome of single cells gives insight into issues such as cell-to-cell heterogeneity and genome instability. Key to single-cell sequencing techniques are whole-genome amplification (WGA) methods that provide sufficient DNA for next-generation sequencing. Current WGA methods have been hampered by low accuracy and spatial resolution of gene copy numbers and by low amplification fidelity. Chen et al. report an improved single-cell WGA method, Linear Amplification via Transposon Insertion (LIANTI). The DNA is randomly fragmented by Tn5 transposition of a transposon that includes a T7 promoter, which allows linear amplification. The authors used the method to determine the spectrum of single-nucleotide variations in a single human cell after ultraviolet radiation.

Science, this issue p. 189

Abstract

Single-cell genomics is important for biology and medicine. However, current whole-genome amplification (WGA) methods are limited by low accuracy of copy-number variation (CNV) detection and low amplification fidelity. Here we report an improved single-cell WGA method, Linear Amplification via Transposon Insertion (LIANTI), which outperforms existing methods, enabling micro-CNV detection with kilobase resolution. This allowed direct observation of stochastic firing of DNA replication origins, which differs from cell to cell. We also show that the predominant cytosine-to-thymine mutations observed in single-cell genomics often arise from the artifact of cytosine deamination upon cell lysis. However, identifying single-nucleotide variations (SNVs) can be accomplished by sequencing kindred cells. We determined the spectrum of SNVs in a single human cell after ultraviolet radiation, revealing their nonrandom genome-wide distribution.

Rapid advances in DNA sequencing have led to a wealth of knowledge about genomes of various species, including humans, most of which have been derived from bulk measurements from a large number of cells. However, a single cell has a unique genome even within an individual human being. For example, each germ cell is distinct, carrying different combinations of paternal and maternal genes. Somatic cells have spontaneous genomic changes that take place stochastically in time and genomic position. These include single-nucleotide variations (SNVs), copy-number variations (CNVs), and structural variations. Such genomic changes can lead to cancer and other diseases. As such, characterization of single-cell genomes has attracted increasing attention in recent years (1, 2). The importance of single-cell genomics becomes more apparent in the case of highly valued and rare samples, such as embryonic cells and circulating tumor cells (3, 4), or when probing stochastic changes and cell-to-cell heterogeneity (59).

Because of the trace amount of genomic DNA in a cell, single-cell genome sequencing has relied on whole-genome amplification (WGA). Among previous WGA methods, degenerate oligonucleotide–primed polymerase chain reaction (DOP-PCR) is an exponential PCR reaction with degenerate priming (10). Multiple-displacement amplification (MDA) uses a strand-displacing DNA polymerase to exponentially amplify single-stranded DNA into a hyperbranched structure (11, 12). Multiple annealing and looping-based amplification cycles (MALBAC) uses quasi-linear amplification through looping-based amplicon protection followed by PCR (5). All these methods involve nonspecific priming and exponential amplification that create amplification bias and errors.

To reduce such bias and errors, we have developed a new WGA method, Linear Amplification via Transposon Insertion (LIANTI), which combines Tn5 transposition (13) and T7 in vitro transcription (14) for single-cell genomic analyses. Random fragmentation and tagging of genomic DNA by Tn5 transposition has been used to prepare DNA sequencing libraries by introducing priming sites for PCR amplification (15). However, such exponential amplification is associated with amplification bias and errors, limiting its applications in single-cell genomics (16, 17). Here we demonstrate linear amplification, whose advantage over exponential amplification is illustrated in Fig. 1A

In LIANTI, genomic DNA from a single cell is randomly fragmented by Tn5 transposition of a specially designed LIANTI transposon that includes a T7 promoter (Fig. 1B). Genomic DNA fragments tagged by T7 promoters are linearly amplified into thousands of copies of RNAs through in vitro transcription, followed by reverse transcription and second-strand synthesis into double-stranded LIANTI amplicons ready for DNA library preparation (Fig. 1C). LIANTI eliminates nonspecific priming and exponential amplification used in other single-cell WGA methods, greatly reducing amplification bias and errors.

Fig. 1 LIANTI single-cell whole-genome amplification scheme and amplification uniformity.

(A) Comparison of exponential and linear amplification, assuming the DNA fragments A and B have replication yields of 100 and 70% per round, respectively. For a final amplification factor of ~10,000 of fragment A, exponential amplification results in a ratio of 8:1, hampering the accuracy of CNV detection. In contrast, linear amplification exhibits a ratio of 1:0.7, which is much closer to unity. Linear amplification is also superior to exponential amplification in fidelity. In exponential amplification, a polymerase of the highest fidelity (10−7) replicating the human genome (3 × 109 bp) in the first cycle would give ~300 errors, which will be propagated permanently in the next replication cycles, leading to false-positive SNVs. In contrast, in linear amplification, the errors would appear randomly at different locations in the amplicons and can be easily filtered out. (B) LIANTI transposon and transposome. LIANTI transposon consists of a 19-bp double-stranded transposase binding site and a single-stranded T7 promoter loop. Equal molar amounts of LIANTI transposon and Tn5 transposase are mixed and dimerized to form LIANTI transposome. (C) LIANTI scheme. Genomic DNA from a single cell is randomly fragmented and tagged by LIANTI transposon, followed by DNA polymerase gap extension to convert single-stranded T7 promoter loops into double-stranded T7 promoters on both ends of each fragment. In vitro transcription overnight is performed to linearly amplify the genomic DNA fragments into genomic RNAs, which are capable of self-priming on the 3′ end. After reverse transcription, RNase digestion, and second-strand synthesis, double-stranded LIANTI amplicons tagged with unique molecular barcodes are formed, representing the amplified product of the original genomic DNA from a single cell, and ready for DNA library preparation and next-generation sequencing. (D) Read depths across the genome with 1-Mb bin size and a zoom-in to a 10-Mb region [chromosome 1 (Chr1): 60,000,000 to 70,000,000] with 10-kb bin size. The MALBAC data are normalized by the average of two other MALBAC cells to remove the sequence-dependent bias reproducible from cell to cell. (E) Coefficient of variation for read depths along the genome as a function of bin sizes from 1 b to 100 Mb, showing amplification noise on all scales for single-cell WGA methods, including DOP-PCR, MALBAC, MDA, and LIANTI. The normalized MALBAC data (dashed line) are shown together with the unnormalized MALBAC data. Only the unnormalized data of the other methods are shown as no substantial improvement by normalization was observed. Poisson curve is the expected coefficient of variation for read depth assuming only Poisson noise. LIANTI exhibits a much improved amplification uniformity over the previous methods on all scales.

We used LIANTI to amplify genomic DNA from single BJ cells, a human diploid cell line from skin fibroblasts chosen for no aneuploidy. Single-cell genomic DNA was randomly fragmented by Tn5 transposition to give an average fragment size of ~400 base pairs (bp) (figs. S1, A and B, and S2, A and B). After overnight in vitro transcription, we routinely acquired ~20 ng of LIANTI amplicons for DNA library preparation. We sequenced BJ cells at ~30× depth and performed a systematic comparison between LIANTI and previous WGA methods [data from (1)]. LIANTI achieves 97% genome coverage and a 17% allele dropout rate, outperforming other WGA methods (table S1).

To evaluate amplification uniformity, we plotted the average read depths in 1-Mb bins across the genome for LIANTI, MDA, MALBAC, and DOP-PCR, together with a zoom-in to a 10-Mb region on chromosome 1 with 10-kb bins (Fig. 1D). On both scales, LIANTI exhibits the highest amplification uniformity compared to the other methods. To better quantify amplification bias on all scales, we plotted the coefficient of variation (CV) of the read depth along the genome as a function of the bin size (Fig. 1E and fig. S3A), which is more reproducible and informative than power spectra and Lorenz curves (fig. S3B) used previously (see supplementary materials). LIANTI achieves the lowest CV values with respect to all bin sizes, offering the highest accuracy for CNV detection.

The spatial resolution of CNV detection in a single cell has been limited to ~1 Mb owing to the amplification noise of previous WGA methods. In LIANTI, amplification noise, though much reduced, still exists on account of different amplification factors for each fragment, preventing accurate detection of micro-CNVs (<100-kb CNVs). To further reduce this noise, instead of relying on read depths, we carried out digital counting of the inferred fragment numbers, as shown in Fig. 2A. This is done by taking advantage of the fact that LIANTI amplicons mapped to the reference genome with the same ends should originate from the same genomic DNA fragment, hence allowing more accurate inference of the fragment numbers at each genomic position. For example, in Fig. 2B, the unamplified bulk (top panel) shows a 2-to-1 copy-number loss. However, the LIANTI single-cell read-depth raw data (middle panel) obscures this micro-CNV. The inferred fragment number by the digital-counting analysis (bottom panel) better resolved the micro-CNV. Digital counting improves the resolution of micro-CNV detection to ~10 kb. We characterized the false positives and false negatives for micro-CNV detection in a single BJ cell (fig. S4, A and B). Results differ for copy-number gains, 2-to-0 copy-number losses, and 2-to-1 copy-number losses (fig. S4, C and D), none of which were possible by previous WGA methods at this resolution.

Fig. 2 Genome-wide detection of micro-CNVs and replication origin–firing events in single BJ cells.

(A) Principle for the inference of fragment numbers by LIANTI. Single-cell LIANTI amplicons mapped to the same starting and ending coordinates on the reference genome are grouped, as they originated from one fragment of the genomic DNA. This allows for the correction of the different amplification efficiency, often size dependent, for each fragment. The digital counting of the inferred fragment number across the genome is shown here for 2-to-1 copy-number loss. (B) Example of a 57-kb 2-to-1 micro-CNV detected in a single BJ cell, plotted with 100-bp bin size. Top panel is the read depth from unamplified bulk sequencing showing the existence of the micro-CNV. Middle panel is the read depth of the single-cell LIANTI amplicons, which obscures the micro-CNV because of amplification noise at this resolution. Bottom panel shows the inferred fragment number by LIANTI digital-counting analysis, which recovers the micro-CNV in the single cell. (C) Genome-wide detection of replication origin firing and replicon formation based on the copy-number gain in 11 single cells with 10-kb bin size (~250-Mb region of Chr1 shown in the plot). (D) Correlation plots of single-cell replicon copy numbers with the bulk readouts of the Repli-Seq assay and the DNase I–hypersensitive assay using 100-kb bin size. (E) Correlation plots of replicon copy numbers between pairs of single cells close in replication progress in S phase using 100-kb bin size. The diagonal signal represents replicon copy numbers shared by both cells, and the off-diagonal signal suggests stochastic origin firing and replicon formation, which is different from cell to cell.

We took advantage of LIANTI’s capability to detect micro-CNVs to probe DNA replication, an important research area in biology. In particular, whether the firing of replication origins and replicon formation (~50 to 120 kb) are stochastic has been a subject of intensive investigation (1821) and can be best answered by single-cell measurements. Recently, MDA was used to probe single-cell DNA replication (22), but was unable to resolve individual replicons as a result of its poor spatial resolution.

We show in Fig. 2C whole-genome sequencing with LIANTI for 11 BJ cells picked from a synchronized population in early S phase. The genome-wide replication origin–firing and replicon-formation events were detected by the copy-number gain from 2 to 3 and from 3 to 4 with kilobase resolution (Fig. 2C and fig. S6). The genome-wide replicon copy numbers of a single cell correlate well with the conventional bulk readouts of the Repli-Seq assay (23) (Fig. 2D and fig. S7) and the deoxyribonuclease I (DNase I)–hypersensitive assay (24) (Fig. 2D and fig. S8), suggesting that a subset of replication origins are used in individual cells. Figure 2E shows the correlation plots of replicon copy numbers between pairs of single cells close in replication progress in S phase (Fig. 2E and fig. S9). Whereas the diagonal signal represents replicons shared by both cells (deterministic), the strong off-diagonal signal suggests a large degree of stochasticity in terms of replication origin firing, which is different from cell to cell.

In terms of SNV detection accuracy, among all WGA methods, LIANTI gives the lowest false-positive rate of 5.4 × 10−6 for single–BJ cell SNV detection (Fig. 3A and fig. S10A), which is still higher than the anticipation from linear amplification. We further characterized the mutation spectra of false positives and found that both LIANTI and MDA exhibit a C-to-T false-positive predominance, which is not seen in the unamplified bulk (Fig. 3B). Such “de novo” C-to-T mutations have been reported in many previous single-cell genomic studies (25, 26) and most recently in nondividing neurons (26).

Fig. 3 Detection of SNVs in single BJ cells.

(A) False-positive rates of SNV detection in a single BJ cell. The error bars were calculated from three different BJ cells. (B) Spectra of SNV false positives in unamplified bulk, single-cell LIANTI, single-cell MDA, and single-cell uracil–DNA glycosylase (UDG)–treated LIANTI samples. The number of false positives is shown in the bracket for each sample. Both LIANTI and MDA results exhibit predominant C-to-T false positives not seen in the unamplified bulk. Similar C-to-T SNVs have been reported in previous single-cell MDA studies and attributed to de novo mutations (26). We attribute the phenomenon to the spontaneous C-to-U deamination upon cell lysis, which is often seen in ancient DNA bulk samples. We show that such C-to-U deamination accounts for the observed SNV false positives by WGA of the cell lysate treated with UDG, which eliminates cytosine-deaminated uracil bases and hence recovers the reduced C-to-T false-positive fraction in the bulk.

We instead attribute this predominant C-to-T observation to the experimental artifact of C-to-U deamination after cell lysis, which is well known as the most common cause of point mutations (27, 28) and is especially prominent in ancient DNA (29). Deamination of C to U is a natural process that occurs at a low rate randomly in the genome (30), and hence would be difficult to see in bulk sequencing because of the extremely low allele frequency. To test whether C-to-T false-positive predominance in LIANTI is caused by C-to-U deamination, we treated genomic DNA from a lysed cell before LIANTI amplification with uracil–DNA glycosylase, which functions as part of the DNA repair system in live cells, to eliminate cytosine-deaminated uracil bases (31). Indeed, a significant reduction of C-to-T SNVs were observed (Fig. 3B and fig. S10B), showing that the commonly observed C-to-T SNV predominance in the field of single-cell genomics is caused by in vitro cytosine deamination artifact and is a false positive.

Likewise, the second most frequent false positive is A-to-G (Fig. 3B), which happens to be the second most common spontaneous mutation of DNA bases due to adenine deamination (27). Another common type of false positive is G-to-T (Fig. 3B), which is likely caused by guanine oxidation to 8-hydroxyguanine (32, 33). We concluded that the accuracy of single-cell SNV detection for any WGA method is fundamentally limited by chemical instability of DNA bases in the absence of cellular DNA repair systems. As a result, sequencing two kindred cells (5), which are a pair of cells derived from the division of a single cell, is necessary to filter out such false positives occurring randomly in the genome.

We further demonstrate the use of LIANTI for the study of mutations generated by ultraviolet (UV) radiation. It is well known that exposure to UV radiation in sunlight leads to DNA damage and potential skin cancer, attracting many mechanistic studies. UV radiation generates cyclobutane pyrimidine dimers (CPDs) and (6-4) photoproducts (PPs) on genomic DNA (34), which are subject to nucleotide excision repair (NER) (35). If the damage is not repaired before DNA replication, error-prone translesion synthesis DNA polymerase is recruited to the damaged region, giving rise to de novo SNVs (36). However, these mutations are different from cell to cell because of the randomness of UV damage along the genome, which necessitates single-cell whole-genome amplification and sequencing.

To characterize UV-induced genome-wide mutations, we exposed BJ cells to different UV doses. After propagating several cell cycles without UV, a single cell under investigation was cultured to generate a pair of kindred cells, which were subject to LIANTI and sequencing in order to eliminate false-positive SNVs (Fig. 4A).

Fig. 4 Genome-wide profiling of UV-induced mutations in single BJ cells.

(A) Experimental design. BJ cells cultivated in dishes are exposed to UV radiation at a dose of 5, 15, and 30 J/m2. Single cells that survived cell-cycle arrest and apoptosis were picked and allowed to divide into multiple kindred cells (fig. S11), from which a pair of kindred cells were picked for LIANTI. (B) Spectra of UV-induced SNVs in a representative cell exposed to 15 J/m2 UV radiation. (C) Depletion of UV-induced SNVs within transcribed regions, DNase I–hypersensitive sites, and early-replicating regions. “Expected” column is the percentage of SNVs simulated assuming random distribution along the genome. “Observed” column is the percentage of SNVs observed in UV-radiated samples, with the error bars calculated from four kindred pairs. (D) Overlay of the density of UV-induced SNVs (red) and the minus Repli-Seq signal (blue) reflecting the replicated genomic regions, as well as the minus DNase I–hypersensitive signal (blue) throughout the genome (~250-Mb region of Chr1 shown in the plot). Both signals were calculated in 2-Mb moving windows with 100-kb increments. (E) Nontemplate-to-template ratio of UV-induced C-to-T and T-to-A mutations within transcribed regions and the sequence context of such mutations. “Expected” column is the ratio simulated assuming random distribution of SNVs on both strands. “Observed” column is the ratio observed in UV-radiated samples, with the error bars calculated from four kindred pairs. Sequence context is plotted based on the frequency of each base next to the corresponding type of mutation.

We detected 4700 to 9300 UV-induced SNVs throughout the genome from each pair of kindred cells (fig. S12). The SNV spectra show a C-to-T predominance (Fig. 4B and fig. S13), in good agreement with the previously reported SNV spectra of sun-exposed normal human skin and melanomas (3739). While examining the point mutation distribution along the genome, we discovered a depletion of mutations within transcribed regions (Fig. 4C), which can be explained by the involvement of transcription-coupled NER (40, 41). We also observed a significant depletion within DNase I–hypersensitive sites and early-replicating regions (Fig. 4C). When plotting throughout the genome, we observed a strong anticorrelation between the density of UV-induced SNVs and Repli-Seq signal reflecting the replicated genomic regions, as well as the DNase I–hypersensitive signal (Fig. 4D and fig. S14). Similar phenomena have also been observed in cancer genomes without UV radiation (4244), which was attributed to NER impairment by DNA-bound proteins (43, 45).

We further examined the propensity of mutations for the two strands within transcribed regions and observed a C-to-T enrichment in the nontemplate strand (Fig. 4E). The same enrichment was also observed in UV-associated cancer genomes (46), which can be explained by the preferred CPD and PP removal by transcription-coupled NER on the template strand (40, 47). When plotting the sequence context of C-to-T mutations, the adjacent base is mostly T on the 5′ side (Fig. 4E), consistent with the well-known mechanism of UV-induced CPD and PP formation of T:C dimers, followed by error-prone translesion synthesis (36). Notably, we also observed an enrichment of T-to-A in the nontemplate strand (Fig. 4E and fig. S15), suggesting the involvement of transcription-coupled NER as well. We further plotted the sequence context of T-to-A mutations and found the adjacent base is mostly T on both sides (Fig. 4E and fig. S16), suggesting that T-to-A mutations may be caused by UV-induced CPD and PP formation of T:T, followed by a different kind of error-prone translesion synthesis.

High-throughput sequencing of many single cells can be easily achieved by adding combinatorial cellular barcodes in the LIANTI transposon and primer. In addition to fundamental investigations illustrated, the high precision of micro-CNVs detection and the ability to call individual SNVs in a single cell will allow better genetic screening in reproductive medicine and provide valuable information about how genome variation takes place in cancer and other diseases.

Supplementary Materials

www.sciencemag.org/content/356/6334/189/suppl/DC1

Materials and Methods

Figs. S1 to S16

Table S1

References (4864)

References and Notes

  1. Acknowledgments: The LIANTI development was supported by the NIH Director’s Pioneer Award (5DP1CA186693), and the study of DNA damage by UV radiation was supported by a National Cancer Institute grant (5R33CA174560). The comparison with other methods and the study of micro-CNVs were supported by Beijing Municipal Science and Technology Commission grants (D1511000024150002 to X.S.X.), National Key Technologies Research and Development program (2016YFC0900100 to L.H.), and funding from Beijing Advanced Innovation Center for Genomics at Peking University. L.T. was supported by a Howard Hughes Medical Institute International Student Research Fellowship. We thank Y. Yin, P. Cui, A. Chapman, Y. Tang, and other members in the group for their assistance and helpful discussions. X.S.X., C.C., and D.X. are inventors on a patent application filed by Harvard University that covers the single-cell whole-genome sequencing by LIANTI technology. Raw sequencing data were deposited at the National Center for Biotechnology Information with accession number SRP102259 at www.ncbi.nlm.nih.gov/sra/SRP102259.
View Abstract

Navigate This Article