Report

Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells

See allHide authors and affiliations

Science  10 Jan 2014:
Vol. 343, Issue 6167, pp. 193-196
DOI: 10.1126/science.1245316

Expressing One Allele at a Time

Although genetic traits are often dominant or recessive, the impact of the same heterozygous genotype can vary quite a bit between individuals. Deng et al. (p. 193), analyzed global gene expression in hundreds of individual mouse cells and found that a substantial fraction of the genes only expressed one of the alleles, chosen randomly, at any given point in time. Such stochasticity in transcription increases the heterogeneity among cells and likely contributes to the phenotypic variance among individuals of identical genotype.

Abstract

Expression from both alleles is generally observed in analyses of diploid cell populations, but studies addressing allelic expression patterns genome-wide in single cells are lacking. Here, we present global analyses of allelic expression across individual cells of mouse preimplantation embryos of mixed background (CAST/EiJ × C57BL/6J). We discovered abundant (12 to 24%) monoallelic expression of autosomal genes and that expression of the two alleles occurs independently. The monoallelic expression appeared random and dynamic because there was considerable variation among closely related embryonic cells. Similar patterns of monoallelic expression were observed in mature cells. Our allelic expression analysis also demonstrates the de novo inactivation of the paternal X chromosome. We conclude that independent and stochastic allelic transcription generates abundant random monoallelic expression in the mammalian cell.

In diploid organisms, the zygote inherits one set of autosomal chromosomes from each parent. Although it is widely believed that transcription of autosomal genes occurs from both parental alleles, specific classes of genes have been shown to express only one, randomly selected, allele (allelic exclusion) (13). Analyses of clonally amplified lymphocytes by using single-nucleotide polymorphism (SNP)–sensitive microarrays revealed that 8% of human autosomal genes and 16% of mouse genes showed a type of random, monoallelic expression that was stably maintained during clonal expansion (4, 5). Furthermore, parental-specific (imprinted) expression has been demonstrated for 1% of autosomal genes (6, 7) and, perhaps most strikingly, in the inactivation of one X chromosome in female cells (8). Although RNA fluorescent in situ hybridization (RNA-FISH) has been used to study a few individual genes (9, 10), little is known about general patterns of allelic expression in single cells.

To investigate allele-specific gene expression at single-cell resolution, we isolated 269 individual cells dissociated from in vivo F1 embryos (CAST/EiJ × C57BL/6J, hereafter abbreviated as CAST and C57, respectively) from oocyte to blastocyst stages of mouse preimplantation development (PD) (11). We generated transcriptome profiles with Smart-seq (12) or Smart-seq2 (13) from each individual cell (table S1 and fig. S1, A and B). Principal component analysis (PCA) clustered the cells by developmental stage and embryo, effectively reconstructing the dynamics of PD (Fig. 1A). Next, using strain-specific SNPs (14) to distinguish transcription from the maternal and paternal chromosomes (15) we observed that 82% of all genes expressed during PD contained ≥1 informative SNP (fig. S1C) and that different SNPs within the same gene gave coherent allelic calls (fig. S2). Because maternal RNA lingers from the oocyte (16), we expected the maternal genotype to dominate the zygotic transcriptome. Indeed, the zygote, and also the early two-cell, contained essentially only maternal RNA, but in the subsequent stages, the maternal fraction gradually declined to reach parity with paternal transcripts at the four-cell stage (Fig. 1B), which is consistent with rapid maternal transcript clearance and zygotic genome activation. As a control for the accuracy in alignments and SNP annotation, we analyzed individual cells of pure C57 or CAST background and found 99.4 and 99.7% correctly classified reads, respectively (Fig. 1B).

Fig. 1 Single-cell transcriptomes reconstruct preimplantation development.

(A) Single-cell gene expression profiles projected onto the first two principal components. Cells from different stages and embryos are designated by colors and symbols. (B) The percentage (by mass) of maternal RNA observed in single-cell transcriptomes (black dots; median is in green) at different stages of development and in controls from pure maternal (CAST, red) and paternal (C57, blue) backgrounds. The eight-cell stage outlier cells with maternal bias are all from one embryo (supplementary text).

We next investigated the gene activation across paternal chromosomes. We found that genes on the paternal X chromosome (Xp) of female embryos were indeed transcriptionally activated similarly to those on paternal autosomes during a defined time window of the PD. Subsequent re-inactivation occurred first beyond the four-cell stage (Fig. 2A), demonstrating de novo Xp inactivation (17, 18) rather than inheritance and propagation of a pre-inactivated Xp (19). X chromosome inactivation initiates from the X-inactivation center (Xic), from which Xist is transcribed, and spreads in cis (18). Our data provided a high-resolution map of silencing over Xp at the four-cell, 16-cell, and early blastocyst stages (Fig. 2B) that substantiates the observation of a silencing gradient (19) and demonstrated that the spread of Xp silencing is not a simple function of the distance to Xic (Fig. 2B, and escapee genes in fig. S3 and table S2).

Fig. 2 De novo paternal X-chromosome inactivation.

(A) (Top) Average ratio of SNP-containing reads (paternal, C57; maternal, CAST) per autosome or X chromosome in cells of embryos (male and female, separately) at stages of preimplantation development. Error bars denote SEM (n = 3 to 28 cells), and P values were calculated using Student’s t-test. (Bottom) Allele-resolution gene expression (RPKM) of Xist in cells from different developmental stages. Colors indicate embryos according to sex and allele-specific expression. Only stages for which female embryos were available are shown. Error bars denote SEM (n = 3 to 28 cells). (B) Spread of Xp inactivation, shown as the fraction maternal reads for genes on the X chromosome by using a moving window average of 10 genes. Xic is marked by a dashed line.

These findings gave us confidence in inferring biological signals from the allelic information in single-cell RNA sequencing (RNA-seq). To further explore allelic expression of autosomal genes, we classified their expression as biallelic, maternal monoallelic, or paternal monoallelic according to SNP-containing reads. Surprisingly, this revealed a great degree of monoallelic expression (on average, 54% of genes) across all stages of PD (figs. S4 to S24). The monoallelic calls were similar or more abundant than in available RNA-FISH data (fig. S25) (9, 10). Because single-cell transcriptome methods suffer from stochastic losses of RNA species, it was necessary to determine to what extent random sampling effects inflate observed monoallelic calls. We therefore lysed individual cells (from 8- or 16-cell embryos) and split the lysate into two equal volume fractions that were independently processed into sequencing libraries. Using the allelic calls from the split-pairs, we modeled the stochastic losses and inferred the underlying levels of biallelic and monoallelic expression in sets of genes binned by expression level (figs. S26 and S27). We estimated that 60% of all polyadenylated [poly (A)+] RNA molecules are lost in the Smart-seq2 protocol (Fig. 3A) (13) because inferred losses stabilized at levels equal to a single RNA molecule. This analysis uncovered coherent monoallelic expression estimates across independent split-cell experiments with a median of 17% of genes (Fig. 3B). Although technical losses of RNA contributed as much as 66% of observed monoallelic expression, this strategy allowed us to determine the underlying amount of biological monoallelic expression in single cells.

Fig. 3 Quantitative characterization of allelic transcription patterns.

(A) Estimated levels of allelic losses (z) for genes binned by expression level from split-cell experiments (n = 4 cells) on cells from 8- or 16-cell stage embryos. In (A) and (B), error bars denote 95% CIs (bootstrap over genes). (B) Inferred fraction of genes with monoallelic expression for genes binned by expression level, using split-cell experiments as in (A). (C) Percent of genes with monoallelic expression in 8- or 16-cell stage embryo cells, from raw observations, individual split-cell experiments, and their median. Error bars denote 95% CIs. (D) Mean percent of genes with monoallelic expression across cells at each developmental stage for genes with mean expression ≥90 RPKM per stage (fig. S28). Individual cell percentages are shown as black dots. (E) Mean percent of genes with monoallelic expression per embryonic stage, when pooling cells from the same embryo stage. Individual embryo percentages are shown as black dots. Colors and RPKM threshold are as in (D). (F) Mean fraction of cells (eight-cell stage) with biallelic expression for genes binned by their fraction of nondetected (silent) cells; error bars give SEM (n ≥ 3 genes). The expected biallelic fraction was computed on the basis of either a model of independent allele activation or coordinated activation (11). (G) Mean ratio of total expression between cells with biallelic or monoallelic expression of the same genes. Error bars give SEM (n = 120 genes).

In subsequent analyses of monoallelic expression, we focused only on transcripts expressed at sufficient abundances to be little influenced by random sampling, as determined in our control experiments (fig. S28). Exploring the levels of monoallelic expression among cells from the four-cell stage to the late blastocyst stage, we observed similar levels throughout the PD, with an average of 12 to 24% monoallelic expression for mRNAs (Fig. 3D) and 19 to 26% for noncoding poly(A)+ RNAs. In contrast, consistent biallelic expression was observed for only a few hundred genes (table S3), often with housekeeping functions (4, 9). Allele classification of single-cell data of pure C57 and CAST background gave 97.3 and 99.5% correct monoallelic calls, respectively (Fig. 3D). Pooling cells by embryo removed essentially all monoallelic expression (Fig. 3E), demonstrating a high degree of cell-specific randomness in monoallelic expressions. We therefore concluded that a dynamic type of random monoallelic expression is abundant in blastomeres.

Pioneer studies on interleukin-4 (20, 21) found transcription of the two alleles to occur independently, but it remained unknown whether such independence applies on the genomic scale. Using our comprehensive allelic expression data for thousands of genes, we therefore investigated whether expression from the two alleles occurred independently of one another. Independent allelic expression would yield a specific relationship between the fraction of cells with biallelic, monoallelic, or no expression (10), which is different from scenarios of coordinated allelic expression. Markedly, the observed fraction of cells with biallelic expression followed the fraction expected under the independence model (Fig. 3F, eight-cell stage data, and fig. S29, all stages), demonstrating independent allelic transcription for genes across all expression levels, substantially extending results on individual genes (20, 21) to a global principle. Under independent allelic expression, biallelic expression of a particular gene would on average result in twofold higher RNA copy numbers than would the same gene in cells in which the gene is monoallelically expressed because cells with biallelic expression have transcriptional output from two alleles. To test this hypothesis, we analyzed mean gene expression levels in cells with biallelic expression and indeed observed them to be 2.0 ± 0.1 times higher [95% confidence interval (CI), bootstrap] than the levels in cells with monoallelic expression at all developmental stages (Fig. 3G). Thus, both the allelic expression patterns and the expression levels point to independent allelic transcription.

Because embryonic cells are uncommitted progenitors, it was important to determine whether abundant random monoallelic expression also occurs in mature cells. To this end, we investigated single-cell transcriptomes of in vivo liver cells (C57 × CAST), together with a control dilution series of RNA extracted from liver tissue (C57 × CAST) (Fig. 4A and fig. S30). The monoallelic expression levels in single liver cells were higher than sampling effects measured in control dilutions across all expression levels (Fig. 4B), and the fraction of true monoallelic calls increased with the expression threshold. We also profiled 10 individual adult mouse fibroblasts (five from each reciprocal cross) and detected similar random monoallelic expression, on average 24% of expressed genes per cell. We therefore conclude that random monoallelic expression is abundant in both embryonic and mature cells.

Fig. 4 Monoallelic expression in different mature cell types.

(A) The cumulative fraction of monoallelic expression as a function of the expression levels for individual liver cells and at different dilutions. (B) Percentage of genes with monoallelic expression in individual cultured adult fibroblasts (C57 × CAST and CAST × C57) for genes with mean expression ≥20 RPKM per stage (fig. S28).

In this study, we uncovered a stochastic pattern of monoallelic expression that differs from the stable allelic regulation of genomic imprinting and allelic exclusion (22, 23). It also differs from the stably maintained monoallelic expression observed in clonal lymphoid cell populations (4, 5). Instead, the rapid expression dynamics that we uncovered in individual cells are consistent with models of transcriptional bursting (24). In each cell, independent bursts of transcription occur from both alleles over time, but RNA from only one allele is often present at any given time. Because stochastic losses of RNA substantially inflates naive estimates of allelic expressions, stringent controls such as split-cells and dilution series are of critical importance for accurate allelic expression analyses in single cells. It is likely that stochastic transcription of heterozygous alleles contributes to variable expressivity—phenotypic variation among cells and individuals of identical genotypes—which may have fundamental implications for variable disease penetrance and severity (2528).

Supplementary Materials

www.sciencemag.org/content/343/6167/193/suppl/DC1

Materials and Methods

Supplementary Text

Figs. S1 to S30

Tables S1 to S3

References (2932)

References and Notes

  1. Materials and methods are available as supplementary materials on Science Online
  2. Acknowledgments: Q.D. designed and performed mouse work and generated RNA-seq libraries. D.R. designed and performed computational analyses and prepared figures and tables. B.R. analyzed X chromosome data, prepared figures, and generated RNA-seq libraries. R.S. coordinated and designed the study and wrote the manuscript with input from other authors. We are grateful to C. Burge, T. Perlmann, S. Linnarsson, G. Winberg, and other members of our laboratory for comments on the manuscript and A. Johnsson for managing sequencing. This work was supported by the Swedish Research Council grants 2011-965 (Q.D.) and 2008-4562 (R.S.), by the European Research Council Starting Grant 243066 (R.S.), by the Foundation for Strategic Research (R.S.), and Åke Wiberg Foundation grant 756194131 (R.S.). Sequence data have been deposited in National Center for Biotechnology Information Gene Expression Omnibus (GSE45719) and Sequence Read Archive (SRP020490).
View Abstract

Navigate This Article