Research Article

Comprehensive single-cell transcriptional profiling of a multicellular organism

See allHide authors and affiliations

Science  18 Aug 2017:
Vol. 357, Issue 6352, pp. 661-667
DOI: 10.1126/science.aam8940
  • Fig. 1 sci-RNA-seq enables multiplex single-cell transcriptome profiling.

    (A) Schematic of the sci-RNA-seq workflow. AAAAA, polyadenosine tail; NVTTTTT, polythymidine primer. (B) Schematic of sci-RNA-seq library amplicons for Illumina sequencing. bp, base pairs; R, annealing sites for Illumina sequencing primers; P, Illumina P5 or P7 adaptor sequence. (C) Scatter plot of unique molecular identifier (UMI) counts from human and mouse cells, determined by 384 × 384 sci-RNA-seq. Blue, inferred mouse cells (n = 5953). Red, inferred human cells (n = 3967). Gray, collisions (n = 884). (D) Scatter plot of UMI counts from human and mouse cells, determined by 96 × 96 sci-RNA-seq with an optimized protocol. Blue, inferred mouse cells (n = 129). Red, inferred human cells (n = 160). Gray, collisions (n = 5). In (C) and (D), only cells originating from wells containing mixed human and mouse cells are shown. (E) Correlation between gene expression measurements in aggregated sci-RNA-seq profiles of NIH/3T3 cells (n = 238) and nuclei (n = 124). (F) t-SNE plot of cells originating in wells containing HEK293T (red; n = 60), HeLa S3 (blue; n = 69), or a mixture (gray; n = 321). (G) Correlation between gene expression measurements from aggregated sci-RNA-seq data and bulk RNA-seq data obtained using a related protocol (29). In (E) and (G), the red line is the linear regression, and the black line is y = x.

  • Fig. 2 sci-RNA-seq shows robust gene expression measurements.

    (A) Scatter plot of UMI counts from human and mouse cells, determined by a 16 × 84 sci-RNA-seq experiment on mixed HEK293T and NIH/3T3 cells (table S1). Blue, inferred mouse cells (n = 109). Red, inferred human cells (n = 168). Gray, collisions (n = 19). (B) Box plots showing the number of UMIs detected per cell (thick horizontal lines, medians; upper and lower box edges, first and third quartiles, respectively; whiskers, 1.5 times the interquartile range; circles, outliers). (C) Correlation between gene expression measurements in aggregated sci-RNA-seq profiles from two experiments performed 2 months apart on independently grown and fixed cells. (D) Correlation between gene expression measurements in aggregated sci-RNA-seq profiles of fixed-fresh and fixed-frozen cells. In (C) and (D), the red line is the linear regression, and the black line is y = x.

  • Fig. 3 A single sci-RNA-seq experiment highlights the single-cell transcriptomes of the C. elegans larva.

    (A) t-SNE visualization of the high-level cell types identified. (B) Bar graph showing the percentage of somatic cells profiled in the first sci-RNA-seq C. elegans experiment that could be identified as belonging to each cell type (red), compared with the percentage of cells from that type expected in an L2 C. elegans individual (blue). (C) Scatter plots showing the log-scaled transcripts per million (TPM) values of genes in the aggregation of all sci-RNA-seq reads (x axis) or in bulk RNA-seq (y axis; geometric mean of three experiments). Red line, y = x; blue line, linear regression. The top plot includes only the first sci-RNA-seq experiment. The bottom plot also includes intestine cells from the second sci-RNA-seq experiment. (D) Number of genes that are at least five times as highly expressed in a specific tissue as in the second-highest-expressing tissue, excluding genes for which the differential expression between the first- and second-highest expressing tissues is not significant (q > 0.05). (E) Same as (D), except comparing cell types instead of tissues. (F) Heat map showing the relative expression of genes in consensus transcriptomes for each cell type, estimated by sci-RNA-seq. Genes are included if they have a size factor–normalized mean expression of >0.05 in at least one cell type (8613 genes in total). The raw expression data (UMI count matrix) is log-transformed, column-centered, and scaled (using the R function scale), and the resulting values are clamped to the interval (–2, 2). GABA, γ-aminobutyric acid.

  • Fig. 4 sci-RNA-seq reveals the transcriptomes of fine-grained anatomical classes of C. elegans neurons.

    (A) t-SNE visualization of high-level neuronal subtypes. Cells identified as neurons from the t-SNE clustering shown in Fig. 3A were reclustered with t-SNE. NA, not assigned. (B) Clusters in the neuron t-SNE that can be identified as corresponding to one, two, or four specific neurons in an individual C. elegans larva. The number of neurons of each type is shown in parentheses. (C) Heat map showing the relative expression of high-neuronal-expression genes across 40 neuron clusters identified by t-SNE and density peak clustering. Genes are included if their expression in the aggregate transcriptome of all neurons in our data is more than five times that of their expression in any other tissue, excluding cases where the differential expression is not significant (q > 0.05). (D) Distribution for each neuron cluster of the number of genes in that cluster whose expression is more than five times that in the second-highest expressing neuron cluster (q for differential expression < 0.05). (E) Cartoon illustrating the position of the left and right ASE neurons (pink) relative to the pharynx (green). [From www.wormatlas.org (56)] (F) Volcano plot showing differentially expressed genes between the left and right ASE neurons. Points in red correspond to genes that are differentially expressed (q < 0.05) with more than a threefold difference between the higher- and lower-expressing neuron(s). (G) The left AWA and ASG neurons arise from the embryonic cell AB plaapapa; the right AWA and ASG neurons arise from AB praapapa. (H) Volcano plot showing differentially expressed genes between the AWA and ASG neurons.

  • Fig. 5 Cell type–specific expression profiles from sci-RNA-seq enable the deconvolution of whole-animal transcription factor ChIP-seq data.

    For each of 27 cell types, a regularized regression model was fit to predict log-transformed gene expression levels in that cell type on the basis of ChIP-seq peaks in gene promoters (28). The ChIP-seq data were generated by the modENCODE (41) and modERN (42) consortia, profiling transcription factor binding in whole C. elegans animals. “EM” next to a TF label indicates that the ChIP-seq data for the TF are from an embryonic stage; “PE” indicates that the data are from a postembryonic stage. Colors in the heat map show the extent to which having a ChIP-seq peak for a given TF in a gene promoter correlates with increased expression in a given cell type. Peaks in “HOT” regions (28) are excluded. Gray cells in the heat map correspond to cases where a TF is not expressed in a cell type (<10 TPM), in which case ChIP-seq data for that TF are not considered by the regression model.

Supplementary Materials

  • Comprehensive single-cell transcriptional profiling of a multicellular organism

    Junyue Cao, Jonathan S. Packer, Vijay Ramani, Darren A. Cusanovich, Chau Huynh, Riza Daza, Xiaojie Qiu, Choli Lee, Scott N. Furlan, Frank J. Steemers, Andrew Adey, Robert H. Waterston,‡ Cole Trapnell, Jay Shendure

    Materials/Methods, Supplementary Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods
    • Figs. S1 to S24
    • Captions for Tables S1 to S14
    • References
    Tables S1-14
    Table S1. Summary of experiments Table S2: Summary statistics for cell type consensus expression profiles constructed in this study. Table S3: Tissue-level consensus expression profiles. Values listed are transcripts per million. Table S4: Cell type consensus expression profiles. Values listed are transcripts per million. Table S5: Neuron cluster consensus expression profiles. Values listed are transcripts per million. Table S6: Differential expression test results for the identification of tissue-enriched genes. See Methods. There is a row for each gene that is expressed in at least 10 cells in the analysis dataset. “Max tissue� is the tissue that the gene is expressed highest in. “Tissue 2� is the tissue the gene is expressed second highest in. “q-val� is the false detection rate at which the differential expression between the highest and second-highest expressing tissues can be called as non-zero. Table S7: Differential expression test results for the identification of cell type enriched genes. See Methods. There is a row for each gene that is expressed in at least 10 cells in the analysis dataset. “Max cell type� is the cell type that the gene is expressed highest in. “Cell type 2� is the cell type the gene is expressed second highest in. “q-val� is the false detection rate at which the differential expression between the highest and second-highest expressing cell types can be called as non-zero. Table S8: Differential expression test results for the identification of neuron cluster enriched genes. See Methods and Fig. 4. There is a row for each gene that is highly enriched (>5-fold) in rons relative to other tissues (as reported in Table S6). “Max cluster� is the neuron cluster that the gene is expressed highest in. “Cluster 2� is the neuron cluster the gene is expressed second highest in. “q-val� is the false detection rate at which the differential expression between the highest and second-highest expressing neuron clusters can be called as non-zero. Table S9: Differential expression test results for anterior vs. posterior body wall muscle. “moderated log2(anterior / posterior)� is equal to log2(anterior TPM+1) - log2(posterior TPM+1). Table S10: Differential expression test results for posterior vs. other intestine. “moderated log2� is defined as in Table S9. Table S11: Differential expression test results for amphid 1 vs. phasmid sheath cells. “moderated log2� is defined as in Table S9. Table S12: Differential expression test results for the ASEL vs. ASER neuron. “moderated log2� is defined as in Table S9. Table S13: Differential expression test results for AWA vs. ASG neurons. “moderated log2� is defined as in Table S9. Table S14: List of genes used in heatmaps in Fig. 3F and Fig. 4C.

Navigate This Article