Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types

See allHide authors and affiliations

Science  14 Feb 2014:
Vol. 343, Issue 6172, pp. 776-779
DOI: 10.1126/science.1247651


In multicellular organisms, biological function emerges when heterogeneous cell types form complex organs. Nevertheless, dissection of tissues into mixtures of cellular subpopulations is currently challenging. We introduce an automated massively parallel single-cell RNA sequencing (RNA-seq) approach for analyzing in vivo transcriptional states in thousands of single cells. Combined with unsupervised classification algorithms, this facilitates ab initio cell-type characterization of splenic tissues. Modeling single-cell transcriptional states in dendritic cells and additional hematopoietic cell types uncovers rich cell-type heterogeneity and gene-modules activity in steady state and after pathogen activation. Cellular diversity is thereby approached through inference of variable and dynamic pathway activity rather than a fixed preprogrammed cell-type hierarchy. These data demonstrate single-cell RNA-seq as an effective tool for comprehensive cellular decomposition of complex tissues.

Introducing MARS-Seq

Immune cells are typically differentiated by surface markers; however, this designation is somewhat crude and does not allow for fine distinctions that might be characterized by their RNA transcripts. Jaitin et al. (p. 776) used massively parallel single-cell RNA-sequencing (MARS-Seq) analysis to explore cellular heterogeneity within the immune system by assembling an automated experimental platform that enables RNA profiling of cells sorted from tissues using flow cytometry. More than 1000 cells could be sequenced, and unsupervised clustering analysis of the RNA profiles revealed distinct cellular groupings that corresponded to B cells, macrophages, and dendritic cells. This approach provides the ability to perform a bottom-up characterization of in vivo cell-type landscapes independent of cell markers or prior knowledge.

Understanding the heterogeneous and stochastic nature of multicellular tissues is currently approached through a priori defined cell types that are used to dissect cell populations along developmental and functional hierarchies (13). This methodology heavily relies on enumeration of cell types and their precise definition, which can be controversial (47) and is based in many cases on indirect association of function with cell-surface markers (58). Perhaps the best understood model for cellular differentiation and diversification is the hematopoietic system. The developmental tree branching from hematopoietic stem cells toward distinct immunological functions was carefully worked out through many years of study, and effective cell-surface markers are available to quantify and sort the major hematopoietic cell types. Even in this well-explored system, however, it is becoming increasingly difficult to explain modern genome-wide and in vivo data with refined cell types’ hierarchy and functions that extend beyond the classical myeloid and lymphoid cell types. For example, dendritic cells (DCs) are antigen-presenting cells that were originally characterized through their morphology (9) but are now understood to represent a highly heterogeneous group (10) with multiple functions, regulatory circuits, and phenotypes (6, 7, 9). Despite considerable efforts and progress by use of the marker-based approach, much of the known functional heterogeneity within the DC group is not truly compatible with any of the DC subclassification schemes (6, 7, 11). Such lack of definitive models for cell types and states is common in many fields of biology.

An attractive alternative to marker-based cellular dissection of complex tissues is to characterize in vivo cell-type compositions through unsupervised sampling and modeling of transcriptional states in single cells. This natural approach was so far difficult to implement because of many technical limitations that are being progressively alleviated with the advent of single-cell RNA sequencing (RNA-seq) (1220). Sampling and sequencing RNA from dozens of single cells was recently used to estimate stochastic transcriptional variation in stationary cultured cells (14) or during a dynamic process (1214, 16, 19). An unsupervised framework for dissecting transcriptional heterogeneity within complex tissues may therefore be envisioned, provided that many thousands of cells can be assayed routinely by using single-cell RNA-seq and that data from such experiments can be normalized and modeled effectively even when cells represent highly diverse cell types and states.

We developed an automated massively parallel RNA single-cell sequencing framework (MARS-Seq) (figs. S1 to S6) (21) that is designed for in vivo sampling of thousands of cells by multiplexing RNA-seq while maintaining tight control over amplification biases and labeling errors. The method is based on fluorescence-activated cell sorting (FACS) of single cells into 384-well plates and subsequent automated processing that is done mostly on pooled and labeled material, leading to a dramatic increase in throughput and reproducibility. To explore the new technique, we sequenced RNA from more than 4000 mouse spleen single cells (table S1), focusing initially on a heterogeneous cell population enriched for expression of the CD11c surface marker. We hypothesized that this strategy for cell acquisition will sample a diverse collection of splenic cell types while focusing on the challenging DC populations (6, 7).

Our methodology uses three levels of barcoding (molecular-, cellular-, and plate-level tags) to facilitate molecule counting with a high degree of multiplexing. The strategy is to characterize cell subpopulations first by classifying single cells on the basis of low-depth RNA sampling and then studying transcriptional profiles at high resolution by integrating data from dozens to hundreds of cells within each unsupervised class. As shown in Fig. 1A, multiplexing 1536 cells in one sequencing lane provided an average of 22,000 aligned reads per cell, and after extensive normalization, these can be used to unambiguously define 200 to 1500 distinct RNA molecules from each cell. Our labeling and filtering scheme ensures that spiked-in technical controls show cell-to-cell variance that is compatible with the theoretical (binomial) sampling noise, comparing favorably with previously reported techniques (Fig. 1B) (18). This technical stability substantially increases the information content of the sampled transcriptional states, which can be directly modeled as unbiased samples of the cells’ mRNA pool. In contrast to technical spike-in controls or the bulk of detected genes, we observed high cellular variance for a large number of genes, many of which are well known cell type–specific markers, suggesting that this attests for the high degree of heterogeneity within the splenic cell population (Fig. 1B) and promoting the idea of classifying cells into subpopulations on the basis of covariation of such heterogeneous markers.

Fig. 1 Massively parallel single-cell RNA-seq.

(A) Distribution of mapped reads per cell in a multiplexed 1536-cell experiment. (B) Mean and variance in mRNA (blue) and spike-in controls (red). (C) Mean mRNA counts in replicated pooled population of homogeneous (FACS-sorted) pDCs.

To test how sensitive our strategy can be for characterizing the transcriptional state of subpopulations in the sample, we estimated coverage and mean mRNA molecule count reproducibility for groups of 10 to 40 single-cell profiles, representing 0.6 to 2.4% of the cells on one sequencing lane. Analysis of single-cell mRNA profiles from FACS-sorted plasmacytoid DCs (pDCs) (Fig. 1C and fig. S6) confirmed that pooling of homogeneous cell populations provides rich and highly reproducible transcriptional profiles. For a subpopulation at a frequency of 2.5%, the assay report on 1255 genes with a standard deviation of less than 35% of the mean, and on 324 genes with a standard deviation of 20% of the mean. Together, the availability of high-variance marker genes and the dynamic range provided by pooled single-cell transcriptional profiles enable unsupervised dissection and characterization of heterogeneous cell populations, opening the way for ab initio cell-type decomposition of splenic populations at a high level of detail.

We have implemented a probabilistic strategy for unsupervised classification of cells into “idealized types.” Hierarchical clustering (Fig. 2A) defined seeds of highly correlated cells, leading to the initialization of a probabilistic mixture model and classification of single cells into types or families of homogeneous states. Visualization of the multiclass data by using a new circular a posteriori projection technique (Fig. 2B) represented the splenic cell population as a combination of several molecular behaviors, five of which (classes I to V) being distinctively separated from a group of more loosely defined classes (classes VI and VII). The frequencies of classes I to V range between 3.7 and 17%, allowing in all cases to infer rich transcriptional states through in silico pooling of single-cell mRNA profiles within each class. Analysis of gene enrichment (table S2 and figs. S7 and S8) and comparison of these profiles with existing transcriptional profiles of classical hematopoietic populations (, unambiguously linked classes I to V to B cells, natural killer (NK) cells, macrophages, monocytes, and pDCs (Fig. 2C). The remaining classes were all linked to DCs. FACS analysis using classical surface markers confirmed our in silico estimations of the frequency of B cells and pDCs within the CD11c-enriched splenic cell population (fig. S9). Further analysis and additional single-cell quantitative polymerase chain reaction experiments confirmed that “marker” genes are robustly enriched in their relevant subpopulations (figs. S10 and S11). Using classical marker-based sorting, we further validated our approach with additional single-cell RNA-seq data from FACS-sorted B cells, NK cells, pDCs, and monocytes. Projection of the new data onto the model we generated from the splenic population showed remarkable compatibility between the traditional marker-based cell-type definition and the marker-free single-cell RNA-seq technique (Fig. 2D). Analysis of splenic cell populations therefore showcased single-cell RNA-seq as a direct and unsupervised way for identifying and characterizing subpopulations within heterogeneous tissues.

Fig. 2 Single-cell dissection of immune cell types.

(A) Color-coded correlation matrix of single-cell mRNA profiles. Groups of strongly correlated cells that are used to initialize a probabilistic mixture model are numbered and marked with white frames. (B) Circular a posteriori projection (CAP) plot summarizing the predictions of the probabilistic mixture model for the CD11c+ cells. Each cell is projected onto the two-dimensional sphere according to the posterior probability of its association with the model’s classes. The dimensions of the CAP plot should not be interpreted linearly or as principal components. (C) Bar plots depicting similarities of mean RNA counts in inferred types and Immgen expression profiles. The most correlated group of Immgen profiles is colored specifically as indicated for each type. (D) Shown are CAP plots depicting single-cell RNA-seq data sets acquired from marker-based FACS sorting for single pDCs, B cells, NK cells, and monocytes. Sorted cells are shown in red; density of the CD11c+ pool is shown in gray.

We profiled additional 1536 single cells from spleens that were exposed to lipopolysaccharide (LPS) for 2 hours (22), aiming to test how an immediate response to an infection-mimicking stimulus can be deciphered across the heterogeneous splenic cell population. We found that the LPS-treated cells are broadly classified into similar cell types to those observed in untreated cells, with some changes in the relative representation of different types (Fig. 3A). Using the non-LPS mixture model, we classified the non-LPS and LPS-exposed cells into classes and inferred a rich transcriptional profile within each class before and after treatment. Clustering 1575 variable genes identified groups of cell type–specific response genes (such as Tnf and Marco in macrophages and Xcl1 and Gzmb in NK cells), and a large group of type I interferon response genes (Irf7, Stat2, Ifit1, Cxcl10, and hundreds more) activated pervasively in all or almost all cell types (Fig. 3B, fig. S12, and tables S3 and S4).

Fig. 3 Response to LPS across multiple cell types.

(A) Inferred cell-type frequencies before and after LPS treatment. (B) Clustering of more than 1300 genes give mean inferred transcriptional mean in each cell type before and after LPS infection (±). Full gene list is provided in table S4.

With thousands of samples readily available, single-cell RNA-seq is poised to go beyond the classical cell-types hierarchies that are outlined by current marker-based approaches, examining complex relations between cell subpopulations or continuous spectra of types. Analysis of 1031 single cells that were associated with DC-related classes (VI and VII) in our unsupervised CD11c+ model (Fig. 4A) indicated that although 15% of these cells (class DC1) are strongly linked together, the remaining bulk of DCs could not be organized along a clear clustering hierarchy (11). Nevertheless, we found strong support for substantial internal organization within the remaining DC population (DC2 to DC4) (table S5), including a group of cells coexpressing Relb, Nfkbia, and additional associated genes (DC2) (fig. S13). More generally, we have identified several gene modules that represent combinatorial pathway activity within the DC bulk (fig. S14), indicating that despite the lack of a clear hierarchy, the DC cell population is governed by a high degree of transcriptional organization. Additional single-cell sequencing of CD8+ CD86+, CD8int CD86, and CD4+ FACS-sorted populations (Fig. 4B) showed that this organization can be approached to a limited extent with existing marker-based classification. Remarkably, exposure to LPS reorganizes the DC population substantially, with a large number of gene modules being activated in a highly heterogeneous fashion (Fig. 4C and fig. S15). According to our analysis, certain specific CD4+ DC subpopulations are activating the Irf4, tumor necrosis factor, and transforming growth factor–β pathways (fig. S16 and table S6), whereas other pathways (such as Irf7) are activated pervasively (table S5). This combinatorial activity of pathways within the LPS-exposed DC pool is not represented in preexisting DC subtypes according to our data. Committed and developmentally stable myeloid and lymphoid cell types maintain their identity during immediate response to infection while responding through generic and cell type–specific pathways. These pathways create substantial cell-to-cell variance and define new cell subpopulations within each of these cell types (fig. S17), forming diversity that may have functional implications. Observation of transcriptional subpopulations, however, does not necessarily imply the existence of further committed and preprogrammed cell subtype hierarchy.

Fig. 4 Gene modules and the distribution and redistribution of DC states.

(A) Single-cell correlation matrix for cells classified as DCs, showing detected subclasses using white frames. (B) Type/class distributions of single-cell RNA-seq data from three different FACS-sorted DC (CD11c enriched) populations: CD8a+ CD86+; CD8a intermediate (int) CD86; and CD8a CD4+ ESAM+ (fig. S13A). (C) Gene correlation matrix is depicting potential LPS-dependent interactions between 225 genes. Key genes are indicated, with the complete list available in fig. S15.

We presented this framework for broad sampling of single-cell transcriptional states from tissues and demonstrated how it can be used to dissect complex functions in a bottom-up fashion. MARS-seq can be readily applied to tissues and organs in normal and disease states to redefine their cell-type and cell-state compositions and link it to detailed genome-wide transcriptional profiling. Given the inherent stochasticity and heterogeneity of multicellular tissues, this approach can prove essential for understanding how in vivo biological function emerges from complex cell ensembles.

Supplementary Materials

Materials and Methods

Figs. S1 to S17

Tables S7 to S9

References (2325)

Tables S1 to S6

References and Notes

  1. Materials and methods are available as supplementary materials on Science Online.
  2. Acknowledgments: Research was supported by the European Research Council and Israel Science Foundation (1782/11, 1050/12) grants to I.A. and A.T. RNA-seq data are deposited in Gene Expression Omnibus, accession no. GSE54006.
View Abstract

Stay Connected to Science

Navigate This Article