Research Article

Cell type transcriptome atlas for the planarian Schmidtea mediterranea

See allHide authors and affiliations

Science  25 May 2018:
Vol. 360, Issue 6391, eaaq1736
DOI: 10.1126/science.aaq1736

Mapping the planarian transcriptome

A cell type's transcriptome defines the active genes that control its biology. Two groups used single-cell RNA sequencing to define the transcriptomes for essentially all cell types of a complete animal, the regenerative planarian Schmidtea mediterranea. Because pluripotent stem cells constantly differentiate to rejuvenate any part of the body of this species, all developmental lineages are active in adult animals. Fincher et al. determined the transcriptomes for most, if not all, planarian cell types, including some that were previously unknown. They also identified transition states and genes governing positional information. Plass et al. used single-cell transcriptomics and computational algorithms to reconstruct a lineage tree capturing the developmental progressions from stem to differentiated cells. They could then predict gene programs that are specifically turned on and off along the tree, and they used this approach to study how the cell types behaved during regeneration. These whole-animal transcriptome “atlases” are a powerful way to study metazoan biology.

Science, this issue p. eaaq1736, p. eaaq1723

Structured Abstract

INTRODUCTION

The complete sequence of animal genomes has had a transformative impact on biological research. Whereas the genome sequence of an organism contains the information for its development and physiology, the transcriptomes (the sets of actively transcribed genes) of the cell types in an organism define how the genome is used for the unique functions of its cells. Cell number and complexity have historically made the identification of all cell types, much less their transcriptomes, an extreme challenge for most multicellular organisms. Recent advances in single-cell RNA sequencing (SCS) have greatly enhanced the ability to determine cell type transcriptomes, with SCS of thousands of cells readily achievable.

RATIONALE

We reasoned that it might be possible, given these advances, to determine the transcriptomes of essentially every cell type of a complete organism possessing an unknown number of cell types. The planarian Schmidtea mediterranea, famous for its regeneration ability, is an attractive case study for such an undertaking. Planarians possess a complex anatomy with diverse differentiated cell types, including many found across animals. Furthermore, planarians contain a proliferating cell population called neoblasts that includes pluripotent stem cells. Neoblasts mediate regeneration and constitutive tissue turnover. Consequently, lineage precursors for essentially all differentiated cells types are also present in adults. Finally, planarians constitutively express positional information guiding tissue turnover. Therefore, comprehensive SCS at a single time point (the adult) could allow transcriptome determination for all differentiated cell types and for lineage precursors, and could identify patterning information that guides new cell production and organization. Capturing this information in most organisms would require sampling adults and many transient embryonic stages.

RESULTS

We used the SCS method Drop-seq to determine the transcriptomes for 66,783 individual cells from adult planarians. We locally saturated cell type coverage by iteratively sequencing distinct body regions and assessing the frequency of known rare cell types in the data. Clustering the cells by shared gene expression grouped cells into broad tissue classes. Subclustering of each broad tissue type in isolation enabled separation of cells into the cell populations constituting each tissue. These analyses enabled the identification of a previously unidentified tissue group and the classification of poorly characterized tissues into their constituent cell types, including numerous previously unknown cell types. Transcriptomes were identified for many rare cell types, including those that exist as rarely as ~10 cells in an animal that has 105 to 106 cells, which suggests that near-to-complete cellular saturation was reached. In addition, transcriptomes for known and novel lineage precursors, from pluripotent stem cell to differentiated cell types, were generated. Precursor transcriptomes identified transcription factors required for maintenance of associated differentiated cells during homeostatic cell turnover. Finally, the data were used to identify genes regionally expressed in muscle, which is the site of planarian patterning gene expression.

CONCLUSION

We successfully used SCS to generate transcriptomes for most to all cells of a complete organism. This resource provides a wealth of data regarding the cellular site of expression of thousands of conserved genes and the transcriptomes for cell types widely used in animals. These data will inform studies of these genes and cell types broadly, and will provide a resource for the fields of planarian biology and comparative evolutionary biology. This work also provides a template for the generation of cell type transcriptome atlases, which can be applied to a large array of organisms.

An atlas of planarian cell type transcriptomes.

High-throughput single-cell RNA sequencing of adult planarians reveals a cell type transcriptome atlas that includes rare cell types and many novel cell populations, cellular transition states, and patterning information, as demonstrated by two regionally expressed genes in muscle.

Abstract

The transcriptome of a cell dictates its unique cell type biology. We used single-cell RNA sequencing to determine the transcriptomes for essentially every cell type of a complete animal: the regenerative planarian Schmidtea mediterranea. Planarians contain a diverse array of cell types, possess lineage progenitors for differentiated cells (including pluripotent stem cells), and constitutively express positional information, making them ideal for this undertaking. We generated data for 66,783 cells, defining transcriptomes for known and many previously unknown planarian cell types and for putative transition states between stem and differentiated cells. We also uncovered regionally expressed genes in muscle, which harbors positional information. Identifying the transcriptomes for potentially all cell types for many organisms should be readily attainable and represents a powerful approach to metazoan biology.

The complete sequence of animal genomes, such as that of Caenorhabditis elegans reported in 1998 and humans in 2001, has had an immeasurable impact on research (13). Whereas the genome sequence of an organism contains the information for its development and physiology, the transcriptomes (the sets of actively transcribed genes) of the cell types in an organism define how the genome is used for the unique functions of its cells. Recent advances in RNA sequencing of individual cells have greatly enhanced the ability to determine cell type transcriptomes (4, 5), and single-cell RNA sequencing (SCS) of thousands of cells has become readily achievable (6). For example, the transcriptomes of most cell types of complete C. elegans L2 larvae and numerous mouse cells were recently reported with this approach (7, 8). We reasoned that it might be possible, given these advances, to determine the transcriptomes of essentially every cell type of a complete adult organism possessing an unknown number of cell types.

Multicellular organisms can have many millions of cells and hundreds of different cell types, and the cellular composition of organisms varies markedly over the course of development. This complexity has historically made the identification of all cell types, much less their transcriptomes, for most multicellular organisms an extreme challenge. The planarian Schmidtea mediterranea is an attractive case study organism for which to generate the transcriptomes for all cells in an animal. Planarians are famous for their ability to regenerate essentially any missing body part, and they possess a complex body plan containing many characterized cell types (9, 10). Despite this complexity, with an average planarian possessing ~105 to 106 cells (11), planarians are smaller with simpler anatomy than humans and many other model systems such as mice. Planarians are also easily dissociated into single-cell suspensions, allowing potential characterization of all cells. Because some planarian cell types, such as glia (12, 13), have only recently been defined with molecular markers, it is probable that undescribed planarian cell types exist. The combination of known and potentially unknown cell types is attractive for developing approaches that can apply to diverse organisms with varying amounts of available cell type information. Planarians possess a population of proliferative cells called neoblasts that contain pluripotent stem cells, enabling their ability to regenerate and replace aged cells in tissue turnover (14). Neoblasts are the only cycling somatic cells and the source of all new tissue. Neoblasts contain multiple classes of specialized cells, with transcription factors expressed to specify cell fate (15, 16). Because of the constant turnover of planarian tissues, essentially all stages of all cell lineages, from pluripotent stem cell to differentiated cell, are anticipated to be present in the adult (9, 17).

Planarians also constitutively and regionally express dozens of genes that have roles in positional information (18). These genes, referred to as positional control genes (PCGs), are expressed in a complex spatial map spanning anterior-posterior (AP), medial-lateral (ML), and dorsal-ventral (DV) axes (18), and their expression is largely restricted to muscle (19). PCGs are hypothesized to constitute instructions for the maintenance and regeneration of the body plan. Because of these features, comprehensive SCS at a single time point (the adult) could allow transcriptome identification for all differentiated cell types, lineage precursors for these cells, and the patterning information that guides new cell production and organization. To capture this information in most organisms would require sampling the adult and many transient stages of embryogenesis.

Single-cell RNA sequencing of 50,562 planarian cells

Planarians have a complex internal anatomy including a brain, ventral nerve cords, peripheral nervous system, epidermis, intestine, muscle, an excretory system (the protonephridia), and a centrally located pharynx (10). These major tissues are composed of multiple different cell types that, together with other gland and accessory cells, constitute the planarian anatomy.

To detect planarian cell types and states in an unbiased manner, including rare cell types, we used the SCS method Drop-seq (6) to determine the transcriptomes for 50,562 individual cells from adults (Fig. 1A, fig. S1A, and table S1). Planarians contain 105 to 106 cells (11), and yet some cell types are extremely rare, such as the ~100 photoreceptor neurons of eyes (20). Given such rarity, sequencing random cells from entire animals might not reach cell type saturation with even 105 cells sequenced. Therefore, we divided animals into five sections (head, prepharyngeal region, trunk with pharynx removed, tail, and the pharynx itself) and cells from each region were dissociated, sorted by flow cytometry, and sequenced (Fig. 1A, fig. S1A, and table S1). Sequences were aligned to a previously assembled transcriptome (21). We targeted cell type saturation by assessing coverage of known, rare cell types during iterative rounds of cell isolation and sequencing in a region-by-region approach. In total, 25 separate Drop-seq runs were completed, yielding cells with an average of 3020 unique molecular identifiers (UMIs) and 1404 genes (~13% of the estimated detection limit) (fig. S1, A to C, table S1, and supplementary materials).

Fig. 1 Drop-seq of 50,562 planarian cells.

(A) Schematic illustrating the workflow used to isolate and cluster single cells. (B) t-SNE representation of 44 clusters generated from the data. (C and D) Upper panels: t-SNE plots colored according to gene expression (red, high; blue, low) for highly enriched genes from nine planarian tissue classes. Red outlines denote clusters assigned to that tissue class. Lower panels: FISH images for tissue-enriched genes. Scale bars: whole-animal images, 200 μm; insets, 50 μm.

Genes with high variance and expression across cells were used to generate informative principal components using Seurat (6, 22). Cells were clustered using Seurat into 44 distinct major clusters using a graph-based clustering approach and were visualized by applying t-distributed stochastic neighbor embedding on transcriptomes (t-SNE) (Fig. 1B and fig. S1D). Cells from different regions were largely interspersed in the t-SNE plots, except for cells from the pharynx, which contains many unique cell types (fig. S2A). Cell doublets were scarce within the data and did not affect clustering results (fig. S2, B to D). To determine the identity of each cluster, we identified cluster-specific genes by means of a receiver operating characteristic curve analysis and a likelihood ratio test based on zero-inflated data (table S2) (23). Expression of established cell type markers within each cluster and fluorescence in situ hybridization (FISH) with cluster-specific markers enabled cluster assignment to one of eight previously identified planarian tissue classes: protonephridia, neural, epidermis, intestine, pharynx, muscle, neoblast, and parenchymal (Fig. 1C). The parenchymal class was previously termed “parapharyngeal” because of localization of some enriched markers around the pharynx (24). However, most cell populations within this class exhibit broader localization in the planarian parenchyma. We also identified a ninth group of clusters marked by CTSL2 (dd175) expression (Fig. 1D). CTSL2 (dd175) FISH revealed cells with long processes distributed broadly. We designated this group of clusters the cathepsin+ class. Hierarchical clustering of a subset of 5000 cells by Euclidean distance, independently of Seurat, recapitulated assignment of cells into these nine tissue classes (fig. S3).

Clusters representing the major planarian tissue classes were generally heterogeneous in terms of gene expression. For example, neural clusters contained a large number of known neuronal cell types, which suggests that multiple distinct cell types could be identified within each major cluster (fig. S4). Therefore, we systematically subclustered each major cluster group (Figs. 2 to 6), identifying >150 subclusters, and determined genes with enriched expression in cells of each subcluster (table S2). Subclustering proved a powerful approach to defining the collection of cell types that constituted each major cluster and identified candidate transition states between stem cells and differentiated cells.

Fig. 2 Subclustering identifies neoblast subpopulations.

(A) t-SNE representation of 22 clusters generated from subclustering cells with smedwi-1 expression ≥ 2.5 [ln(UMI-per-10,000 + 1)]. Identity of numbered clusters unknown. PP, parenchymal; PN, protonephridia. Intestine cluster is indicated by lower expression of smedwi-1 and enriched gata4/5/6 and hnf-4 expression. (B) smedwi-1+ t-SNE plots colored by prox-1 and zfp-1 expression. (C) Double FISH image for dd_10988 and smedwi-1. Yellow arrows highlight coexpression; white arrows denote absence of coexpression. (D) smedwi-1+ t-SNE plots colored by expression of differentiated tissue-enriched genes. Arrows indicate gene expression sites. (E) Left: smedwi-1+ t-SNE plots colored by gata4/5/6 and hnf-4 expression. Right: All cluster t-SNE plots colored by gata4/5/6 and hnf-4 expression. C, cathepsin+ cells; I, intestine; γ-Nb, γ-neoblasts. (F) smedwi-1+ t-SNE plots colored by ETS1 (dd2092) and FOXF1 (dd6910) expression. (G) Epidermal cell maturation stages. (H) t-SNE representation of epidermal subclusters. FISH images labeled by their associated cluster(s) are shown. (I) Epidermal t-SNE plots colored by epidermal lineage marker expression from (G). (J) dd_554 FISH. (K) Pharynx t-SNE plots colored by smedwi-1 and dd_554 expression. (L) Pharynx t-SNE plot colored by the body region from which each cell was isolated. (M and N) t-SNE plots, colored by smedwi-1 expression, generated by subclustering cells identified as (M) protonephridia, intestine, muscle, cathepsin+, neural, and (N) parenchymal. (O) Parenchymal t-SNE plots colored by expression of eight transcription factor–encoding genes enriched in (N). Arrows indicate gene expression sites. Scale bars, 50 μm (C), 200 μm [(H) and (J)].

Progenitors in planarian cell lineages

Neoblasts are abundant and express canonical marker genes such as smedwi-1 (25), vasa (26), and bruli (27) (Fig. 1C and fig. S5, A and B). Neoblasts are cycling cells and consequently show enrichment in expression of S/G2/M cell cycle markers (fig. S5C). To identify the transcriptomes of potential neoblast subpopulations, we selected in silico and subclustered 12,212 cells with smedwi-1 expression of ≥2.5 [ln(UMI-per-10,000 + 1)] (Fig. 2A and fig. S6A). Resulting clusters on the left of the plot were enriched in S/G2/M cell cycle markers (fig. S6B). These clusters included the previously characterized major specialized neoblast classes, including γ-neoblasts (intestine progenitors) and ζ-neoblasts (epidermis progenitors) (28) (Fig. 2B). A number of other subclusters were also identified, including one marked by expression of the contig dd_10988 (fig. S6, C and D). FISH confirmed that dd_10988 was expressed in a neoblast subset as well as in a number of smedwi-1 cells (Fig. 2C).

The large number of subclustered neoblasts facilitated transcriptome determination for candidate progenitors for many planarian tissues. Clusters to the right of the plot were marked by a G1/G0 cell cycle status and displayed expression of various tissue markers (Fig. 2D and fig. S6B). These included a population defined by expression of POU2/3, a marker for protonephridia-specialized neoblasts (29), and a number of subclusters expressing markers also expressed in specific differentiated tissues or their postmitotic precursors, such as ChAT for the nervous system, prog-1 for the epidermis, ASCL4 (dd1854) for parenchymal cells, and COL4A6A (dd2337) for muscle (Fig. 2D and fig. S7A). Expression of these markers in smedwi-1+ cells suggests that these cells could be transition states for those lineages. Several markers enriched in the dd_10988+ subcluster, including dd_10988, were also expressed in cells of the two smedwi-1+ neural subclusters, as well as in neural cells of the initial clustering (figs. S6C and S7, B and C), which suggests that the dd_10988+ subcluster is enriched in neural progenitors. Likewise, many markers enriched in the PLOD1 (dd3457)+ subcluster were also expressed in the smedwi-1+ muscle subcluster, which suggests that the PLOD1 (dd3457)+ subcluster is enriched in muscle progenitors (fig. S7D). prox-1, hnf-4, nkx2.2, and gata4/5/6 encode transcription factors expressed in intestinal progenitors (28), and in these data all four genes were expressed in γ-neoblasts (Fig. 2, B and E, and fig. S7E), with hnf-4, nkx2.2, and gata4/5/6 also expressed in intestinal clusters (Fig. 2E and fig. S7F). hnf-4, but not prox-1, nkx2.2, and gata4/5/6, was also expressed in a smedwi-1+ cell cluster enriched in CTSL2 (dd175) expression (the cathepsin+ cell marker) and in differentiated cathepsin+ cells (Fig. 2E and fig. S7G). The additional transcription factor–encoding genes ETS1 (dd2092) and FOXF1 (dd6910) were expressed with hnf-4 in these cells and also displayed expression patterns similar to that of CTSL2 (dd175) in the animal (Fig. 2F and fig. S8, A and B) and have recently been shown to regulate the planarian pigment cell lineage (30). Pigment cells clustered within the cathepsin+ cell class in our data (see below). By FISH, hnf-4 was indeed coexpressed with nkx2.2 and gata4/5/6 in the intestine, but was also coexpressed with cathepsin+ cell markers (fig. S8, C to E), which suggests that hnf-4 is expressed in two distinct lineages. These data demonstrate the utility of this approach for identifying potentially novel neoblast progenitor populations and the transcription factors that define them.

Some planarian neoblasts display pluripotency in clonal assays and are hypothesized to generate all lineage-committed neoblast subpopulations, and are called clonogenic neoblasts (14). We selected cells expressing high levels of smedwi-1 but that excluded ζ- and γ-neoblasts [including subclusters 2, 9, dd_10988+, dd_6998+, dd_17796+, SAMD15 (dd19710)+, dd_11221+, dd_13666+, and PLOD1 (dd3457)+] and subclustered this set of neoblasts in isolation (fig. S9, A and B). A remnant ζ-neoblast population (clusters 4 and 6), as well as protonephridia progenitors (cluster 10) and the putative neural (clusters 2 and 5) and muscle (clusters 1 and 9) progenitor populations described above in the smedwi-1+ cell subclustering, were identified (fig. S9C and table S2). Clusters 0, 3, 7, and 8 were largely devoid of specifically enriched markers (table S2). It is therefore possible that clonogenic neoblasts are defined by an absence of any tissue-specific markers, as opposed to the unique expression of specific genes.

When all cells were clustered together, numerous smedwi-1+ cells were present regionally within each of the other eight major planarian tissue clusters (Fig. 1C). We reasoned that these smedwi-1+ cells could represent progenitors for the cell types within each associated tissue cluster. We therefore examined these smedwi-1+ cells after taking each tissue class in isolation and subclustering the data.

The planarian epidermis contains ciliated and nonciliated cells as well as dorsal-ventral boundary epidermis (10, 28, 31), and the lineage from ζ-neoblasts to epidermal cells is well characterized (3133) (Fig. 2G). SCS reveals gene expression transitions during neoblast epidermal differentiation (31); subclustering 11,021 epidermal lineage cells (Fig. 1C) produced subclusters associated with each epidermal lineage stage (Fig. 2H and fig. S10, A and B). Plotting gene expression onto this t-SNE map showed a continuous progression from ζ-neoblast to differentiated cells (Fig. 2I and fig. S10C).

The gene dd_554 [SmedASXL_059179 in (34)] is expressed in candidate pharynx progenitors (34) (smedwi-1+ cells at the pharynx base) and in smedwi-1 cells within the pharynx (34) (Fig. 2J and fig. S11). Subclustering the 1083 nonmuscle, non-neuronal pharynx cluster cells (Fig. 1C) revealed that smedwi-1+ cells sequenced from nonpharynx midbody tissue clustered with pharynx cells, despite not being part of the pharynx itself (Fig. 2, K and L). Because the pharynx lacks neoblasts, pharynx-specialized neoblasts must be outside of the pharynx. This clustering of neoblasts with pharynx cells clearly demonstrates the ability of SCS data clustering to associate lineage precursors with differentiated cells. Similarly, many dd_554+ cells sequenced from outside of the pharynx clustered with pharynx cells (Fig. 2, K and L). Plotting smedwi-1/dd_554 expression onto pharyngeal subclusters revealed a progression from smedwi-1+ cells isolated outside the pharynx to dd_554+ cells isolated outside the pharynx to dd_554+ cells isolated inside the pharynx to pharyngeal cells (Fig. 2, K and L). These epidermis and pharynx examples demonstrate how precursor stages within cell lineages can be identified from subclustering cells within a major tissue class. Because planarians constantly generate new differentiated cells for essentially all tissue types (17, 20), transcriptomes for lineage precursors for essentially every cell type in the body could in principle be studied with this approach.

Cell lineages for many planarian cell types are largely uncharacterized. After tissue type subclustering, smedwi-1+ cells were present with locally high expression in resultant t-SNE plots; smedwi-1 expression level gradually declined in cells across subclusters (Fig. 2, M and N). These smedwi-1+ cells, similar to the epidermis and pharynx cases, could represent transition states between pluripotent neoblasts and differentiated cells for the various cells of the protonephridia, intestine, muscle, nervous system, parenchymal, and cathepsin+ cells (Fig. 2, M and N). The smedwi-1+ cells found within subclusters of the major tissue type classes generally displayed enriched expression of at least one transcription factor. For example, smedwi-1 expression was high within cells at the center of the parenchymal cell t-SNE plot and displayed a graded decrease projecting in all directions into seven major parenchymal subclusters (Fig. 2N). Each projection was associated with enriched expression of one or more distinct transcription factors, identifying candidate transcription factors associated with the specification of different parenchymal cell types (Fig. 2O).

Subclustering cells by tissue type uncovers rare cell types

The protonephridia, the planarian excretory and osmoregulatory system, contains flame cells for filtering fluids, proximal and distal tubule cells, and a collecting duct (29, 35, 36). The protonephridia is a model tissue for studying organ regeneration and the evolution of kidney-like excretory systems. Subclustering of 890 protonephridia cells (Fig. 1C) identified each known protonephridia cell type as a separate subcluster, revealing the complete transcriptomes of these cells (Fig. 3A and fig. S12, A to C). Furthermore, two protonephridia subclusters with smedwi-1+ cells were identified (Fig. 2M). One was enriched in flame cell gene expression (e.g., dd_2920) and the other in a proximal tubule marker (dd_10830), which suggests that they might be flame and tubule cell precursors, respectively (fig. S12, D and E).

Fig. 3 Subclustering of tissues reveals transcriptomes for known and novel cell populations.

(A) t-SNE representation of the protonephridial subcluster. FISH images are labeled by their associated cluster. (B) t-SNE representation of intestinal subclusters. (C) Double FISH images of genes enriched in separate intestinal subclusters. Numbers indicate the associated subcluster for each marker. (D) Top: Cell trajectory of enterocyte and outer intestinal cell lineages produced by Monocle. Cells are colored by identity. Bottom: Heat map of branch dependent genes (q value < 10–145) across cells plotted in pseudo-time (41). Cells, columns; genes, rows. Beginning of pseudo-time is at center of heat map. “Cl.” annotation indicates a log-fold enrichment ≥ 1 of the gene in that intestine Seurat cluster. (E) Top left: Intestine t-SNE plot colored by expression of PTF1A (dd6869). Top right: Illustration of cutting scheme used to generate fragments. Bottom: dd_115 and dd_75 FISH of control and PTF1A (dd6869) RNAi animals. Animals were cut and fixed 23 days after the start of double-stranded RNA (dsRNA) feedings. Scale bars: whole-animal/fragment images, 200 μm; insets, 50 μm.

Less is known regarding the full complement of cell types in other planarian tissues. Ultrastructural studies suggested that the planarian intestine contains two cell types: absorptive enterocytes and secretory goblet cells (37, 38). Subclustering of 3025 intestinal cells (Fig. 1C) revealed three distinct cell populations (clusters 4, 5, and 8) (Fig. 3B and fig. S13, A and B). FISH with subcluster-enriched markers (table S2) revealed distinct intestine components. Cluster 4 represented an inner intestine cell layer (Fig. 3C) and was enriched for absorptive enterocyte markers (39). Cluster 8 cells were largely present within the primary intestine branches, resembling the pattern of goblet cells (40). A third group (cluster 5) represented an outer intestine cell layer and displayed a set of enriched genes different from that of clusters 4 and 8 (Fig. 3C and table S2). In addition to these three main intestine components, clusters representing putative transition states were also identified. Clusters 1, 3, and 6 included many smedwi-1+ cells (Fig. 2M). Genes with enriched expression in clusters 0 and 7 displayed expression spanning into the enterocyte cluster (cluster 4), suggesting these might be enterocyte transition states (fig. S14A). Genes with enriched expression in clusters 2 and 3 displayed expression spanning into the outer intestine cluster (cluster 5) and might reflect transition or variant states of these cells (fig. S14B). The Monocle toolkit can be used to predict cellular transitions in lineages (41) and was used to build single-cell trajectories for the enterocyte and outer intestine cell lineages, closely recapitulating the candidate transition states identified by Seurat (Fig. 3D, fig. S15, and table S3).

Several transcription factors required for the specification of various planarian cell types have been identified with RNA interference (RNAi) and gene expression studies. Because of constant tissue turnover, RNAi of transcription factor–encoding genes expressed in specific classes of specialized neoblasts in adult planarians can lead to steady depletion of the cell type generated by that specialized neoblast class (29, 42, 43). The transcriptomes identified here generate a resource of enriched gene expression for different cell types, including transcription factor–encoding genes. Accordingly, inhibition of the transcription factor–encoding PTF1A (dd6869) gene, which had enriched expression in candidate transition states for the outer intestine cluster, strongly reduced this cell population while not affecting absorptive enterocytes of the intestine (Fig. 3E).

The nervous system displays by far the greatest known cell type composition complexity of the major planarian tissues. By subclustering 11,907 neuronal cells (Fig. 1C), we identified 61 distinct subclusters representing a diversity of cell types and states (Fig. 4A and fig. S16A). Twelve subclusters had high smedwi-1 expression, which suggested that they represent neuronal precursors (fig. S16B). Cluster 10 contained cells of the brain branches, as determined by expression of gpas (44) and pds (45) (Fig. 4B and fig. S16C). Three subclusters (clusters 3, 7, and 8) were defined by expression of pc2 (encoding a neuropeptide-processing proprotein convertase) as well as an assortment of markers for rare neuron classes in the cephalic ganglia and ventral nerve cords (Fig. 4C and fig. S16, D and E). We also sequenced an additional 7766 cells from the brain region to expand the number of cells in these clusters (fig. S16, F to H). In addition to these large clusters, there existed a number of smaller, compact, and well-separated subclusters. These could be further divided into ciliated and nonciliated neurons according to the expression of rootletin (dd6573), which encodes a ciliary rootlet component (Fig. 4D and fig. S17A). Because of further heterogeneity within these clusters (e.g., opsin+ presumptive photoreceptors were present together, but not as a separate cluster), data from these two cell sets (ciliated, not ciliated) were each taken in isolation for further subclustering. This yielded 37 nonciliated neuron subclusters (Fig. 4E and figs. S17B, S18, A and B, and S19 to S21) and 25 putatively ciliated neuron subclusters (Fig. 4F and figs. S17B, S22, A and B, and S23B). We assessed the localization of cells associated with 46 of 62 of these subclusters by FISH using subcluster-specific markers. The observed cell types had a wide range of patterns including rare cell types such as photoreceptor neurons (Fig. 4, E and F, and figs. S18 to S23). Many genes had enriched expression in multiple clusters; the distribution of neural cell types they represented was defined by a combinatorial set of markers (figs. S18 to S23). A number of identified cell types from different subclusters displayed similar localization patterns. However, FISH demonstrated no overlap in subcluster-specific markers, consistent with the SCS data (Fig. 4G). For several neural subtypes, we found smedwi-1+ candidate precursor cells. Four nonciliated neuron subclusters (subclusters 1, 2, 4, and 12) and a single ciliated neuron subcluster (subcluster 1) were enriched in smedwi-1 expression (fig. S24A). Nonciliated neuron subcluster 4 also expressed gata4/5/6, as did six smedwi-1 clusters (clusters 14,16/33, 24, 26, and 32) that radiated out from central smedwi-1+ cells, raising the possibility that these smedwi-1+ cells constitute precursors for these populations (fig. S24B).

Fig. 4 Subclustering of neural cells reveals known and novel cell populations.

(A) t-SNE representation of the neural subcluster. (B and C) Top: t-SNE plots colored by expression of gpas (B) and pc-2 (C). Bottom: FISH for gpas (B) and pc-2 (C) labeled with the associated neural subcluster. (D) t-SNE plot in (A) overlaid with outlines indicating the ascribed identity of each subcluster as ciliated or nonciliated. (E and F) t-SNE representation of subclustered cells identified in (D) as nonciliated (E) or ciliated (F). (G) Double FISH images of three sets of nonciliated neuron genes enriched in separate subclusters. Numbers indicate the associated nonciliated neuron subcluster(s) for each marker. Scale bars: whole-animal images, 200 μm; insets, 50 μm.

The pharynx is a muscular tube used for feeding and defecation (10). It is contained within an epithelial cavity and connects to the intestine at its anterior end via an esophagus. Pharyngeal muscle cells and pharyngeal neurons clustered together with the other muscle cells and neurons of the body (figs. S25A and S26A). Other pharynx-associated cells, including cells from isolated pharynges and surrounding tissue, constituted the other major pharynx clusters. These non-neural, nonmuscle pharynx and pharynx-associated cells (Fig. 1C, n = 1083 cells) were subclustered, and FISH was performed on cluster-enriched markers (Fig. 5A and fig. S25, B and C). Subclusters included pharyngeal cavity epithelium cells (clusters 7 and 8), the epithelial pharynx lining (clusters 1 and 5), the mouth and esophagus (cluster 9), cells near the pharynx opening (cluster 6), and cells that constitute the connection to the planarian body (cluster 4). FISH confirmed nonoverlapping expression patterns for markers of tested separate cell populations (Fig. 5B).

Fig. 5 Tissue subclustering identifies cell populations of poorly characterized tissues.

(A) t-SNE representation of the pharynx subcluster. FISH images are labeled by their associated cluster(s). (B) Double FISH images of pharynx markers enriched in separate subclusters. Numbers indicate the associated pharynx subcluster(s) for each marker. (C) t-SNE representation of the muscle subcluster. (D) Double FISH images of two muscle markers enriched in separate subclusters. Numbers indicate the associated muscle subcluster for each marker. (E) t-SNE representation of the parenchymal subcluster. (F) Double FISH images of three sets of parenchymal markers enriched in separate subclusters. Numbers indicate the associated parenchymal subcluster for each marker. (G) Top left: Parenchymal t-SNE plot colored by expression of nkx6-like. Top right: Illustration of cutting scheme used to generate fragments. Bottom: dd_515 and dd_385 FISH of control and nkx6-like RNAi animals. Animal sections were cut and fixed 23 days after the start of dsRNA feedings. Scale bars: whole-animal/fragment images, 200 μm; insets, 50 μm.

Planarian muscle expresses collagen in addition to canonical muscle genes such as troponin and tropomyosin (19). Muscle exists in a subepidermal body wall layer, in the pharynx, surrounding the intestine, and in a DV domain (46). Subclustering 5014 muscle cells (Fig. 1C) revealed seven smedwi-1+ candidate precursor subclusters (clusters 0, 1, 3, 4, 5, 10, and 11) (Fig. 2M), as well as subclusters containing body wall muscle (cluster 7), pharyngeal muscle (cluster 2, 8, 9, and 12) (fig. S26A), a population of muscle cells enriched around the intestine (cluster 6), and an unidentified population (cluster 13) (Fig. 5C and fig. S26, B and C). Markers for body wall muscle (cluster 7) and cluster 13 were expressed in nonoverlapping cells by FISH (Fig. 5D).

Whereas some molecular characterization existed for the seven broad planarian tissue classes previously mentioned, very little is known regarding the cellular composition of the two remaining classes. The parenchymal class (Fig. 1C) (24) was highly heterogeneous, with subclustering of 2120 cells identifying many distinct cell populations (Fig. 5E and figs. S27, A and B, and S28B). In addition to eight smedwi-1+ putative precursor subclusters (clusters 0, 1, 2, 3, 4, 6, 8, and most of 9) (Fig. 2N), parenchymal cell subclustering revealed 13 well-separated differentiated cell subclusters. FISH showed that each of these differentiated cell populations were present as scattered cells, presumably within a mesenchymal tissue layer called the parenchyma that surrounds major planarian organs (10). Previous morphological studies determined that the parenchyma is composed of multiple gland cells, neoblasts, and “fixed parenchymal cells” characterized through histological and electron microscopy studies as a likely phagocytic cell with long cellular processes filling most of the parenchymal space (10, 47, 48). Some identified parenchymal subclusters appeared to be gland cells, displaying processes extending to the epidermis, defining transcriptomes for these cells. Candidate gland cell types included two that were exclusively dorsal (clusters 16 and 17), two exclusively lateral (clusters 10 and 13, including marginal adhesive gland cells and an unknown cell population), four present both dorsally and ventrally (clusters 7, 11, 14, and 15), and one present ventrally near the brain (cluster 19). Three subclusters (clusters 5, 12, and 18) contained cells with patterns similar to those of planarian neoblasts, but were not neoblasts. Finally, a single subcluster contained large cells surrounding the pharynx (small group of cluster 9 cells) and were enriched for expression of previously identified metalloprotease-encoding genes (49). Three pairs of parenchymal subclusters (six subclusters total) were confirmed to exist in nonoverlapping populations by FISH (Fig. 5F).

The transcription factor–encoding gene nkx6-like was expressed in a parenchymal cell population marked by dd_515. Inhibition of nkx6-like ablated dd_515 cells, while not affecting a distinct, non-enriched parenchymal cell population marked by dd_385 (Fig. 5G). These results further highlight the potential to use the data to ablate many specific cell types in the animal.

The final major class of cells, the cathepsin+ group, contained 7034 cells (Fig. 1D). This group of clusters contained recently described glia and pigment cells (12, 13, 50). Subclustering of cathepsin+ cells identified four subclusters expressing smedwi-1 that represented putative precursor cells (clusters 0, 1, 3, and 6) (Fig. 2M), a glial subcluster (cluster 15), and two pigment cell populations (clusters 11 and 14), identifying transcriptomes for these cell types (Fig. 6A and figs. S29, A and B, and S30B). Eight cathepsin+ subclusters represented previously unidentified cell populations. FISH revealed striking, elaborate morphologies for most of these cells, involving long processes and unique distributions (Fig. 6A and figs. S29B and S30B). Cells from subclusters 5 and 10 were spread throughout the planarian body, with long processes filling substantial parenchyma space. Subcluster 8 represented cells specific to the pharynx. Subcluster 9 cells were scattered throughout the animal. Subclusters 4 and 16 identified cells with dense aggregated foci of elaborate processes at scattered locations throughout the animal that lacked definitive positions—an unusual and unanticipated cell type distribution. FISH identified markers labeling cell bodies of these cells, revealing that the aggregates comprised many cells (Fig. 6B). Subclusters 12 and 13 also exhibited processes with visible cell bodies. Subcluster 12 cells were largely subepidermal. The most elaborate of these newly identified cells (subclusters 5 and 10) were excluded from the intestine and brain, but had processes around the branches of the intestine and protonephridia and interspersed within the cephalic ganglia (Fig. 6, C to E, and fig. S31, A and B). FISH confirmed nonoverlapping expression patterns for two tested subclusters (fig. S31C).

Fig. 6 Tissue subclustering reveals a previously unidentified class of cells.

(A) t-SNE representation of the cathepsin+ cell subcluster. FISH images are labeled by their associated cluster(s). Images associated with subclusters 5/10 and 8 are single slices in the animal. All other images are maximum intensity projections. (B) Double FISH for two cathepsin+ cell markers enriched in the same subclusters, 4 and 16. (C to E) FISH for dd_9 and mat (C), ChAT (D), and dd_7742 (E). (F) Top: Cell trajectory of dd_1831+ and dd_9+ cathepsin+ cell lineages produced by Monocle. Cells are colored by identity. Bottom: Heat map of branch dependent genes (q value < 10–175) across cells plotted in pseudo-time (41). Cells, columns; genes, rows. Beginning of pseudo-time is at center of heat map. “Cl.” annotation indicates a log-fold enrichment ≥ 1 of the gene in that cathepsin+ cell Seurat cluster. Scale bars: whole-animal images, 200 μm; insets and (B) to (E), 50 μm.

Subcluster 7 of the cathepsin+ group of cells was enriched in expression of genes with expression spanning into clusters 5 and 10 (fig. S32A). Similarly, expression of cluster 2 marker genes spanned into clusters 4 and 16 (fig. S32B). These cells might reflect transition or variant states of cells for clusters 5/10 and 4/16, respectively. SMEDWI-1 protein perdures in neoblast progeny after loss of smedwi-1 mRNA, allowing detection of newly produced neoblast progeny (27). MAP3K5 (dd4849)+ cells, which were predicted to be expressed in cells transitioning from the smedwi-1+ state in the cathepsin+ cell plot, were SMEDWI-1+/smedwi-1, supporting the interpretation that these cells are progenitors in the cathepsin+ cell lineage (fig. S32, C and D). The Monocle toolkit was also used to build single-cell trajectories for these clusters, with data closely recapitulating the transition states identified by Seurat (Fig. 6F, fig. S33, and table S3).

A near complete discovery of planarian cell type transcriptomes

The number of cell types identified in this study vastly exceeded prior planarian SCS data (24). Within the neuronal subclusters, a 17-cell subcluster represented photoreceptor neurons (Fig. 4E), which are present at ~100 total cells in a medium-sized (~2 to 3 mm) animal (20). Therefore, our data should have readily included unknown cell types as rare as photoreceptor neurons. Similarly, an average-sized planarian has ~60 cintillo+ neurons, and our data included 10 cintillo+ neurons (fig. S34A). These cells were grouped within a larger subcluster (cluster 3) of nonciliated neurons (fig. S34B), suggesting that even further subclustering of this “subcluster 3” could reveal additional distinct cell types. Indeed, cintillo+ cells emerged as a unique cluster from such additional (fourth tier) subclustering of original data (fig. S34C and table S2). Esophagus cells, connecting pharynx to intestine, clustered with mouth cells in the pharynx subclustering data (fig. S34D). About 50 of these cells exist in an average-sized animal, and three such cells were present in the data (fig. S34, A and D). Several known rare cell types did not separate into individual clusters, although most could still be identified in the data, which suggests that the data are largely saturated for rare cell types. These include anterior pole cells, which function as an anterior organizer (5153); notum+ neurons in the brain (54); and posterior pole cells (45), each of which are among the rarest known cell types in the animal, with only ~10 each present in an average animal (fig. S34A). Five anterior pole cells, 10 notum+ neurons, and one posterior pole cell were identified in the data (fig. S34, E to G). Similarly, ~25 ovo+ eye progenitors (42) and ~90 nanos+ germ cells (55) are present in an average animal (fig. S34A). Two eye progenitors and 19 germ cells were identified in the data (fig. S34, H and I). In addition to the asexual strain of S. mediterranea used in this study, a sexual strain of cross-fertilizing hermaphrodites exists. We sequenced 8455 cells from this strain, adding sexual strain cells to this resource as well, including seven yolk cells and seven testes cells in addition to the 19 germ cells described above (fig. S35, A to D). Further sequencing of sexual cell types could be a target for future studies. Together, our data indicate that we have essentially reached saturation for determining the cell type transcriptomes of asexual planarians.

Discovery of novel patterning genes

Planarians constitutively express dozens of genes associated with patterning (PCGs) in complex spatial patterns across body axes (18). PCGs are almost exclusively expressed in muscle (19). AP-axis PCGs are well established, including with muscle SCS (56). Muscle cells did not subcluster according to their anatomical positions (Fig. 7A and fig. S36, A to C). However, we reasoned that expression of known PCGs could ascribe locations to muscle cells in the data. Because of variability in the expression of any one PCG, muscle cell regional identity was determined on the basis of expression of at least two PCGs. For example, posterior muscle cells were identified by coexpression of at least two of the four posterior PCGs wnt11-1, wnt11-2, fz4-1, and wntP-2, yielding 163 cells (Fig. 7A and fig. S36D). Differential expression analysis using the algorithm SCDE (57) was performed on these 163 cells against the 4851 other muscle cells (Fig. 7A and table S4). Strikingly, nine of the differentially expressed genes were identified by Scimone et al. (56) as posterior-enriched; eight of these were within the top 26 genes identified by differential expression analysis (Fig. 7B, hypergeometric P = 2.75 × 10–9). A similar analysis on 837 anterior muscle cells was also performed (fig. S36, A and D, and table S4). Nine of the differentially expressed genes were identified by Scimone et al. (56); four of the genes were within the top 25 genes identified by SCDE (Fig. 7B, hypergeometric P = 5.80 × 10–5). We also applied this approach to the less well studied ML axis. We identified 62 lateral muscle cells, and FISH with 15 of the top genes identified seven with lateral muscle expression (Fig. 7C and fig. S36, B and D to G) (58, 59). We isolated 90 medial muscle cells, and the top-ranked gene displayed a striking thin stripe of expression down the dorsal midline (Fig. 7C and fig. S36, C, D, and H). Together, these results demonstrate the power of deep SCS for identifying regional gene expression, such as that involved in patterning, in adult animal tissues.

Fig. 7 Identification of new regionally expressed genes in muscle.

(A) Top: t-SNE plot colored by muscle cells positive for expression ≥ 0.5 [ln(UMI-per-10,000 + 1)] of two of the four posterior PCGs wnt11-1, wnt11-2, fz-4, and wntP-2. Pink, positive cells; gray, negative cells. Bottom: Transcriptomes for posterior muscle cells were compared to all other muscle cells by SCDE. (B) List of differentially expressed genes in posterior and anterior muscle cells that were identified in Scimone et al. (56). Rank indicates the rank of the gene in our analysis. (C) FISH images of one lateral and one medial expressed gene ranked highly in this analysis (59). Number indicates gene rank in the list generated by SCDE. Scale bars: whole-mount images, 200 μm; insets, 50 μm. (D) Illustration highlighting the capacity of the data set to identify almost all cell types in the planarian, as well as specialized neoblast progenitors and novel patterning information from the adult animal.

Discussion

RNA sequencing of >50,000 cells (in total, 66,783 cells were sequenced) of the planarian S. mediterranea allowed the identification of transcriptomes for most to all cell types of an adult animal. This includes transcriptomes for cell types present as rarely as 10 cells in an animal with 105 to 106 cells, which strongly suggests that we have reached near-saturation. Sequencing of different body regions and assessment of rare cell type coverage in an iterative process enabled us to reach this saturation level. Some cell types might escape detection by this technique if they are exceptionally rare or hard to dissociate from the animal. Our data did indicate that some cell types were preferentially recovered according to the abundance of that cell type by FISH, whereas others were less represented (fig. S37, A and B). In particular, prog-1+ epidermal progenitor cells were highly overrepresented in the data relative to their prevalence in the animal, perhaps because their small size made their isolation easier (fig. S37A). Absent prog-1+ cells, most other cell types analyzed were represented similarly to their relative abundance in the animal (fig. S37B). Regardless of differences in ease of dissociation between cell types, we recovered data from all known cell types assessed. Not every known rare cell type emerged as a separable cluster; that is, these cells were sometimes embedded within a larger cluster. In some instances, further rounds of subclustering based on such knowledge resulted in splitting of subclusters into additional subclusters. Therefore, further subclustering analyses and even deeper sequencing will likely continue to enhance the capacity to computationally isolate rare cell types from other clusters. Nonetheless, the transcriptomes for such rare cell types are present in our data and can be studied by searching for the desired cells. Another challenge inherent in assessing saturation of cell type sequencing is ambiguity with the term cell type. Gene expression heterogeneity exists within well-defined clusters and could reflect differences attributable to technical sampling error, cell type state differences, or robust differences in biological function. Further in vivo morphological and functional studies with identified cell clusters, further computational analyses, and even more sequencing data can continue to refine the knowledge of biologically important cell type differences.

Cell types have been previously identified largely through morphological descriptions and perhaps a few marker genes. Determining cell type transcriptomes with large-scale SCS is a powerful new approach to defining the cell type constitution of a tissue, an organ, or even a complete animal. In our study, we identified a large number of previously uncharacterized planarian cell populations across multiple tissues. This included multiple cell populations (in the cathepsin+ group) previously undescribed at the molecular level. One cell population, defined by dd_9 expression, had long processes filling parenchymal space and surrounding, but excluded from, other planarian tissues. This pattern is reminiscent of “fixed parenchymal cells,” a largely uncharacterized cell population described by histology and electron microscopy (EM) (48). Previous EM work suggested that fixed parenchymal cells are likely phagocytic, with clearly observed lysosomes; dd_9+ cells highly expressed genes encoding a variety of digestive enzymes and endocytosis proteins, providing further support for this hypothesis (table S2). The biology of these cathepsin+ cells and all the other diverse cell types identified in this work can now be studied in depth using identified transcriptomes and the tools of planarian biology research. For instance, we show for two case studies above that RNAi of a gene encoding a transcription factor with enriched expression in a candidate cell lineage leads to ablation of the predicted differentiated cell.

Generating transcriptomes for most to all cell types in an animal will be invaluable for studying gene function and the biology and evolution of a large range of important cell types. Because of their phylogenetic position within the Spiralian superphylum (60), major cell types found across diverse bilaterians (e.g., shared between humans and Drosophila, C. elegans, molluscs, annelids, and/or other bilaterians) should have been present in the last common ancestor of planarians and humans. As such, studying the transcriptomes and associated genes with cell type–enriched expression in this data set can allow characterization of the gene function underlying the biology of these cells.

Planarian biology presents many features that made this organism attractive for comprehensive SCS. Planarians are a model for studying numerous important problems in regeneration, stem cell biology, patterning, and evolution. At a single time point—the adult—there exist progenitors for essentially all cell types and the patterning information for guiding new cell type production. We identified the transcriptomes of numerous candidate transition states in lineages from pluripotent stem cell to diverse differentiated cell types. Furthermore, we used the data to identify novel regionally expressed genes in planarian muscle (the site of patterning gene expression). Together, these results illustrate the capacity of our data set to define cell type transcriptomes, identify lineage transition states, and ascertain novel patterning information, all from a single time point (Fig. 7D). We propose that this atlas-like data set of cell type transcriptomes, much like the genome sequence of an animal, can serve as a resource fueling an immense amount of research, not only in planarians but in other bilaterians with similar cell types. To facilitate such study, we developed an online resource that generates cluster expression data, for any gene, across all clusters and subclusters (digiworm.wi.mit.edu). Case study model organisms have proved to be valuable testing grounds for developing approaches to complete genome sequencing; these planarian SCS data demonstrate an approach to near-to-complete cell type transcriptome identification that could be applied broadly to diverse organisms with varying degrees of information about cell type composition. The remarkable ability of single-cell RNA sequencing to reach nearly complete saturation of transcriptome identification for all the cell types of an animal represents a powerful approach for describing the anatomy of complete organisms at the molecular level.

Supplementary Materials

www.sciencemag.org/content/360/6391/eaaq1736/suppl/DC1

Materials and Methods

Figs. S1 to S37

Tables S1 to S5

References (6193)

References

Acknowledgments: We thank M. L. Scimone, C. McQuestion, and K. D. Atabay for their help in the targeted dissociation of tissue from the planarian brain; all Reddien Lab members for valuable comments and discussion; and E. Z. Macosko, M. Goldman, and S. A. McCarroll for making protocols available. Funding: We acknowledge NIH (R01GM080639) support. P.W.R. is an Investigator of the Howard Hughes Medical Institute and an associate member of the Broad Institute of Harvard and MIT. We thank the Eleanor Schwartz Charitable Foundation for support. Author contributions: P.W.R. supervised. C.T.F. and P.W.R. designed experiments and wrote the manuscript. C.T.F., P.W.R., and O.W. analyzed data. C.T.F. and T.H. built and optimized the Drop-seq setup. C.T.F. developed the data processing pipeline. C.T.F. and K.M.K. developed the pipeline for clustering analysis. C.T.F. and P.W.R. performed planarian tissue extractions. C.T.F. performed all other planarian experiments. O.W. generated the online resource. Competing interests: The authors declare no competing interests. Data and materials availability: All raw and processed data files associated with this study have been deposited to Gene Expression Omnibus (GEO) under the accession number GSE111764.
View Abstract

Subjects

Navigate This Article