Review

The promise of spatial transcriptomics for neuroscience in the era of molecular cell typing

See allHide authors and affiliations

Science  06 Oct 2017:
Vol. 358, Issue 6359, pp. 64-69
DOI: 10.1126/science.aan6827

Abstract

The stereotyped spatial architecture of the brain is both beautiful and fundamentally related to its function, extending from gross morphology to individual neuron types, where soma position, dendritic architecture, and axonal projections determine their roles in functional circuitry. Our understanding of the cell types that make up the brain is rapidly accelerating, driven in particular by recent advances in single-cell transcriptomics. However, understanding brain function, development, and disease will require linking molecular cell types to morphological, physiological, and behavioral correlates. Emerging spatially resolved transcriptomic methods promise to fill this gap by localizing molecularly defined cell types in tissues, with simultaneous detection of morphology, activity, or connectivity. Here, we review the requirements for spatial transcriptomic methods toward these goals, consider the challenges ahead, and describe promising applications.

The study of cellular complexity in the nervous system based on anatomy has its roots in the pioneering work of Santiago Ramón y Cajal and his contemporaries. Using Camillo Golgi’s “black reaction” to stain sparse cell populations in histological sections, they identified a diversity of anatomical forms (1), their variation across brain regions, and general conservation across species. Subsequent research has refined and extended these studies with other techniques, yet a clear quantitative understanding of the cell types and circuit motifs that make up most brain regions remains elusive.

Molecular techniques have offered a new way to stratify neuron types by the genes they express, initially with antibodies and subsequently with RNA transcripts. Genome sequencing provided the full complement of protein-coding genes and allowed their systematic mapping, one gene at a time, across the brain with in situ hybridization (2). Transcriptomic methods to simultaneously measure all transcripts by using microarrays and then RNA sequencing subsequently allowed the analysis of gross or finely microdissected regions in mouse, monkey, and human brain (37). These studies illustrated highly stereotyped spatial variation in transcript usage across brain regions and temporal variation across development that recapitulated prior anatomical and developmental work.

However, the analysis of bulk tissue samples obscures the heterogeneity of cell types present even in the smallest brain regions. Single-cell transcriptomics is a major advance combining the information content of the full transcriptome with single-cell resolution, a throughput sufficient to tackle the cellular complexity of complex brain regions, and methods applicable across species, including human. Applications of these methodologies in the brain (815) revealed that cellular complexity is high but finite and generally similar to that observed with morphoelectric classification (16). Individual transcriptomic types are rarely defined by single genes; rather, cell types have distinct combinatorial expression signatures (8, 9). Transcriptomic profiles reflect known cell classes, anatomical location and cytoarchitectonic organization (such as structure and layer), and developmental origin. These observations have suggested that the transcriptome, reflecting the gene-regulatory state responsible for cellular phenotype, could be a central measure and unifying framework for understanding both cell-type diversity and lineage (14, 17).

The success of single-cell transcriptomics suggests that a complete inventory of molecularly defined cell types is now achievable in mouse and human brain, and efforts are under way through the NIH Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative, the European Human Brain Project, and the Human Cell Atlas initiative (18). Such molecular classification has clear appeal because it is both pragmatically achievable and represents the causal genetic foundation of cellular phenotypes. Genetic cell-type signatures can provide cell type–specific genetic tools and inform on gene therapy and molecular diagnostics (Fig. 1). Furthermore, this molecular classification has potential as a genetic scaffold onto which other cellular phenotypes can be attached; a molecular classification may be of limited value until a clear relationship to anatomical or functional cellular properties that have traditionally been used to define cell types is demonstrated.

Fig. 1 Spatial mapping of transcriptomic cell types in healthy and diseased tissues.

(A) Single-cell transcriptome analysis involves tissue dissociation, single cell isolation, and RNA-seq to generate a molecular classification of cell types. Distinct combinatorial gene panels can be identified that are sufficient to discriminate between types. Multiplexed RNA smFISH based on these gene panels can be used to map the spatial locations and proportions of transcriptomic cell types. (B) Spatial distribution phenotypes can be associated with human disease.

A crucial challenge concerns this correspondence problem—the problem of aligning cell types defined by using molecular phenotypes with those defined by morphology, electrophysiology, connectivity, and function (19). One approach has been to measure gene expression after patch clamp slice physiology and biocytin cell filling by using single-cell polymerase chain reaction or RNA sequencing (Patch-seq) on aspirated cell content (20, 21). However, Patch-seq is labor-intensive work requiring highly trained electrophysiologists, and throughput is limited to a few tens of neurons per scientist per day. On the other hand, many techniques have been developed that label and study functional properties of larger numbers of neurons in vitro and in vivo, including sparse genetic cell filling, multipatching, optogenetics, and optical measurements of neuronal activity by using genetically encoded calcium indicators (22). Methods are now needed that additionally identify the transcriptional identity of cells in the same sample to link these measurements to a common cell-type reference (Fig. 2).

Fig. 2 Multiplexed RNA smFISH applications to study correspondence of cellular phenotypes to transcriptomic cell types.

Post-hoc analysis by using spatial transcriptomics after in vitro or in vivo assays allows identification of morphological, physiological, connectional, and functional properties of transcriptomic cell types.

Enormous progress has been made in the development of spatial transcriptomics methods that allow multiplexed analysis of cellular transcripts [reviewed in (23)]. Initial work focused on proof of concept that a high degree of multiplexing can be achieved through serial hybridizations or more complex barcoding schemes. A number of these methods now work on brain tissue sections and can reveal the spatial organization of gene expression and corresponding transcriptional types in tissues (12, 24). Thus, there is clear potential to exploit transcriptional cell-type classifications and map their spatial distributions in tissues, creating censuses of cell types and their developmental trajectories (Fig. 1). However, these methods and their analysis are not yet standardized and validated, and various technical hurdles remain before they can be applied as post hoc assays to thicker tissue sections. A clear articulation of the target applications and their associated challenges will be helpful to define requirements and guide further optimization of these methods. Our Review summarizes the current state of the field and lays out the needs and challenges associated with application of spatial transcriptomics to map spatial distributions and correlate other cellular phenotypes to transcriptomically defined cell types on the basis of dissociated single-cell transcriptomics.

State of the field

The detection of specific gene expression by means of colorimetric or fluorescent in situ hybridization (FISH) in tissue has a long history, although standard methods are limited to a handful of colors. In single-molecule FISH (smFISH), multiple fluorescently labeled oligonucleotide probes are hybridized to the target RNA molecule, which can be imaged as diffraction-limited spots that are counted to determine the expression level of the target gene in its native cellular context. Recently, combinatorial labeling of single RNA molecules (smFISH) has increased the number of genes detected (Fig. 3). smFISH is quantitative, with a near-100% detection sensitivity, and can be applied on tissue sections (2528). In its simplest form, detection of multiple genes is limited by the number of spectrally resolvable fluorophores, but multiple rounds of stripping and rehybridizing can linearly increase the number of detected transcripts (12).

“Thus, there is clear potential to exploit transcriptional cell-type classifications and map their spatial distributions in tissues, creating censuses of cell types and their developmental trajectories.”

Fig. 3 Spatially resolved transcriptomics achieve high levels of multiplexing through multiple rounds of probing, imaging, and stripping.

In additive smFISH, a small number of spectrally resolved probes is used in each round. seqFISH uses temporal barcodes, in which the combination of signal across all cycles is specific to each target. MERFISH uses an error-corrected barcode to reduce the effect of false-positive and -negative signals. In situ sequencing uses padlock probes to target specific mRNAs, with cDNA synthesis and rolling-circle amplification in situ, followed by sequencing by ligation. FISSEQ uses a similar principle, but reverse-transcribes RNA in an unbiased manner.

Multiplexing by barcoding

To increase the number of genes probed with smFISH, combinatorial barcodes can be used. In spatial barcoding, fluorophors targeted to different segments along the length of an RNA are resolved with superresolution microscopy (29). In spectral barcoding, the targets are labeled with a specific combination of fluorophors (29, 30). In temporal barcoding, multiple cycles of smFISH hybridization and stripping repeatedly label the same RNA molecules in a predefined color sequence (31). With temporal barcoding, the number of resolvable targets scales as the number of fluorophors to the power of the number of rounds, as long as the individual molecules are distinguishable.

In contrast to spectral barcodes, temporal barcodes are sensitive to losses (false-negative hybridizations) and tissue movement between wash cycles, which can transform one valid barcode into another or lead to invalid barcodes. Multiplexed error robust FISH (MERFISH) uses more complex redundant barcodes to detect and correct errors (32). MERFISH uses an indirect labeling approach in which nonfluorescent encoding probes are first hybridized to all targets and have two nonbinding tails to which fluorescent readout probes can hybridize. Each gene is assigned a barcode designed as a modified Hamming code (originally used for error-free telecommunication) that can be used to correct errors in the temporal barcode. Reliable identification of 130 transcripts was demonstrated with error-robust coding, and even an impressive 985 transcripts when the error-correcting properties of the code were omitted, although the identification rate dropped to ~25%. More recent adaptations have made MERFISH faster (33) and applicable to tissue sections (34). To overcome difficulties with tissue autofluorescence, tissue-clearing methods were applied after cross-linking of mRNA to a scaffold (3438). One current limitation of MERFISH is that a large number of encoding probes need to fit on the target RNA molecule, so that only long transcripts (>3 kb) can be detected (32).

Similarly, highly multiplexed RNA detection in large areas of the mouse brain was achieved with single-molecule hybridization chain reaction (smHCR), resulting in a 20-fold signal increase over smFISH and overcoming tissue background (24, 39, 40). A hybrid approach, seqFISH, was used to detect either 125 or 249 RNA species in the cortex and hippocampus, where lowly expressed genes were resolved with barcoding and highly expressed genes with serial hybridizations (24).

Sequencing in situ

An alternative to FISH is to sequence RNA directly (Fig. 3). In situ sequencing (4143) uses target-specific padlock probes to create rolling-circle amplification products for a predetermined set of transcripts. In one approach, four barcode nucleotides were introduced in the padlock probe and sequenced in situ to resolve 32 targets in a breast cancer tumor section, showing distinct regionalization in the tumor (41). In contrast, the fluorescent in situ sequencing (FISSEQ) method sequences the amplified cellular RNA content directly. For example, more than 8000 different RNA species were detected across cultured fibroblasts by reading 27 bases of each transcript (42). However, to our knowledge FISSEQ has not been demonstrated on tissue sections, and currently, the number of transcripts detected in each cell is low. It is estimated that a maximum of ~90 products fit in the optical space of a cell, limiting the total number of reads per cell (41).

An alternative approach to in situ sequencing uses patterned microarrays, carrying barcoded oligo-dT primers, to capture the mRNA from a tissue section (44). During cDNA synthesis on the array, barcodes indicating the location of each spot are incorporated into the cDNA, which is pooled and sequenced. This allows each read to be mapped to the correct spatial coordinates, limited only by the microarray spot size and spacing.

Sensitivity and breadth of transcriptome coverage differ greatly among methods. Single-cell mRNA-seq detects between 5 and 40% of all mRNA molecules present in a cell, depending on the technology and sequencing depth. smFISH detects nearly 100% of all mRNA molecules, although for a limited set of targeted genes. In situ barcode sequencing and microarray-based sequencing can capture ~5% of mRNAs. FISSEQ in principle allows a totally unbiased analysis of all cellular transcripts at subcellular resolution but currently has sensitivity as low as 0.01%.

Multiplexing is currently constrained by the limited optical and physical space inside cells. When combinatorial barcodes are used, RNA molecules from multiple genes are imaged simultaneously and thus compete for optical space. When the level of multiplexing increases, molecules start to overlap, and barcodes become difficult to read. However, the physical volume of each cell can be increased with expansion microscopy so that RNA molecules that would belong to the same diffraction-limited spot are pulled apart (35, 45). Expansion microscopy is compatible with multiple rounds of smFISH, and because highly multiplexed methods exist for embedded RNA, we expect that both methods will be unified to study a large number of transcripts (34). However, expansion microscopy also increases the total volume to be imaged and thus reduces imaging throughput.

Technical and computational challenges

The goal of identifying molecularly defined cell types in situ places specific demands on the degree of multiplexing required, the necessary accuracy and sensitivity of measurements, and the scalability of the method for large tissue sections or high-throughput analysis. In the context of the correspondence problem, the goal is to reliably identify the types of cells in situ and align those cells with measurements of other modalities (Fig. 2). This will require advances in tissue preparation, optimized methods for probe selection, computational tools, and imaging scale.

Tissue preparation

Morphology, electrophysiology, and connectivity are usually studied in thick tissue blocks in vitro, whereas studying behavioral responses requires in vivo analysis. We need to reliably correlate such measurements with post hoc spatial transcriptomic analysis on the same tissues. One strategy is to image these thicker tissue blocks to record the position (and potentially morphology) of every recorded cell and then resection them to thin sections. Staining the resectioned tissue could determine transcriptomic identities of recorded neurons by means of multiplexed in situ profiling followed by alignment of the stained tissue sections with the original recorded volume. This alignment problem has already been solved for aligning electron microscopy reconstructions after resectioning following in vivo calcium imaging studies (46, 47). A related challenge is the combination of smFISH with antibody labeling in order to delineate cell membranes for cell segmentation and accurate assignment of signal to cells, as well as to identify genetic labeling in the cells of interest.

Alternatively, thick slices can be cleared and imaged either with confocal imaging or light sheet microscopy. Unamplified in situ transcriptomic methods are then unsuitable because the faint signal can only be imaged with epifluorescent microscopes, imposing a maximum imaging depth of ~30 μm owing to the short working distance of high-magnification objectives (38) and the inability to reject out-of-focus signal. However, single-molecule detection after amplification was demonstrated up to 500 μm (35, 40). Multiple rounds of labeling are possible (34), enabling barcoding strategies, but require more time for reagents to penetrate thick samples, which will be especially challenging for enzymatic-based methods such as in situ sequencing and FISSEQ.

Autofluorescence can render some parts of the spectrum unusable. In particular, in human brain the accumulation of strongly autofluorescent pigmented lipid granules called lipofuscin can block fluorescent signal across a wide spectrum. Lipofuscin tends to be unevenly distributed among cell types and brain regions and can therefore also cause misleading underrepresentation of particular classes of cells. Lipofuscin is difficult to remove physically, but its fluorescence can be partially quenched (for example, using Sudan Black dye); however, it remains an issue especially for unamplified smFISH, in which the true fluorescent signal is weak.

Optimal gene selection for combinatorial analysis

The first applications of single-cell RNA-seq to survey transcriptomic cell types in mouse brain help bound the problem of cellular diversity. For example, 49 and 47 cell types were identified in the mouse visual and somatosensory cortex (8, 9), 25 types were identified in the human developing ventral midbrain (12), 62 and 45 were identified in the mouse hypothalamus (48, 49), 16 neuron types were identified in the human cortex (14), and 39 types were identified in the mouse retina (11). Deeper sampling and methodological improvements will undoubtedly increase diversity but are unlikely to do so by orders of magnitude. Extrapolating from these observations, it seems likely that the adult mammalian brain contains up to a few thousand molecularly distinct cell types, although there are certain to be additional variables such as spatial gradients and cell state–dependent signatures. Furthermore, during brain development, cellular differentiation unfolds through dynamic intermediate states with complexity that probably rivals the adult. To distinguish among these types and states in situ by using a limited number of genes requires careful probe selection based on high-quality reference data sets (atlases) of brain region– and developmental stage–matched single-cell RNA-seq data [a recent discussion between Cai and Cembrowski highlights this point (24, 50)]. Cell types can usually not be defined by a single specific marker, and a combinatorial approach in which genes are more broadly expressed among more cell types but with high selectivity for those types is more efficient. If the type space can be successively bisected, only log2(n) genes are needed to distinguish n cell types, whereas a panel of cell type–specific markers would need at least n genes. Furthermore, the natural hierarchy in cell types will facilitate the construction of a combinatorial probe set. Although in practice optimal bisection might not possible and technical contraints restrict the gene selection, a panel of several hundred genes may be sufficient to detect and identify most brain cell types.

Computational challenges

The computational analysis of spatially resolved transcriptomics data remains in its infancy (51). With its origin in microscopy, FISH and related methods were initially analyzed as images, relying on human interpretation. However, to realize the full potential of spatial methods, more recent work uses automatic workflows to transform images into computable data.

A key step is feature detection, the process of discovering relevant spatial objects such as cells, nuclei, vessels, and tissue borders. Although there are effective feature-detection algorithms for dispersed cells in culture, tissues and tissue sections are much more challenging. A promising avenue of research is the use of deep neural networks for feature detection and classification, especially for the discovery of complex higher-order spatial features beyond cell bodies, such cell-cell interaction motifs, subcellular localization of mRNAs, or transport into axons. Deep learning has been highly successful at image analysis tasks in other domains, including for unsupervised feature discovery (52), and is beginning to have an impact in genomics (53).

Last, new computational methods are needed for making probabilistic assignments of transcriptomic types to each cell analyzed with smFISH, similar to those demonstrated for tissue-level expression patterns (54, 55). Ideally, algorithms will be developed that can simultaneously design an optimal set of genes to target based on reference transcriptome atlases, and use those genes to call cell types in situ. These will likely need to be tailored to each specific method and application so as to account for the distinct technical properties of these data sets such as detection sensitivity and error rate, as well the number of genes and redundancy needed to make high-confidence cell-type calls.

Scaling to the size of the brain

The main limitation for applying multiplexed in situ techniques on full tissue sections is imaging speed. Especially for unamplified smFISH, imaging demands a high optical magnification and the acquisition of Z stacks because of the thin focal plane. Acquisition can be optimized to improve throughput, and signal amplification enables imaging with lower-magnification objectives, but even then, larger samples such as human brain sections are a challenge. Therefore, microscopy methods need to be developed that can image at high resolution while retaining high spatial throughput.

Applications

Once these technical challenges have been solved, spatial transcriptomic methods are well poised to make major contributions to a number of topics even more challenging than normal adult cellular brain diversity. Three fields in particular seem well suited to benefit from the use of spatial methods with cell-type discriminatory power: dynamic processes of development, complexities of neuronal connectivity, and alterations in tissue organization in disease.

Spatial transcriptomics of brain development

Development is governed by three key processes: the creation of shape by morphogenesis and migration, the creation of function by cell differentiation, and the interplay of form and function through patterning and cell-cell interaction. Understanding this incredibly complex sequence of events will require methods that are both scalable (to the size of the developing brain), comprehensive, and spatially resolved. Although differentiation lineages can be inferred to some extent from single-cell RNA-seq data (56), spatial methods will be required to place differentiation in the proper context of morphogen gradients, cell migration, and cell-cell interactions. On the other hand, the spatial dimension provides useful constraints that can be used to help infer lineages. For example, smFISH was used to show embryonic radial glia subtype compartmentalization in multiple developmental stages of the mouse ventral midbrain (12). There is thus an opportunity for computational advances that combine single-cell RNA-seq with spatial transcriptomic information—both location and environment—for lineage inference.

“However, the analysis of bulk tissue samples obscures the heterogeneity of cell types present even in the smallest brain regions.”

However, both single-cell RNA-seq and spatial transcriptomics are limited to providing snapshots of the current state of cells. Recently, a general strategy was described for recording past cellular events, called MEMOIR (memory by engineered mutagenesis with optical in situ readout) (57). The system is based on a set of barcoded recording elements, termed scratchpads, which can be irreversibly marked via CRISPR/Cas9–targeted mutagenesis. By inducing specific guide RNAs in response to cellular events, those events can be recorded and preserved for readout later in the cell’s history. This system can be used to record physical lineage information but more generally to preserve a record of signals and perturbations experienced by a cell during development. MEMOIR scratchpads are detected in situ by use of smFISH and can be combined with probes for cellular identity and other molecular phenotypes.

Ultimately, a systematic effort to map the developing brain by means of single-cell RNA-seq and spatial transcriptomics will enable the discovery of key principles of brain development and contribute to understanding the origins of developmental neurological disease.

Spatial transcriptomics of neuronal connectivity

Brain function is fundamentally determined by the patterns of connectivity between neurons, often at long distances from cell body location. Many studies have explored the developmental molecular logic of early events in pathfinding and target selection, whereas fine-tuning of synaptic connections is generally considered to be an activity-dependent process. However, our understanding of cell type–specific patterns of connectivity—or even cell type–specific patterns of long-range projection target specificity—remains limited. Analysis of single-cell transcriptomics data suggests that connectivity will vary between types because cell types express complex and distinct combinatorial sets of genes associated with connectivity, such as adhesion molecules. Therefore, the molecular classification of cell types may dramatically assist understanding of detailed wiring diagrams, provided that connectivity studies can be linked to transcriptomics.

Spatial transcriptomics as a post hoc assay provide a means to characterize the connectional properties of transcriptomic types. New technologies are allowing the tracing of axonal projections from single neurons across the entire brain, whose molecular identity could subsequently be identified. Similarly, trans-synaptic rabies-based labeling from single- or defined-cell populations identifies afferent cellular inputs whose molecular identity could be established. The traditional approach to studying neuronal connectivity is paired (or higher-order) patch clamp electrophysiology. For example, Markram et al. (16) extensively used this approach to measure connectivity among different cortical cell types, and a recent study from Jiang et al. (58) used an octopatch approach to identify recurrent motifs between cortical cell types. Spatial transcriptomics methods coupled to such an approach would identify each measured cell and allow the generation of a highly detailed connectivity matrix among types. The same approach, with technical modifications suited to specific tissue preparations, could be applied to a variety of other methods for studying connectivity from tractography to ultrastructure by using techniques ranging from tracers to array tomography (59).

Spatial methods also invite exploration of novel genetic strategies that exploit the power of DNA barcoding for higher-throughput connectomic analysis. One such strategy, BOINC (Barcoding Of Individual Neuronal Connections) (60), relies on the generation of random barcodes in single cells and physically joining barcodes in adjacent cells connected by synapses. A variation of this idea but based on injection of barcoded viral particles [multiplexed analysis of projections by sequencing (MAPseq)] (61) can be used to map thousands of long-range projections but not link them to molecular cell identities or to specific target cells. An imaging-based approach, FISSEQ-BOINC (62), envisaged direct in situ sequencing of barcodes located in the pre- and postysynaptic compartments across a synaptic cleft. Although presently none of these strategies link connectivity to molecular identity, this could be achieved in principle through simultaneous detection of gene expression in the pre- and postsynaptic cells. However, any mapping of complete circuits would require imaging very large volumes, increasing the imaging throughput challenge.

Spatial transcriptomics of brain disease

Although the greatest short-term scientific impact of spatial transcriptomics may be on the fundamental question of brain architecture, the greatest long-term societal impact may instead come from applying integrated spatial transcriptomics to human neurological disease. Most brain disorders are either developmental or degenerative, and in both cases, the spatial organization of the diseased brain is of fundamental importance. With a detailed understanding of brain cell types, and their localization, comes an opportunity to develop new molecular pathology to diagnose and understand brain disease. For example, a type of disease-associated activated microglia was recently discovered by using single-cell RNA-seq in a mouse Alzheimer’s disease model, and the authors used spatial methods to show that these activated microglia are specifically located near β-amyloid plaques (Fig. 1B) (63). This immediately suggests a plausible mechanism for the origin of Alzheimer’s disease in humans (insufficient microglia activation) and an avenue for targeted treatment with microglia-activating drugs.

Many brain diseases similarly show strong spatial organization, including multiple sclerosis (white matter lesions) and Parkinson’s disease (Lewy bodies). Brain tumors, in particular, are complex cellular environments, and single-cell RNA-seq has been used to show how multiple parallel tumor clones are intermingled with normal neurons and glia as well as activated immune and vascular cells (6466). To fully understand the interplay of cell-cell interactions and complex differentiation in tumors, spatial transcriptomics will be an invaluable contribution.

Transcriptomics has provided a new paradigm for understanding the nervous system in terms of its genetically defined cell types. Spatial transcriptomics methods will contribute to a deep understanding of cell types and neuronal circuitry in developing, adult, and diseased brains, provided that technical hurdles can be overcome to make them broadly applicable in combination with other analytical methods. The clear definition of specific-use cases and associated requirements, and standardization of experimental and analytical methods, will help this field realize its remarkable potential for discovery.

References and Notes

  1. Acknowledgments: The authors thank the Knut and Alice Wallenberg Foundation, the Wellcome Trust, and the Swedish Foundation for Strategic Research for their support (S.L. and L.E.B.) and the Allen Institute for Brain Science founders, P. G. Allen and J. Allen, for their vision, encouragement, and support (E.L.).
View Abstract

Navigate This Article