Research Article

Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis

See allHide authors and affiliations

Science  01 Jun 2018:
Vol. 360, Issue 6392, eaar3131
DOI: 10.1126/science.aar3131

Mapping the vertebrate developmental landscape

As embryos develop, numerous cell types with distinct functions and morphologies arise from pluripotent cells. Three research groups have used single-cell RNA sequencing to analyze the transcriptional changes accompanying development of vertebrate embryos (see the Perspective by Harland). Wagner et al. sequenced the transcriptomes of more than 90,000 cells throughout zebrafish development to reveal how cells differentiate during axis patterning, germ layer formation, and early organogenesis. Farrell et al. profiled the transcriptomes of tens of thousands of embryonic cells and applied a computational approach to construct a branching tree describing the transcriptional trajectories that lead to 25 distinct zebrafish cell types. The branching tree revealed how cells change their gene expression as they become more and more specialized. Briggs et al. examined whole frog embryos, spanning zygotic genome activation through early organogenesis, to map cell states and differentiation across all cell lineages over time. These data and approaches pave the way for the comprehensive reconstruction of transcriptional trajectories during development.

Science, this issue p. 981, p. eaar3131, p. eaar5780; see also p. 967

Structured Abstract


During embryogenesis, pluripotent cells gradually become specialized and acquire distinct functions and morphologies. Because much of the specification process is controlled through changes in gene expression, the identification of the transcriptional trajectories underlying cell fate acquisition is paramount to understanding and manipulating development.


Traditional approaches have studied specific fate decisions by analyzing the transcription of a few selected marker genes or by profiling isolated, predefined cell populations. The advent of large-scale single-cell RNA sequencing (scRNA-seq) provides the means to comprehensively define the gene expression states of all embryonic cells as they acquire their fates. This technology raises the possibility of identifying the molecular trajectories that describe cell fate specification by sampling densely during embryogenesis and connecting the transcriptomes of cells that have similar gene expression profiles. However, the numerous transcriptional states and branch points, as well as the asynchrony in developmental processes, pose major challenges to the computational reconstruction of developmental trajectories from scRNA-seq data.


We generated single-cell transcriptomes from 38,731 cells during early zebrafish embryogenesis at high temporal resolution, spanning 12 stages from the onset of zygotic transcription through early somitogenesis. We took two complementary approaches to identify the transcriptional trajectories in the data. First, we developed a simulated diffusion-based computational approach, URD, which identified the trajectories describing the specification of 25 cell types in the form of a branching tree. Second, we identified modules of coexpressed genes and connected them across developmental time. Combining the reconstructed developmental trajectories with differential gene expression analysis uncovered gene expression cascades leading to each cell type, including previously unidentified markers and candidate regulators. Combining these trajectories with Seurat, which infers the spatial positions of cells on the basis of their transcriptomes, connected the earlier spatial position of progenitors to the later fate of their descendants.

Inspection of the developmental tree led to new insights about molecular specification in zebrafish. For example, the first branch point in the tree indicated that the first molecular specification event may not only separate the germ layers but also define the axial versus nonaxial mesendoderm. Additionally, some developmental branch points contained intermediate cells that expressed genes characteristic of multiple downstream cell fates. Gene expression analysis at one such branch point (the axial mesoderm) suggested that the intermediate cells switch their specification from one fate (notochord) to another (prechordal plate). Last, analysis of single-cell transcriptomes from a Nodal-signaling mutant revealed that even at the whole-transcriptome level, mutant cells were canalized into a subset of wild-type states and did not adopt any transcriptional states not observed in wild type, despite abnormal developmental signaling.


These findings reconstruct the gene expression trajectories during the embryogenesis of a vertebrate and highlight the concurrent canalization and plasticity of cell type specification. The scRNA-seq data and developmental tree provide a rich resource for future studies in zebrafish: The raw and processed data and the URD software are available for download, and the data can be browsed interactively online. Last, this approach provides a broadly applicable framework with which to reconstruct complex developmental trajectories from single-cell transcriptomes.

Developmental tree of early zebrafish embryogenesis.

Single-cell transcriptomes were generated from zebrafish embryos at 12 developmental stages (six of which are shown). The transcriptional trajectories that describe the fate specification of 25 cell types were reconstructed from the data. Molecular specification is visualized with a force-directed layout, in which each cell is represented by a point (colored by developmental stage), proceeding from pluripotent cells (at the bottom center) outward to 25 distinct cell types. A subset of the identified trajectories are labeled in groups.


During embryogenesis, cells acquire distinct fates by transitioning through transcriptional states. To uncover these transcriptional trajectories during zebrafish embryogenesis, we sequenced 38,731 cells and developed URD, a simulated diffusion-based computational reconstruction method. URD identified the trajectories of 25 cell types through early somitogenesis, gene expression along them, and their spatial origin in the blastula. Analysis of Nodal signaling mutants revealed that their transcriptomes were canalized into a subset of wild-type transcriptional trajectories. Some wild-type developmental branch points contained cells that express genes characteristic of multiple fates. These cells appeared to trans-specify from one fate to another. These findings reconstruct the transcriptional trajectories of a vertebrate embryo, highlight the concurrent canalization and plasticity of embryonic specification, and provide a framework with which to reconstruct complex developmental trees from single-cell transcriptomes.

During embryogenesis, a single totipotent cell gives rise to numerous cell types with distinct functions, morphologies, and spatial positions. Because this process is primarily controlled through transcriptional regulation, the identification of the transcriptional states underlying cell fate acquisition is paramount to understanding and manipulating development. Previous studies have presented different views of cell fate specification. For example, artificially altering transcription factor expression (such as in reprogramming) has revealed remarkable plasticity of cellular fates (13). Conversely, classic embryological studies have indicated that cells are canalized to adopt perduring fates separated by epigenetic barriers. Technological limitations necessitated that traditional embryological studies focus on specific fate decisions with selected marker genes, but the advent of single-cell RNA sequencing (scRNA-seq) raises the possibility of fully defining the transcriptomic states of embryonic cells as they acquire their fates (48). However, the large number of transcriptional states and branch points, as well as the asynchrony in developmental processes, pose major challenges to the comprehensive identification of cell types and the computational reconstruction of their developmental trajectories. Pioneering computational approaches to uncover developmental trajectories (57, 911) were either designed to address stationary or steady-state processes or accommodate only small numbers of branch points and thus are insufficient for addressing the complex branching structure of time-series developmental data. We addressed these challenges by combining large-scale single-cell transcriptomics during zebrafish embryogenesis with the development of a new simulated diffusion-based computational approach to reconstruct developmental trajectories, called URD (named after the Norse mythological figure who nurtures the world tree and decides all fates).

High-throughput scRNA-seq from zebrafish embryos

We profiled 38,731 cells from 694 embryos across 12 closely spaced stages of early zebrafish development using Drop-seq, a massively parallel scRNA-seq method (12). Samples spanned from high blastula stage (3.3 hours postfertilization, just after transcription from the zygotic genome begins), when most cells are pluripotent, to six-somite stage (12 hours postfertilization, shortly after the completion of gastrulation), when many cells have differentiated into specific cell types (Fig. 1A and table S1). In a t-distributed stochastic neighbor embedding (tSNE) plot (13) of the entire dataset based on transcriptional similarity, it is evident that developmental time was a strong source of variation in the data, but the underlying developmental trajectories were not readily apparent (Fig. 1B). Consistent with the understanding that cell types become more transcriptionally divergent over time, cells from early stages formed large continuums in the tSNE plot, whereas more discrete clusters emerged at later stages (Fig. 1C).

Fig. 1 Generation of a developmental specification tree for early zebrafish embryogenesis using URD.

(A) Single-cell transcriptomes were collected from zebrafish embryos at 12 developmental stages (colored dots) spanning 3.3 to 12 hours postfertilization (hpf). (B) tSNE plot of the entire data, colored by stage [as in (A)]. Developmental time is a strong source of variation, and the underlying developmental trajectories are not immediately apparent. (C) tSNE plot of data from two stages (top, 50% epiboly; bottom, six-somite). Clusters are more discrete at the later stage. (D) URD’s approach for finding developmental trajectories. 1: Transition probabilities are computed from the distances between transcriptomes and used to connect cells with similar gene expression. 2: From a user-defined “rootuu” (such as cells of the earliest time point), pseudotime is calculated as the average number of transitions required to reach each cell from the root. 3: Trajectories from user-defined “tips” (such as cell clusters in the final time point) back to the root are identified by simulated random walks that are biased toward transitioning to cells younger or equal in pseudotime. 4: To recover an underlying branching tree structure, trajectories are joined agglomeratively at the point where they contain cells that are reached from multiple tips. 5: The data are visualized with a force-directed layout based on cells’ visitation frequency by the random walks from each tip. (E) Force-directed layout of early zebrafish embryogenesis, optimized for 2D visualization (supplementary materials, materials and methods, fig. S2, and movie S1), colored by stage [as in (A)] with terminal populations labeled. EVL, enveloping layer; P, placode; Adeno., adenohypophyseal; Trig., trigeminal; Epi., epibranchial; Panc.+Int., pancreatic + intestinal; RBI, rostral blood island; ICM, intermediate cell mass. (F to I) Cell populations downstream of early and intermediate branch points recovered by URD.

URD reconstructs complex branching developmental trajectories

Acquisition of many single-cell embryonic transcriptomes with high temporal resolution created the possibility of reconstructing developmental trajectories through similarity in gene expression profiles. Such an approach would allow the investigation of the gene expression dynamics and the timing of molecular specification—;when progenitor populations become transcriptionally distinct from each other and begin to express the regulators that will drive their future fates. Therefore, we developed URD, an approach to uncover complex developmental trajectories as a branching tree. URD extends diffusion maps [originally presented for single-cell differentiation analysis in pioneering work by Haghverdi et al. and their R package, destiny (9, 10)] through several advances: It introduces a new way to order cells in pseudotime, finds developmental trajectories, discovers an underlying branching tree that abstracts specification, and visualizes the data (Fig. 1D and supplementary materials, materials and methods).

In general, URD operates by means of “simulating diffusion,” using discrete random walks and graph searches to approximate the continuous process of diffusion (supplementary materials, materials and methods). Briefly, URD constructs a k-nearest-neighbor graph between transcriptomes in gene expression space; graph edges are assigned transition probabilities that are used as weights in later simulations and describe the chance a random walk would move along each edge (Fig. 1D, 1) (9, 10). The user identifies the root(s) (starting points) and tips (end points) of the developmental process. Cells are next assigned a pseudotime—;an ordering that should reflect their developmental progress rather than absolute time—;in order to compensate for developmental asynchrony. URD calculates pseudotime by simulating diffusion from the root to determine each cell’s distance from the root (as the average number of diffusive transitions needed to reach it across several simulations) (Fig. 1D, 2). Next, the developmental trajectory (the path in gene expression) between each tip and the root is determined by identifying which cells are visited by simulated biased random walks initiated in that tip; the walks are biased to only transition to cells of equal or earlier pseudotime so that when they reach developmental branch points, they proceed toward the root and do not explore other cell types (Fig. 1D, 3). Then, URD reconstructs a branching tree structure by joining pairs of trajectories where they pass through the same cells (Fig. 1D, 4; black and purple edges, for example). Last, the data are visualized with a force-directed layout based on cells’ visitation frequency by the random walks from each tip (Fig. 1D, 5) (14). The developmental trajectories identified with URD are akin to cell lineages but differ from classical definitions of cell lineage because they are reconstructed from observed gene expression and do not measure mother-daughter relationships between cells. URD does not require any prior knowledge of the developmental trajectories it seeks to find (such as the number of branch points or definition of intermediate states).

Fig. 2 Developmental trajectories, genes, and connected gene modules overlaid on the force-directed layout.

From top to bottom, (i) the trajectories identified by URD from the root to a given population (or group of populations), (ii) gene expression of a classical marker of that population, and (iii) expression of a six-somite gene module active in the population and its connected modules from earlier stages. The remainder are presented in fig. S3.

Fig. 3 Association of developmental trajectories with temporal gene expression patterns.

(A) The underlying branching structure found by URD. Pink bars demarcate collections of cell types used in Fig. 4A. (B) The structure of connected gene modules. Each circular node represents a module and is colored by the developmental stage from which the module was computed (as in Fig. 1A). Blue bars demarcate collection of modules downstream of each 50% epiboly (5.3 hpf) gene module used in Fig. 4B. (C) Gene expression cascades during specification of the prechordal plate and notochord. Expression is displayed as a moving-window average in pseudotime (along the x axis), scaled to the maximum observed expression. Selected genes are labeled along the y axis. Genes are annotated with whether they were identified as a differentially expressed gene, as a top ranking member of a differentially expressed connected gene module, or both. Cascades for all trajectories (with all genes labeled) are presented in fig. S5.

Fig. 4 Molecular specification maps relate cell position at 50% epiboly to cell fate at six-somite.

(A) Visitation by random walks from given tip(s) (as proportion of visitation from all tips) and the spatial location of visited 50% epiboly cells (ventral side to the left). The six tip groups are marked in Fig. 3A. (B) Spatial expression of 50% epiboly gene modules. Expression of connected gene modules is plotted on the force-directed layout to highlight populations that will emerge from the 50% epiboly module’s expression domain. The six groups of connected gene modules are marked in Fig. 3B.

Fig. 5 Characterization of Nodal signaling mutant with scRNA-seq and the developmental specification tree.

(A) Spatial assignment of wild-type and MZoep transcriptomes according to a wild-type landmark map indicates an absence of wild-type marginal fates in MZoep (ventral, left). Shown at random are 311 wild-type transcriptomes (to match MZoep cell number). (B) (Top) wild-type expression domain of spatially restricted gene modules identified in Smart-seq data (ventral to left). (Bottom) Violin plot of the maximum-scaled gene module levels in wild-type and MZoep mutant cells. The marginal dorsal, dorsal, and marginal gene modules are absent or strongly reduced in MZoep. (C) Expression of gene modules connected to those missing in MZoep (marginal dorsal, dorsal, and marginal; red) and connected to those remaining in MZoep (blue). (D) Hierarchical clustering of wild-type and MZoep mutant transcriptomes, based on the scaled expression of gene modules. Number of clusters is determined by the Davies-Bouldin index. Genotype is indicated beneath the heatmap (wild type, green; MZoep, red). Clusters 3 and 8 contain only wild-type cells. All other clusters contain a mixture of wild-type and MZoep cells. This clustering analysis was sufficiently sensitive to detect computationally simulated altered states (fig. S10).

Reconstructed developmental tree recapitulates molecular specification during zebrafish embryogenesis

Application of URD to the early zebrafish embryogenesis scRNA-seq data generated a tree whose branches reflected embryonic specification trajectories. To define the final cell populations (the tips of the tree), we clustered cells from the final stage of our time course and determined cluster identity through the expression of known marker genes (fig. S1). The recovered tree followed the specification of 25 final cell populations across 16 branch points (Fig. 1E, fig. S2, and movie S1). The reconstructed tree substantially recapitulated the developmental trajectories expected from classical embryological studies (15–;17). For example, the primordial germ cells, the enveloping layer cells (EVLs), and the deep-layer blastomeres already formed separate trajectories by high stage (Fig. 1F). Unexpectedly, the first branch point within the blastoderm not only separated the ectoderm from the mesendoderm but also divided the axial mesoderm from the remainder of the mesendoderm (Fig. 1G). Later branching events were also recovered, such as the separation of paraxial, lateral, and intermediate mesoderm (Fig. 1H); the separation of the non-neural and neural ectoderm (Fig. 1I); and the eventual branching of the non-neural ectoderm into epidermis, neural plate border, and multiple preplacodal ectoderm trajectories (Figs. 1E and 2, trajectory, and fig. S3). Displaying the expression of classic marker genes on the developmental tree highlighted expected trajectories and confirmed their annotation (Fig. 2, gene expression, and fig. S3). For example, consistent with its known expression, the notochord marker gene noto was restricted primarily to two trajectories (the entire notochord trajectory and the later stages of the tailbud trajectory) and confirmed that the branch point between the notochord and prechordal plate was correctly placed (18). These results show that URD can reconstruct the highly complex branching trajectories of early zebrafish embryogenesis solely on the basis of large-scale scRNA-seq data.

Connected gene modules support reconstructed developmental trajectories

To complement the specification tree, and in order to find groups of genes that are coexpressed within cell populations, we applied non-negative matrix factorization (NMF) to the single-cell transcriptomes (19). This approach produced modules of covarying genes and described cells in terms of module expression (which is more robust than individual gene expression measurements) (4, 20). Modules were annotated post hoc according to their highly ranked classic marker genes, and similar modules from adjacent stages were linked to each other according to the overlap of their most highly ranked genes. This approach created chains of connected gene modules that provided an alternative way to track developmental trajectories (fig. S4 and table S2). For example, the prechordal plate chain of connected gene modules extended from 50% epiboly to six-somite stage, during which the top ranking genes gradually changed from early to late markers for the prechordal plate (table S3).

The URD-generated developmental tree and the chains of connected gene modules provided two different ways to analyze the scRNA-seq data and define developmental trajectories. To determine how congruent these approaches are, we highlighted cells in the developmental tree according to their expression of connected gene modules. Cells that express connected gene modules occupied specific URD-recovered developmental trajectories, further supporting the structure of the developmental tree reconstructed by URD (Fig. 2, module expression, and fig. S3).

Gene cascades reveal expression dynamics along developmental trajectories

Gene expression and gene module analysis were combined in order to find candidate regulators and markers along each trajectory uncovered with URD. Genes and connected gene modules were associated with developmental trajectories by testing for their differential expression downstream of each branch point of URD’s recovered branching structure (Fig. 3, A and B; and supplementary materials, materials and methods). Gene expression dynamics were then fit with an impulse model (21) so as to determine the onset and offset time of their expression, which was then used to order genes. As an example, sequential expression was observed for 197 genes enriched in the prechordal plate during its specification, including several well-known transcription factors or signaling molecules that confirm the validity of our approach (such as gsc, foxa3, klf17, and frzb) (Fig. 3C and figs. S5 to S7) (2225). This cascade contained several regulatory factors (such as fzd8a, fzd8b, mllt1b, and inhbaa) without described roles in the prechordal plate that would be candidates for reverse genetic screens. This cascade also contained more than 40 genes that were not previously annotated as associated with the prechordal plate (fig. S6), and those tested by means of in situ hybridization were indeed expressed in the prechordal plate (fig. S7). Thus, combining URD and gene module analysis uncovered the transcriptional cascades that accompanied the development of progenitors into differentiated cells and highlighted both previously characterized and newly identified trajectory-enriched genes.

Combining developmental trajectories with spatial analysis infers progenitor locations

The URD-generated tree is a powerful way to visualize developmental trajectories but lacks spatial information. We therefore asked whether the trajectories could be traced to their spatial origin at the late blastula stage (16, 17). First, a spatial map of the Drop-seq 50% epiboly transcriptomes was generated by using Seurat, a method we previously developed to infer the spatial locations of single-cell transcriptomes by comparing the genes expressed in each transcriptome with the spatial expression patterns of a few landmark genes obtained from RNA in situ hybridization (20). Second, Seurat’s spatial map was combined with either URD or connected gene module analysis (as parallel, independent approaches) in order to associate cell populations at six-somite stage with the location of their “pseudoprogenitors” at 50% epiboly. In one approach, we used URD’s simulated random walks from cell populations at six-somite (Fig. 3A, pink bars) to infer their pseudoprogenitor cells and then plotted the spatial location of the 50% epiboly pseudoprogenitors using Seurat (Fig. 4A). In the other approach, we plotted the spatial expression of each 50% epiboly gene module and identified its connected gene modules at later stages (Fig. 3B, blue bars on right); the cells that express these connected gene modules are highlighted on the developmental tree (Fig. 4B) and are the “pseudodescendents” that arise from the spatial domain identified by the 50% epiboly gene module.

The results from the two approaches were highly concordant and agreed with classic fate-mapping experiments (16, 17). For example, both approaches associated the dorsal margin of the 50% epiboly embryo with axial mesodermal fates (prechordal plate and notochord). Likewise, in both cases the animal pole was associated with ectodermal fates, with the ventral animal side biased to non-neural fate and the dorsal animal side to neural fate. Overall, these results show that scRNA-seq data can be used not only to reconstruct specification trajectories and their associated gene cascades but also to connect the earlier spatial position of progenitors to the later fate of their descendants.

Nodal signaling mutant cells are canalized into a subset of wild-type transcriptional states

Previous studies of mutant embryos have provided important insights into embryonic fate specification, but scRNA-seq raises the possibility of rapidly phenotyping mutants both genome-wide and at single-cell resolution. To test this idea, we profiled maternal-zygotic one-eyed pinhead mutants (MZoep), which lack the coreceptor for the mesendoderm inducer Nodal (26, 27). We first asked whether previous MZoep embryological results could be reconstructed simply from scRNA-seq data. Transcriptomes were generated with Smart-seq2 (28), with deeper mRNA coverage than that of Drop-seq (fig. S8); 325 MZoep and 1047 wild-type transcriptomes were collected at 50% epiboly, when Nodal signaling is normally active at the blastoderm margin. Using Seurat, we inferred the spatial origin of both wild-type and MZoep transcriptomes based on a wild-type landmark map (Fig. 5A). As predicted, no MZoep cells mapped to the margin of a wild-type embryo, where mesendodermal progenitors arise. Applying NMF revealed that in MZoep mutants, expression of the marginal dorsal, dorsal, and marginal gene modules is greatly reduced or absent, but no mutant-specific modules were found (Fig. 5B). NMF recovered the same 50% epiboly gene modules from Smart-seq and Drop-seq data (Fig. 5B and table S4), which allowed us to use the chains of connected Drop-seq gene modules to predict the MZoep mutant phenotype at later stages. Namely, our approach successfully predicted the loss of paraxial mesoderm, ventrolateral mesoderm, axial mesoderm, and endoderm (Fig. 5C), as well as the continued presence of the tailbud, as found in previous embryological studies (26). Analysis of additional single-cell transcriptomes from wild-type and MZoep mutants at six-somite stage largely verified our predictions (fig. S9 and table S5). Together, these results show that combining modest-scale scRNA-seq in mutants with the large-scale developmental reference tree constructed for wild type can rapidly provide phenotypic insights: In the absence of Nodal signaling, marginal blastomeres that would normally become mesendodermal progenitors instead become ectodermal and tail progenitors, resulting in the absence of mesendodermal cell types and an altered fate map, as shown in previous cell-tracing experiments (27).

We next asked whether the mutant scRNA-seq dataset could provide novel insights into cell identity in the absence of Nodal signaling. Morphological analysis and fate mapping of MZoep mutants has found that all cells appear to adopt wild-type fates (26, 27). However, intricate interactions exist between the several signaling pathways active during early development, and the elimination of Nodal signaling changes levels and domains of other developmental signals in the embryo. Thus, MZoep mutant cells could potentially perceive signaling input combinations that do not occur in wild-type embryos, which may result in gene expression states not observed in wild type. We therefore wondered whether mutant cells expressed previously unknown combinations of gene modules under this altered signaling landscape or were transcriptionally equivalent to a subset of wild-type states. Coclustering of wild-type and MZoep transcriptomes according to their gene module expression revealed that while some mesendodermal cell types were absent in MZoep at 50% epiboly, the remaining cells clustered with wild-type states (Fig. 5D and fig. S10). This result indicates that even on the whole-transcriptomic and single-cell level, mutant cells are canalized into a subset of wild-type fates after the loss of an essential signaling pathway.

Hybrid gene expression states reveal developmental plasticity

Inspection of the developmental tree revealed that most cells fell along tight trajectories, but some cells were located in intermediate zones between branches. This observation seemed at odds with the view that embryonic cells traverse a developmental trajectory until a branch point funnels them cleanly and irreversibly into one of multiple downstream branches. For example, at the axial mesoderm branch point, most cells fell along the classic bifurcation from progenitor into notochord or prechordal plate fates (15, 29), with waves of gene expression corresponding to their specification and differentiation status (Fig. 6A, highlighted). However, ~5.4% were intermediate cells that expressed both notochord and prechordal plate markers (Fig. 6, A and B). The intermediate cells and the completely bifurcated axial mesoderm cells with similar pseudotimes came from embryos at the same developmental stage (Fig. 6C); moreover, they no longer expressed genes characteristic of the progenitors (such as nanog and mex3b). These observations eliminated models in which intermediate cells retained their progenitor state and delayed specification. Instead, we noticed that these cells expressed early markers of both programs (gsc, frzb, ta, and noto) but later markers of only the notochord program (ntd5 and shha). This observation raised two possible models: (i) Cells initially express both programs, then shut off the prechordal plate program, and produce only notochord markers later (“dual-specification” model); or (ii) cells specified as notochord first express early and late notochord markers and then change their specification by shutting off notochord expression and initiating early prechordal plate marker expression (“trans-specification” model).

Fig. 6 Hybrid state of cells in the axial mesoderm.

(A) Branch point plot, showing pseudotime (y axis) and random walk visitation preference from the notochord (N, left) and prechordal plate (P, right) tips (x axis), defined as the difference in visitation from the two tips divided by the sum of visitation from the two tips. Direct trajectories to notochord (green) and prechordal plate (pink) are highlighted, and intermediate cells are circled. (B) Gene expression of notochord markers (top row) and prechordal plate markers (bottom row) at the axial mesoderm branch point. Intermediate cells express early (ta/ntl and noto) and late (ntd5 and shha) notochord markers but only early prechordal plate markers (gsc and frzb). (C) Cells at the branch point, colored by developmental stage. Intermediate cells have the same developmental stage as that of fully bifurcated cells with similar pseudotimes. (D) Illustration of the prechordal plate (P) and notochord (N) in the 75% epiboly embryo. (E and F) Double fluorescent in situ expression of the early prechordal plate marker gsc (red) and either the early notochord marker ta/ntl [(E), green] or the late notochord marker ntd5 [(F), green] at 75% epiboly (8 hpf). Most cells contain only prechordal plate marker mRNA (for example, “1”) or notochord marker mRNA (for example, “5”). Cells with both prechordal plate and notochord marker mRNA are observed at the boundary of the two tissues, with red nuclear transcription foci, indicating active transcription of gsc (for example, “2” to “4”) (fig. S11). Cells that contained both ntd5 and gsc mRNA were identified and scored for their nuclear transcription foci, which indicate active transcription (supplementary materials, materials and methods); 56% had one or more observable nuclear transcription dots, of which 80% showed only active gsc transcription, 7% showed both active gsc and active ntd5 transcription, and 13% showed active ntd5 transcription alone.

To distinguish between these two models, we investigated their properties and location using fluorescent RNA in situ hybridization. At 75% epiboly, some cells located in the border region between the two tissues coexpressed an early prechordal plate marker (gsc) with either an early (ta/ntl) or late (ntd5) notochord marker (Fig. 6, D to F, and fig. S11). The existence of these intermediate cells in situ confirms that they are not an artifact of scRNA-seq (such as cell doublets), and their defined spatial localization near the border of the two tissues suggests that they are not merely biological or technical noise. Instead, the location of coexpressing cells raises the possibility that the boundary between the notochord and prechordal plate territories is refined during gastrulation, after the two populations become transcriptionally distinct. Many of the coexpressing cells exhibited bright nuclear foci that denoted sites of active transcription of the probed genes. Most of the cells with nuclear transcription foci for a single gene exhibited transcription of the early prechordal plate marker, gsc. By contrast, cells with active transcription of the notochord markers ta/ntl or ntd5 were rare (Fig. 6, E and F). The active transcription of the prechordal plate program supports the trans-specification model—;the intriguing possibility that some axial mesoderm cells move down the notochord specification path but then trans-specify into prechordal plate cells.


We describe a molecular specification tree of an early vertebrate embryo by exploring large-scale single-cell transcriptomic data with URD, a computational approach to reveal developmental trajectories in transcriptional space. Our study lays the foundation for multiple areas of future exploration.

First, URD is a powerful tool to reveal the transcriptional trajectories of complex developmental processes such as zebrafish embryogenesis. Combining URD with connected gene module analysis uncovered the transcriptional cascades underlying fate specification. Moreover, combining URD with Seurat enabled the anchoring of developmental trajectories to their spatial origins. We anticipate that similar augmentation of URD with information about lineage relationships (30, 31), chromatin dynamics (32), and signaling will further deepen insights into developmental processes. Additionally, some of URD’s limitations could be addressed by future enhancements. For instance, branching events driven by single genes (such as sox32 in the endoderm) (33, 34) are not captured with URD until additional transcriptional differences arise, which could potentially be improved with more aggressive, iterative branch calling. Also, improvements to the throughput and quality of scRNA-seq may drive improved performance from URD: More closely spaced time points could enable detection of rapidly changing fate decisions and could reveal sequential bifurcations at branch points currently thought to enter multiple trajectories; larger numbers of cells could enable reconstruction of rare, transcriptionally indistinct populations (such as the floor plate and hypochord); and improved sequencing depth could enable detection of important but lowly expressed regulators that are not found in the current data (such as npas4l/cloche). In the future, URD could be used to analyze additional systems, such as in vitro differentiation, tissue regeneration, cancer, and disease progression.

Second, the scRNA-seq data and developmental tree provide a rich resource for future studies of zebrafish embryogenesis. The presented data describe embryonic gene expression with unparalleled temporal and cellular resolution, and the reconstructed tree thus provides an atlas of the expression pattern and dynamics for nearly all genes. This allows inspection of gene coexpression with much greater ease than with multicolor in situ hybridization, reveals markers for cell types of interest, and associates uncharacterized genes with particular cell types. Moreover, the data reveal the progressive nature of cell fate specification and suggest potentially redundant regulators of developmental decisions with overlapping spatial and temporal expression, which would be missed in forward genetic screens.

Third, this work begins to illustrate the trajectories, canalization, and potential plasticity of fate specification. With respect to trajectories, our analysis suggests that not all cell fate decisions are binary; multiple trajectories can arise simultaneously from a pool of equipotent progenitors. For example, at the molecular level, the earliest specification of blastomeres separates the axial mesoderm, nonaxial mesendoderm, and ectoderm, rather than just separating the germ layers. This conclusion is compatible with the observation that the axial mesoderm progenitors reside at the dorsal blastula margin, where maternal factors that specify the Mangold-Spemann organizer simultaneously affect the fate of those progenitors (15). Concerning canalization, the analysis of a Nodal signaling mutant reveals that even on the whole-transcriptomic level, mutant cells adopt a subset of wild-type transcriptomic states, but not new ones, demonstrating the extensive canalization during embryogenesis even after abnormal developmental signaling. With respect to plasticity, the identification of intermediate cells at the axial mesoderm branch point demonstrates hybrid developmental states. In situ hybridization experiments suggest that these cells trans-specify from one cell type (notochord) to another (prechordal plate), and their border zone spatial localization suggests refinement of the boundary between these two regions. These results support an alternative view of developmental fate choice to the common interpretation of Waddington’s model: Downstream of some branch points, some cells still transition across the “ridges” between different lineages even after they are canalized into well-defined transcriptional states. This suggests a developmental plasticity that has been observed after perturbation (13, 35) but has not been well established in normal developmental processes. We propose that in a continuous morphogen gradient that resolves into discrete states, some cells will receive an amount of signal that is on the cusp between the two states; these cells would end up at the boundary between the two tissues. It is conceivable that continued signaling could provide a “push” over a shallow ridge, such as continued Nodal signaling in the axial mesoderm (22) or continued retinoic acid signaling in the hindbrain (36, 37). Future studies combining lineage tracing and in vivo imaging of gene expression will be needed to directly observe such transitions.

Collectively, our approach provides a framework with which to reconstruct the specification trajectories of many developmental systems without the need for prior knowledge of gene expression patterns, fate maps, or lineage trajectories. The generation of developmental trees for different species will enable comparative studies, and developmental statistics generated from such comparisons will help reveal conserved pathways and species-specific idiosyncracies.

Supplementary Materials

Materials and Methods

Figs. S1 to S11

Tables S1 to S5

References (3851)

Movie S1

Supplementary Analysis

References and Notes

Acknowledgments: We thank B. Raj and J. Gagnon for assistance collecting samples when two hands were insufficient, M. Rabani for assistance with the impulse model, M. Haesemeyer for the critical suggestions of directly simulating diffusion and of using visitation frequency as a method of dimensionality reduction, the Harvard Center for Biological Imaging and Bauer Core Facility for support, and members of the Schier and Regev laboratories for helpful discussions. We thank B. Raj, J. Gagnon, S. Pandey, N. Lord, M. Rabani, A. Carte, M. Haesemeyer, I. Whitney, and T. Montague for comments on the manuscript. Funding: This research was supported by the NIH (A.F.S., J.A.F., and A.R.), Allen Discovery Center for Cell Lineage Tracing (A.F.S.), Jane Coffin Childs Memorial Fund (J.A.F.), Charles A. King Trust (J.A.F.), Howard Hughes Medical Institute (A.R.), and Klarman Cell Observatory (A.R.). Author contributions: J.A.F., Y.W., A.R., and A.F.S, conceived the study; J.A.F., Y.W., and A.F.S. wrote the paper with revisions by A.R. and S.J.R.; J.A.F. and Y.W. collected the data; S.J.R. performed preliminary analyses; J.A.F., Y.W., and A.F.S. performed the data analysis presented in the manuscript; J.A.F. developed URD with input from A.R, S.J.R., and Y.W.; K.S. developed the variable gene identification method. S.J.R. and J.F. developed the 5′ UMI Smart-seq2 mapping pipeline; Y.W. performed the connected gene modules and spatial analyses. Competing interests: A.R. is a SAB member of ThermoFisher Scientific, Syros Pharmaceuticals, and Driver Group. Data and materials availability: The raw data reported in this paper are archived at NCBI GEO (accession no. GSE106587) and in processed and interactively browseable forms in the Broad Single-Cell Portal ( URD is available from GitHub (

Stay Connected to Science

Navigate This Article