"Stemness": Transcriptional Profiling of Embryonic and Adult Stem Cells

See allHide authors and affiliations

Science  18 Oct 2002:
Vol. 298, Issue 5593, pp. 597-600
DOI: 10.1126/science.1072530


The transcriptional profiles of mouse embryonic, neural, and hematopoietic stem cells were compared to define a genetic program for stem cells. A total of 216 genes are enriched in all three types of stem cells, and several of these genes are clustered in the genome. When compared to differentiated cell types, stem cells express a significantly higher number of genes (represented by expressed sequence tags) whose functions are unknown. Embryonic and neural stem cells have many similarities at the transcriptional level. These results provide a foundation for a more detailed understanding of stem cell biology.

Stem cells (SCs) have the capacity to self-renew as well as the ability to generate differentiated cells. Recently, the field of SC biology has attracted increasing attention because of the isolation of human embryonic SCs (1, 2) and the suggestion that adult SCs may have a broader potential or plasticity than was previously thought [(3), reviewed in (4), but see (5, 6)]. Understanding the genes that govern the special properties of SCs has implications for both embryology and basic cell biology. Despite this interest and the potential for use of SCs in cell replacement therapy, relatively little is known about the genetic programs for SCs.

The three best characterized types of SCs in vertebrates are embryonic (ESC), neural (NSC), and hematopoietic (HSC) stem cells. There is indirect evidence for SCs in intestine, skin, muscle, and liver, but their isolation has remained elusive (4, 7). Few genes are known to play roles in SCs or to be useful for SC isolation (4, 7). Genes expressed in ESCs have been identified with cDNA arrays containing ∼600 genes (8), and genes enriched in NSCs (9) or HSCs (10, 11) have been identified by subtractive hybridization. However, many genes expected to be enriched in SCs were not identified by these methods (12), and the use of different methods precludes a direct comparison of results from different stem cells (8–11).

We established transcriptional profiles for ESCs, NSCs, HSCs, and the differentiated cells from lateral ventricles of the brain and from the main cell population of the bone marrow (Fig. 1). This protocol is intended to first identify genes enriched in each individual stem cell population and then compare those sets of genes to one another. The methodological details (12) can be summarized as follows. Replicates of mouse stem and differentiated cell samples were isolated, and amplified probes were prepared by in vitro transcription and then hybridized to DNA microarrays (Affymetrix U74Av2) containing about 12,000 genes. Scanned arrays were analyzed with Affymetrix MAS 4.0 software to identify transcripts absent in differentiated cells but present in SCs, and with dChip software, a statistical method for model-based expression analysis (13), to obtain expression indices that identify transcripts enriched in SCs. In each comparison, a 90% confidence interval was calculated for the fold change in gene expression, and the lower limit of this interval—the lower confidence bound (LCB)—was used as a measure of enrichment in gene expression. Li and Wong (14) have shown that the LCB is more reliable than fold change as a ranking statistic for changes in gene expression. Yuen et al. (15) compared data from Affymetrix chips and real-time polymerase chain reaction and concluded that chip analyses are accurate, reliable, and underestimate differences in gene expression. In view of their work, our criterion of selecting genes with LCBs above 1.2 (which corresponds to an estimated fold change of 1.9 in gene expression) most likely corresponds to a fold change of at least 3 in gene expression. The reproducibility of our results is underscored by the high correlation coefficients of the replicates (which have a mean of 0.98 and a range of 0.96 to 1.00); 7786 genes, 63% of the array, were reproducibly detected (called “present”), indicating that a substantial portion of the mouse genome was assayed.

Figure 1

Transcriptional profiling of stem cells. Embryonic, neural, and hematopoietic stem cells, as well as the corresponding differentiated cell populations (lateral ventricle brain and bone marrow main population), were isolated and their mRNAs were amplified and hybridized to oligonucleotide arrays. The arrays were analyzed by a combination of the Microarray Suite (Affymetrix) and dChip (Wong lab, Harvard) software, followed by public database searches and functional annotation. Indicated at the bottom are well-known genes that were expected to be enriched in each of the three stem cells and were indeed detected as enriched.

Lateral ventricles of the brain and the main cell population of the bone marrow were used as baselines for NSCs and HSCs, respectively, as they correspond to differentiated cell types for these SCs. For ESCs, which give rise to all mouse cell types, we compared them to lateral ventricles of the brain and the main cell population of the bone marrow and then intersected the comparisons (i.e., we selected only genes that showed a significant enrichment in both comparisons). This method proved to be effective for detecting genes expected from the literature to be enriched in ESCs (Fig. 1). Genes enriched in ESCs, NSCs, and HSCs were assigned to functional categories with the use of and National Center for Biotechnology Information (NCBI) databases. Gene lists were intersected by Unigene number to determine overlaps (Fig. 2A). Functionally annotated data were organized into fully searchable spreadsheets (12); all raw data are also available online ( so that others can design alternative criteria for analyzing the data.

Figure 2

Overlaps between genes enriched in stem cells. (A) Venn diagram of the number of genes enriched in each stem cell population, and their overlaps. Note the high overlap between ESC- and NSC-enriched genes. In a small percentage of the cases, the same gene is recognized by more than one probe set in the Affymetrix array; hence, the 230 probe sets enriched in all SCs actually correspond to 216 unique genes. (B) ESC and NSC samples cluster together in hierarchical clustering trees, irrespective of the Gene Ontology category used to generate the trees. Several other categories were assayed (not shown). In all cases the hematopoietic samples constituted a separate cluster.

Figure 1 shows that most previously known SC markers were detected as enriched in their respective SC population, providing a strong validation of the protocol. Moreover, most of these genes are enriched only in the expected SCs, attesting to the specificity of the results. Recent studies [(3), reviewed in (4)] suggest that the potential of adult SCs may be broader than previously thought, and raise the question of whether all SCs are similar at the transcriptional level. Our results show that SCs are distinct in that each SC type can clearly be identified by highly enriched genes that are not present (or not enriched) in other SCs. An extensive commentary on the genes enriched in each of the three SCs is provided in (12). In addition to identifying genes specific to each stem cell population, the data show that there is a subset of genes commonly enriched in all SCs. Figure 2A shows numbers of genes enriched in each type of SC, as well as the overlaps.

Relative to differentiated cell types, SCs express a significantly higher number of expressed sequence tags (ESTs); 34% of genes enriched in the differentiated cell samples are ESTs, but that number increases to about 45% in SC samples (12). The overlap of genes enriched in both differentiated cell samples (lateral ventricles of the brain and the main cell population of the bone marrow) contains only 16% ESTs, whereas the overlap of genes enriched in all three SC samples (ESCs, NSCs, and HSCs) contains more than 50% ESTs (12). Given that ESTs represent genes about which little is known, SCs evidently express a higher proportion of genes and functions that remain to be investigated.

One question that arises is whether SCs are more similar to one another or to their differentiated counterparts. We find that HSCs are more similar to the main cell population of the bone marrow than to any other sample. However, NSCs are more similar to ESCs than to the lateral ventricles of the brain or any other sample (Fig. 2B, see below). One might have expected the greatest overlap to be between NSCs and HSCs, given that they are both cell populations taken from adult mice, and in view of their reported capacity to transdifferentiate [reviewed in (4)]. Nonetheless, there is greater overlap between genes enriched in ESCs and NSCs (Fig. 2). The ESC-enriched genes overlap in 1101 genes (61.6%) with NSC-enriched genes but only overlap in 431 genes (24.1%) with HSC-enriched genes. ESCs and NSCs also show a higher overlap in depleted genes (i.e., changing in the opposite direction) (16).

ESCs and NSCs are similar not only in enriched and depleted genes, but in the overall pattern of gene expression values. We assayed similarity of gene expression by calculating the correlation coefficients (CCs) between data sets (12) and by hierarchical clustering, a method that arranges genes according to their similarity (17). In the first method, the CC between the ESC data set and any other data set except itself (1.00) and NSC (0.87) is always below 0.78. Conversely, the NSC data set shows an even higher CC to ESC (0.87) than to lateral ventricles of the brain (0.82). In the second method, hierarchical clustering trees reveal a strong similarity between the ESC and NSC samples, regardless of the functional category of genes used to generate the trees (Fig. 2B) (18).

The global overlap between genes expressed in ESCs and NSCs supports a default model for neural development (19). This model is based on results showing that embryonic cells of both frogs (19) and mice (20) become neural cells in the absence of cell-to-cell signaling. Simply put, it may be that ESCs do not require numerous signals to become neural cells because they already express many regulators of NSC function. This similarity between ESCs and NSCs is also encouraging for efforts aimed at using ESCs to generate neurons for neurodegenerative disorders.

The main finding of this study is that the expression of 216 genes is enriched in all three SCs (Fig. 2A and Table 1). [See (12) for a fully annotated list and commentary on SC-enriched genes.] These genes are likely to reveal core stem cell properties (or “stemness”) that underlie self-renewal and the ability to generate differentiated progeny. We propose that the essential attributes of stemness include (i) active JAK/STAT (Janus kinase/signal transducers and activators of transcription), TGF-β (transforming growth factor–β), Yes (Yamaguchi sarcoma) kinase, and Notch signaling; (ii) capacity to sense growth hormone and thrombin; (iii) interaction with the extracellular matrix via integrin α61, Adam9, and bystin; (iv) engagement in the cell cycle, either arrested in G1 or cycling; (v) high resistance to stress, with up-regulated DNA repair, protein folding, ubiquitin system, and detoxifier systems; (vi) a remodeled chromatin, acted upon by DNA helicases, DNA methylases, and histone deacetylases; and (vii) translation regulated by RNA helicases of the Vasa type. Only four of the 216 genes enriched in all SCs had absolute calls of “absent” in all the differentiated cell samples and “present” in all the SC samples, namely Uridine phosphorylase, Suppressor of Lec15, and two ESTs (16). Thus, most if not all of the SC-enriched genes are not expressed exclusively in SCs. Rather, it is their combined enrichment relative to differentiated cells that underlies the common properties of SCs.

Table 1

The 216 genes enriched in all three stem cells. A few genes are present in more than one category. See database S4 for a full list of genes by general functional category, and table S8 for detailed functional annotations (12).

View this table:

One theme emerging from these data is that SCs have characteristics of cells under stress. A similar up-regulation of chaperones, ubiquitin/proteasome genes for protein degradation, DNA repair, and detoxifying enzymes has been reported in yeast under oxidative stress (21, 22). It is possible that DNA replication and protein folding need to be well executed in SCs, given the potential for cellular amplification if genetic or epigenetic mutations occur (23). An alternative explanation is that SCs may be protected from aging due to oxidative stress.Caenorhabditis elegans mutants that have an extended life-span have increased levels of molecular chaperones and enzymes that process oxidative free radicals, and appear to be resistant to environmental stresses (24).

Although the physiological role of Abc transporters in SCs is unclear, they are known to clear cells of toxins, and transporters confer drug resistance to tumor cell lines. The two adult SCs included in this study share Abc transporters, and all three SCs are enriched for Abcb1/Mdr1. Dye efflux mediated by Abc transporters has indeed been used to isolate cells with SC properties from a variety of organs (25, 26), including in this study. Interestingly, an Abc transporter activity has been shown to maintainDictyostelium prespore cells undifferentiated by expelling a small-molecule differentiation-inducing factor, DIF (27).

Several members of the JAK/STAT and TGF-β pathways are enriched in all three SCs. The JAK/STAT pathway promotes self-renewal of ESCs (28) and is required for self-renewal ofDrosophila sperm stem cells (29, 30). The growth hormone receptor, enriched in all three SCs, signals via the JAK/STAT pathway (31). The TGF-β pathway maintains quiescence of hematopoietic precursors (32) and is required for development of mouse and fly germ cells (33). Moreover, the JAK/STAT and TGF-β pathways have been shown to interact, and integration of these signals may be a theme common to SCs.

Chromatin-remodeling helicases of the SNF2/SWI2 family, of which two are enriched in all SCs, are also a feature worth noting. These enzymes promote DNA unwinding and maintain an open chromatin structure. Because chromatin remodeling plays a role in transcription, replication, and DNA repair (34), we speculate that the ability to modulate local chromatin states may be necessary for SC pluripotency.

Genes involved in posttranscriptional regulation are also enriched in SCs, including those with roles in alternative splicing and translational control. Several DEAD-box RNA helicases are enriched in each SC, and an EST with high homology to Ddx1 is common to all of them. Ddx4/Vasa, which we detected as enriched in ESCs, is a conserved regulator of translation required for germline development across animal phyla (35).

Related but not identical genes may perform the same function in different SCs. Indeed, each SC is enriched for members of the Notch pathway, DNA methylases, or transcriptional repressors of the histone deacetylase and Groucho families, but none of them overlaps in all three SCs.

Of the 216 genes enriched in all three SCs, only 60 have been mapped to a chromosomal location in LocusLink (NCBI). Twelve of these genes are on chromosome 17, which means that this chromosome contains 3.7 times the number of SC-enriched genes that would be present if these genes were randomly distributed (Fig. 3). The probability that this would happen randomly is 4 × 10−4. The t-complex on chromosome 17 contains genes involved in embryonic development and spermatogenesis (36), and four SC-enriched genes map in the t-complex. It is possible that some of the clustered SC-enriched genes are coregulated at a local chromatin level, and their proximity in the genome may reflect an ancestral clustering of SC genes.

Figure 3

Chromosomal locations of stem cell–enriched genes. The x axis refers to the set of mouse chromosomes; the y axis indicates how many SC-enriched genes map to each chromosome. Because mapping information was obtained from LocusLink (NCBI), values on the y axis were normalized for each chromosome to the total number of genes per chromosome mapped in LocusLink. Of the 60 stem cell–enriched genes that have been mapped in the mouse genome, 12 map to chromosome 17. A number of genes map to the t-complex on chromosome 17, including four within 1.1 cM (insert).

There are limitations to the use of oligonucleotide arrays to characterize cells. The analysis is obviously limited to genes present on the microarray, and some RNA transcripts may not be translated into proteins. Another limitation of this study is that it is currently impossible to purify SCs to absolute homogeneity. Genes expressed by contaminating progenitor or differentiated cells may therefore be detected in the SC samples. However, the comparisons used with the corresponding differentiated cell samples should mediate against these “contaminating” genes in subsequent analyses.

Further insights into the biology of SCs will be gained by extending our approach to other SCs and other organisms. The genes identified here may now be tested in functional assays and may be useful in isolating new SCs. Characterization of the genomic regions that regulate their expression will also further an understanding of the genetic regulatory networks that stem cells use.

Supporting Online Material

Materials and Methods

SOM Text

Tables S1 to S10

Fig. S1

Databases S1 to S5


  • * To whom correspondence should be addressed. E-mail: dmelton{at}


Stay Connected to Science

Navigate This Article