The Genetic Program of Hematopoietic Stem Cells

See allHide authors and affiliations

Science  02 Jun 2000:
Vol. 288, Issue 5471, pp. 1635-1640
DOI: 10.1126/science.288.5471.1635


Blood cell production originates from a rare population of multipotent, self-renewing stem cells. A genome-wide gene expression analysis was performed in order to define regulatory pathways in stem cells as well as their global genetic program. Subtracted complementary DNA libraries from highly purified murine fetal liver stem cells were analyzed with bioinformatic and array hybridization strategies. A large percentage of the several thousand gene products that have been characterized correspond to previously undescribed molecules with properties suggestive of regulatory functions. The complete data, available in a biological process–oriented database, represent the molecular phenotype of the hematopoietic stem cell.

Single hematopoietic stem cells can give rise to at least eight distinct blood cell lineages and can maintain lifelong blood production in mice (1,2). Their hallmark property is the ability to strike a balance between self-renewal and a commitment to differentiation (2). The mechanisms that govern these stem cell fate decisions must be under tight yet flexible control. Despite extensive functional and physical stem cell characterization, almost nothing is known about the molecular nature of these regulatory mechanisms. Several molecules have been shown to play roles in early aspects of hematopoietic development, but it has not been possible to elucidate regulatory pathways that function at the level of self-renewing hematopoietic stem cells. Key aspects of stem cell regulation are likely to be emergent properties of interacting pathways and networks, the elucidation of which requires an extensive description of the molecular components available to the stem cell, that is, its genetic program. Herein, we describe a large number of gene products that represent such a program and extend the known properties of stem cells.

During mouse development, the fetal liver is the first tissue from which hematopoietic stem cells can be purified [(3,4); see (5) for stem cell transplantation data and a description of the hematopoietic hierarchy] and is the site where they expand in number under normal conditions (6). These stem cells are ScaposAA4.1posKitposLinneg/lo(henceforth referred to as “FLHSC”) (3,4). The FLHSC and stem cell–depleted AA4.1neg populations represent the two endpoints of the fetal liver hematopoietic hierarchy. Sufficient numbers of cells were purified to construct non–polymerase chain reaction (PCR)–based cDNA libraries (7). A second FLHSC library was constructed with a PCR-based technique (8). Sequences present in AA4.1neg cells were subtracted from each of the FLHSC libraries to remove “housekeeping” gene products and to enrich for transcripts expressed in primitive cells (9). Our overall approach encompasses high-throughput sequence acquisition and bioinformatics, as well as high-density array, reverse transcriptase–PCR (RT-PCR), and other hybridization analyses. The primary data and the results of computational analyses reside in the Stem Cell Database (SCDb) (5).

The non-PCR–based subtracted stem cell library (designated SA) was the source for most analyses. The efficacy of the subtraction was verified by the absence of β-actin and the retention or enrichment of gene products such as flk2/flt3 and CD34(5). 5′-end sequences obtained from 5735 clones (representing 2119 nonredundant gene products) were characterized by the bioinformatic analyses shown in Fig. 1A. The sequences were compared (using the BLAST algorithm) to seven databases: SwissProt, GenBank protein and nucleotide collections, expressed sequence tags (ESTs), murine and human EST contigs (10), and SCDb itself (a measure of internal redundancy). Statistical analysis and limited empirical data suggest that these sequences represent a substantial portion (conservatively, 50 to 55%) of the library complexity (5). The nonredundant sequences were categorized by homology as depicted inFig. 1B (upper panel). Novel sequences were extended with ESTs, and the predicted amino acid sequences were evaluated for motifs, hydrophobicity, and the presence of signal peptides (5). A functional categorization of informative protein sequences is shown inFig. 1B (lower panel). Each entry in SCDb is accompanied by an “executive summary”—a distillation of predicted structural properties, physiological roles, tissue distribution, and other features that may suggest functions in stem cell biology.

Figure 1

(A) Automated bioinformatics analysis includes queries of six public databases and SCDb itself. Novel open reading frames (ORFs) are examined for protein motifs, transmembrane helices, and signal sequences. These data, along with the results of virtual Northerns, nearest neighbor analysis, and PubMed queries, are incorporated into an executive summary for each entry by the annotator. (B) Categorization of informative sequences by homology (top) or, if known, by protein type (bottom). See (5) for a detailed breakdown of protein families and access to sequences.

In-depth analyses of the entire subtracted FLHSC sequence set have revealed interesting molecules and molecular relationships. At least 161 transcription factors, 174 cell-surface or membrane-associated molecules, 28 secreted proteins, and 147 signaling molecules have been identified. Many of these are previously undescribed or are identified in stem cells for the first time. Examples of sequences in these categories are presented in Table 1 [for complete lists, alignments, and peptide motif analyses, see (5)]. Full-length sequences for five molecules with characteristics suggestive of regulatory roles were obtained (5). Sequence alignments and Flag epitope-tag subcellular localization data are presented in Fig. 2. Clone SA61 (Fig. 2, first panel), recently published as DOKL (11), has been suggested to function as a modulator of Abl signaling. SA49P01 or Hemp-1 (hematopoietic expressedmammalian polycomb; Fig. 2, second panel) contains a Cys2-Cys2 zinc finger domain, as well as two Lethal-3 Malignant Brain Tumor [l(3)mbt] repeats, and is a member of the Polycomb group of transcriptional repressors. The predicted SMC34 protein contains eight transmembrane domains and is localized to the endoplasmic reticulum (ER) (Fig. 2, third panel). LL5-96 (Fig. 2, fourth panel) encodes a LIM-domain protein. Clone C3-65 (Fig. 2, fifth panel) encodes a putative protein methyltransferase.

Figure 2

Alignments of important motifs (right) and subcellular localization (above) for five proteins identified in hematopoietic stem cells. An alignment of SA61 with the two closely related molecules p62Dok and p56Dok-2establishes it as a member of the Dok family of tyrosine kinase substrates. Subcellular localization studies show accumulation of SA61-GFP fusion protein in the cytoplasm. Alignment of thembt repeats of Hemp with those of the Polycomb-family transcriptional repressor proteins Sex Comb on Midleg (Scm) and Malignant Brain Tumor (Mbt) indicate that hemp is a novel Polycomb Group member. FLAG epitope-tagged Hemp protein localizes to the nuclei of transfected cells (red nuclei). A region of the eight-transmembrane protein SMC34 beginning in the first transmembrane domain is highly conserved in an ORF fromCaenorhabditis elegans as well as mouse and human homologs of an apparently related protein. Tagged SMC34 protein (red) shows accumulation in the ER and overlap (yellow) with ER-staining calnexin antibody (green). Clone LL5-96 contains a single LIM domain that aligns with those of several transcription factors. An LL5-96–green fluorescent protein–(LL5-96-GFP) fusion protein preferentially localizes to Cos-1 nuclei. Clone C3-65 shows homology to prokaryotic and mammalian protein methyltransferases in three canonical methyltransferase domains. A C3-65-GFP fusion protein localizes to Cos-1 cytoplasm. Nuclei in the second, third, and fourth photographs were counterstained with Hoechst 33342 (blue).

Table 1

Examples of known and novel sequences that have been placed into functional categories. Complete lists for each category can be found in SCDb (5). Asterisks denote sequences conserved in both Drosophila and C. elegans. HLH, helix loop helix; CD, cytoplasmic domain; LDL, low density lipid; TM, transmembrane; EGF, epidermal growth factor; IL, interleukin; GTP, guanosine triphosphate; ATP, adenosine triphosphate.

View this table:

The overall developmental similarities of these different cell populations suggest that functionally important regulatory molecules should be expressed in multiple sources of stem cells. A comparison of fetal and adult populations should also uncover molecules whose expression is not simply a function of proliferative status. Accordingly, the expression of four of these transcripts was analyzed in fetal liver cells, different compartments of the adult bone marrow hematopoietic hierarchy (Fig. 3A), and embryonic stem (ES) cell–derived hemangioblasts, as well as their hematopoietic and endothelial progeny (12). The expression complexity in the hematopoietic hierarchy was further demonstrated by hybridization of several molecules to pools of cDNA populations representative of single progenitor cells with defined differentiation abilities (Fig. 3B) (13). The expression differences among these very closely related progenitor cells suggest a high degree of precision in the transcriptional control mechanisms functioning at distinct stages of the hematopoietic hierarchy.

Figure 3

(A) Expression of four transcripts in several different hematopoietic subpopulations determined by semi-quantitative PCR. Rows 1 through 6, cDNA populations I and II were made from independently isolated fetal liver cells as described (4): AA4.1posLinneg/loSca-1posc-Kitpos(designated Scapos); AA4.1posLinneg/loSca-1negc-Kitpos(Scaneg); cells negative for AA4.1 by immunopanning (AA4.1neg). Rows 7 through 10, PCR-amplified cDNA from adult bone marrow: ScaposKitposLinnegRh123low(Rhlo ); ScaposKitposLinnegRh123high(Rhhi); ScanegKitposLinneg(Scaneg). Rows 11 through 12, ScaposKitposLinnegCD34neg/low[CD34neg; see (30)]; ScaposKitposLinnegCD34pos(CD34pos). Rows 13 through 15, ES cell-derived populations (a gift from G. Keller): Blast, hemangioblasts; Hemat., hematopoietic embryoid bodies derived from blast cells; Endo., endothelial embryoid bodies derived from blasts (12). PCR cycles went from 18 cycles to 30 in three-cycle increments. Thus, each lane represents approximately an eightfold increase in PCR product over the previous one. Products were Southern-blotted and hybridized with specific cDNA probes. (B) Blots of cDNA from pools of individual hematopoietic progenitors (a gift of N. Iscove, Toronto, Canada) were used to examine expression of two transcripts along the hematopoietic hierarchy. The relative darkness of each circle represents the hybridization intensity of the corresponding slot on the blot. See (5) for actual blot images. The labels designate the measured colony-forming potential of each cell type. E, erythroid; Meg, megakaryocyte; Mac, macrophage; Neut, neutrophil; Mast, mast cell; B, B cell; T, T cell; BFU-E and CFU-E, erythroid blast-forming and colony-forming units, respectively.

In order to more extensively explore the molecular similarities between fetal and adult stem cells, we used nylon membrane arrays containing over 18,000 SA clones, including the entire sequenced set. Adult bone marrow stem cells are Rhodamine-123loScaposKitposLinneg(Rhlo), whereas mature cells are Linpos(14). Bidirectionally subtracted (15) probe populations from fetal liver or bone marrow were each hybridized to the SA arrays. Table 2 depicts differentially hybridizing genes from the fetal liver and adult bone marrow ranked by differential hybridization intensity. Four genes, three of which were previously undescribed, were found in both screens. We have shown that one of these, CD27, is a marker for a subset of purified bone marrow stem cells (16). Clone LL2in20044 contains coiled-coil and COOH-terminal transmembrane domains and is homologous to many members of the syntaxin and SNARE vesicle docking protein families [see (5) for additional hybridization data].

Table 2

Examples of the known and novel genes that show differential hybridization with intensity ratios greater than five. Upper, FLHSC versus AA4.1neg populations; Lower, Rhlo versus Linpos populations. Mean ratio is the average hybridization intensity ratio observed between the two probes. Count is the number of times a particular sequence was observed in the same screen. More data from the analyses as well as array images are presented in (5). The asterisks denote those transcripts that are highly differentially expressed in both screens.

View this table:

A comparison of the fetal and adult screens also reveals molecules that appear more predominant in either fetal or adult hematopoietic stem cells. The top two differential transcripts in the fetal liver screen are macroH2A1.2 and dnmt3b. The variant histone macroH2A1.2 may affect developmental changes in chromatin structure, and has been shown to associate with inactive chromatin on the X chromosome, as well as certain cell type–specific non-X chromosome sites (17–19). The cytosine methyltransferase Dnmt-3b is essential for mammalian development, as is Dnmt-3a, also identified in SCDb (20, 21). Both of these chromatin-modifying proteins are likely to play roles in stem cell biology. All clones hybridizing with the adult probe population are likely to be expressed in both fetal and adult stem cells. One of these is the homeotic transcriptional repressor TGIF, which is a corepressor [along with the histone deacetylase HD-1 and c-Ski (22), and in competition with p300/CBP] of the transforming growth factor–β (TGF-β)–mediated Smad2-Smad4 pathway. These regulators—TGIF, HD-1, Ski, and p300/CBP (as well as the Ski-interacting protein andDrosophila Bx42 homolog Skip)—have all been identified in the SA library, although TGIF is the most highly represented. The expression of these specific molecules in stem cells further supports a role for TGF-β in hematopoietic control (23). The p300/CBP protein is known to acetylate Histone H2A (24) and thus may also be involved in functional regulation of macroH2A1.2. The predicted protein encoded by clone LL2in14463, which contains a sequence signature indicative of a histone deacetylase, may also interact with these and other chromatin modifiers.

Other gene products identified in our studies can be placed into developmental pathways. For example, the importance of the Notch pathway in hematopoietic development (25) is underscored by the identification of Notch-1, Manic Fringe, nuclear factor kappa B (NFκB), and Dishevelled-1 in SCDb. Other molecules represented in SCDb include members of the Fos serum response pathway (c-Fos, Serum Response Factor, SAP1a, Ets-related protein) and the associated Ets family members FLI-1, FLI-1–associated protein, and PU.1. Among the many cell adhesion proteins uncovered in these studies are Semaphorin B and Neuropilin-1, as well as two apparently novel members of the plexin family. Neuropilin-1 and some members of the plexin family are known to bind semaphorins (26,27). Given the presence of all three types of molecules in the SA library, it is likely that semaphorins play a role in stem cell adhesion and homing behavior.

Our studies identify numerous individual candidate regulatory molecules, but they also pave the way for more global approaches to stem cell biology. In particular, the production of stem cell microarrays will permit the analysis of fluctuations in the genetic program as a function of permutations in self-renewal, commitment, or other stem cell properties (3, 4, 28). The large collection of gene products will also facilitate proteomic strategies to uncover protein interaction networks (29). We anticipate that the SCDb will be a resource for the stem cell community and will foster the collaborative and consortial interactions necessary for global approaches to important biological questions.

  • * To whom correspondence should be addressed. E-mail: ilemischka{at}


View Abstract

Navigate This Article