Defining an essential transcription factor program for naïve pluripotency

See allHide authors and affiliations

Science  06 Jun 2014:
Vol. 344, Issue 6188, pp. 1156-1160
DOI: 10.1126/science.1248882

Predicting stem cell renewal or differentiation

Predicting complex mammalian cell behavior is extremely challenging. Dunn et al. developed a computational model that predicts when embryonic stem cells will self-renew or differentiate. The model revealed an essential program governing pluripotency and identifies a minimal set of components and interactions that accurately predict responses to genetic perturbation.

Science, this issue p. 1156


The gene regulatory circuitry through which pluripotent embryonic stem (ES) cells choose between self-renewal and differentiation appears vast and has yet to be distilled into an executive molecular program. We developed a data-constrained, computational approach to reduce complexity and to derive a set of functionally validated components and interaction combinations sufficient to explain observed ES cell behavior. This minimal set, the simplest version of which comprises only 16 interactions, 12 components, and three inputs, satisfies all prior specifications for self-renewal and furthermore predicts unknown and nonintuitive responses to compound genetic perturbations with an overall accuracy of 70%. We propose that propagation of ES cell identity is not determined by a vast interactome but rather can be explained by a relatively simple process of molecular computation.

Mouse embryonic stem (ES) cells exhibit the capacity to self-renew indefinitely, the plasticity to generate all somatic lineages and germ cells, and the ability to reenter embryogenesis after blastocyst injection. Collectively these properties are described as ground-state pluripotency (1). The pluripotent state of mouse ES cells has been considered as controlled by a vast network of genetic interactions (26). However, only a limited number of transcription factors (TFs) have been validated rigorously. Two have been found to be indispensable: Oct4 and Sox2. In contrast, factors such as Nanog, Klf4, and Esrrb are individually dispensable, yet their overexpression can support self-renewal (79).

Culture environments for efficiently sustaining mouse ES cells have been progressively refined. Currently optimized conditions comprise the cytokine leukemia inhibitory factor (LIF), together with two selective inhibitors (2i) of glycogen synthase kinase 3 (Chiron99021, CH) and mitogen-activated protein kinase kinase (Mek) (PD0325901, PD) (10). A number of direct transcriptional targets downstream of LIF, CH, and PD have been identified (fig. S1A) (8, 9, 1114). Although there is evidence of extensive cross-regulation between the TFs, it is not understood how the environmental signals are processed to stabilize the network and to ensure self-renewal. Characterizing this circuitry should allow us to explain and predict the self-renewal behavior of ES cells.

ES cells can be stably cultured as a substantially homogeneous population in four different environments: 2i+LIF, 2i, LIF+CH, and LIF+PD (Fig. 1A and fig. S2, A to C). Under these conditions, cells show no signs of differentiation, self-renew at clonal density, homogeneously express pluripotency markers, and form germline-competent chimeras after blastocyst injection (fig. S2E). This system permits us to study gene expression of pluripotency factors at steady state in different culture conditions with similar abilities to maintain ES cells. Furthermore, cells can be efficiently transitioned from one culture condition to another without affecting pluripotency or cell survival (fig. S2D), allowing adjustments in gene expression circuitry to be measured over time.

Fig. 1 Flexible culture conditions for mouse ES cells allow deduction of possible interactions between pluripotency factors.

(A) Under different combinations of LIF, CH, and PD, ES cells show homogeneous expression of Rex1, Oct4, Esrrb, and Nanog. See also fig. S2, A to C. Scale bar indicates 50 μm. (B) Gene expression for the set of 17 pluripotency-associated TFs (gastrulation brain homeobox 2, GBX2; transcription factor CP2–like 1, Tfcp2I1; sal-like 4, Sall4) from five experiments under different culture conditions. Columns 1 to 4 and 5 to 8 are biological replicates in the indicated steady-state conditions, whereas columns 9 to 23 illustrate the change in gene expression over time from cells in LIF+PD, 2i, or LIF+CH switched to 2i+LIF. Gene expression was measured by quantitative reverse transcription polymerase chain reaction and normalized to mean expression level. β-actin was used as an internal control. (C) Examples showing negative correlation between Esrrb and Tcf3 and positive correlation between Esrrb and Klf4 under time-series experiments. The Pearson correlation coefficient is indicated on each plot. (D) The meta-model derived from a Pearson coefficient threshold of 0.7, with the expected gene expression under 2i+LIF conditions. Gene expression is discretized to high (blue) or low (white). Positive regulation indicated by a black arrow; negative regulation indicated by a red circle-headed line. Dashed lines indicate optional interactions; solid lines indicate definite interactions. ERK, extracellular signal–regulated kinase.

We analyzed gene expression for 17 factors implicated in ES cell maintenance and inferred to be cross-regulatory (Fig. 1B) (1). To identify plausible TF interactions, we examined expression correlation between all possible TF pairs across the different culture conditions. Steady-state data were obtained from ES cells grown in each condition for 10 days, and time-course data were obtained from cells cultured for 10 days in each of 2i, LIF+CH, and LIF+PD before adding the third component to reconstitute 2i+LIF (Fig. 1B). By way of example, the strong negative correlation observed between estrogen-related receptor β (Esrrb) and transcription factor 7–like 1 (Tcf7l1, also known as Tcf3) (Fig. 1C) suggests a repressive interaction.

We used the Pearson coefficient as a metric to quantify the correlation between any two TFs (fig. S3A). Accordingly, we constructed a “meta-model” of the pluripotency network (Fig. 1D), defined as a set of possible undirected interactions, each suggested by the Pearson correlation data (supplementary materials section S1J). The meta-model therefore consists of a set of models, each a unique instantiation of possible interactions that might explain ES cell behavior. Inferred interactions are assumed to be functional but not necessarily direct.

The possible combinations of interactions between the TFs exceed the billions of billions. Distillation to only those models that capture observed behavior, and can explain ES cell information processing and decision-making, necessitated development of a new computational approach.

We adopted Boolean network formalism to abstract gene expression levels as low or high within the endogenous expression range. Computational analyses were performed by using a bespoke software tool designed to encode possible genetic interactions and behavioral constraints (supplementary materials section S1I). The tool implements formal verification procedures (15) to synthesize those interaction combinations and gene-regulation functions that provably satisfy experimental data, thereby constraining the set of possible models. Moreover, the tool enables interrogation of the entire constrained meta-model to generate predictions of behavior under genetic and environmental perturbations.

We defined a set of 23 experimental constraints, comprising conversion between culture conditions, loss of signals, and gain- or loss-of-gene-function experiments (Fig. 2A and supplementary materials section S1K). Each constraint consists of an initial and final gene expression pattern that must be reached in a finite number of steps. Starting from the meta-model, we identified a subset of models that can reproduce all 23 constraints simultaneously.

Fig. 2 Iteration with experimental observations reduces model complexity.

(A) Twenty-three experimental constraints are defined, each with initial (left column) and final (right column) conditions (supplementary materials section 1K). (B) The meta-model derived from a Pearson correlation threshold of 0.792, constrained to satisfy the expected experimental behavior (A). Eleven of the possible interactions are present in all possible models, and the remaining possible interactions are shown by dashed gray lines. All of the constraints can be satisfied without including five components of the network (gray).

To reduce model complexity and to maximize the predictive capability of the meta-model (supplementary materials section S1M), we sought only those components and interactions that are essential. A high Pearson correlation threshold is desirable to select only possible interactions between genes that show strong correlations in expression. The maximum Pearson correlation threshold that identified the smallest set of interactions needed to satisfy all experimental constraints simultaneously is 0.792. This gave a set of 70 possible interactions. We found that, in addition to known signal targets, 11 of the possible interactions were present in all of these models. We conclude that these are essential interactions for the pluripotency network (Fig. 2B). We further found that five components are not required and can be eliminated: zinc finger protein 42 (Zfp42, also known as Rex1); nuclear receptor subfamily 0, group B, member 1 (Nr0b1); methyl-CpG binding domain protein 3 (Mbd3); chromodomain helicase DNA binding protein 4 (Chd4, also known as Mi-2β; and Kruppel-like factor 5 (Klf5) (Fig. 2B, in gray). We thus derived a first set of minimal models to explain the 23 known behaviors of ES cells.

The set of possible interactions were compared with those inferred from open-source microarray data after genetic perturbations and also with chromatin immunoprecipitation sequencing data. We found that our set of possible interactions are supported by these independent data but that a set of interactions derived only from perturbation data was insufficient to satisfy experimental observations (supplementary materials section S2A and figs. S4 and S5). Second, we compared our computational approach with Bayesian network inference (16). The latter yielded fewer interactions and only one model that was inconsistent with experimental observations (fig. S5B).

ES cells cultured in 2i+LIF show a remarkable degree of robustness because they can tolerate the singular loss of key TFs such as Nanog homeobox (Nanog) or Esrrb (8, 9). However, the response of ES cells to compound perturbations is unknown, difficult to predict, and time-consuming to investigate by unsupervised experimental tests. We used the complete set of data-constrained models to predict the response of ES cells to single and double knockdowns in 2i+LIF.

We determined whether a given knockdown was predicted to result in maintenance or collapse of self-renewal (supplementary materials section S1L) (1719). We accepted predictions only when all of the candidate models were in agreement. This led to 11 predictions from the 24 possible double knockdowns and one single knockdown prediction (Fig. 3A, left prediction column).

Fig. 3 Experimental validation of model predictions.

(A) Summary of model predictions and experimental validation. Incorrect predictions are highlighted with an asterisk. Each box in the Exp. Validation column indicates the number of alkaline phosphate–positive (AP+) colonies obtained in a clonal assay. See also fig. S6. (B) Clonal assay of ES cells in the indicated culture conditions. For each sample, the number of AP+ colonies, relative to nontargeting siRNA-transfected cells, is indicated. Each bar represents the mean and SEM of at least four independent experiments. (C) Rex1GFPd2 flow profile of cells withdrawn from either 2i+LIF or 2i conditions for the indicated time. Cells taken from 2i down-regulate the reporter more rapidly. (D) The simplest model of 16 interactions and 12 components required to satisfy all experimental constraints. The complete set of constrained models is shown in fig. S7.

To test the predictions, we used small interfering RNA (siRNA) transfection followed by clonal assay, which provides a quantitative measure of self-renewal capacity (Fig. 3A, left exp. validation column). A subset of predictions was also tested by using knockout ES cells (fig. S6, B and C). The experimental results confirmed all predictions with the sole exception of the Tbx3/Klf2 double knockdown. Several predictions were nonintuitive: for example, the effect of Klf2/Sall4 double knockdown could not be inferred from single knockdown results (Fig. 3B).

No single model within the meta-model is capable of producing the observed behavior after Tbx3/Klf2 double knockdown. We therefore used this incorrect prediction as a new experimental constraint to derive a more accurate set of models by incrementally lowering the Pearson coefficient threshold to identify the most-supported subset of interactions (supplementary materials section S1M). We found that including an optional positive interaction between octamer-binding transcription factor 4 (Oct4) and sex-determining region Y–box 2 (Sox2) was sufficient to introduce the necessary flexibility to the network.

With the refined meta-model, we sought predictions of the response to genetic perturbations in different conditions (Fig. 3A, right, and fig. S6, F and G). In 2i, LIF+CH, and LIF+PD conditions, we found that 17 of 28 tested predictions were correct (Fig. 3A and fig. S6D).

The outcomes of some double knockdowns, including Esrrb/Nanog, were predicted to be strictly dependent on the culture environment: impeding self-renewal in 2i but not in 2i+LIF (Fig. 3B). Thus, our modeling approach can accurately recapitulate the behavior of the network in response to input signals.

The meta-model predictions suggest that the pluripotency network is less resistant to genetic perturbation in 2i than in 2i+LIF (compare charts in Fig. 3A). We tested whether a similar difference in robustness could be observed upon environmental perturbation. We transferred ES cells from either 2i+LIF or 2i only into unsupplemented medium then, after different periods, performed a clonogenicity assay by replating in 2i+LIF. The number of self-renewing cells declined more rapidly for 2i than 2i+LIF cultures (fig. S6, H and I). Rex1GFP (GFP, green fluorescent protein) was also down-regulated faster from 2i (Fig. 3C). These results suggest that the pluripotent state collapses more readily in 2i ES cells, consistent with the network being less stable than in 2i+LIF.

To identify the simplest among the set of equally valid candidate models constrained using the additional experimental data from knockdowns in 2i+LIF, we explicitly sought those with the fewest interactions. Only 16 interactions between 12 components may be required, because T-box 3 (Tbx3) is dispensable as a regulator of any other component (supplementary materials section S1N). There is only one valid model of this size, representing the minimal possible transcriptional program governing ES cell self-renewal (Fig. 3D). Of note, a repressive effect of Esrrb on Oct4 is required in all of the constrained models.

Our results reveal that ES cell decision-making is not necessarily dependent on a vast gene network, as widely considered, but instead can be explained by a program that, in its simplest version, consists of just 16 interactions, 12 components, and three inputs. This example model from the meta-model identifies the most parsimonious network that could maintain naïve pluripotency under four different culture conditions. Environmental signals are processed via interactions between pluripotency factors to stabilize the gene expression pattern that defines a self-renewing, pluripotent cell. The program itself is unchanging under the action of different inputs but computes the appropriate stable state as a consequence of these inputs. This conclusion is consistent with the proposition that individual cell states are attractors (20).

It will be instructive to learn how the current meta-model can be refined further to explain how cells exit from the pluripotent state for differentiation and how this state is generated in the developing embryo or by reprogramming. Furthermore, the general design of our computational approach means that it can be applied to understand biological computation in other cell systems.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S6

Tables S1 to S3

References (2132)

References and Notes

  1. Acknowledgments: We thank T. Kalkan, H. Kugler, and S. Dupont for stimulating discussions and comments on this manuscript and B. Simons for reading the manuscript. G.M. was a recipient of a Human Frontier Science Program fellowship. A.G.S. is a Medical Research Council Professor and is also funded by the Wellcome Trust.
View Abstract

Navigate This Article