Report

Systematic Identification of Signal-Activated Stochastic Gene Regulation

See allHide authors and affiliations

Science  01 Feb 2013:
Vol. 339, Issue 6119, pp. 584-587
DOI: 10.1126/science.1231456

Abstract

Although much has been done to elucidate the biochemistry of signal transduction and gene regulatory pathways, it remains difficult to understand or predict quantitative responses. We integrate single-cell experiments with stochastic analyses, to identify predictive models of transcriptional dynamics for the osmotic stress response pathway in Saccharomyces cerevisiae. We generate models with varying complexity and use parameter estimation and cross-validation analyses to select the most predictive model. This model yields insight into several dynamical features, including multistep regulation and switchlike activation for several osmosensitive genes. Furthermore, the model correctly predicts the transcriptional dynamics of cells in response to different environmental and genetic perturbations. Because our approach is general, it should facilitate a predictive understanding for signal-activated transcription of other genes in other pathways or organisms.

A central goal of systems biology is to understand and predict the complex, stochastic dynamics of gene regulation (13). Although biochemical studies have identified many regulatory proteins in these processes, this typically does not enable construction of quantitatively predictive models of transcriptional dynamics. One challenge lies in the fact that gene regulation is a dynamic multistate process with largely unknown reaction rates. For example, a two-state system may represent closed and open chromatin (46) or the presence or absence of a transcription factor (79). Including more states or regulatory reactions results in a combinatorial increase in the number of possible model structures (10) and leads to a complicated trade-off between overfitting and predictive power.

We propose a data-driven comprehensive approach to identify and validate predictive, quantitative models of transcriptional dynamics through the integration of single-cell experiments and discrete stochastic analyses within a system identification framework. We apply this approach to the well-characterized high-osmolarity glycerol (HOG) mitogen-activated protein kinase (MAPK) pathway in Saccharomyces cerevisiae and focus on the regulation of STL1, CTT1, and HSP12 (11, 12) genes. Upon osmotic shock, the Hog1p kinase quickly enters the nucleus (Fig. 1A, and figs. S2 to S4, and S6) (1316) and activates STL1, CTT1, and HSP12 gene expression (figs. S6 and S9) (17). We find that the Hog1p translocation dynamics is homogeneous (14, 15, 17), yet downstream gene activation is heterogeneous among cells (17). To quantify STL1 expression directly, we developed a single-molecule fluorescent in situ hybridization (smFISH) (18, 19, 20) assay, which captures the stochastic nature of mRNA transcription with high temporal and single-molecule resolution (Fig. 1B) (21, 22, 23).

Fig. 1

Quantitative analysis of single-cell stochastic gene regulation. (A) Schematic of a generic signaling cascade in which a kinase enters the nucleus and interacts with transcription factors (TF) and chromatin modifiers (CM) to regulate gene expression. (B) Rapid, stochastic, and bimodal activation of endogenous STL1 mRNA expression is detected with single-molecule RNA-FISH [yeast cell: gray circle; DAPI (4′,6-diamidino-2-phenylindole)–stained nucleus: blue; STL1 mRNA: green dots]. Scale bar: 2 μm.

In addition to the kinase Hog1p, we consider the effects of the transcription factor Hot1p and the chromatin modifiers Arp8p and Gcn5p that modulate STL1 transcription (17, 24). For this system, we seek to find and validate a model that predicts the system's dynamic mRNA expression for several genes (STL1, CTT1, and HSP12) in response to environmental and genetic perturbations. We propose a range of model structures, each with a discrete number of states, {S1, S2,…,SN} (Fig. 2A). Each haploid cell occupies one state at a time, and transitions among states are discrete, stochastic events. At least two states are required to explain bimodality, but additional states allow for more complex mechanisms, such as chromatin remodeling or transcription factor binding or release (79, 17). Because activated mRNA transcription and degradation rates are constant throughout different conditions (fig. S5), only transition rates can be variable and are assumed to be constant or linearly dependent on the kinase. After identifying the model structure and Hog1p dependency, we validate the model structure for several mutants and different Hog1p-dependent genes.

Fig. 2

Identifying a maximally predictive model structure. (A) Two- and multistate model structures that allow for kinase, transcription factor, and chromatin modifier–dependent activation of gene expression. (B) Relative likelihoods of best fit for different model structures at 0.4 M NaCl (red, left axis) and the resulting predictions at 0.2 M NaCl (green, right axis). Cross-validation at 0.4 M NaCl (27) is used to quantify predictive uncertainty (gray region, left axis) and yields excellent a priori knowledge of predictive power (compare blue and green lines). (C) mRNA expression distributions at two NaCl concentrations (black and blue lines) and best fit at 0.4 M (red line) and the corresponding prediction at 0.2 M NaCl (green line). The fit and predictions correspond to the four-state structure with one Hog1p dependency identified at 0.4 M NaCl in (fig. S7). The black arrow indicates the similar mRNA expression levels after an osmotic shock of 0.2 and 0.4 M NaCl. The purple star indicates the time point of gene expression deactivation.

To choose the best number of states needed to match STL1 gene expression dynamics, we allow every state transition rate to be Hog1p-dependent. For two-, three-, four-, and five-state model structures with any parameter set, we use the finite state projection (FSP) approach (25) to formulate a finite set of linear ordinary differential equations that predicts the time-varying probability distributions. We adjust the model parameters until the FSP analysis fits the bimodal mRNA distributions at all times (26). As expected, the fit improves as the model complexity increases (Fig. 2B, red line, and fig. S11). However, increased complexity leads to greater parametric uncertainty and may diminish predictive power. Applying cross-validation analyses to replicate experiments at 0.4 M NaCl (27), we score all models according to their estimated predictive power (Fig. 2B, blue line). This prediction estimate is validated with additional experiments conducted at 0.2 M NaCl, and we find that cross-validation provides an excellent estimate of predictive power (Fig. 2B, compare blue and green lines, and figs. S11 and S12). We find that the two- and three-state models are too simple, whereas the more complex five-state model structure is prone to overfitting (Fig. 2B and figs. S11 and S12).

We now concentrate on the four-state model structures and determine which reactions depend upon Hog1p. To identify a Hog1p-model structure with enough flexibility to match the data while avoiding overfitting, we allow one or two Hog1p dependencies. We then rank the corresponding maximum likelihoods and cross-validate the top ranked Hog1p-model structures. The fit improves with increasing complexity (Fig. 2B, red line, and fig. S11), while constraining the number of Hog1p dependencies reduces uncertainty (Fig. 2B and fig. S11). One notable feature of the identified model structure and its corresponding parameters is that in the absence of Hog1p, a fast reaction from S2 → S1 keeps all cells in the inactive S1 state (fig. S8, red line). When Hog1p exceeds a certain threshold, the gene can transition among the active S2, S3, and S4 states (fig. S8, blue, green, and black lines). Our final model captures all qualitative and quantitative features of STL1 mRNA expression dynamics after a 0.4 M NaCl osmotic shock (Fig. 2C, top). These features include a constant time delay, t0, between Hog1p translocation and mRNA expression; slow activation of gene expression; transient bimodality in RNA populations; conserved maximal mRNA expression between different conditions; and Hog1p-dependent modulation of gene expression duration. In addition, the model makes the best predictions for the mRNA expression after osmotic shock with 0.2 M NaCl (Fig. 2C, bottom).

To test the generality of this model's predictive power, we collect new data sets at 0.4 M NaCl for several different mutant strains and for different Hog1p-activated genes. The different mutant strains include a fivefold Hot1p-overexpression strain and gene knockouts of the chromatin modifiers ARP8 or GCN5. We also consider two additional stress response genes: CTT1 and HSP12. The model identified above fits equally well to the mRNA expression dynamics for STL1 in the Hot1p-overexpression strain as well as the arp8Δ and gcn5Δ mutants (Fig. 3A and figs. S15, S18, and S19). The same structure also fits the CTT1 and HSP12 mRNA expression dynamics (figs. S9, S15, S18, and S19) with relatively few parameter changes between the different genes and mutations (table S2) (27). The resulting model makes excellent predictions for the dynamics of CTT1 and HSP12 mRNA expression at 0.2 M NaCl (Fig. 3, B and C, and figs. S16, S18, and S19). Combining the relative changes in the rates measured for STL1 in the mutant ARP8 strain with the rate changes for the CTT1 and HSP12 expression measured in wild-type strain results in a very good prediction of the CTT1 and HSP12 mRNA expression in the ARP8 mutant strain (Fig. 3C and figs. S17 to S19) (27).

Fig. 3

Model structure validation. (A) Combined fit of the model structure identified (fig. S7) to different genetic mutations affecting STL1 expression at 0.4 M NaCl: wild-type (WT) (red), Hot1p 5× (blue), arp8Δ (black), and gcn5Δ (green). (B) Model prediction of CTT1 (cyan) and HSP12 (magenta) expression at 0.2 M NaCl. (C) Model prediction for HSP12 expression at 0.4 M in the arp8Δ strain.

Having determined that the model structure identified above can fit and predict STL1, CTT1, and HSP12 mRNA expression dynamics in different mutant strains, we examine which rates vary most for each mutant and gene in comparison to wild-type STL1 (Fig. 4A and table S2). The most variable rates between different mutations are the k12 and k21 transition rates, which indicate that Hot1p, Gcn5p, and Arp8p all modulate the transition rates into and out of the S1 state but result in different Hog1p-activation and -deactivation thresholds (fig. S10). Other transition rates are affected to a much lower degree.

Fig. 4

Relating model structure to biological function. (A) Mutant and gene-specific rate changes relative to STL1. (B) Final model, in which Hog1p, Hot1p, Gcn5p, and Arp8p regulate transitions between states S1 and S2.

The identified model structure and parameters quantitatively capture and/or predict all of the observed experimental data (Figs. 2 to 4 and figs. S15 to S19). The model also yields several qualitative and quantitative insights, including (i) a switchlike mechanism that activates each gene and stabilizes its activity when Hog1p exceeds a gene-specific threshold and (ii) gene-specific production and degradation rates that are independent of the Hog1p-kinase dynamics. The four-state model structure is essential to explain the temporal dynamics in gene expression observed in all of the experiments. This structure describes an Off state, S1, which is the default state in the absence of osmotic shock, and three On states with different transcription rates and reaction rates between the states. Activation occurs when nuclear Hog1p represses the deactivation rate, k21, subject to the interplay of gene- and mutant-specific (de)activation thresholds (fig. S10 and table S2). This interplay provides the main knob by which the duration of mRNA expression is finely tuned in response to different environmental conditions (e.g., different salt concentrations) or to different genetic mutations.

In this study, we have identified a single quantitative model to understand and predict STL1, CTT1, and HSP12 gene expression dynamics in response to various environmental and genetic perturbations. We generated a large range of possible model structures and developed a dynamic single-cell assay with which to discriminate among these model structures. We combined this experimental assay with discrete stochastic analyses and parameter identification approaches. Our cross-validation analyses systematically eliminated oversimplified and overcomplex model structures. We eventually selected the model structure and parameters for a single best model to predict STL1, CTT1, and HSP12 dynamics. Furthermore, the identified model provides detailed insight into the biophysical dynamics of signal-activated gene regulation. Because the presented experimental and computational tools are applicable to any gene or signaling pathway, this integrated identification approach can lead to insights into complex cellular networks for other organisms.

Supplementary Materials

www.sciencemag.org/cgi/content/full/339/6119/584/DC1

Materials and Methods

Figs. S1 to S19

Tables S1 and S2

References (2938)

  • Co-senior authors.

References and Notes

  1. Using full mRNA distributions for fitting yields substantially improved predictions (fig. S13) compared to fitting with the procedure in (28) that uses only means and variances.
  2. Materials and methods are available as supplementary materials on Science Online.
  3. Acknowledgments: This work was funded by the NSF (ECCS-0835847) and Human Frontier Science Program (RGP0061/2011) to M.K.; the NSF (ECCS-0835623), National Institutes of Health–National Cancer Institute Physical Sciences Oncology Center at Massachusetts Institute of Technology (U54CA143874), and an NIH Pioneer award (1DP1OD003936) to A.v.O.; Los Alamos National Laboratory–Laboratory Directed Research and Development to B.M.; the Deutsche Forschungsgemeinschaft (Forschungs Stipendium) to G.N.; and the A*STAR program, Singapore, to R.Z.T. We thank F. van Werven for the yeast crosses and N. Hengartner, B. Pando, S. Klemm, J. van Zon, and M. Wall for discussions on the model. We also thank M. Bienko, N. Crosetto, C. Engert, S. Itzkovitz, J. P. Junker, S. Klemm, S. Semrau, J. van Zon, and H. Youk for comments on the manuscript.
View Abstract

Navigate This Article