Research Article

Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data

See allHide authors and affiliations

Science  22 Apr 2005:
Vol. 308, Issue 5721, pp. 523-529
DOI: 10.1126/science.1105809

This article has a correction. Please see:


Machine learning was applied for the automated derivation of causal influences in cellular signaling networks. This derivation relied on the simultaneous measurement of multiple phosphorylated protein and phospholipid components in thousands of individual primary human immune system cells. Perturbing these cells with molecular interventions drove the ordering of connections between pathway components, wherein Bayesian network computational methods automatically elucidated most of the traditionally reported signaling relationships and predicted novel interpathway network causalities, which we verified experimentally. Reconstruction of network models from physiologically relevant primary single cells might be applied to understanding native-state tissue signaling biology, complex drug actions, and dysfunctional signaling in diseased cells.

Extracellular cues trigger a cascade of information flow, in which signaling molecules become chemically, physically, or locationally modified; gain new functional capabilities; and affect subsequent molecules in the cascade, culminating in a phenotypic cellular response. Mapping of signaling pathways typically has involved intuitive inferences arising from the aggregation of studies of individual pathway components from diverse experimental systems. Although pathways are often conceptualized as distinct entities responding to specific triggers, it is now understood that interpathway cross-talk and other properties of networks reflect underlying complexities that cannot be explained by the consideration of individual pathways or model systems in isolation. To properly understand normal cellular responses and their potential disregulation in disease, a global multivariate approach is required (1). Bayesian networks (2), a form of graphical models, have been proffered as a promising framework for modeling complex systems such as cell signaling cascades, because they can represent probabilistic dependence relationships among multiple interacting components (35). Bayesian network models illustrate the effects of pathway components on each other (that is, the dependence of each biomolecule in the pathway on other biomolecules) in the form of an influence diagram. These models can be automatically derived from experimental data through a statistically founded computational procedure termed network inference. Although the relationships are statistical in nature, they can sometimes be interpreted as causal influence connections when interventional data are used; for example, with the use of kinase-specific inhibitors (6, 7).

There are several attractive properties of Bayesian networks for the inference of signaling pathways from biological data sets. Bayesian networks can represent complex stochastic nonlinear relationships among multiple interacting molecules, and their probabilistic nature can accommodate noise that is inherent to biologically derived data. They can describe direct molecular interactions as well as indirect influences that proceed through additional unobserved components, a property crucial for discovering previously unknown effects and unknown components. Therefore, very complex relationships that likely exist in signaling pathway architectures can be modeled and discovered. The Bayesian network inference algorithm constructs a graph diagram in which nodes represent the measured molecules, and arcs (drawn as lines between nodes) represent statistically meaningful relations and dependencies between these molecules. When inferring a Bayesian network from experimental data, the network inference algorithm aims to discern a model that closely predicts the observations made. The algorithm approximates the most likely models by traversing the space of possibilities via single-arc changes that improve the score. There is a trade-off between simple models and those that accurately capture the empirical distribution observed in the data. The employed Bayesian scoring metric captures this trade-off; thus, a high-scoring model is a both simple and accurate representation of the data (8, 9). Bayesian networks have been applied to gene expression data for the study and discovery of genetic regulatory pathways (4, 6, 10). However, because of the probabilistic nature of the Bayesian modeling approach, effective inference requires many observations of the system. Thus, such studies have often been limited by data sets of insufficient size; for instance, those made up of measurements based on averaged samples derived from heterogeneous cell populations (a necessary limitation when using lysates from large numbers of cells) (5, 11).

In contrast to lysate-based methods, intracellular multicolor flow cytometry (12, 13) allows more quantitative simultaneous observations of multiple signaling molecules in many thousands of individual cells. Hence, it is an especially appropriate source of data for Bayesian network modeling of signaling pathways; for instance, because it allows for simultaneous measurement of biological states in more native contexts, as well as for large sample sets. Flow cytometry can be used to quantitatively measure a given protein's expression level and can also include measures of protein-modification states such as phosphorylation (1315). Because each cell is treated as an independent observation, flow cytometric data provide a statistically large sample that could enable Bayesian network inference to accurately predict pathway structure (Fig. 1A). As demonstrated in this article, interrogating signaling networks in populations of single cells provides a robust source of statistically powerful dependencies that can be used to automatically infer signaling causality using Bayesian network computation.

Fig. 1.

Bayesian network modeling with single-cell data. (A) Schematic of Bayesian network inference using multidimensional flow cytometry data. Nine different perturbation conditions were applied to sets of individual cells (Table 1). A multiparameter flow cytometer simultaneously recorded levels of 11 phosphoproteins and phospholipids in individual cells in each perturbation data set (Table 2). This data conglomerate was subjected to Bayesian network analysis, which extracts an influence diagram reflecting dependencies and causal relationships in the underlying signaling network. (B) Bayesian networks for hypothetical proteins X, Y, Z, and W. (a) In this model, X influences Y, which, in turn, influences both Z and W. (b) The same network as (a), except that Y was not measured in the data set. (C) Simulated data that could reconstruct the influence connections in (B) (this is a simplified demonstration of how Bayesian networks operate). Each dot in the scatter plots represents the amount of two phosphorylated proteins in an individual cell. (a) Scatter plot of simulated measurements of phosphorylated X and Y shows correlation. (b) Interventional data determine directionality of influence. X and Y are correlated under no manipulation (blue dots). Inhibition of X affects Y (yellow dots) and inhibition of Y does not affect X (red dots). Together, this indicates that X is consistent with being an upstream parent node. (c) Simulated measurements of Y and Z. (d) A noisy but distinct correlation is observed between simulated measurements of X and Z.

Modeling Bayesian networks with multivariable individual-cell data.Fig. 1B (panel a) presents a sample Bayesian network representing four hypothetical biomolecules. A directed arc from X to Y is interpreted as a causal influence from X onto Y; in this case, we say X is Y's “parent” in the network. In the case that X activates Y, where activation can be read out by phosphorylation status, we expect and observe correlation in levels of phosphorylation as measured by flow cytometry (simulated data in Fig. 1C, panel a). Critical to causal interpretation of Bayesian network models is the inclusion of interventional cues (whether activating or inhibiting) that directly perturb the states of the measured molecules (Fig. 1C, panel b) and strengthen inference directionality. For instance, inhibition of molecule X might lead to inhibition of both X and Y, whereas inhibition of molecule Y leads only to inhibition of Y. Thus, we would infer X to be upstream of Y as shown in Fig. 1B, panel a. Moreover, because flow cytometry can measure multiple molecules within each cell, it is possible to identify complex causal influence relationships involving multiple proteins. Consider the signaling cascade from X onto Y onto Z (Fig. 1B, panel a), where correlation exists between the measured activities of each pair, including between X and Z (Fig. 1C, panel d). Bayesian network inference yields the most concise model, automatically excluding arcs basedondependenciesalready explained by the model. Thus, despite the correlation between them, the arc between X and Z is omitted, because the X-Y and the Y-Z relationships explain the X-Z correlation. Similarly, because Z and W are both activated by their common cause Y, we expect their activities to be correlated, but no arc appears between them because their respective arcs from Y mediate this dependency. Finally, consider a scenario in which molecule Y was not measured. The statistical correlation between the observed activities of X and Z does not depend on observing Y; therefore, their correlation would still be detected. An indirect arc would be detected from X onto Z (Fig. 1B, panel b).

Expanding this concept to a real data set, we applied Bayesian network analysis to multivariate flow cytometry data. Data were collected after a series of stimulatory cues and inhibitory interventions (Table 1), with cell reactions stopped at 15 min after stimulation by fixation, to profile the effects of each condition on the intracellular signaling networks of human primary naïve CD4+ T cells, downstream of CD3, CD28, and LFA-1 activation (Fig. 2 shows a currently accepted consensus network). We made flow cytometry measurements of 11 phosphorylated proteins and phospholipids [Raf phosphorylated at position S259, mitogen-activated protein kinases (MAPKs) Erk1 and Erk2 phosphorylated at T202 and Y204, p38 MAPK phosphorylated at T180 and Y182, Jnk phosphorylated at T183 and Y185, AKT phosphorylated at S473, Mek1 and Mek2 phosphorylated at S217 and S221 (both isoforms of the protein are recognized by the same antibody), phosphorylation of protein kinase A (PKA) substrates [cAMP response element–binding protein (CREB), PKA, calcium/calmodulin-dependent protein kinase II (CaMKII), caspase-10, and caspase-2] containing a consensus phosphorylation motif, phosphorylation of phospholipase C–γ (PLC-γ) on Y783, phosphorylation of PKC on S660, phosphatidylinositol 4,5-bisphosphate (PIP2), and phosphatidylinositol 3,4,5-triphosphate (PIP3)] (Table 2) (8, 16). Each independent sample in this data set consists of quantitative amounts of each of the 11 phosphorylated molecules, simultaneously measured from single cells [data sets are downloadable (8)]. For purposes of illustration, examples of actual fluorescence-activated cell sorter (FACS) data plotted in prospective corelationship form are shown in fig. S1. In most cases, this reflects the activation state of the kinases monitored, or in the cases of PIP3 and PIP2, the levels of these secondary messenger molecules in primary cells, under the condition measured. Nine stimulatory or inhibitory interventional conditions were used (Table 1) (8). The complete data sets were analyzed with the Bayesian network structure inference algorithm (6, 9, 17).

Fig. 2.

Classic signaling network and points of intervention. This is a graphical illustration of the conventionally accepted signaling molecule interactions, the events measured, and the points of intervention by small-molecule inhibitors. Signaling nodes in color were measured directly. Signaling nodes in gray were not measured, but are presented to place the signaling nodes that were measured within contextual cellular pathways. The interventions classified as activators are colored green and inhibitors are colored red. Intervention site of action is indicated in the figure. Arcs are used to illustrate connections between signaling molecules; in some cases, the connections may be indirect and may involve specific phosphorylation sites of the signaling molecules (see Table 3 for details of these connections). This figure contains a synopsis of signaling in mammalian cells and is not representative of all cell types, with inositol signaling corelationships being particularly complex.

Table 1.

Known biological effects of perturbations employed. The left-hand column lists the specific reagents used in each perturbation condition, and the right-hand column classifies the reagent class into either a general perturbation that overall stimulated the cell or a specific perturbation that acted on a defined set of molecules. The conditions used in the study were as follows: (i) anti-CD3 + anti-CD28, (ii) anti-CD3/CD28 + ICAM-2 (intercellular adhesion molecule–2), (iii) anti-CD3/CD28 + U0126, (iv) anti-CD3/CD28 + AKT inhibitor, (v) anti-CD3/CD28 + G06976, (vi) anti-CD3/CD28 + psitectorigenin, (vii) anti-CD3/CD28 + LY294002, (viii) phorbol 12-myristate 13-acetate (PMA), and (ix) β2 cyclic adenosine 3′,5′-monophosphate (β2cAMP).

Reagent Reagent class
Anti-CD3/CD28 General perturbation: Activates T cells and induces proliferation and cytokine production. Induced signaling through the T cell receptor (TCR), activated ZAP70, Lck, PLC-γ, Raf, Mek, Erk, and PKC. The TCR signaling converges on transcription factors NFκB, NFAT, and AP-1 to initiate IL-2 transcription.
ICAM-2 General perturbation: Induces LFA-1 signaling and contributes to CD3/CD28 signaling that converges on AP-1 and NFAT transcriptional activity.
β2cAMP Specific perturbation: cAMP analog that activates PKA. PKA can regulate NFAT activation and T cell commitment processes.
AKT inhibitor Specific perturbation: Binds inositol pleckstrin domain of AKT and blocks AKT translocation to the membrane where normally AKT becomes phosphorylated and active [median inhibitory concentration (IC50) = 5 mM]. Inhibition of AKT and phosphorylation of AKT substrates are needed to enhance cell survival.
U0126 Specific perturbation: Inhibits MEK1 (IC50 = 72 nm) and MEK2 (IC50 = 58 nm) in a noncompetitive manner (ATP and Erk substrates). Inhibits activation of Erk, arresting T cell proliferation and cytokine synthesis.
PMA Specific perturbation: PMA activates PKC and initiates some aspects of T cell activation.
G06976 Specific perturbation: Inhibits PKC isozymes (IC50 < 8 nM). Inhibits PKC and arrests T cell activation.
Psitectorigenin Specific perturbation: Inhibits phosphoinositide hydrolysis. Inhibits PIP2 production and disrupts phosphoinositol turnover.
LY294002 Specific perturbation: Phosphatidylinosital 3-kinase (PI3K inhibitor. Inhibits PI3K and subsequent activation of AKT.
Table 2.

Nodes measured in pathway and specificity antibodies used. The left-hand column shows target molecules measured in this study that were assayed using monoclonal antibody to the target residues (site of phosphorylation or phosphorylated product as described) (16).

Measured molecule Antibody specificity
Raf Phosphorylation at S259
Erk1 and Erk2 Phosphorylation at T202 and Y204
p38 Phosphorylation at T180 and Y182
Jnk Phosphorylation at T183 and Y185
AKT Phosphorylation at S473
Mek1 and Mek2 Phosphorylation at S217 and S221
PKA substrates Detects proteins and peptides containing a phospho-Ser/Thr residue with arginine at the -3 position
PKC Detects phosphorylated PKC-α, -βI, -βII, -δ, -ϵ, -η, and -θ isoforms only at C-terminal residue homologous to S660 of PKC-βII
PLC-γ Phosphorylation at Y783
PIP2 Detects PIP2
PIP3 Detects PIP3

A high-accuracy human primary T cell signaling causality map. The resulting de novo causal network model was inferred (Fig. 3A) with 17 high-confidence causal arcs between various components. To evaluate the validity of this model, we compared the model arcs (and absent potential arcs) with those described in the literature. Arcs were categorized as the following: (i) expected, for connections well-established in the literature that have been demonstrated under numerous conditions in multiple model systems; (ii) reported, for connections that are not well known, but for which we were able to find at least one literature citation; and (iii) missing, which indicates an expected connection that our Bayesian network analysis failed to find. Of the 17 arcs in our model, 15 were expected, all 17 were either expected or reported, and 3 were missed (Fig. 3A and table S1) (8, 1822). Table 3 enumerates the probable paths of influence corresponding to model arcs determined by surveying published reports.

Fig. 3.

Bayesian network inference results. (A) Network inferred from flow cytometry data represents expected outcomes. This network represents a model average from 500 high-scoring results. High-confidence arcs, appearing in at least 85% of the networks, are shown. For clarity, the names of the molecules are used to represent the measured phosphorylation sites (Table 2). (B) Inferred network demonstrates several features of Bayesian networks. (a) Arcs in the network may correspond to direct events or (b) indirect influences. (c) When intermediate molecules are measured in the data set, indirect influences rarely appear as an additional arc. No additional arc is added between Raf and Erk because the dependence between Raf and Erk is dismissed by the connection between Raf and Mek, and between Mek and Erk (for instance, see Fig. 1C). (d) Connections in the model contain phosphorylation site–specificity information. Because Raf phosphorylation on S497 and S499 was not measured in our data set, the connection between PKC and the measured Raf phosphorylation site (S259) is indirect, likely proceeding via Ras. The connection between PKC and the undetected Raf phosphorylation on S497 and S499 is seen as an arc between PKC and Mek.

Table 3.

Possible molecular pathways of influence represented by arcs in the model. Shown are the possible pathways of influence inferred from the data, with the connection shown in Fig. 3A and the unmeasured molecules (in bold) that might mediate indirect influences. E, expected; R, reported. See main text for further discussion. Specific phosphorylation sites are included as subscripts. See table S1 for citations that support the inferences.

Connection Influence path Type Category
PKC→Raf PKC→Ras→RafS259 Indirect E
PKC→Mek PKC→RafS497/S499→Mek Indirect E
PKC→Jnk PKC→→MKKs→Jnk Indirect E
PKC→p38 PKC→→MKKs→p38 Indirect E
PKA→Raf PKA→RafS259 Direct E
PKA→Mek PKA→RafS621→Mek Indirect E
PKA→Erk PKA→HePTP→Erk Indirect E
PKA→Jnk PKA→→MKKs→Jnk Indirect E
PKA→p38 PKA→→MKKs→p38 Indirect E
Raf→Mek Direct phosphorylation Direct E
PKA→Akt PKA→CaMKK→AktT308→AktS473 Indirect E
Mek→Erk Direct phosphorylation Direct E
Plc-γ→PIP2 Direct hydrolysis to IP3 Direct E
Plc-γ→PIP3 Recruitment leading to phosphorylation Reversed E
PIP3→PIP2 Precursor-product E
Erk→Akt Direct or indirect R

Several of the known connections from our model are direct enzyme-substrate relationships (Fig. 3B) (PKA to Raf, Raf to Mek, Mek to Erk, and Plc-γ to PIP2), and one has a relationship of recruitment leading to phosphorylation (Plc-γ to PIP3). In almost all cases, the direction of causal influence was correctly inferred (an exception was Plc-γ to PIP3, in which case the arc was inferred in the reverse direction). All the influences are contained within one global model; thus, the causal direction of arcs is often compelled so that these are consistent with other components in the model. These global constraints allowed detection of certain causal influences from molecules that were not perturbed in our assay. For instance, although Raf was not perturbed in any of the measured conditions, the method correctly inferred a directed arc from Raf to Mek, which was expected for the well-characterized Raf-Mek-Erk signal transduction pathway. In some cases, the influence of one molecule on another was mediated by intermediate molecules that were not measured in the data set. In the results, these indirect connections were detected as well (Fig. 3B, panel b). For example, the influence of PKA and PKC on the MAPKs p38 and Jnk likely proceeded via their respective (unmeasured) MAPK kinase kinases. Thus, unlike some other approaches used to elucidate signaling networks [for example, protein-protein interaction maps (23, 24)] that provide static biochemical association maps with no causal links, our Bayesian network method can detect both direct and indirect causal connections and therefore provide a more contextual picture of the signaling network.

Another feature demonstrated in our model is the ability to dismiss connections that are already explained by other network arcs (Fig. 3B, panel c). This is seen in the Raf-Mek-Erk cascade. Erk, also known as p44/42, is downstream of Raf and therefore dependent on Raf, yet no arc appears from Raf to Erk, because the connection from Raf to Mek and the connection from Mek to Erk explain the dependence of Erk on Raf. Thus, an indirect arc should appear only when one or more intermediate molecules is not present in the data set, otherwise the connection will proceed via this molecule. The intervening molecule may also be a shared parent. For example, the phosphorylation statuses of p38 and Jnk are correlated (fig. S2), yet they are not directly connected, because their shared parents (PKC and PKA) mediate the dependence between them. Although we cannot know whether an arc in our model represents a direct or indirect influence, it is unlikely that our model contains an indirect arc that is mediated by any molecule observed in our measurements. Correlation exists between most molecule pairs in this data set [per Bonferroni corrected P value (fig. S2)], which can occur with closely connected pathways. Therefore, the relative lack of arcs in our model (Fig. 3A) contributed greatly to the accuracy and interpretability of the inferred model.

A more complex example is the influence of PKC on Mek, which is known to be mediated by Raf (Fig. 3B, panel d). PKC is known to affect Mek through two paths of influence, each mediated by a different active phosphorylated form of the protein Raf. Although PKC phosphorylates Raf directly at S499 and S497, this event is not detected by our measurements, because we use only an antibody specific to Raf phosphorylation at S259 (Table 2) (16). Therefore, our algorithm detects an indirect arc from PKC to Mek that is mediated by the presumed unmeasured intermediate Raf phosphorylated at S497 and S499 (18). The PKC-to-Raf arc represents an indirect influence that proceeds via an unmeasured molecule, presumed to be Ras (19, 20). We discussed above the ability of our approach to dismiss redundant arcs. In this case, there are two paths leading from PKC to Mek, because each path corresponds to a separate means of influence from PKC to Mek: one via Raf phosphorylated at S259 and the other through Raf phosphorylated at S497 and S499. Thus, neither path is redundant. This result demonstrates the distinction that this analysis is sensitive to specific phosphorylation sites on molecules and is capable of detecting more than one route of influence between molecules.

Three well-established influence connections do not appear in our model: PIP2 to PKC, PLC-γ to PKC, and PIP3 to Akt. Bayesian networks are constrained to be acyclic, so if the underlying network contains feedback loops, we cannot necessarily expect to uncover all connections (fig. S3). Availability of suitable temporal data could possibly permit this limitation to be overcome using dynamic Bayesian networks (25, 26).

Experimental confirmation of predicted network causality. Two influence connections in our model are not well established in the literature: PKC on PKA and Erk on Akt. To probe the validity of these proposed causal influences, we searched for reports in the literature. Both connections have previously been reported: the PKC-to-PKA connection in rat ventricular myocytes and the Erk-to-Akt connection in colon cancer cell lines (21, 22). An important goal of our work was to test the ability of Bayesian network analysis of flow cytometry data to correctly infer causal influences from unperturbed molecules within a network. For example, Erk was not directly acted on by any activator or inhibitor in the sample sets, yet Erk showed an influence connection to Akt. Our model thus predicts that direct perturbation of Erk would influence Akt (Fig. 4A). On the other hand, although Erk and PKA are correlated (fig. S2), the model predicts that perturbation of Erk should not influence PKA.

Fig. 4.

Validation of model prediction. (A) The model predicts that an intervention on Erk will affect Akt, but not PKA. (B) To test the predicted relationships, Erk1 and Erk2 were inhibited using siRNA in cells stimulated with antibody to CD3 (anti-CD3) and anti-CD28. (C) Amounts of Akt phosphorylation in transfected CD4+ cells [enhanced green fluorescent protein (EGFP+) cells] were assessed, and amounts of phosphorylated PKA are included as a negative control. When Erk1expression is inhibited, phosphorylated Akt is reduced to amounts similar to those in unstimulated cells, confirming our prediction (P = 0.000094). PKA is unaffected (P = 0.28).

As a test of these predictions (Fig. 3A), we used small interfering RNA (siRNA) inhibition of either Erk1 or Erk2, and the amounts of S473-phosphorylated Akt and phosphorylated PKA were then measured. In accord with the model predictions, Akt (P < 9.4 × 10–5) phosphorylation was reduced after siRNA inhibition of Erk1 but the activity of PKA (P < 0.28) was not (Fig. 3, B and C). Akt phosphorylation was not affected by the inhibition of Erk2. The connection between Erk1 and Akt may be direct or indirect, involving mediatory molecules yet to be understood, but the connection is supported by both the model and the validation experiment.

Enablers of accurate inference: network interventions and sufficient numbers of single cells. Three features distinguish our data from the majority of currently attainable biological data sets. First, we simultaneously measured multiple protein states in individual cells, eliminating population-averaging effects that could obscure interesting correlations. Second, because the measurements were on single cells, thousands of data points were collected in each experiment. This feature constitutes a tremendous asset for Bayesian network modeling, because the large number of observations allows for accurate assessment of underlying probabilistic relationships, and therefore allows for the extraction of complex relationships from noisy data. Third, interventional assays generated hundreds of individual data points per intervention (because flow cytometry measures single cells in population), allowing for an increase in inferences of causality. To evaluate the importance of these features, we created the following variations on our original data set: (i) an observation-only data set (that is, without any interventional data) of 1200 data points; (ii) a population-averaged (that is, a simulated Western blot) data set; and (iii) a truncated individual-cell data set of size comparable to the simulated Western blot data set (that is, the original data set with most of the data randomly excluded to reduce its size) (8).

Bayesian network inference was performed on each set of data. The network inferred from 1200 observational data points included only 10 arcs, all undirected, of which 8 were expected or reported, and 10 arcs were missing (fig. S4A). This result demonstrates that interventions are critical for effective inference, particularly to establish directionality of the connections (Fig. 1B). The truncated single-cell data set (420 data points) shows a large (11-arc) decline in accuracy, missing more connections and reporting more unexplained arcs than its larger (5400 data points) counterpart (fig. S4B). This result emphasizes the importance of sufficiently large data set size in network inference. The network inferred from averaged data (fig. S4C) shows a further five-arc decline in accuracy relative to that inferred from an equal number of single-cell data points, emphasizing the importance of single-cell data. The fact that population averaging destroys some of the signals present in the data may reflect the presence of heterogeneous cellular subsets that are masked by averaging techniques.

Discussion and summary. As shown, we correctly reverse-engineered and rapidly inferred the basic structure of a classically understood signaling network that connects a number of key phosphorylated proteins in human T cell signaling, a map built by classical biochemistry and genetic analysis over the past two decades. The network was automatically constructed with no a priori knowledge of pathway connectivity. The application of Bayesian networks to single-cell flow cytometry has distinct advantages, including an ability to measure events in primary cells after in vivo interventions (thus measuring context-specific signaling biology in tissues), inference of directed arcs and causality therein, and the ability to detect indirect as well as direct connections. This latter point is a powerful feature when the known list of participating molecules may not be exhaustive, and can be especially important when networks are used to assess the effects of system perturbations (as in a pharmaceutical context). A limiting step in the experiment is the availability of suitable reagents; currently, there are about 80 antibodies to phosphorylated molecules that are compatible with flow cytometry, but this number is expected to rapidly increase (27, 28).

Application of this approach to other sets of molecules, cell types, disease states, and interventions (for example, siRNA and dominant negative screens, or pharmaceutical agents) should enhance our understanding of signaling networks, especially with respect to complex nonlinear cross-talk between pathways. Another important experimental issue that this approach can address is the differences among specific primary cell types and cell subpopulations. The traditional understanding of pathway structures as collated from diverse model cell types and organisms demonstrates the essential congruity of basic signaling networks, but does not easily reveal the subtle differences that exist in different primary cell subtypes. It is now possible to appreciate pathway intricacies in primary cell subsets, including those with previously uncharacterized signaling molecules. The application of this approach during biochemical interrogation of cellular subset–specific signaling networks in the course of a disease state or in the presence of pharmaceutical agents can potentially provide important mechanistic information of clinical relevance. For example, this method could identify sets of signaling molecules that explain differences between responses to chemotherapy in patients with cancer (15).

Concerning the computational aspect, a key advantage of Bayesian networks is that they are relatively robust to the existence of unobserved variables; for example, their ability to detect indirect influences via unmeasured molecules. At the forefront of Bayesian network research is the development of methods to automatically infer the existence and location of such hidden variables. Although our results were restricted to 11 phosphomolecular measurements per cell, the number of simultaneous parameters measured by flow cytometry is steadily growing (27, 28). As measurement systems improve, and the ability to readily and accurately measure greater numbers of internal signaling events increases, additional opportunities to discover novel influences and pathway structures become possible.

One of the caveats in the use of Bayesian networks for the elucidation of signaling pathways is that they are restricted to be acyclic, whereas signaling pathways are known to be rich in feedback loops. Indeed, our inference missed three classic arcs, most likely for this reason. Given time series data, dynamic Bayesian networks could potentially capture these feedback loops. To measure the amounts of internal phosphorylated proteins, the cells must be fixed. Therefore, continuous, real-time, simultaneous, multiparameter, single-cell time-series data cannot be collected with the current technology. Because Bayesian networks belong to a more general class of probabilistic graphical models, within the formalism of these models it is possible to develop a model that could handle feedback loops, given a series of static time points using the current technology.

Although there is much to be developed both computationally and experimentally, by extending the concepts derived here it is clear that simultaneous multivariate analysis of biological states in multiple discrete entities, such as cells, offers a useful approach for rapidly deriving signaling network hierarchies and structures. Extension of this approach to biological systems involving multiple cell populations, such as solid tissues and organs, or whole-animal studies such as in whole-body fluorescence imaging of phosphorylation states in staged Caenorhabditis elegans or Drosophila larva, or thin-slice tissue sections from mammalian organs, could allow automated construction of signaling network influences not only within but also across cell boundaries in an increasing number of physiological contexts.

Supporting Online Material

Materials and Methods

Figs. S1 to S4

Table S1

Data Sets S1 to S14


References and Notes

View Abstract

Navigate This Article