PerspectiveSystems Biology

Systems biology (un)certainties

See allHide authors and affiliations

Science  23 Oct 2015:
Vol. 350, Issue 6259, pp. 386-388
DOI: 10.1126/science.aac9505

Systems biology, some have claimed (1), attempts the impossible and is doomed to fail. Possible definitions abound, but systems biology is widely understood to be an approach for studying the behavior of systems of interacting biological components that combines experiments with computational and mathematical reasoning. Modeling complex systems occurs throughout the sciences, so it may not be immediately clear why it should attract greater skepticism in molecular and cell biology than in other scientific disciplines. The way in which biological models are often presented and interpreted (and overinterpreted) may be partly to blame. As with experimental results, the key to successfully reporting a mathematical model is to provide an honest appraisal and representation of uncertainty in the model's predictions, parameters, and (where appropriate) in the structure of the model itself.

Deriving mathematical models in biology is rarely straightforward. Although biology is, of course, subject to the same fundamental physical laws—for example, conservation of mass, energy, and momentum—as the other sciences, these laws often do not provide a good starting point for understanding how biological organisms and systems work. Biological modeling is an example of the so-called inverse problem (1), and instead emphasizes contextspecific levels of abstraction and relies upon experimental observations to decide if a particular model is useful.

Typically, not all terms or parameters in a biological model are known or observable directly, and—except for some highly specific systems—the abundances of all the key players (molecules, cells, or individuals) cannot be measured simultaneously and continuously. Thus, despite being often overlooked, the challenge is not only to identify suitable mathematical descriptions of biological systems (such as gene regulatory networks) and mathematical representations of such systems that provide mechanistic insight, but also to communicate the inevitable uncertainty in the model's (possibly many) unknowns. Although substituting these unknowns for estimates obtained from exogenous experimental assays that are related to the biological system in question has been advocated (2), this is usually fraught with problems: It is often impossible to perform all of the necessary experiments to measure missing parameters directly; frequently these measurements are themselves subject to considerable uncertainty, or are only possible to make under very different conditions; and the approach misses an opportunity to extract this information from the original, and potentially most relevant, experimental data. In this context, Bayesian inference procedures (3), which naturally permit the integration of external prior knowledge or beliefs with newly observed data, may provide the most natural framework for expressing and reporting uncertainty, and for this reason have become increasingly popular in systems biology. For example, a recent application provides insight into a debated mechanism for the way in which organisms may ameliorate dangerous damage to mitochondrial DNA (3).

The number of unknowns in a model is partly determined by the scale of the system being studied. A biological system may range in scale from a few interacting molecules to whole populations of organisms, and this can have a huge impact on both the modeling approach and the associated assessment of uncertainty (see the figure). For small systems, it might be possible to rely solely on strong prior knowledge and specify a model structure that reflects known interactions. For larger systems, automated “network inference” algorithms have been employed (4, 5). These aim to identify statistical dependencies, such as those between messenger RNA expression measurements, to highlight potential (co)regulatory interactions. But here, too, information about known transcription factor targets can be used as as prior information to guide network inference. These data-driven, hypothesis-generating approaches for creating a biological model from data alone are often opposed, because as the biological systems in question become larger and more complex, it becomes increasingly impossible to learn the “true” and complete underlying network (1). However, this opposition seems to be predicated upon misconceptions of the aims of large-scale network inference, which are typically to highlight and explore dependencies in data sets—and thereby help to generate new hypotheses worthy of further investigation—rather than to uncover a single grand unifying model of the system. Unfortunately, not all network inference algorithms are equally effective, and the most popular approach—using the correlation among expression levels of genes (6)—results in a particularly poor basis on which to conduct further studies, let alone mechanistic models. Several alternative network inference approaches exist (4, 5) that provide better, more robust candidate networks, and can incorporate expert or domain knowledge. These methods may be used to investigate gene-regulatory relationships, and how interactions between genes change over time or differ between disease cases and healthy controls.

Abstraction and simplification.

Mathematical models usually represent elements of a biological system at one scale. Choosing the appropriate degree of abstraction and simplification can be influenced by current knowledge about the system, the quality and quantity of experimental data, the computational demands of a particular modeling approach, and the modeling aims. For example, when studying a signaling pathway, the scope of the model may be restricted to a single pathway or it might include the influence of parts of a wider interconnected network of signaling pathways.


Although large-scale grand unifying biological models have occasionally been sought (for example, whole-cell models, which are designed to be comprehensive models of cells that are expressed in terms of each cell's molecular components) (7), these efforts remain the exception and only exist in early draft forms. Many challenges remain for models of such scope, including how to validate their quality and adequately report the uncertainty in both their overall global structure and their implied submodels. Whatever the scale of the system and data set, many different models are likely to provide plausible fits, while still remaining consistent with current knowledge. Even for a small-scale (five-gene) regulatory network, it can be possible to find tens of thousands of models that provide qualitatively perfect fits even to dense, low-noise data sets, but that yield a variety of (often contradictory) insights into the regulatory relationships between genes (8). Assessing robustness of predictions and inferences across multiple alternative models can therefore be illuminating (810), provided the conclusions are still understood to be dependent on the particular set of models that are specified, and influenced by the experiments that are chosen (11, 12).

Models should not be expected to work in every conceivable context, and the most exciting results are frequently those that break existing models. Model uncertainty, as characterized by the ability to identify multiple models that explain current observations, therefore calls for the design of maximally discriminative experiments that will break as many models as possible (12, 13). For example, dose-response curves typically provide too little discriminatory power, whereas carefully designed time-resolved analyses allow the study of even complicated models, such as proteasomal dynamics (14).

Many of the criticisms of model development in systems biology stem from a lack of appreciation of the variety of roles that can be played by mathematical modeling (1). This is partly driven by overstating implications and consequences of models, perhaps due to poor understanding of the power of modern statistical approaches and machine learning. Overstating can also be attributed to a lack of understanding the value of reporting uncertainties. Frequent fallacies and bad practices continue to thrive, including the use of correlation to capture causal relationships, failure to address the multiple comparison problem (for example, when testing for the presence of potential regulatory interaction among all pairs of genes, one cannot apply standard hypothesis testing, which would result in excessive numbers of false-positives; to prevent this, the threshold for significance must be adjusted to account for the large number of tests carried out), and a lack of error bars or more general confidence sets/intervals (8) for parameters and models.

Models are simplified (but not simplistic) representations of real systems, and this is precisely the property that makes them attractive to explore the consequences of our assumptions, and to identify where we lack understanding of the principles governing a biological system. Models are tools to uncover mechanisms that cannot be directly observed, akin to microscopes or nuclear magnetic resonance machines (15). Used and interpreted appropriately, with due attention paid to inherent uncertainties, the mathematical and computational modeling of biological systems allows the exploration of hypotheses. But the relevance of these models depends on the ability to assess, communicate, and, ultimately, understand their uncertainties.

  • * All authors contributed equally to this work.


Navigate This Article