Special Reviews

Reverse Engineering of Biological Complexity

See allHide authors and affiliations

Science  01 Mar 2002:
Vol. 295, Issue 5560, pp. 1664-1669
DOI: 10.1126/science.1069981


Advanced technologies and biology have extremely different physical implementations, but they are far more alike in systems-level organization than is widely appreciated. Convergent evolution in both domains produces modular architectures that are composed of elaborate hierarchies of protocols and layers of feedback regulation, are driven by demand for robustness to uncertain environments, and use often imprecise components. This complexity may be largely hidden in idealized laboratory settings and in normal operation, becoming conspicuous only when contributing to rare cascading failures. These puzzling and paradoxical features are neither accidental nor artificial, but derive from a deep and necessary interplay between complexity and robustness, modularity, feedback, and fragility. This review describes insights from engineering theory and practice that can shed some light on biological complexity.

The theory and practice of complex engineering systems have progressed so radically that they often embody Arthur C. Clarke's dictum, “Any sufficiently advanced technology is indistinguishable from magic.” Systems-level approaches in biology have a long history (1, 2) but are just now receiving renewed mainstream attention (3–13), whereas systems-level design has consistently been at the core of modern engineering, motivating its most sophisticated theories in controls, information, and computation. The hidden nature of complexity (“magic”) and discipline fragmentation within engineering have been barriers to a dialog with biology. A key starting point in developing a conceptual and theoretical bridge to biology is robustness, the preservation of particular characteristics despite uncertainty in components or the environment (14).

Biologists and biophysicists new to studying complex networks often express surprise at a biological network's apparent robustness (15). They find that “perfect adaptation” and homeostatic regulation are robust properties of networks (16,17), despite “exploratory mechanisms” that can seem gratuitously uncertain (18–20). Some even conclude that these mechanisms and their resulting features seem absent in engineering (20, 21). However, ironically, it is in the nature of their robustness and complexity that biology and advanced engineering are most alike (22). Good design in both cases (e.g., cells and bodies, cars and airplanes) means that users are largely unaware of hidden complexities, except through system failures. Furthermore, the robustness and fragility features of complex systems are both shared and necessary. Although the need for universal principles of complexity and corresponding mathematical tools is widely recognized (23), sharp differences arise as to what is fundamental about complexity and what mathematics is needed (24). This article sketches one possible view, using experience and theoretical insights from engineering complexity that are relevant to biology. We hope to dispel some common misconceptions and to renew a dialog between engineering theorists and their biologist and clinician colleagues.

Complexity, Optimality, and Convergent Evolution

The differences between biology and technology (and between organisms) are obvious, particularly at the molecular and device level. Nevertheless, convergent evolution, a well-established concept in both engineering and evolutionary biology, yields remarkable similarities at higher levels of organization. Recently, engineering systems have begun to have almost biological levels of complexity. For example, a Boeing 777 is fully “fly-by-wire” with 150,000 different subsystem modules, organized via elaborate protocols into complex control systems and networks, including roughly 1000 computers that can automate all vehicle functions. In terms of cost and complexity, the 777 is essentially a vast control system and computer network that just happens to fly. The consequence of good design is that its regulatory complexity is hidden from passengers (except when they use entertainment systems). The internal activity level is staggering, however (e.g., the data rate recorded on the internal state during final production testing is nearly equivalent to one human genome every minute). Commercial aircraft are not the only systems undergoing such explosions in complexity as a result of advanced controls and embedded networking; virtually all technologies are evolving similarly (25). We claim that this technological evolution of complexity is convergent with that of biology.

A striking example of convergent evolution is Fig. 1, comparing cruise speed to massM over 12 orders of magnitude, from the 747 and 777 to fruit flies (26). The essential assumption in allometric scaling theory is that convergent evolution leads to nearly optimal systems with similar gross characteristics. It follows that simple arguments based on optimal design can explain functional relations between variables across many scales (27, 28). Here, a well-known elementary argument (29) shows good correspondence with the data and yields explanations for deviations. The popular allometric scaling theories (connecting, say, efficiency and geometry) are appealing: They are simple, accessible, suggestive evidence confirming convergent evolution and engineering optimality. Such theories are largely irrelevant to complexity directly, but an understanding of them leads to what is relevant. The scaling theory described by Fig. 1 does not distinguish between flight in the atmosphere and in a laboratory wind tunnel. In the latter context, a much simpler “mutant” 777 with nearly all of its 150,000-count “aeronome” knocked out would have roughly the same lift, mass, and cruise speed, and thus (from an allometric scaling viewpoint) would exhibit no deleterious laboratory “phenotype.” Redundancy does not explain this finding (30). Rather, the mutant has lost control systems and robustness required for real flight outside the lab. Allometric scaling emphasizes the essential similarities between these 777 variants and a toy scale model (and a fruit fly), whereas our interest is their huge differences in complexity. Similarly, minimal cellular life requires a few hundred genes (31), yet evenEscherichia coli have ∼4000 genes, less than 300 of which have been classified as “essential” (32). The likely reason for this “excess” complexity is also the presence of complex regulatory networks for robustness. In technology as well as in organisms, such robustness tradeoffs drive the evolution of spiraling complexity.

Figure 1

Optimal cruise speed at sea level versus mass (log-log) for organisms and airplanes. Line is theoretical prediction (12) with V = cM αand α = 1/6 (29). Shorter wings for speed and maneuverability (triangles) yield higher cruise speeds than those optimized for soaring (diamonds). Most systems (circles) are compromises. Humans are not selected for powering flight and are far from optimal (square). Data and theory are from (26).

As an example of spiraling complexity to battle fragility, consider our use of O2 as a nutrient (electron acceptor), which obligates us to use complex feedback control mechanisms to ensure both sufficient O2 and protection from O2toxicity. Distributed, multiscaled networks maintain precise internal, local O2 concentrations throughout the body, both acutely and chronically. Dependency on such regulation makes its failure lethal, of course, but an additional fragility created by this exquisitely controlled environment is that it creates an attractive ecosystem for parasites, whose systems can thus be more streamlined. Host robustness to parasites then requires a separate complex immune control system.

In the developing immune system, T cells are educated to recognize self from nonself in the thymus. They are then selected to proceed on to the periphery (positive selection) or, if they are inaccurate sensors, they self-destruct. A fragility of this exceedingly complex immune system is autoimmune disease, an example being primary biliary cirrhosis (PBC) in which self-reactive lymphocytes slip through the immunity education program. Autoimmune injury to bile ducts causes toxic bile acids to accumulate. Injured hepatocytes fail to clear hormones, contributing to increased pressures in the liver circulation and even to rupture of connected venous systems. Pressure-induced distortion of blood vessels in the spleen traps platelets, and damaged hepatocytes undersynthesize blood-clotting proteins, both exaggerating blood loss after trauma. Unable to capitalize on the usual homeostatic feedback interactions with the liver, virtually every organ—including brain and kidney—can fail, all initiated by superficially minuscule autoimmune damage.

Medical interventions for PBC (i.e., drugs and transplantation) are further control systems adding to spiraling complexity, robustness, and fragility. Genetic variation in the P450 enzyme family leads to considerable interindividual variation in drug handling and side effects. Imbalance of the P450 network can lead to accumulation of toxins, including carcinogens. Polypharmacy—a common necessity—results in even more unpredictable interactions because drugs modulate P450 activity. Liver transplantation is now standard therapy for PBC. Immunosuppression must be sufficient to quash the immune mechanism that recognizes a foreign invader, but too much immunosuppression allows infection and tumors to go unchecked. Hence, the fragilities of transplantation are infection and tumors.

Modularity and Protocols

What emerges from these examples is that spiraling complexity, feedback regulation, robustness, fragility, and cascading failures are heavily intertwined, as is well known to biologists and engineers alike. Equally important and well known is the obvious role that modularity plays at every level, from base pairs and amino acids to genes and proteins, from organelles and membranes to pathways and networks, and finally to organs and organ axes (4–8)—and in every complex process, from development (11) to evolution (18). Although their meaning varies, modules generally are components, parts, or subsystems of a larger system that contain some or all of the following features: (i) identifiable interfaces (usually involving protocols) to other modules, (ii) can be modified and evolved somewhat independently, (iii) facilitate simplified or abstract modeling, (iv) maintain some identity when isolated or rearranged, yet (v) derive additional identity from the rest of the system.

The organization and design of advanced technologies suggest universal principles, relevant to biology, linking modularity with the robust yet fragile nature of complex systems. Truly universal principles should manifest themselves in at least limited ways in scale-model (toy) systems, just as allometric scaling does. Consider the ubiquitous Lego toy system (33, 34). The signature feature of Lego is the patented snap connection for easy but stable assembly of components. The snap is the basic Lego protocol, and Lego bricks are its basic modules.

We claim that protocols are far more important to biologic complexity than are modules. They are complementary and intertwined but are important to distinguish. In everyday usage, protocols are rules designed to manage relationships and processes smoothly and effectively. If modules are ingredients, parts, components, subsystems, and players, then protocols describe the corresponding recipes, architectures, rules, interfaces, etiquettes, and codes of conduct (35). Protocols here are rules that prescribe allowed interfaces between modules, permitting system functions that could not be achieved by isolated modules. Protocols also facilitate the addition of new protocols and organization into collections of mutually supportive protocol suites. Like modules, they simplify modeling and abstraction, and as such may often be largely “in the eye of the beholder.” A good protocol is one that supplies both robustness and evolvability.

Lego exhibits multilayer robustness, from components and toys to the product line. Lego bricks and toys are reusable and robust to trauma, and the snap is versatile, permitting endless varieties of toys from an array of components. This makes both a given Lego collection and the entire toy system evolvable to changes in what one chooses to build, to the addition of new Lego-compatible parts, and to novel toy designs. Evolution here is simply robustness to (possibly large) changes on long time scales. The low cost of modules and the popularity of the system confer other forms of robustness and evolvability; lost parts are easily replaced, and enthusiasts constantly design new modules and toys. The Lego protocol also creates fragilities at every level. Superficially minuscule damage to the snap at a key interface may cause an entire toy to fail, yet noninterfacing parts of bricks may be heavily damaged with minimal impact. The success of Lego means that any new snap, even a superior one, would not be easily adopted. Selection pressures thus preserve a protocol in two ways: Protocols facilitate evolution and are difficult to change.

It is instructive to compare the robustness properties (basic performance, ability to withstand trauma, versatility of allowed interconnections, reusability of modules, cost of parts and labor, and evolvability) of the standard Lego snap protocol (called the wild type, WT) with those of other hypothetical protocols (denotedSmooth, Glue, and Mold). Smooth bricks without snaps have unconstrained interconnections, but the results are much less robust to trauma, severely limiting the range of toys.Glue, in addition to the WT snap, increases ability to withstand trauma but sharply decreases component reusability. InjectionMolding entire toys goes even further. Thus, each “mutation” offers advantages, with both different robustness and fragility, but none uniformly improves on WT's overall robustness. WT is “fine-tuned” for robustness. We claim that this kind of optimality and robustness is most important to biological complexity.

As systems become more complex, protocols facilitate the layering of additional protocols, particularly involving feedback and signaling. Suppose we want to make a Lego structure incrementally more useful and versatile by “evolving” it to be (i) mobile, then (ii) motorized, then (iii) able to avoid collisions in a maze of obstacles. The first increment is easy to achieve, with Lego protocol–compatible axles and wheels. Motorizing toys involves a second increment in complexity, requiring protocols for motor and battery interconnection as well as a separate protocol for gears. All can be integrated into a motorized protocol suite to make modular subassemblies of batteries, motors, gears, axles, and wheels. These are available, inexpensive additions. The third increment increases cost and complexity by orders of magnitude, requiring layers of protocols and modules for sensing, actuation, and feedback controls plus subsidiary but essential ones for communications and computing (34). All are available, but it is here that we begin to see the true complexity of advanced technologies. Unfortunately, we also start to lose the easily described, intuitive story of the basic protocols. Minimal descriptions of advanced Lego features enabling sensing and feedback control literally fill books, but the protocols also facilitate the building of elaborate, robust toys, precisely because this complexity is largely hidden from users. This is consistent with the claim that biological complexity too is dominated not by minimal function, but by the protocols and regulatory feedback loops that provide robustness and evolvability.

This added complexity also creates new and often extreme fragilities. Removing a toy's control system might cause reversion to mere mobility, but a small change in an otherwise intact control system could cause wild, catastrophic behavior. For example, a small software bug might easily lead to collision seeking, a fragility absent in simpler toys. Similarly, large multicellular organisms are unaffected by the death of a single cell, but failure of one cell's control system can lead to fatal autoimmune diseases or cancer.

The snap protocol is concretely instantiated only in Lego modules, but it is also easy to identify the protocol itself as a useful and informative abstraction. The snap protocol is more fundamental to Lego than are any individual modules. Similarly, we have no trouble distinguishing the many higher level protocols that organize sensing and feedback from the hardware modules themselves. In biology, the identification of protocols is easiest when shared by many different modules, as in Lego. Thus, abstractions such as gene regulation (11), covalent modification, membrane potentials, metabolic and signal transduction pathways, action potentials, and even transcription-translation, the cell cycle, and DNA replication could all be reasonably described as protocols (36), with their attendant modular implementations in various activators and repressors, kinases and phosphatases, ion channels, receptors, heterotrimeric guanine nucleotide binding proteins (G proteins), and so on. The cardiovascular system has protocols for gas and nutrient exchange and transport, implemented in heart, lung, vascular networks, and blood modules. The immune system involves elaborate protocols for complement and cell-mediated activation, implemented in modules such as T cells, natural killer cells, major histocompatibility complex molecules, and antibodies. Metazoan development has highly conserved protocols (18). Appropriate temporal and spatial expression during development (11) is regulated by enormous numbers of feedback strategies (9). These and many other protocols facilitate robust development and function in ways similar to Lego protocols, and they produce similar fragilities (9).

Thinking in terms of protocols, in addition to genes, organisms, and populations, as foci of natural selection, may be a useful abstraction for understanding the evolution of complexity (37). Good protocols allow new functions to be built from existing components and allow new components to be added or to evolve from existing ones, powerfully enhancing both engineering and evolutionary “tinkering.” Protocols enable modularity and robustness but are in turn sources of fragility. Successful protocols become highly conserved because they both facilitate evolution and are difficult to change.

Lego has a perfectly complete “legome” of all parts, including full structure and function. A similar compendium is far from available for even simple organisms. Yet understanding a collision-avoiding, software-intensive, feedback-regulated Lego robot would require extensive reverse engineering of additional layers of protocols and modules beyond the legome. That the legome would not be sufficient is no surprise, but for reverse engineering such details may not be entirely necessary (see below). Imagine that such a Lego robot was a prototype for a single toy that dispensed entirely with the Lego modules in favor of custom implementation. Similar to Mold, this toy could easily have much more robustness to trauma, be faster, and navigate more complex obstacles, but at the expense of limited part reuse. The modules and lower level protocols—most of the legome—would be completely different, yet we might claim that the essence of the toy, and what the prototype aimed to capture, remained. That essence involves the protocols that organized the sensors, actuators, and feedback control system that enables the obstacle avoidance and contributes almost the entire cost and complexity. These too are governed by protocols, but also by entirely new laws.

Elementary Feedback Concepts

Protocols are the most important aspect of modularity, and the most complex and critical protocols are for feedback control and the sensing, computing, communication, and actuation that implement it. Feedback control is both a powerful and dangerous strategy for creating robustness to external disturbances and internal component variations. Properly balanced, it delivers such a huge benefit that both engineers and evolution capitalize extensively on feedback to build and support complex systems (4, 9). Detailed elaboration of the nature of regulatory feedback underlying complexity is beyond the scope of this article, but an elementary “toy” model illustrates the necessity of feedback to the function of complex systems as well as feedback's “conservation of fragility” law. This is arguably the most critical and rigorously established robustness tradeoff in complex systems.

In most technologies as well as in biochemistry, it is relatively easy to build either uncertain, high-gain components or precise, low-gain ones; but the precise, high-gain systems essential to both biology and technology are impossible or prohibitively expensive to make unless a feedback strategy like that in Fig. 2 is used. The simplest case is steady-state gain where, after some transient, r and d are held constant, andy too approaches a constant y =Rr + Sd (38), where Rand S are responses of y to s andd, respectively. Solving y = d + ACy + Ar givesEmbedded Image Embedded Image(1)where F is the feedback gain. Ideally, perfect control would have |S| = 0, because that givesy = –r/C (R = –1/C) completely independent of arbitrary variations inA and d. If A → ∞ and –1/C ≫ 1, then F → –∞, |S| → 0, and y → −r/C. ThenR amplifies r and is perfectly robust to external disturbance d and to variations in A. ChoosingC small and precise, with A sufficiently large and even sloppy, is one effective, efficient, and robust way to makey a high-gain function of r. |S| measures the deviation from perfect control, and feedback can attenuate or greatly amplify the effects of uncertainties. Defining fragility as log|S|, note that F < 0 iff |S| < 1 iff log|S| < 0 (39).F > 0 makes log|S| > 0, amplifyingd and uncertainty in A, and F → 1 makes log|S| → ∞ (40). Unfortunately, this story is incomplete and even misleading without dynamics. The simplest possibility is for A and C to be first-order differential equationsEmbedded Image(2) Embedded Image C is a low-pass filter with internal statex and parameters k 1 > 0 andk 2 > 0. A is a pure integrator with state a and gain g > 0 (41). This type of control is called “integral feedback.” The parameters g, k 1, and k 2 might typically be functions of underlying physical quantities such as temperature, binding affinities, concentrations, etc., and thus might vary widely. The responsey(t) to steps in r and dare shown in Fig. 3 over two orders of magnitude in g and k 1. This simple protocol of integral feedback produces extremely robust external behavior even from wildly varying components (the blue solid versus red dashed lines in Fig. 3B). It is easily shown that this system is stable iff gk 1 > 0 andk 2 > 0, and converges to the steady statey = (k 2/k 1)rindependently of arbitrarily large variations in gain g and disturbance d (42). If k 2k 1, y = (k 2/k 1)r is a high-gain amplifier as well (43). The individual values ofg, k 1, and k 2influence the rate of convergence to steady state, but only the ratiok 2/k 1 determines its value. Thus, robust high steady-state gain can be achieved with uncertain and small parameters with the right feedback protocol. Figure 3C shows that variations in both g andk 2 of orders of magnitude have modest impact, and only on early transient behavior.

Figure 2

Minimal feedback system with actuator Aand controller/sensor C. The goal is for responsey to amplify reference r, independent of external disturbance d and variations in A. The signalsu and a are the input and output of the actuatorA, and x is the output ofC.

Figure 3

Closed (k 1 = 0.01, blue) versus open (k 1 = 0, red) loop response y(t) to step changes att = 0 in (A)d(t) (r = 0) and (B)r(t) (d = 0) forg = 0.1, 1, and 10; k 1 = 0.01; and k 2 = 10k 1. Note the extreme divergence (k 1 = 0) versus convergence (k 1 = 0.01) ast → ∞. (C) is a zoom of (A) withk 1 = 0.01, 0.1, and 1;k 2 = 10k 1 added for each value of g. (D) log|S(ω)| versus ω for responses in (A). The peaks in log|S(ω)| correspond to the oscillations in (A) and (B). Note the equal areas under the curves for log|S(ω)|.

The protocol here is the structure of the equations, including the integral feedback and the signs of the parameters. Modules are the implementations of the actuator and controller. As with Lego, the protocol must be “fine-tuned” (because rewiring components or flipping signs typically creates exponentially growing instabilities), but this allows the modules to vary widely with minimal effect (44). Integral feedback is used ubiquitously in engineering (45) and is likely to be ubiquitous in biology as well, to achieve everything from homeostatic regulation to “perfect adaptation,” and preliminary investigations confirm this impression (46–48). One reason is that integral feedback is both sufficient and necessary for perfect and robust steady-state tracking. Intuitively, necessity follows from the fact that in steady state, a = yd must perfectly cancel any constant (step in) d, whereas the inputu to A cannot depend on this d, because y does not. Thus, A (or C) must contain an internal model of the dynamics of d, which for step changes is a pure integrator (49), which produces unbounded outputs to constant inputs. Thus, open-loop hypersensitivity is necessary for closed-loop robustness, and Fig. 3B is not an accident.

Fragility enters in the transient response. When g is increased, the response is faster but oscillatory (Fig. 3, A and B).Figure 3D plots fragility log|S(ω)| versus ω whereY(ω) and D(ω) are Fourier transforms ofy and d, andEmbedded Image(3)For increasing g, low-frequency robustness (log|S(ω)| < 0) is improved, but at the expense of increased fragility (log|S(ω)| > 0) at higher frequencies (50). Indeed, it can be proven that for allg Embedded Image(4)so net fragility is, in this sense, a conserved quantity. Robustness (log|S(ω)| < 0) is paid for by an equal fragility (log|S(ω)| > 0), which amplifiesd and uncertainty in A (51). This quite general result also holds for arbitrary parameters, control systems, and disturbances (52). Thus, there are always nonconstant (e.g., sinusoidal) d(t) that would be amplified in y(t). Such d could be perfectly rejected too, but only by adding internal models as complex as the external environment that generates d. Although such modeling is possible only for simple idealized laboratory environments, even approximate attempts can drive an extreme complexity spiral in real systems, and any controller is still subject to the constraint inEq. 4. The key to good control design, then, is to ensure that this fragility is tolerable and occurs where uncertainties are relatively small.

Even these simple toy examples show the robust yet fragile features of complex regulatory networks. Their outward signatures are extremely constant regulated variables (yet occasional cryptic fluctuations) as well as extraordinary robustness to component variations (yet rare but catastrophic cascading failures). These apparently paradoxical combinations can easily be a source of confusion to experimentalists, clinicians, and theoreticians alike (53), but are intrinsic features of highly optimized feedback regulation. Because net robustness and fragility are constrained quantities, they must be manipulated and controlled with and within complex networks, even more so than energy and materials.Figure 3B shows how extreme open-loop versus closed-loop behavior can be, and thus how dangerous loss of control is to a system relying on it. The tradeoff in Eq. 4 shows that even when working perfectly, net fragility is constrained, and thus some transient amplification is unavoidable.

The necessity of integral feedback and the fragility constraint in Eq. 4 thus describe laws, not protocols—perhaps the two simplest such laws from control theory. Controllers that are more complex, with additional dynamics and multiple sensors and actuators, offer more refinement in performing robustness-fragility tradeoffs. Adding to regulatory complexity is also relatively easy in an evolutionary sense. Faster components allow for faster closed-loop responses. All are used in both biology and engineering, but all are still ultimately subject to Eq. 4. Control engineers must contend with this tradeoff, and its generalizations to more complex structures dominate control system design. Presumably, such tradeoffs dominate and constrain evolution and biology as well.

Implications for Biology and Engineering

The success of systems biology will certainly require modeling and simulation tools from engineering (54, 55), where experience shows that brute-force computational approaches are hopeless for complex systems involving protocols and feedback. Highly fragile features require highly sophisticated modeling, whereas robust features often have adequate models that are greatly simplified, requiring a “middle-out” approach (10). For example, ifFig. 2 is for a module in a larger system, the steady-state gainy = (k 2/k 1)rdepends only on k 2/k 1 and no other parameters, potentially simplifying experiments and modeling. If transient dynamics or component failure were of interest, more details would be needed, determined more by the rest of the system than by the internal components.

Many challenges of postgenomic biology are converging to the challenges facing engineers building complex “networks of networks,” and engineering theory and practice are undergoing a revolution as radical as biology's. The simple ideas here only hint at the possibilities. For example, more complex control protocols thanFig. 2, used in both engineering and biology, can ameliorate (though not eliminate) the constraint in Eq. 4, but sophisticated theory is needed to elucidate the issues. Realistic models of biological networks will not be simple and will require multiple feedback signals, nonlinear component dynamics, numerous uncertain parameters, stochastic noise models (56), parasitic dynamics, and other uncertainty models. Scaling to deal with large networks will be a major challenge. Fortunately, researchers in robust control theory, dynamical systems, and related areas have been vigorously pursuing mathematics and software tools to address exactly these issues and apply them to complex engineering systems (57, 58). Biological applications are new, but progress so far is encouraging.

Experiments, modeling and simulation, and theory all have fragilities, but they are complementary, and through the right protocols they have the potential to create a robust “closed-loop” systems biology (59). Biologists' frustrating experience with theory has been primarily in an open-loop mode, where simple and attractive ideas can be wrong but receive enormous attention. Biology is the only science where feedback control and protocols play a dominant role, so it should not be surprising that there would be popular theories, coming from within science, that did not emphasize these issues. Biologists and engineers now have enough examples of complex systems that they can close the loop and eliminate specious theories (60). We should compare notes.

  • * To whom correspondence should be addressed. E-mail: doyle{at}cds.caltech.edu


View Abstract

Stay Connected to Science

Navigate This Article