Predicting the Behavior of Techno-Social Systems

See allHide authors and affiliations

Science  24 Jul 2009:
Vol. 325, Issue 5939, pp. 425-428
DOI: 10.1126/science.1171990


We live in an increasingly interconnected world of techno-social systems, in which infrastructures composed of different technological layers are interoperating within the social component that drives their use and development. Examples are provided by the Internet, the World Wide Web, WiFi communication technologies, and transportation and mobility infrastructures. The multiscale nature and complexity of these networks are crucial features in understanding and managing the networks. The accessibility of new data and the advances in the theory and modeling of complex networks are providing an integrated framework that brings us closer to achieving true predictive power of the behavior of techno-social systems.

Modern techno-social systems consist of large-scale physical infrastructures (such as transportation systems and power distribution grids) embedded in a dense web of communication and computing infrastructures whose dynamics and evolution are defined and driven by human behavior. To predict the behavior of such systems, it is necessary to start with the mathematical description of patterns found in real-world data. These descriptions form the basis of models that can be used to anticipate trends, evaluate risks, and eventually manage future events. If fed with the right data, computational modeling approaches can provide the requested level of predictability in very complex settings. The most successful example is weather forecasting, in which sophisticated supercomputer infrastructures are used to integrate current data and huge libraries of historical meteorological patterns into large-scale computational simulations. Although we often complain about the accuracy of daily weather forecasts, we must remember that numerical weather models and predictions allow us to project the path and intensity of hurricanes, storms, and other severe meteorological occurrences and, in many cases, to save thousand of lives by anticipating and preparing for these events.

Given the success that has been achieved in weather forecasting for decades, why haven’t we achieved the same success in the quantitative prediction of the next pandemic spatio-temporal pattern or the effects over the next decade of connecting billions of people from China and India on Internet growth and stability? The basic difference is that forecasting phenomena in techno-social systems starts with our limited knowledge of society and human behavior rather than with the physical laws governing fluid and gas masses. In other words, though it is possible to produce satellite images of atmospheric turbulence, we do not yet have large-scale worldwide, quantitative knowledge of human mobility, the progression of risk perception in a population, or the tendency to adopt certain social behaviors. In recent years, however, tremendous progress has been made in data gathering, the development of new informatics tools, and increases in computational power. A huge flow of quantitative data that combine the demographic and behavioral aspects of society with the infrastructural substrate is becoming available (16). Analogously to what happened in physics, we are finally in the position to move from the analysis of the “social atom” or “social molecules” (i.e., small social groups) to the quantitative analysis of social aggregate states, as envisioned by social scientists at the beginning of the past century (7). Here, I refer to “social aggregate states” as large-scale social systems consisting of millions of individuals that can be characterized in space (geographic and social) and time. The shift from the study of a small number of elements to the study of the behavior of large-scale aggregates is equivalent to the shift from atomic and molecular physics to the physics of matter. The understanding of how the same elements assembled in large number can give rise, according to the various forces and elements at play, to different macroscopic and dynamical behaviors opens the path to quantitative computational approaches and forecasting power. Yet at the same time, the study of social aggregate states present us with all the challenges already faced in the physics of matter, from turbulence to multiscale behavior.

Reality Mining and Proxy Networks

The level of information flow regarding techno-social systems is not just due to advances in number crunching power of modern computer processors. Insights into the nature of the interlinks between people and technology and the dissolution of boundaries between the cyberworld and our real-world social activities are changing our accessibility to data, leading to “reality-mining,” which has been defined as the collection of machine-sensed environmental data that are related to human social behavior (2). A prime example of the people/technology interlinkage can be found in the analysis of human mobility. In the past, approaches to human interactions and mobility have mostly relied on census and survey data, which were often incomplete and/or limited to a specific context. Despite advances in the study of human transport (8, 9), this lack of data has hindered the construction of a general framework of human mobility based on dynamical principles at the individual level with the ability to bridge spatial scales, from small communities to large urban areas and countries, in a bottom-up perspective. However, in pioneering work, Brockmann et al. (4) showed that popular Web sites for currency tracking (such as and collect a massive number of records on money dispersal that can be used as a proxy for human mobility. This work opened the path to the general exploitation of proxy data for human interaction and mobility (10). Analogously, modern mobile phones and personal digital assistants combine sophisticated technologies such as Bluetooth, Global Positioning System, and WiFi, constantly producing detailed traces on our daily activities (2, 11). For instance, in a recent study, Gonzalez et al. (6) used mobile phone data to track the movements of 100,000 people over a 6-month time span. Furthermore, it is now possible to use sensors and tags that produce data at the microscale of one-to-one interactions (1, 2).

Through confronting us with serious ethical and privacy questions, these kinds of data and the reduced cost of producing, accessing, and communicating information on techno-social systems are changing our understanding of a wide range of phenomena (1217). The spatial dynamics of human infectious diseases are determined by the mobility of individuals who carry a disease into previously uninfected populations. Analogously, human migration and mobility mediate a large number of bioinvasions, defined as the introduction of previously unknown organisms in ecosystems. The evolution of languages and dialects is also driven by the mixing of populations and the merging and/or isolation of communities. Finally, the daily mobility of humans in Internet space defines our exploitation and foraging of information.

Network Thinking

The Internet and virtual worlds are networks that we navigate and explore every day (1719). Human-interaction models are based on social networks in which nodes represent individual interacting agents and the links are potential interactions (20). Mobility, ecological, and epidemiological models rely on metapopulation networks that consist of entire populations interlinked by virtue of the exchanges between groups of individuals (21). A large body of work has shown that most real-world networks exhibit dynamic self-organization (that is, they become more complicated over time without the intervention of outside forces) and are statistically very heterogeneous; these characteristics are typical hallmarks of complex systems (2224). The various statistical distributions characterizing these networks (including the probabilities of node connection and the intensities of the connecting links) are generally heavy tailed and skewed, and they vary over several orders of magnitude (25). The foremost challenge offered by complex networks therefore resides in their interconnectedness (networks of networks) and multiscale nature. Figure 1 depicts three networks that exemplify human mobility at different scales, ranging from cross-continental airline travel to within-city mobility among mobile phone cell towers. Ideally, to make predictions about the processes driven by human mobility, we need to integrate this data, with its wide-ranging granularities (from a few hundred meters and a few hours to thousands of kilometers and several days), into a huge multiscale network.

Fig. 1

Multiscale properties of mobility networks. On the left, we report the probability distribution P(s) for the traffic, measured as the number of traveling individuals, on any given connection, of three different networks: (A) the continental U.S. airline network, (B) the continental U.S. county commuting network, and (C) the mobility among telephone tower cells in a major urban area. In all cases, the distributions are highly skewed and span from three to seven orders of magnitude. On the right, we show the illustration of the continental U.S. airline network (D) and the commuting network (E) among major census areas. The color scale from yellow to dark red identifies the traffic flow magnitude in logarithmic scale. The airline network is made mostly by long-range connections as compared with a gridlike ordering of the commuting network. The daily average flow of the commuting network is one order of magnitude larger than that of the airline network.

Thus, the complexity of techno-social systems calls for a “network” mindset. A simple example is provided by the large-scale description of epidemic spreading. The spread of the plague epidemic in the 14th century (the Black Death) (26) was mainly a spatial diffusion phenomenon. Historical studies have established that the disease propagation followed a simple pattern that can be adequately described mathematically within the framework of continuous differential equations with terms that describe diffusion. As anticipated in 1933 (27), the large-scale and geographical impact of infectious diseases [such as the SARS epidemic (28) or the current swine flu epidemic] on populations in the modern world is mainly due to commercial air travel. An epidemic that starts in Southeast Asia will rapidly reach North America and Europe (Fig. 2). This picture, therefore, cannot be simply described in terms of diffusive phenomena; rather, it must incorporate the spatial structure of modern transportation networks. For instance, it is the heavy-tailed nature of the airline traffic network that explains why travel restrictions alone are ineffective in containing a global epidemic unless the global mobility rate is reduced at least by one order of magnitude (2931).

Fig. 2

Epidemic invasion tree obtained from the simulations of a pandemic originating in Hanoi, Vietnam. The nodes identify 3200 populations worldwide, and the directed links indicate the path along which the epidemic has moved from one population to the other. The color map from dark red to dark blue is according to the time ordering of the epidemic invasion. Simulations obtained with the worldwide epidemic and mobility model from (38).

Another crucial aspect of modern network thinking is the dynamical self-organization that gives rise to large-scale infrastructure patterns independent of human planning and engineering of the system. The prime example of a dynamical self-organizing system may be the Internet, but most communication infrastructures, road and transportation systems, supply networks, and power distribution grids are also dynamically growing networks. Road construction, for instance, is obviously planned, and it is not surprising that considerations of optimization of cost, efficiency, and utility inform the planning effort. As a consequence, one could generally expect road networks to exhibit a high degree of regularity. Yet everyday experience suggests that this is not the case, especially in towns that have grown over a long period of time. For this reason, researchers have formulated simple road-formation models (32) that try to capture the tension between the notion of optimality that inspires planners and the limited time and spatial horizons that inform their decisions.

However, the biggest challenge in providing a holistic description of multiscale networks is the necessity of simultaneously dealing with multiple time and length scales. The final system’s dynamical behavior at any scale is the product of the events taking place on all scales. The single agent spreading a disease or single node of the Internet that fails are apparently not affected by the multiscale nature of the network, just as single molecules do not care about the multiscale nature of turbulent fluids. However, the collective dynamical behavior and our ability to conduct mathematical and/or computational analyses of techno-social systems are constrained by the multiscale characteristic of the system. In the context of networks and techno-social systems, the multiscale challenge is making its appearance now because of the availability of large-scale data sets. Thus, we have to develop appropriate formalisms and techniques, as researchers studying multiscale physical systems (fluids, solids, distribution of masses in the universe, etc.) have done in the past (33). To achieve analytical understanding of techno-social systems and approach them computationally, we must find different strategies to deal with dynamical behavior and/or equations that work at very different characteristic scales but still influence each other. Such methods will finally allow the definition of layered computational approaches in which different modeling assumptions and granularities can be used consistently in the context of a general multiscale framework.

Taking Advantage of Multiscale Networks

Knowledge of network characteristics opens the path to the discovery and understanding of new statistical and dynamical laws governing large infrastructural systems coupled to social systems. Furthermore, the massive interconnectivity of spatially distributed populations and the complexity and strong heterogeneity of multiscale networks are the keys to the construction of ab-initio computational models, in which the behavior of the system can be understood in a bottom-up perspective, as opposed to the traditional mean-field or top-down strategies. This happens in a wide array of contexts ranging from urban planning (34) to epidemic modeling (3538). Notable examples are the TRANSIM and EPISIMS projects (35), in which agent-based models, including millions of individuals, are used to simulate the dynamics and traffic of entire cities and the spread of biological agents, respectively.

In some cases, the understanding of complex networks provides counterintuitive and surprising approaches to the engineering and management of complex techno-social systems. For example, in power grids and other flow-carrying networks, the failure of a single node or line can trigger a domino effect (“cascading failure”), in which the overload induced by the flow redistribution may generate a global failure of the network. By taking advantage of the heterogeneity of the flow carried on the links of multiscale networks, A. E. Motter (39) has proposed an adaptive defense mechanism that is actually based on the removal of a certain number of nodes to induce intentional failures. Although this mechanism might appear counterintuitive, the intentional failure of appropriately chosen nodes does not amplify the cascade process and, on the contrary, is able to mitigate the final damage. In other words, we now can provide a rationale for understanding the emerging tipping points and nonlinear properties that often underpin the most interesting characteristics of a techno-social system’s behavior.

The Toughest Challenge

Although many basic conceptual questions remain unresolved, the major roadblock in defining the fundamental predictability limits for techno-social systems is their sensitivity and dependence on social adaptive behavior. In the absence of a stress on the system, a stationary state is reached in which the feedback between the social behavior and the physical infrastructure determines the details of how the network behavior and the dynamical process of interest play out. We can imagine using steady-state data to forecast system behavior under such “normal” conditions. However, in the case of catastrophic events (for instance, the disruption of social order during emergencies such as pandemics or major natural disasters), the behavior of techno-social networks is driven out of equilibrium into unknown territory.

An interesting and ethically challenging aspect of predicting and managing the unfolding of catastrophic events in techno-social networks is the system’s adaptation to predictions when they are made publicly available. Social behaviors react and adapt to knowledge of predictions. Contrary to what happens in physical systems, the predictions themselves are part of the system dynamic. In addition, predictions may point to unethical control and anticipation strategies favoring specific demographic sectors of the society. Finally, the risk of erroneous predictions may lead to costly or unethical social control mechanisms with no actual benefits. Whereas some of the above issues may find a partial solution through improvements in the accuracy and reliability of models, it is clear that social adaptation to predictions presents us with new methodological and ethical problems.

Addressing these problems involves tackling three major scientific challenges. The first is the gathering of large-scale data on information spread and social reactions that occur during periods of crisis. This is not presently out of reach, via large-scale mobile communication databases (such as mobile telephones, Twitter logs, and social Web tools) operating at the moment of specific disaster or crisis events. The second challenge is the formulation of formal models that make it possible to quantify the effect of risk perception and awareness phenomena of individuals on the techno-social network structure and dynamics. The third challenge concerns the deployment of monitoring infrastructures capable of informing computational models in real time. Complex systems and networks theory, mathematical biology, statistics, nonequilibrium statistical physics, and computer science all play a key role in the effort to meet these challenges. Although such an integrated approach might still be in its infancy, it now seems possible to imagine the creation of computational forecasting infrastructures that will help us design better energy-distribution systems, plan for traffic-free cities, anticipate the demands of Internet connectivity, or manage the deployment of resources during health emergencies.

References and Notes

  1. I thank V. Colizza, D. Balcan, B. Goncalves, M. Gonzalez, and H. Hu for help with the figures and M. Gonzalez for the data used in Fig. 1C. I am partially supported by NIH, NSF, the Defense Threat Reduction Agency, the Lilly Endowment Foundation, and the Future Emerging Technologies projects Epiwork and Dynanets.
View Abstract

Stay Connected to Science

Navigate This Article