Report

Understanding the Spreading Patterns of Mobile Phone Viruses

See allHide authors and affiliations

Science  22 May 2009:
Vol. 324, Issue 5930, pp. 1071-1076
DOI: 10.1126/science.1167053

Abstract

We modeled the mobility of mobile phone users in order to study the fundamental spreading patterns that characterize a mobile virus outbreak. We find that although Bluetooth viruses can reach all susceptible handsets with time, they spread slowly because of human mobility, offering ample opportunities to deploy antiviral software. In contrast, viruses using multimedia messaging services could infect all users in hours, but currently a phase transition on the underlying call graph limits them to only a small fraction of the susceptible users. These results explain the lack of a major mobile virus breakout so far and predict that once a mobile operating system’s market share reaches the phase transition point, viruses will pose a serious threat to mobile communications.

Lacking a standardized operating system, traditional cellphones have been relatively immune to viruses. Smart phones, however, can share programs and data with each other, representing a fertile ground for virus writers (14). Indeed, since 2004 more than 420 smart phone viruses have been identified (2, 3), the newer ones having reached a state of sophistication that took computer viruses about two decades to achieve (2). Although smart phones currently represent less than 5% of the mobile market, given their reported fast annual growth rate (4) they are poised to become the dominant communication device in the near future, raising the possibility of virus breakouts that could overshadow the disruption caused by traditional computer viruses (5).

The spread of mobile viruses is aided by two dominant communication protocols. First, a Bluetooth (BT) virus can infect all BT-activated phones within a distance from 10 to 30 m, resulting in a spatially localized spreading pattern similar to the one observed in the case of influenza (3, 6, 7), severe acute respiratory syndrome (SARS) (8, 9), and other contact-based diseases (Fig. 1A) (10). Second, a multimedia messaging system (MMS) virus can send a copy of itself to all mobile phones whose numbers are found in the infected phone’s address book, a long-range spreading pattern previously exploited only by computer viruses (11, 12). Thus, in order to quantitatively study the spreading dynamics of mobile viruses we need to simultaneously track the location (13), the mobility (1417), and the communication patterns (1821) of mobile phone users. We achieved this by studying the anonymized billing record of a mobile phone provider and recording the calling patterns and the coordinates of the closest mobile phone tower each time a group of 6.2 million mobile subscribers used their phone. Thus, we do not know the users’ precise locations within the tower’s reception area, and no information is available about the users between calls.

Fig. 1

The spreading mechanisms of mobile viruses. (A) A BT virus can infect all phones found within BT range from the infected phone, its spread being determined by the owner’s mobility patterns. An MMS virus can infect all susceptible phones whose number is found in the infected phone’s phonebook, resulting in a long-range spreading pattern that is independent of the infected phone’s physical location. (B) A small neighborhood of the call graph constructed starting from a randomly chosen user and including all mobile phone contacts up to four degrees from it. The color of the node represents the handset’s OS, which in this example are randomly assigned so that 75% of the nodes represent OS1, and the red are the remaining handsets with OS2 (25%). (C) The clusters in the call graph on which an MMS virus affecting a given OS can spread, illustrating that an MMS virus can reach at most the number of users that are part of the giant component of the appropriate handset. As the example for the OS shows, the size of the giant component highly depends on the handset’s market share (see also Fig. 2C).

The methods we used to simulate the spreading of a potential BT and MMS virus are described in (22). Briefly, once a phone becomes infected with an MMS virus, within 2 min it sends a copy of itself to each mobile phone number found in the handset’s phone book, approximated with the list of numbers with which the handset’s user communicated during a month-long observational period. A BT virus can infect only mobile phones within a distance r = 10 m. To simulate this process, we assigned to each user an hourly location that was consistent with its travel patterns (13) and followed the infection dynamics within each mobile tower area using the susceptible infected (SI) model (23). That is, we consider that an infected user (I) infects a susceptible user (S), so that the number of infected users evolves in time (t) as dI/dt = βSI/N, where the effective infection rate is β = μ<k> with μ = 1, N is the number of users in the tower area, and the average number of contacts is <k> = ρA = NA/Atower, where A = πr2 represents the BT communication area and ρ = N/Atower is the population density inside a tower’s service area. Once an infected user moves into the vicinity of a new tower, it will serve as a source of a BT infection in its new location.

A cell phone virus can infect only the phones with the operating system (OS) for which it was designed (2, 3), making the market share m of an OS an important free parameter in our study. The current market share of various smart phone OSs vary widely, from as little as 2.6% for Palm OS to 64.3% for Symbian. Given that smart phones together represent less than 5% of all phones, the overall market share of these OSs among all mobile phones is in the range of m = 0.0013 for Palm OS and m = 0.032 for Symbian, numbers that are expected to dramatically increase as smart phones replace traditional phones. To maintain the generality of our results, we treat m as a free parameter, finding that the spreading of both BT and MMS viruses is highly sensitive to the market share of the susceptible handsets (Fig. 2, A and B). Our simulations indicate that given sufficient time, a BT virus can reach all susceptible handsets because user mobility guarantees that sooner or later each susceptible handset will find itself in the vicinity of an infected handset. The spreading rate strongly depends, however, on the handset’s market share. For example, if the handset’s market share is m = 0.01 it takes several months for a BT virus to reach all susceptible handsets. In contrast, for m = 0.30 the BT virus could infect 85% of susceptible handsets in a few hours and 99.8% in less than a week (Fig. 2A).

Fig. 2

The spreading patterns of BT and MMS viruses. (A) The changes in the ratio of infected and susceptible handsets (I/N) with time in the case of a BT virus affecting handsets with different m. (B) Same as in panel (A) but for MMS viruses. The saturation in I/N indicates that an MMS virus can reach only a finite fraction of all susceptible phones. (C) The size of the giant component Gm in a function of m. The blue triangles correspond to the saturation values measured in Fig. 2B, whereas the red line is the theoretical prediction according to percolation theory (the deviations are mainly attributed to finite size effects and degree correlations because the calculation assumed an infinite call graph). (D) The latency time needed to infect q = 0.65 or q = 0.15 fraction of susceptible handsets via a BT virus, approximated with T(q = 0.65, m) ~ m–0.63±0.05 and T(q = 0.15, m) ~ m–0.60±0.04 (continuous lines). (E) The latency time for an MMS virus for q = 0.05, 0.15, and 0.30. The continuous lines correspond to T(q,m) ~ (mmq*)α(q), where the best fits indicate a systematic q-dependence: α(0.05) = 0.20 ± 0.02, α(0.15) = 0.17 ± 0.01, and α(0.30) = 0.14 ± 0.01. (F) Log-log plot showing Lave and Lmax for the largest cluster. The fits correspond to Lmax ~ (mm*)–0.20±0.02 and Lave ~ (mm*)–0.19±0.02. The curves in (A), (B), and (D) are obtained from 10 independent simulations, and (E) and (F) represent an average over 100 runs. For more statistical analysis of the fits in (D) to (F), see the detailed discussion in (22).

The most striking difference between BT and MMS viruses comes in the time scales that their spread requires. Indeed, given that it takes approximately 2 min for a typical MMS virus to copy itself into a new handset (24), an MMS epidemic reaches saturation in a few hours, in contrast with the few days that the BT virus requires to infect all susceptible handsets (Fig. 2B). Thus, although there is plenty of time to deploy an antiviral software for a BT virus before it could reach a large fraction of users, it is largely impossible to achieve the same for MMS viruses, given their explosive spread. The good news is that an MMS virus can reach only a small m-dependent fraction of users with a susceptible handset, as indicated by the saturation of the infection curves in Fig. 2B. The origin of this saturation is the fragmentation of the underlying call network. Indeed, in Fig. 1B we show a subset of the real call network and assume for the illustration that the handsets can have only two OSs, OS1 and OS2, with market shares m1 = 0.75 and m2 = 0.25, respectively. Although the underlying call network itself is fully connected, the call graph of the users that share the same handset is fragmented into many islands (Fig. 1C). For m1 = 0.75, we observed a giant component (the largest connected cluster) (Fig. 1C) of size Gm = 0.80, meaning that it contains 80% of the users with the OS1 handset; the rest of the OS1 users are scattered in small isolated clusters. In contrast, for the OS2 handsets the giant component is tiny (Gm = 0.06). If an MMS virus is released from a single handset, it can only reach the handsets in the cluster where the original handset is located, which indicates that an MMS virus can infect at most a Gm fraction of all susceptible handsets, which is 80% for OS1 and 6% for OS2 in the example of Fig. 1.

We found that the handset-based fragmentation of the call graph (Fig. 1, B and C) is governed by a percolation phase transition at the market share mc = 0.095 (Fig. 2C) (25). That is, for m < mc the user base is fragmented into many small isolated islands, making a major MMS virus viral outbreak impossible. In contrast, for handsets with m > mc there is a giant component, allowing the MMS virus to reach all handsets that are part of it. The value of mc and Gm for m > mc can be calculated by using the generating function formalism (26), requiring as input only the network’s degree distribution P(k). With P(k) characterizing our user base, we found a reasonable agreement between the analytical predictions and the direct measurements of the saturation value of the MMS virus spreading in the mobile phone data set (Fig. 2C); the small systematic deviation is rooted in the fact that the generating function formalism ignores the correlations in the call graph’s structure. The value of Fig. 2C is its ability to explain why we have not observed a substantial MMS outbreak so far: Currently, the market share of the largest OS is less than m = 0.03, well under the predicted percolation transition point mc = 0.095 (27, 28). For a more detailed discussion on the factors affecting m and mc, see (22).

The differences between MMS and BT viruses have a strong impact on their spreading dynamics as well. To see this, we denote the latency time with T(q,m), representing the average time necessary for a virus affecting a handset with market share m to reach a q fraction of all susceptible handsets. For a BT virus, T(q,m) is finite for any q and m combination, given that with time the virus can reach all susceptible users (Fig. 2A). We found, however, that the latency time is highly sensitive to m, a dependency well approximated by T(q,m) ~ m−0.6 [coefficient of multiple determination (R2) > 0.99] (Fig. 2D) (22), implying that the smaller a handset’s market share the longer a virus will take to reach a q fraction of susceptible users. The observed divergence at m = 0 indicates that for handsets with a small market share, the spreading process is exceptionally slow because an infected user takes a very long time to come in contact with another user with a similar handset.

Once again, the behavior of MMS viruses is qualitatively different: We found that T(q,m) diverges not at m = 0 but at a finite mq* value (Fig. 2E), meaning that for handsets with m < mq* the virus is unable to reach a q fraction of users. Indeed, an MMS virus can reach at most a Gm fraction of eligible handsets (Fig. 2C), implying that Gm acts as a critical point for the dynamical spreading process and T(q > Gm, m) = ∞. To characterize the observed singularity, the maximum amount of time it takes an MMS virus to invade the giant component should be determined by the length of the longest minimal path (Lmax) characterizing the susceptible giant cluster (29, 30). As Fig. 2F shows, we found that both Lmax and the average minimal path length (Lave) diverge as (mm*)α with α = 0.2 (R2 > 0.97) (22), a singularity that potentially drives the observed divergence of T(q,m) near mq* given by the equation q = Gmq*. A more detailed measurement indicates, however, a systematic q-dependence of α(q) in T(q,m) ~ (mmq*)α(q) in the vicinity of the critical point (Fig. 2E) (22), hinting that there are factors beyond Lmax that contribute to the divergence of T(q,m).

As shown in Fig. 3, we followed the spread of an MMS and BT infection starting from the same user, illustrating that BT and MMS viruses differ in their spatial spreading patterns as well: A BT virus follows a wave-like pattern, predominantly infecting users in the vicinity of the virus’s release point, whereas an MMS virus follows a more delocalized pattern, given that the users’ address books often contain phone numbers of individuals who are far away. To quantify the observed differences, we measured the average distance between the cell phone tower where the first infected user is located and the location of towers servicing the newly infected users. A null model in which the virus always diffuses to the noninfected towers bordering the already infected towers, thus following a classical two-dimensional diffusion process, was used as a reference. As Fig. 3B indicates, the typical source-infection distances observed in the local model are substantially smaller than the distances recorded for either BT or MMS viruses, indicating the impact of a few long-distance travellers that incubate outbreaks in distant cells (13) in the BT spreading process. The average distance is the highest for MMS viruses, underlying the delocalized pattern characterizing its spread. Figure 3B also shows that the dependence of the average source-infection distance (<D>) on N is mainly a function of the spreading technology and appears to be independent of m.

Fig. 3

Spatial patterns in the spread of BT and MMS viruses. (A) The virus starts from the same user located at the tower marked by the red arrows (left). The three panels show the percentage of infected users in the vicinity of each mobile phone tower (denoted by the voronoi cell that approximates each tower’s service area). In the right panel, we show the corresponding time-dependent infection curves, marking the moments when the spatial distribution was recorded. (B) Average distance between the tower where the infection was originally started and the most currently infected phone as a function of N, the number of towers with at least one infected user, used as a proxy of time (three red and blue curves correspond to m = 0.1, m = 0.5, and m = 1). The green line is obtained from a null model that assumes that the virus can only spread from one tower’s service area to its neighbor towers’ service areas. The curves in (B) are obtained from 100 independent simulations.

BT and MMS viruses have their relative limitations: Although the spread of a BT virus is rather slow because of human mobility, an MMS virus can reach only a small fraction of users because of the fragmentation of the call graph. Both limitations are avoided by hybrid viruses that can simultaneously use both BT and MMS connections to spread; the first of many such viruses was the “CommWarrior,” released in 2005 (2, 3). We found, however, that the spreading dynamics of a hybrid virus also displays a complex market share dependence (Fig. 4, A and B), resulting from a nontrivial superposition of the BT and MMS spreading modes. For example, for m = 0.15, when there is a giant component aiding the MMS spreading mode (Fig. 2C), the early stage of the spreading process is dominated by the rapid invasion of the MMS cluster. Subsequently, the BT mechanism allows the virus to invade the rest of the independent MMS clusters as well. For m = 0.01, however, there is no MMS giant component; thus, the spreading is dominated entirely by the BT capability, resulting in a substantially slower spreading pattern (see the different horizontal axes in Fig. 4, A and B).

Fig. 4

The spreading patterns of hybrid viruses. (A and B) The time-dependent fraction of infected users for a hybrid virus spreading on a handset with (A) m = 0.01 and (B) m = 0.15 market share, compared with the BT and MMS spreading modes. (C and D) The m-dependence of latency time for hybrid, MMS, and BT viruses for (C) q = 0.15 and (D) q = 0.65. (E and F) Ratio between the time it takes a BT or MMS virus to reach (E) 15% and (F) 0.65% of the population divided by the time it takes a hybrid virus to reach the same fraction of the population as a function of m. The curves in (A) to (F) are obtained from 10 independent simulations.

The relative role of the BT and the MMS spreading patterns for hybrid viruses is illustrated in Fig. 4, C and D, which shows the latency time T(q,m) for q = 0.15 and q = 0.65. We find that for high m, the MMS mechanism dominates the hybrid virus’s spreading pattern. As m decreases below mq* given by q = Gmq* (Fig. 2E), the giant component becomes smaller than q, so T(q,m) for MMS diverges (green curve) and the BT mechanism starts dominating the spreading rate of a hybrid virus. Therefore, for small values of m the latency time of the hybrid virus converges to the latency time of a BT virus. We found, however, that the phase transition that governs the fragmentation of the call graph plays a key role in the spread of hybrid viruses as well, delimiting the rapid MMS-dominated and slow human mobility–driven spreading modes.

As shown in Fig. 4, E and F, we explored the additional infective power of a hybrid (H) virus, defined as the ratio TMMS(q,m)/TH(q,m) relative to its pure MMS counterpart or TBT(q,m)/TH(q,m) relative to its pure BT counterpart. We found that hybrid viruses are about three times faster than an MMS virus at a constant market share for m > mc. The contribution of BT technology for a hybrid virus dominates for mmc because MMS viruses are unable to spread in this region [TMMS(q,m) = ∞]. The additional infective power of a hybrid virus as compared with a BT virus achieves its highest value close to mq*, decreasing quickly for m → 0 and mildly for m → 1, once again underlying the importance of the critical behavior near mq*.

Taken together, our results offer a comprehensive picture of the potential dangers posed by mobile viruses. We found that although a BT virus can reach the full susceptible user base, its spread is constrained by human mobility, offering ample time for developing and deploying countermeasures. In contrast, MMS viruses can reach most susceptible users within hours. Their spread is limited, however, by the market share–driven phase transition that fragments the underlying call graph, which allows us to predict that no major virus breakout is expected for OSs with market shares under the critical point associated with the user base. Therefore, the current lack of a major mobile virus outbreak cannot be attributed to the absence of effective mobile viruses, but is mainly rooted in the fragmentation of the call graph. Given, however, the rapid growth in the number of smart phones and the increasing market share of a few OSs, it is not inconcievable that the phase transition point will be reached in the near future, raising the possibility of major viral outbreaks. Although the greatest danger is posed by hybrid viruses that take advantage of both BT and MMS protocols, we found that their spread is also limited by the phase transition: Hybrid viruses designed for OSs with small market shares are forced into the slow BT spreading mode, offering time to develop proper countermeasures. We believe that the understanding of the basic spreading patterns presented here could help estimate the realistic risks carried by mobile viruses and aid in the development of proper measures so as to avoid the costly impact of future outbreaks.

Supporting Online Material

www.sciencemag.org/cgi/content/full/1167053/DC1

Materials and Methods

Figs. S1 to S8

Tables S1 to S3

References

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. The human mobility data that were used as a basis for the simulations relied on an anonymized billing data set that was previously recorded by a mobile provider as required by law and for billing purposes and not for the purpose of this project. We thank D. Brockmann, M. Hypponen, J. Park, Z. Qu, C. Song, A. Vazquez, A. Vespignani, and G. Xiao for discussions and comments on the manuscript. We also thank M. Hypponen for providing us the valuable mobile phone virus records data. This work was supported by the James S. McDonnell Foundation 21st Century Initiative in Studying Complex Systems; NSF within the Dynamic Data Driven Applications Systems (CNS-0540348), Information Technology Research (DMR-0426737), and IIS-0513650 programs; the Defense Threat Reduction Agency Award HDTRA1-08-1-0027; and the U.S. Office of Naval Research Award N00014-07-C.
View Abstract

Navigate This Article