Policy ForumSocial Media Research

Crisis informatics—New data for extraordinary times

See allHide authors and affiliations

Science  15 Jul 2016:
Vol. 353, Issue 6296, pp. 224-225
DOI: 10.1126/science.aag2579

Embedded Image

A funnel cloud near Venice, Italy, June 2012.


Crisis informatics is a multidisciplinary field combining computing and social science knowledge of disasters; its central tenet is that people use personal information and communication technology to respond to disaster in creative ways to cope with uncertainty. We study and develop computational support for collection and sociobehavioral analysis of online participation (i.e., tweets and Facebook posts) to address challenges in disaster warning, response, and recovery. Because such data are rarely tidy, we offer lessons—learned the hard way, as we have made every mistake described below—with respect to the opportunities and limitations of social media research on crisis events.

Social Media can be Fetishized

Too much importance is attributed to social media as a tool instead of to the behaviors that underlie it. At the same time, too little attention is granted to examining the “corners” of social media spaces where interesting forms of work and coalitions of helpers are found, e.g., to help people struggling in the recovery after the 2010 Haiti earthquake by “topping off” their phones with minutes (1) and offering language translation of texts (2), or after Hurricane Sandy with pet-family reunion (3). These creative efforts are important but include only a relatively small number of people in contrast to the loud and voluminous general social media response that arises after major disasters.

Yet in the world of emergency management, we often see a demand for science that “proves” behaviors. For example, does the use of social media save lives or improve emergency management? One wonders if the same questions were asked about the first landline telephones. Traditionally, surveys have been used to ask people after the fact how they received warning and evacuation information. Such research on social media use is problematic. The surveys are issued with some delay after the event. People cannot readily recall where different pieces of information came from. Furthermore, the distinction between where and from whom we learned the information is misplaced; e.g., the sheriff is a person, an agency, and a digital presence.

Underlying this survey research is both doubt and fetishization of social media: Are people really using it? Should we invest money in communicating this way? To appreciate that social media are being used in disasters can in turn mean radical changes must occur to emergency management organizations. Such overhaul comes at a cost. Hence there is caution and request for “proof.”

Although doubt about social media's value by emergency managers is understandable, crisis informatics researchers question whether such proof is attainable and if it can be used meaningfully in emergency management decisions in an evolving information environment. We argue that such proof is impossible and that demand by members of the public during an event is what will change the course of emergency management even if “best practices” dictate otherwise (4, 5).

Investigation should shift from the burden of proof of use to design in use. Enduring social media solutions for emergency management have yet to be discovered. What is necessary is to have sufficient permission by emergency management to support solutions as they emerge from grassroots operations and then to foster those ideas deliberately in subsequent events (2, 5).

For example, the Humanitarian Open-StreetMap Team (HOT) movement has drawn upon volunteer mappers to make geospatial data open and available in response to the 2010 Haiti earthquake, 2013 Typhoon Yolanda, and the 2015 Nepal earthquakes. International responders have grown to appreciate the value of the data. Although HOT strives to make data and process improvements in anticipation of big events, responders know that sometimes gains in collaborative mapping practice happen during the event (6).

Data scientists are relatively new to the disaster–social media research space and are knowledgeable about the management and analytics of large-volume data. But their knowledge and, hence, their questions commonly focus on the highest-level correlations that come about when comparing available large data sets [e.g. (7)]. This work is welcome and relevant but is only the beginning of a set of questions that need to be asked about behavioral phenomena.

In principle, social science should figure strongly in social media disaster research. Social science research recognizes often ignored, but critical, distinctions between human behaviors that result from endogenous hazards that can be captured (e.g., crime-based events) versus exogenous hazards that cannot (e.g., weather events that give rise to natural disasters) (5). In social media research, these differences are often flattened, even though they lead to different interpretations and outcomes. Marginalization of these critical fields in data science is worrisome: inclusion of social science would allow for more robust computational social science (8).

Sampling Decisions are Difficult

A tempting myth is that large volumes of social media data alone will reveal patterns of behavior, when in truth, data must be scoped in ways that make it pertinent to a meaningful question, which usually triggers a cascade of new data collection steps and questions (9, 10). When the focus is on volume, rigor in data collection becomes an afterthought. We argue that social media research is the same as any scientific endeavor with respect to the relationship between data collection and accurate research questions; the abundance of data disguises this obligation.

For example, our collaborators at the National Center for Atmospheric Research helped identify keywords to locate tweets about Hurricane Sandy before landfall on 29 October 2012. Keywords had to be broad (“frankenstorm” and “sandy”) because we did not know where the storm would land nor what the important research questions would be. As the place of landfall became clear, we added localized terms, e.g., place names. Collecting on broad terms like #sandy samples the “global” population of curious onlookers, but for sampling the “local” population, which shares and seeks information differently, one has to rely on highly localized terms. Geotagged tweets can help determine a user's proximity to an event; however, only 1 to 2% of tweets are geotagged.

Even this is not enough to enable many questions, because, in return, one receives only a tweet: It represents just one time a place was mentioned or a tweet was geotagged. What are users saying in between such tagged tweets? Are there meaningful data there? When we collect on specialized terms about a disaster, we get data skewed toward the disaster. How can we counterbalance such bias? Our solution includes collecting “contextual streams” on people identified first in the coarse-grained, keyword–based search. In our initial keyword–collected data set for Sandy, we identified 92,000 users who used geotagging on the U.S. eastern seaboard. We then retrieved their most recent 3200 tweets (a limitation imposed by Twitter) to generate a data set of 205 million tweets. Then, we could compare their historical data (with respect to geotagged movement) with their movement immediately before, during, and after the hurricane. This “boring in” on an interesting problem—in this case, evacuation or shelter-in-place movement—is where the power of social media lies with respect to social science problems.

Make “Big” Data Bigger, Then Smaller

Social media are inherently about participation. Social media data do not necessarily represent all of a population evenly, but they do represent a range of behaviors, ideas, and opinions that have a role to play alongside traditional disaster response. Disaster response can be better served if we plan to accept and allow people to have a voice and to help in disaster response.

In addition to this lofty call, social media data provide the opportunity for new insights on a larger scale. For example, instead of being in place to witness actions by members of the public, social media reveals some of what is happening on the ground at sites worldwide. It can be observed with a different kind of precision, including temporal sequencing of events and discourse that can be analyzed without transcription. The trick, though, is to be able to approach the “big data” qualitatively and even ethnographically.

To isolate activity by location or with respect to new and unusual behaviors, data sets must get bigger (by collecting contextual streams), before they can be sampled or filtered accordingly. This is because there are few natural constraints on social media data. There is no automatic mechanism for drawing one's “unit of analysis” and scope. The bounds of observation must be done through decisions—which may have acknowledged limitations—to scope the data. Once made smaller, content analysis helps determine whether the decisions were reasonable; if not, then back to the drawing board. In this way, researchers can isolate populations, regions, and small groups talking about or taking action, e.g., the Far Rockaway community in New York, where many residents felt underserved during Hurricane Sandy (11).

The Tyranny of the Tweet

Researchers and emergency managers interested in social media are at the mercy of social media providers (12) to gain access to data. Even when data are available, we are limited by the data delivery format. For example, Twitter makes data available in the JavaScript Object Notation (JSON) format, with each JSON object containing a single tweet without its conversational context. Yet we know that people speak with continuity across their posts without repeating terms that are most likely to be keywords (1318). As a result, most research studies tweets in isolation as single statements without the monologic context, never mind the conversational context. This, we find, renders most of them useless in terms of understanding their value. But, in context, they are often enlightening. Furthermore, it is difficult to reacquire that context as it requires substantial postprocessing, e.g., to expand a data set 10-fold to search for those contextual tweets. In this way, the data format influences how researchers conceptualize what can be done.

We have offered these lessons not just to caution but also to compel cross-disciplinary research in a critical and exciting area of societal import. The pursuit of deeper investigations enabled by comprehensive treatment of social media data can reveal surprising and emergent features of disaster response in a technologically mediated world.

References and Notes

Acknowledgments: This material is based on work sponsored by NSF grants AGS-1331490, IIS-1524806, and IIS-0910586. We thank all who have been affiliated with Project EPIC: Empowering the Public with Information in Crisis.
View Abstract

Stay Connected to Science

Navigate This Article