News & AnalysisU.S. Science Policy

Agencies Rally to Tackle Big Data

See allHide authors and affiliations

Science  06 Apr 2012:
Vol. 336, Issue 6077, pp. 22
DOI: 10.1126/science.336.6077.22

John Holdren, the president's science adviser, wasn't exaggerating when he said last week that “big data is indeed a big deal.” About 1.2 zettabytes (1021) of electronic data are generated each year by everything from underground physics experiments and telescopes to retail transactions and Twitter posts.

Traffic jam.

This crowdsourced GPS data shows San Francisco taxis at 1-minute intervals for a full day.

CREDIT: COURTESY TIM HUNTER/MOBILE MILLENNIUM

Holdren was kicking off a federal effort to improve the nation's ability to manage, understand, and act upon that data deluge. Its goal is to increase fundamental understanding of the technologies needed to manipulate and mine massive amounts of information; apply that knowledge to other scientific fields; address national goals in health, energy, defense, and education; and train more researchers to work with those technologies. The impetus for the initiative, to be managed by the Office of Science and Technology Policy (OSTP) that Holdren directs, comes from a December 2010 report by a presidential task force that, Holdren said, concluded the nation was “underinvesting” in the field.

Computer scientists welcome the spotlight that the White House is shining on big-data research. “The announcements demonstrate a recognition by a broad range of federal agencies—Defense, Energy, NIH, and many more—that further advances in “big data” management and analysis are critical to achieving their missions,” says Edward Lazowska of the University of Washington, Seattle, who co-chaired the 2010 report on the nation's digital future. “The White House [OSTP] deserves enormous credit for herding the cats to create a true national initiative in this area.”

Last week's event gave half a dozen agencies a chance to showcase what Holdren described as “$200 million in new commitments.” However, it's not clear what portion of that figure represents new money and how much is a continuation of current activities under the new big-data umbrella.

A $35 million investment by the National Science Foundation (NSF) across several areas appears to be the largest commitment of new money this year by any federal agency. The Department of Energy (DOE), for example, is counting a $5-million-a-year award made last year to Lawrence Berkeley National Laboratory for an institute to advance efforts in data management, analysis, and visualization that have been under way for more than a decade within DOE's Advanced Scientific Computing Research program. The Defense Department says it plans to spend $60 million this year on new awards for research on big data, but officials couldn't say if that amount is more than what it has spent in previous years.

Even small investments are welcome. The U.S. Geological Survey points to a tiny (annual budget of $650,000) synthesis and analysis center in Fort Collins, Colorado, that brings groups of scientists together for a week to crunch large data sets. The National Institutes of Health (NIH) is counting a project begun in 2010 to put data from its 1000 Genomes Project on Amazon's cloud-computing platform. Researchers who want to use Amazon's web services to store and analyze the growing database of genetic sequences—the records of some 2700 individuals should be available by the end of the year—will be charged by the hour.

Part of the difficulty of quantifying the new big-data initiative is, ironically, a lack of reliable data on what the government is now spending. The 2010 report recommended $1 billion a year and asserted that current spending on a broader cross-agency program to advance network and information technology research and development (NITRD) falls well short of that level. But that assessment is just guesswork, the report admits, because most NITRD dollars (more than $4 billion in 2010) are used to enhance the computing capacity of scientists in other disciplines. A review of NIH's $570 million-a-year network and information technology portfolio, for example, found that only 2% is actually spent on core research.

The flagship project in NSF's new suite of programs addresses that problem head-on. It's a $25 million competition (nsf12499), run jointly with NIH, to fund fundamental research on “core techniques and technologies” involving big data. But the biomedical giant is putting only $3.5 million into the pot this year, as officials say it was hard to find uncommitted dollars in the current budget.

The solicitation encourages researchers to think big about how big data could transform society, from instant access to all health care information, to an early warning system for failing bridges to sensor-laden clothing that protects the wearer from harm. But applicants will be judged by a more modest yardstick; namely, the best ideas for collecting, managing, and analyzing data as well as promoting collaboration among scientists. Last week, NSF also announced a 5-year, $10 million award to the Algorithms, Machines, and People lab at the University of California, Berkeley, one of four new awards under the agency's Expeditions in Computing program and the only one dealing explicitly with tackling problems relating to big data.

The Defense Advanced Research Projects Agency last week announced a $24 million-a-year program to make “revolutionary advances” beyond current systems in processing and analyzing big data. The solicitation (BAA-12-38) emphasizes the challenges of using “imperfect data” available for only a brief period in the heat of battle, as well as the need to know when humans must override robotic systems.

Jeannette Wing, a professor at Carnegie Mellon University in Pittsburgh, Pennsylvania, and the former head of NSF's computing science directorate, hopes the big-data initiative will give computer scientists the chance to take full advantage of recent developments in the IT field. “Big data has been a ‘big’ thing in computer science for years,” she says. “But it's even bigger now, because of data analytics and cloud computing.”

View Abstract

Navigate This Article