News this Week

Science  30 Jan 2015:
Vol. 347, Issue 6221, pp. 460

You are currently viewing the .

View Full Text

Log in to view the full text

Log in through your institution

Log in through your institution

  1. This week's section

    Tug of war over Arctic oil

    Brooks Range in Alaska's Arctic National Wildlife Refuge.


    The Obama administration has announced several big moves on Arctic oil. On 27 January, the White House issued a long-term plan for of shore drilling that would put several areas in the Arctic's Chukchi and Beaufort seas of limits (while opening swathes of the Atlantic from Virginia to Georgia). On 25 January, officials announced they want to expand protections for 5 million hectares, including 600,000 in an oil- and gas-rich coastal plain, within Alaska's Arctic National Wildlife Refuge. Both plans are drawing criticism from politicians in Alaska, which depends heavily on the oil and gas industry for revenues. “We are going to fight back,” vowed Senator Lisa Murkowski (R–AK), who heads the Senate's energy committee. The moves come on the heels of an executive order released last week by President Barack Obama aimed at improving coordination of U.S. policy in the Arctic as the United States prepares to assume the chairmanship of the multilateral Arctic Council later this year. The order creates an Arctic Executive Steering Committee to streamline agency policies and collaborate with state, local, tribal, and other groups.

    The return of the tortoises

    A Pinzón Island tortoise hatchling at the Charles Darwin Research Station on Santa Cruz Island.


    In the mid-18th century, sailors who landed on the Galápagos island of Pinzón helped create an environmental catastrophe on the island: Rats stowed in their ships ran rampant, eating the eggs and hatchlings of the giant tortoises that lived there. Humans tried to right the wrong, first with a conservation project launched in the 1960s that collected the few unhatched eggs, incubated them on another island, and then transported the 5-year-old tortoises back. The eggs were still threatened by the rat populations, however—until, in 2012, biologists distributed a rat-specific poison on the island via helicopter, making the island rat-free. Now, more than a century since the last baby tortoise was seen on the island, researchers have spotted 10 new hatchlings—which, they say, likely means a hundred times more are in the wild.

    Asia's newest synchrotron sees first light

    The National Synchrotron Radiation Research Center in Hsinchu.


    Taiwan's scientists will shortly have a new research tool for use in biology, nanotechnology, and materials science in the new Taiwan Photon Source at the National Synchrotron Radiation Research Center (NSRRC) in Hsinchu. After nearly 5 years of construction and testing, the facility achieved first light on 31 December. The 518-meter-circumference storage ring is designed to accelerate electrons to 3 gigaelectronvolts. NSRRC claims it will be one of the world's brightest synchrotron x-ray sources. Envisioned uses include protein microcrystallography, studies of protein interactions with other biomolecules, and the development of new materials. Phase one of the project includes the construction of seven beamlines. The facility will become available to users by the end of this year.

    “The lives of people in poor countries will improve faster in the next 15 years than at any other time in history.”

    The Bill & Melinda Gates Foundation's “Big Bet” for 2015, noting improved African GDP and agricultural productivity and new ways to combat epidemics.

    By the numbers

    2—Treatment beds available for each suspected, probable, or confirmed case of Ebola in Sierra Leone, Liberia, and Guinea, according to the World Health Organization's Ebola situation report last week.

    30%—Jump in tiger population from 2011 to 2014 in India (from 1706 to 2226), due to better conservation. India is home to 70% of the world's tigers.

    $90 million—Cost of the roughly 3% of methane delivered to Boston that leaked into the air from 2012 to 2013, finds a study in the Proceedings of the National Academy of Sciences.

    Around the world

    Washington, D.C.

    Senate: Climate change no hoax

    In a symbolic move, the U.S. Senate voted 98 to 1 on 21 January to approve a measure stating that climate change is real and “not a hoax.” The measure, sponsored by Senator Sheldon Whitehouse (D–RI), was aimed at forcing Republican senators to take a stand on an issue that is sensitive with conservative voters. It also poked fun at Senator James Inhofe (R–OK), who has called climate change “a hoax.” But Inhofe turned the tables at the last minute, endorsing the measure and redefining it: Only the idea that humans could affect climate is a hoax, he said. The switch gave Republicans political cover to vote for the measure, but 15 Republicans also supported a separate, ultimately unsuccessful, measure that said humans do contribute to climate change.

    New Delhi

    No patent for hepatitis C drug

    The Indian Patent Office this month rejected a patent on hepatitis C drug Sovaldi, produced by U.S.-based Gilead Sciences Inc. Sovaldi has transformed care for hepatitis C by dramatically cutting treatment time and side effects, but has also come under fire for its $84,000 price tag for a 12-week course. Gilead offered India a discount of 99%—still out of reach for most Indians. The patent rejection opens the door to sale of a generic form of Sovaldi in India; according to a 2014 study in Clinical Infectious Diseases, manufacturing a 12-week course of the generic would cost at most $136. Gilead has challenged the decision. In 2013, India rejected a patent on the leukemia drug Gleevec.

    Geneva, Switzerland

    Ebola triggers WHO reforms

    The World Health Organization (WHO) has embarked on reforms to make it better able to deal with events like the Ebola epidemic in West Africa. At a special session on 25 January in Geneva, WHO's Executive Board adopted a resolution that calls for strengthening the agency's operational muscle, extending its global health workforce, and putting in place a contingency fund for future emergencies. The United Kingdom has already pledged $10 million to that fund, WHO Director-General Margaret Chan said on Sunday. The reforms, which must be approved by the World Health Assembly in May, come after WHO's acknowledgement that it bungled its response to Ebola last year. “The world, including WHO, was too slow to see what was unfolding before us,” Chan told the board, which consists of representatives from 34 members states.


    Research dog breeders sentenced

    Volunteers from Legambiente holding beagle puppies rescued from Green Hill.


    An Italian court has found three employees of Green Hill, a company that breeds beagles for animal studies, guilty of unjustified killing and mistreatment of dogs. The accusations against Green Hill in Brescia, Italy, a subsidiary of U.S.-based Marshall BioResources and one of Europe's largest suppliers of dogs for research, were filed in June 2012 by the environmental organization Legambiente and the animal rights group Lega Anti Vivisezione. The complaint noted that the dogs were never outdoors, were exposed to artificial light night and day, and lived in spaces that weren't properly cleaned, among other charges. The European Animal Research Association strongly condemned the verdicts, calling them a “legal travesty” and part of a “campaign to end animal research in Italy.” The court has to release a written motivation for the verdicts within 60 days.

    Western Galilee, Israel

    Meetup with Neandertals?

    A 55,000-year-old modern human skull found in an Israeli cave.


    Most Europeans and Asians have up to 2% Neandertal DNA in their genomes, but when and where did any love matches between Neandertals and modern humans take place? The discovery of a 55,000-year-old partial skull in Israel's Manot Cave, not far from previously excavated Neandertal fossils of similar age, shores up the suggestion from ancient DNA that these two human lineages engaged in at least some limited mating in the Middle East between about 50,000 and 60,000 years ago. Anthropologists agree that the new fossil, reported this week in Nature, is a member of Homo sapiens, the first to be found outside our African homeland during this crucial time period. Researchers also think it could represent the population of modern humans that soon afterward swept across Europe and Asia.

    Washington, D.C.

    Bill to revamp medical treatment

    A U.S. House of Representatives panel this week released a widely anticipated, bipartisan proposal for speeding the development of new medical treatments. The 393-page draft bill, dubbed the 21st Century Cures Act, has been under development by Fred Upton (R–MI) and Diana DeGette (D–CO) of the House Energy and Commerce Committee since April. The provisions aim to involve patients in drug development; speed clinical trials; streamline regulations; modernize manufacturing facilities; and encourage personalized treatments while also expanding support for young scientists at the National Institutes of Health. The bill is a discussion document, said Upton, who invited comments via the Twitter handle #Cures2015: “Some things may be dropped, some items may be added, but everything is on the table as we hope to trigger a thoughtful discussion toward a more polished product.”


    Corruption case snares scientist

    A prominent cancer researcher has become entangled in a high-profile corruption case in New York state. Robert Taub, former director of the Columbia University Mesothelioma Center, has been named as the “Doctor-1” described in a criminal complaint that accuses Democratic state Representative Sheldon Silver, the speaker of the New York State Assembly, of arranging bribes and kickbacks that netted Silver millions of dollars. The complaint alleges that Silver steered $500,000 from a state health care research fund to Taub; in exchange, Taub referred patients suffering from asbestos-related disease to Silver's law firm, investigators allege. Doctor-1 is cooperating with federal investigators, according to the complaint, and will not be charged with any crime. However, Columbia University noted in a statement on 23 January that “Dr. Taub no longer serves as” the center's director.

    New head of NOAA research

    Attorney Craig McLean, a veteran of the National Oceanic and Atmospheric Administration (NOAA), will be the new head of its research division, the agency announced on 21 January. A deputy chief of NOAA's Office of Oceanic and Atmospheric Research since 2006, McLean served as the first director of NOAA's Office of Ocean Exploration and Research. He's also been a top deputy at the National Ocean Service and the National Marine Sanctuaries Program and was a NOAA Corps officer for more than 2 decades. McLean “understands how to move ocean information out of the laboratory and into operations”—a transition that NOAA has struggled with, says Scott Rayder, a former chief of staff at NOAA and now a senior adviser at the University Corporation for Atmospheric Research in Boulder, Colorado. “He's a huge asset to the community. It's a clear signal that NOAA is placing a premium [on this].”

  2. Privacy

    Credit card study blows holes in anonymity

    1. John Bohannon

    Attack suggests need for new data safeguards.

    For social scientists, the age of big data carries big promises: a chance to mine demographic, financial, medical, and other vast data sets in fine detail to learn how we lead our lives. For privacy advocates, however, the prospect is alarming. They worry that the people represented in such data may not stay anonymous for long. A study of credit card data in this week's issue of Science (p. 536) bears out those fears, showing that it takes only a tiny amount of personal information to de-anonymize people.

    “The open sharing of raw data sets is not the future.”

    Yves-Alexandre de Montjoye, MIT


    The result, coming on top of earlier demonstrations that personal identities are easy to pry from anonymized data sets, indicates that such troves need new safeguards. “In light of the results, data custodians should carefully limit access to data,” says Arvind Narayanan, a computer scientist at Princeton University who was not involved with the study. Or as the study's lead author, Yves-Alexandre de Montjoye, an applied mathematician at the Massachusetts Institute of Technology (MIT) in Cambridge, puts it: When it comes to sensitive personal information, “the open sharing of raw data sets is not the future.”

    De Montjoye's team analyzed 3 months of credit card transactions, chronicling the spending of 1.1 million people in 10,000 shops in a single country. (The team is tightlipped about the data's source—a “major bank,” de Montjoye says—and it has not disclosed which country.) The bank stripped away names, credit card numbers, shop addresses, and even the exact times of the transactions. All that remained were the metadata: amounts spent, shop type—restaurant, gym, or grocery store, for example—and a code representing each person.

    But because each individual's spending pattern is unique, the data have a very high “unicity.” That makes them ripe for what de Montjoye calls a “correlation attack.” To reveal a person's identity, you just need to correlate the metadata with information about the person from an outside source.

    One correlation attack became famous last year when the New York City Taxi and Limousine Commission released a data set of the times, routes, and cab fares for 173 million rides. Passenger names were not included. But armed with time-stamped photos of celebrities getting in and out of taxis—there are websites devoted to celebrity spotting—bloggers, after deciphering taxi driver medallion numbers, easily figured out which celebrities paid which fares.

    Stealing a page from the taxi data hack, de Montjoye's team simulated a correlation attack on the credit card metadata. They armed their computers with a collection of random observations about each individual in the data: information equivalent to a single time-stamped photo. (These clues were simulated, but people generate the real-world equivalent of this information day in and day out, for example through geolocated tweets or mobile phone apps that log location.) The computer used those clues to identify some of the anonymous spenders. The researchers then fed a different piece of outside information into the algorithm and tried again, and so on until every person was de-anonymized.

    Just knowing an individual's location on four occasions was enough to fingerprint 90% of the spenders. And knowing the amount spent on those occasions—the equivalent of a few receipts from someone's trash—made it possible to de-anonymize nearly everyone and trace their entire transaction history with just three pieces of information per person. The findings echo the results of a 2013 Scientific Reports study in which de Montjoye and colleagues started with a trove of mobile phone metadata on subscribers' movements and showed that knowing a person's location on four occasions was enough to identify them.

    One way to protect against correlation attacks is to blur the data by binning certain variables. For example, rather than revealing the exact day or price of a transaction, the public version of the data set might reveal only the week in which it occurred or a price range within which it fell. Binning did not thwart de Montjoye's correlation attack; instead, it only increased the amount of information needed to de-anonymize each person to the equivalent of a dozen receipts.

    These studies needn't be the death knell for social science research using big data. “We need to bring the computation to the data, not the other way around,” de Montjoye says. Big data with sensitive information could live “in the cloud,” protected by gatekeeper software, he says. The gatekeeper would not allow access to individual records, thwarting correlation attacks, but would still let researchers ask statistical questions about the data.

    The mathematics needed to run such a system, a set of standards and algorithms known as differential privacy, is one of the hottest topics in data science. “It works best when you have a large amount of data,” says Cynthia Dwork, a computer scientist at Microsoft Research in Mountain View, California, who is one of the pioneers of the technique. She admits that it is a stark departure from the traditional academic practice of open data sharing, and many scientists are resistant.

    But without such safeguards, rich databases could remain off limits. Take, for example, the data MIT has accumulated from its massive open online courses. It's an information trove that education researchers dream of having: a record of the entire arc of the learning process for millions of students, says Salil Vadhan, a computer scientist at Harvard University. But the data are under lock and key, partly out of fears of a prospective privacy breach. “If we can provide data for research without endangering privacy,” Vadhan says, “it will do a lot of good.”

  3. The End of Privacy


    1. John Bohannon

    Facial recognition software could soon ID you in any photo.


    Appear in a photo taken at a protest march, a gay bar, or an abortion clinic, and your friends might recognize you. But a machine probably won't—at least for now. Unless a computer has been tasked to look for you, has trained on dozens of photos of your face, and has high-quality images to examine, your anonymity is safe. Nor is it yet possible for a computer to scour the Internet and find you in random, uncaptioned photos. But within the walled garden of Facebook, which contains by far the largest collection of personal photographs in the world, the technology for doing all that is beginning to blossom.

    Catapulting the California-based company beyond other corporate players in the field, Facebook's DeepFace system is now as accurate as a human being at a few constrained facial recognition tasks. The intention is not to invade the privacy of Facebook's more than 1.3 billion active users, insists Yann LeCun, a computer scientist at New York University in New York City who directs Facebook's artificial intelligence research, but rather to protect it. Once DeepFace identifies your face in one of the 400 million new photos that users upload every day, “you will get an alert from Facebook telling you that you appear in the picture,” he explains. “You can then choose to blur out your face from the picture to protect your privacy.” Many people, however, are troubled by the prospect of being identified at all—especially in strangers' photographs. Facebook is already using the system, although its face-tagging system only reveals to you the identities of your “friends.”

    DeepFace isn't the only horse in the race. The U.S. government has poured funding into university-based facial recognition research. And in the private sector, Google and other companies are pursuing their own projects to automatically identify people who appear in photos and videos.

    Exactly how automated facial recognition will be used—and how the law may limit it—is unclear. But once the technology matures, it is bound to create as many privacy problems as it solves. “The genie is, or soon will be, out of the bottle,” says Brian Mennecke, an information systems researcher at Iowa State University in Ames who studies privacy. “There will be no going back.”

    SIMPLY DETECTING FACES is easy for a computer, at least compared with detecting common objects like flowers, blankets, and lamps. Nearly all faces have the same features—eyes, ears, nose, and mouth—in the same relative positions. This consistency provides such an efficient computational shortcut that “we've been able to detect faces in images for about 2 decades,” LeCun says. Even the puny computers in cheap consumer cameras have long been able to detect and focus on faces.

    But “identifying a face is a much harder problem than detecting it,” LeCun says. Your face uniquely identifies you. But unlike your fingerprints, it is constantly changing. Just smile and your face is transformed. The corners of your eyes wrinkle, your nostrils flare, and your teeth show. Throw your head back with laughter and the apparent shape of your face contorts. Even when you wear the same expression, your hair varies from photo to photo, all the more so after a visit to the hairdresser. And yet most people can spot you effortlessly in a series of photos, even if they've seen you in just one.

    In terms of perceiving the world around us, facial recognition may be “the single most impressive thing that the human brain can do,” says Erik Learned-Miller, a computer scientist at the University of Massachusetts, Amherst. By contrast, computers struggle with what researchers call the problem of A-PIE: aging, pose, illumination, and expression. These sources of noise drown out the subtle differences that distinguish one person's face from another.


    Thanks to an approach called deep learning, computers are gaining ground fast. Like all machine learning techniques, deep learning begins with a set of training data—in this case, massive data sets of labeled faces, ideally including multiple photos of each person. Learned-Miller helped create one such library, called Labeled Faces in the Wild (LFW), which is like the ultimate tabloid magazine: 13,000 photographs scraped from the Web containing the faces of 5749 celebrities, some appearing in just a few photos and others in dozens. Because it is online and free to use, LFW has become the most popular benchmark for machine vision researchers honing facial recognition algorithms.

    To a computer, faces are nothing more than collections of lighter and darker pixels. The training of a deep learning system begins by letting the system compare faces and discover features on its own: eyes and noses, for instance, as well as statistical features that make no intuitive sense to humans. “You let the machine and data speak,” says Yaniv Taigman, DeepFace's lead engineer, who's based at Facebook's Menlo Park headquarters. The system first clusters the pixels of a face into elements such as edges that define contours. Subsequent layers of processing combine elements into nonintuitive, statistical features that faces have in common but are different enough to discriminate them.

    This is the “deep” in deep learning: The input for each processing layer is the output of the layer beneath. The end result of the training is a representational model of the human face: a statistical machine that compares images of faces and guesses whether they belong to the same person. The more faces the system trains on, the more accurate the guesses.

    The DeepFace team created a buzz in the machine vision community when they described their creation in a paper published last March on Facebook's website. One benchmark for facial recognition is identifying whether faces in two photographs from the LFW data set belong to the same celebrity. Humans get it right about 98% of the time. The DeepFace team reported an accuracy of 97.35%—a full 27% better than the rest of the field.

    Some of DeepFace's advantages are from its clever programming. For example, it overcomes part of the A-PIE problem by accounting for a face's 3D shape. If photos show people from the side, the program uses what it can see of the faces to reconstruct the likely face-forward visage. This “alignment” step makes DeepFace far more efficient, Taigman says. “We're able to focus most of the [system's] capacity on the subtle differences.”

    “The method runs in a fraction of a second on a single [computer] core,” Taigman says. That's efficient enough for DeepFace to work on a smart phone. And it's lean, representing each face as a string of code called a 256-bit hash. That unique representation is as compact as this very sentence. In principle, a database of the facial identities of 1 billion people could fit on a thumb drive.

    But DeepFace's greatest advantage—and the aspect of the project that has sparked the most rancor—is its training data. The DeepFace paper breezily mentions the existence of a data set called SFC, for Social Face Classification, a library of 4.4 million labeled faces harvested from the Facebook pages of 4030 users. Although users give Facebook permission to use their personal data when they sign up for the website, the DeepFace research paper makes no mention of the consent of the photos' owners.

    JUST AS CREEPY as it sounds,” blared the headline of an article in The Huffington Post describing DeepFace a week after it came out. Commenting on The Huffington Post's piece, one reader wrote: “It is obvious that police and other law enforcement authorities will use this technology and search through our photos without us even knowing.” Facebook has confirmed that it provides law enforcement with access to user data when it is compelled by a judge's order.

    “People are very scared,” Learned-Miller says. But he believes the fears are misplaced. “If a company like Facebook really oversteps the bounds of what is ruled as acceptable by society … they could go out of business. If they break laws, then they can be shut down and people can be arrested.” He says that the suspicion stems from the lack of transparency. Whereas academic researchers must obtain explicit consent from people to use private data for research, those who click “agree” on Facebook's end-user license agreement (EULA) grant the company permission to use their data with few strings attached. Such online contracts “are the antithesis of transparency,” Learned-Miller says. “No one really knows what they're getting into.” Last year, the company introduced a friendly looking dinosaur cartoon that pops up on the screen and occasionally reminds users of their privacy settings, and it boiled down the EULA language from 9000 words to 2700.

    There is already a bustling trade in private data—some legal, others not—and facial identity will become another hot commodity, Iowa State's Mennecke predicts. For example, facial IDs could allow advertisers to follow and profile you wherever there's a camera—enabling them to cater to your preferences or even offer different prices depending on what they know about your shopping habits or demographics. But what “freaks people out,” Mennecke says, “is the idea that some stranger on the street can pick you out of a crowd. … [You] can't realistically evade facial recognition.” FacialNetwork, a U.S. company, is using its own deep learning system to develop an app called NameTag that identifies faces with a smart phone or a wearable device like Google Glass. NameTag reveals not only a person's name, but also whatever else can be discovered from social media, dating websites, and criminal databases. Facebook moved fast to contain the scandal; it sent FacialNetwork a cease and desist letter to stop it from harvesting user information. “We don't provide this kind of information to other companies, and we don't have any plans to do so in the future,” a Facebook representative told Science by e-mail.

    The potential commercial applications of better facial recognition are “troublesome,” Learned-Miller says, but he worries more about how governments could abuse the technology. “I'm 100% pro–Edward Snowden,” Learned-Miller says, referring to the former National Security Agency contractor who in 2013 divulged the U.S. government's massive surveillance of e-mail and phone records of U.S. citizens (see p. 495). “We have to be vigilant,” he says.

    Learned-Miller's sentiment is striking, considering that he is funded in part by the U.S. Intelligence Advanced Research Projects Activity to develop a facial recognition project called Janus. Perhaps that's all the more reason to take his warning seriously.

    Correction (2 February 2015): The original version of this story incorrectly described the Janus project as a classified project. It is unclassified.

  4. The Privacy Arms Race

    When your voice betrays you

    1. David Shultz

    "Voiceprints" offer convenience and security, but they may pose privacy issues.

    “My voice is my password.” You may soon find yourself saying that—or perhaps you already do—when you call your bank or credit card company. Like a fingerprint or an iris scan, every voice is unique, and security companies have embraced voice recognition as a convenient new layer of authentication. But experts worry that voiceprints could be used to identify speakers without their consent, infringing on their privacy and freedom of speech.

    Voiceprints are created by recording a segment of speech and analyzing the frequencies at which the sound is concentrated. Physical traits like the length of a speaker's vocal tract or a missing tooth leave their mark on a voice, creating a unique spectral signature.

    Unlike a fingerprint, a voiceprint incorporates behavioral elements as well; traits like cadence, dialect, and accent easily distinguish, say, Christopher Walken from Morgan Freeman. Speech recognition systems, which aim to understand what is being said, minimize these differences, normalizing pitch and overlooking pauses and accents. But for identifying a unique individual, the disparities are crucial.


    Because voiceprint systems typically have the user repeat a standard phrase, identity thieves could theoretically record such phrases and play them back to fool the technology. The systems are designed to detect recordings or synthesized speech, however. An even safer alternative is to ask the customer to repeat a randomly chosen bit of text. “The system will prompt the user, ‘Now say this phrase,’” says Vlad Sejnoha, the chief technology officer at Nuance Communications Inc. in Burlington, Massachusetts, an industry leader in voice recognition technology. “It's hard to come prepared with all possible recordings.” Some systems require no pass phrase at all but rather analyze a person's voice by listening in the background—for instance, as they talk to a sales representative—and compare it with a stored voiceprint.

    Demand for voiceprint authentication is skyrocketing. Nuance Director Brett Beranek says the company has logged more than 35 million unique voiceprints in the past 24 months, compared with only 10 million over the previous 10 years. But massive voiceprint databases could make anonymity a scarcer commodity.

    “Like other biometrics, voiceprint technology does raise privacy issues, because it gives companies and the government the ability to identify people even without their knowledge,” says Jennifer Lynch, an attorney at the Electronic Frontier Foundation in San Francisco, California, specializing in biometrics. “That does create a challenge to anonymous speech protection” as enshrined in the United States' First Amendment, she says.

    How and when voiceprints can be captured legally is murky at best. Many countries have legislation regulating wiretapping, but voice recognition adds a major new dimension that most lawmakers have yet to consider, Lynch says. If the past is any guide, companies have massive financial incentives to track consumers' movements and habits. Recognizing someone as soon as they pick up the phone or approach a cashier will open up marketing opportunities—as well as ease transactions for the consumer. As with many new authentication technologies, the balance between convenience and privacy has yet to be struck.

  5. The End of Privacy

    Breach of trust

    1. John Bohannon

    After the Snowden revelations, U.S. mathematicians are questioning their long-standing ties with the secretive National Security Agency.


    Each year, recruiters from the National Security Agency (NSA), said to be the largest employer of mathematicians in the United States, visit a few dozen universities across the country in search of new talent. It used to be an easy sell. “One of the appealing aspects that they pitch is that you'll be working on incredibly hard and interesting puzzles all day,” says one mathematician who requested anonymity. In the wake of the terrorist attacks of 11 September 2001, he adds, “I felt that if there was any way I could use my mathematical ability to prevent such a thing from ever happening again, I was morally obligated to do it.” Several times over the past decade, he has set aside his university research to work for the agency.

    Lately, however, that sense of moral clarity has clouded for some mathematicians, and the recruiters' task has become more complicated. In 2013, former NSA contractor Edward Snowden began releasing documents revealing, among other things, that the agency has been harvesting e-mail and phone records from ordinary American citizens on a massive scale. NSA may have also purposefully compromised a mathematical standard used widely for securing personal computers the world over.

    The revelations unsettled the anonymous mathematician. “For people who share my motivations,” he says, “the ethics of the NSA's mission matter a great deal.” The news has also roiled the mathematics community and led some to question its long, symbiotic relationship with the spy agency, which nurtures budding mathematicians in school, supports the field with research and training grants, and offers academic mathematicians the chance to take part in the murky world of spy craft. Mathematician David Vogan of the Massachusetts Institute of Technology in Cambridge, who finishes his term as president of the American Mathematical Society (AMS) this week, has urged the society to rethink its long-running, close-knit ties with the agency—though he won little support from other AMS officials.

    In a sign of the difficulty of convincing the most talented mathematicians and computer scientists to work for the agency, NSA Director Admiral Michael Rogers has hit the road himself to make the pitch. “Many of you are potential future employees that I want to compete for,” he told an audience at Stanford University in Palo Alto, California, last November. “The biggest challenge for us … is getting people in the door in this environment.” A student in the audience asked what NSA offers to researchers who may be “disillusioned by the U.S. government.” In a reply that may not have helped, Rogers listed both the chance to “serve the nation” and “the opportunity to do some neat stuff that you can't legally do anywhere else.”

    THE NSA NEEDS MATHEMATICIANS like a papermaker needs trees,” Vogan says. The number of mathematicians employed by the agency cannot be verified. But its total staff is known to be in the tens of thousands, and its official mission—to design cryptologic systems for protecting U.S. information while exploiting weaknesses in the information systems of foreign countries—is deeply mathematical. Since NSA was established in 1952, it has engaged in a mathematical arms race, with ever more sophisticated codemaking and code breaking. As NSA has long affirmed, it has a vested interest in maintaining a healthy domestic mathematics community.

    Like the rest of its activities, the full extent of NSA's involvement with academia is secret. “We do not release specific budgets for programs,” the agency's public affairs office said in response to e-mail queries from Science. Even the total annual budget that Congress provides the agency is classified information; estimates have ranged from $8 billion to $25 billion.

    Only one line item in the NSA budget is publicly reported each year, and only because it involves a grants program for which AMS provides peer review. Through its Mathematical Sciences Program, the agency will spend $4 million this year on research grants, summer internships for undergraduates, sabbaticals for university professors to work at NSA, and mathematical conferences. It's a pittance compared with the more than $400 million that mathematicians receive each year from other federal agencies. But for a handful of areas that benefit, such as number theory and probability, “it's not a trivial amount of money,” Vogan says.


    The fruits of NSA support are readily found in academic journals. “It is expected that you will acknowledge the funding in your papers,” says Egon Schulte of Northeastern University in Boston, whose research in combinatorics is supported by an NSA grant. That makes it possible to directly track the academic output of NSA funding.

    An analysis by Science of academic papers indexed on Google Scholar (see graph) shows that NSA-supported research output grew steadily through the Cold War and the fall of the Soviet Union, dropped briefly between 1999 and 2002, then mushroomed in the wake of the 9/11 terrorist attacks. In 2013, more than 500 papers acknowledged NSA support.

    But direct grants for individual researchers are only a tiny portion of NSA's support for mathematics. Documents that the agency shared with Science describe a broad range of academic programs, from STEM (science, technology, engineering, and mathematics) education in schools to research labs at universities. NSA experts give classroom talks and judge science fairs. A small competitive grants program supports science summer camps and high school math clubs and computer labs. And an NSA program called GenCyber brings some of the most talented high school students and their teachers to universities to focus on “cyber-related education and careers” with help from NSA experts.

    The outreach helps the agency develop a close relationship with the brightest mathematicians at the start of their careers. “What we found is that the sooner you get in contact with students, the better chance you have to employ them,” NSA's then-director of human resources, Harvey Davis, told Congress in a 2002 hearing. Davis also pointed to the agency's cozy ties with higher education. “We are locked in with key professors who make decisions at the universities as well as the math community throughout the country.”

    At the 55 universities designated by NSA as Centers of Academic Excellence, a fulltime NSA “representative” is embedded on campus. According to the documents provided to Science, they serve as the “gateway” for the agency to “influence research and research partnerships that will impact the cyber world and workforce in the future.” NSA's target campuses include well-known private institutions such as Princeton University, New York University, and Carnegie Mellon University, as well as many public ones such as North Carolina State University, Pennsylvania State University, and the University of California, Davis.

    Some universities also receive significant funding from NSA to support research and training. For example, NSA is creating what it calls lablets, research groups within academic departments focused on cybersecurity. According to press releases from the universities, each has received between $2.5 million and $4.5 million so far, but again, the total budgets are unclear.

    This close relationship with academia stirred little controversy until recently, says Thomas Hales, a mathematician at the University of Pittsburgh in Pennsylvania. “Everyone knows colleagues who have worked for the NSA.” After stints at the agency, “they seem to get amnesia about what they were working on,” he quips, but with few exceptions, “no one really cared.” That changed in 2013, when mathematicians got a glimpse of how the agency was using some of their work.

    IN THE WAKE of the Snowden revelations, most of the media attention has focused on NSA's large-scale harvesting of data from U.S. citizens. But it is a more obscure exploit that concerns Hales and many other mathematicians: what they see as an attack on the very heart of modern Internet security.

    When you check your bank account online, for example, the information is encrypted using a series of large numbers generated by both the bank server and your own computer. Generating random numbers that are truly unpredictable requires physical tricks, such as measurements from a quantum experiment. Instead, the computers use mathematical algorithms to generate pseudorandom numbers. Although such numbers are not fundamentally unpredictable, guessing them can require more than the world's entire computing power. As long as those pseudorandom numbers are kept secret, the encoded information can safely travel across the Internet, protected from eavesdroppers—including NSA.

    But the agency appears to have created its own back door into encrypted communications. The computer industry, both in the United States and abroad, routinely adopts security standards approved by the National Institute of Standards and Technology (NIST). But in 2006, NIST put its seal of approval on one pseudorandom number generator—the Dual Elliptic Curve Deterministic Random Bit Generator, or DUAL_EC_DRBG—that was flawed. The potential for a flaw was first identified in 2007 by Microsoft computer security experts. But it received little attention until internal NSA memos made public by Snowden revealed that NSA was the sole author of the flawed algorithm and that the agency worked hard behind the scenes to make sure it was adopted by NIST.

    “[A]n algorithm that has been designed by NSA with a clear mathematical structure giving them exclusive back door access is no accident,” Hales wrote in an open letter published by AMS in February 2014. He tells Science that since then, “my conclusions have been reinforced by other sources.” For example, a July 2014 NIST report suggested that NIST was all but following orders from the intelligence agency. “NSA's vastly superior expertise on elliptic curves led NIST to defer to NSA regarding DUAL_EC,” the report said. Research by academic mathematicians has also revealed that the flaw is easier to exploit if the targeted computer uses other security products that were designed at the request of NSA. NIST dropped its support for the faulty standard in April last year. NSA has not made a public statement about it.

    Some defended the agency. In an open letter in AMS's online journal, Notices of the American Mathematical Society, Richard George, who describes himself as a mathematician who worked for NSA for 41 years, declared that his NSA colleagues “would not dream of violating U.S. citizens' rights,” although “there may be a few bad apples in any set of people.” As for NSA's engineering of a back door into personal computers, George wrote: “I have never heard of any proven weakness in a cryptographic algorithm that's linked to NSA; just innuendo.”

    In the pages of Notices, the revelations triggered a sharp debate about whether the society should cut its ties with the agency. Alexander Beilinson, a mathematician at the University of Chicago in Illinois who helped spur the discussion, argued that the society should completely wash its hands of NSA. The scale of the domestic spying and software tampering makes the United States seem like “a bloated version of the Soviet Union of the time of my youth,” he says. Vogan, AMS's president, was outraged as well. “The NSA may have deliberately broken commercial encryption software,” he says. “I see this activity as parallel to falsification of medical research for profit: as an individual wrong action, which damages permanently the position of science in the world.”

    But after all was said and done, no action was taken. Vogan describes a meeting about the matter last year with an AMS governing committee as “terrible,” revealing little interest among the rest of the society's leadership in making a public statement about NSA's ethics, let alone cutting ties. Ordinary AMS members, by and large, feel the same way, adds Vogan, who this week is handing over the presidency to Robert Bryant, a mathematician at Duke University in Durham, North Carolina. For now, U.S. mathematicians aren't willing to disown their shadowy but steadfast benefactor.

  6. The Privacy Arms Race

    Game of drones

    1. David Shultz

    Unmanned aircraft may soon be everywhere; how they will affect privacy is still unclear.

    Lately, drones seem to be everywhere. They're monitoring endangered wildlife, launching missiles, mapping rainforests, and filming athletes. They can fly high above a neighborhood or just hover outside a bedroom window. The Defense Advanced Research Projects Agency has already built robotic fliers not much larger than an insect; once batteries become small enough, they may become quite literally a fly on the wall. The opportunities—and potential violations of privacy—seem endless. But current and new laws may offer some protection.


    In the United States, the Supreme Court has concluded that nobody owns the airways and anyone can take pictures in public. As a result, citizens have been convicted of growing marijuana in their own backyards based on naked-eye observations made from planes flying overhead in “public navigable airspace.” On the other hand, a newly proposed law in California would make it illegal for paparazzi to use drones to snap pictures of celebrities on their own property.

    Existing laws also ban a peeping Tom from setting up in a tree at the edge of your property and peering into your bathroom window with binoculars; the same laws are likely to extend to flying a drone outside the same window. The Fourth Amendment, which protects citizens inside their homes from unreasonable searches and seizures without a warrant, may shield Americans from miniature government drones searching for illicit substances. But the extent of the protection will likely hinge on the finer points of the law.

    The Federal Aviation Administration is now producing new regulations for unmanned aircraft systems that will limit when and where commercial drones can fly; these may also help protect privacy in some cases. Many other countries, too, are debating how to balance privacy and freedom as drones proliferate.

    Creepy as it is to be watched from aircraft controlled by others, drones are hardly privacy worry No. 1, says John Villasenor, a policy analyst at the Brookings Institution in Washington, D.C., because there are ways to collect far more information easily. “Drone privacy is a legitimate concern,” Villasenor says. “But there are other technologies, such as mobile phones and the use of data gathered by mobile apps running on those phones, that, for me at least, raise far more pressing privacy issues.”

  7. The End of Privacy

    Risk of exposure

    1. Martin Enserink

    When new or dangerous infectious diseases strike, public health often trumps personal privacy.

    Few things can make you famous—or notorious—as fast as an encounter with the Ebola virus. New York physician Craig Spencer saw his daily life dissected by the media, which noted an evening at a Brooklyn bowling alley, a meal at the Meatball Shop, and rides on the 1, A, and L subway trains. Kaci Hickox, a nurse from Maine, was publicly attacked for defying a quarantine that scientists agreed made little sense. The Daily Mail, a British tabloid, delved into the past of freelance cameraman and Ebola patient Ashoka Mukpo and dug up salacious details about his parents' love life.

    Protecting medical information is tricky enough, but when you fall ill during an outbreak of a new or particularly scary disease, everything appears to become fair game. It's not just reporters who pore over your life. Doctors and public health officials, too, want to know where you have been, what you have done, and with whom. The more widely they share any of that information, the greater the risk to your privacy.

    A rise in the number of new and reemerging diseases in the past 2 decades—including SARS, MERS, and several influenza subtypes—has brought such problems painfully into focus, and the advent of social media and cell phone cameras has increased the pressure. When ambulance workers clad in white protective suits picked up a man at his home in the Dutch city of Maastricht on 26 October 2014, for instance, “it was on Twitter in 20 minutes,” says George Haringhuizen, a lawyer at the National Institute for Public Health and the Environment in Bilthoven, the Netherlands. Regional health officials were quick to deny claims that Ebola was involved.

    Reining in bloggers and Twitter users may not be easy. But even professional efforts to track outbreaks pose new threats to privacy. Information about specific patients—although anonymized—is now shared worldwide on public e-mail lists for emerging diseases such as ProMED, which often recirculates newspaper stories from around the world. Although it always redacts patient names, says ProMED Editor Larry Madoff, a simple Google search is enough to find the original story with those names.


    There is a growing need for global ethical standards for governmental disease surveillance, akin to what the Declaration of Helsinki provides for medical research, says Amy Fairchild, a historian at Columbia University who studies public health policy. Fairchild co-chairs a group of ethicists and public health experts assembled by the World Health Organization (WHO) to make recommendations on the subject; privacy will be a key issue, she says.

    UNTIL THE 1960S, major U.S. newspapers routinely printed the names and addresses of people with infectious diseases such as polio, Fairchild says. It wasn't until the 1970s, when governments and other organizations began storing large amounts of electronic data on citizens, including medical records, that privacy emerged as a political issue. Heart-wrenching cases of stigmatization and discrimination against AIDS patients in the 1980s—which led many to hide their HIV status—galvanized support for the protection of medical privacy.

    Many countries now have complex laws and regulations governing how and when medical information can be shared, such as the Privacy Rule of the U.S. Health Insurance Portability and Accountability Act, passed in 1996. Yet there still is a “huge tension” between the worlds of clinical care—where doctors try to protect individual patients—and public health, which tries to protect communities, says bioethicist Arthur Caplan of New York University's Langone Medical Center in New York City—especially during disease outbreaks. “Privacy doesn't fit well in the mindset of people in public health,” he says. “For them, the question is: How much can I get away with without privacy going completely out of the window?”

    Disease detectives at the U.S. Centers for Disease Control and Prevention, for instance, can't track down a mysterious outbreak without having as much information about the patients as possible. At the same time, governments can't know if public health policies work without gathering detailed data about disease incidence.

    But because of privacy concerns, doctors sometimes don't comply with requirements to notify health authorities when they diagnose a patient with a reportable disease, of which there are about a hundred in the United States. A qualitative study conducted in Canada at the height of the 2009 influenza pandemic showed that some doctors were surprisingly reluctant to report patients with flulike symptoms, as they were supposed to. “I think the bottom line for most family physicians is we will not share names, addresses, or phone numbers, period, without individual patient consent,” one said in a focus group.

    There are debates about how reported data can be used as well. New York state, for example, requires doctors not only to report HIV diagnoses, but also to forward lab results such as viral loads and CD4 cell counts. When such reports stop coming in for a given patient, researchers say, it's a sign they may have dropped out of treatment, which could help the virus rebound and put sex partners at risk. A 2013 study showed that of 409 dropouts, 57% were brought back into care after they had been traced and contacted—but some believe that's crossing a line.

    EVEN WHEN DOCTORS or government agencies treat health data discreetly, patient identities often become known—in their neighborhoods, towns, or in the press. When federal agents go around the block to trace the contacts of an Ebola patient, it's usually not hard to find out who the patient is. Europe's very first AIDS patient, who died in 1976, was long known as “the Norwegian sailor,” and later by an anagram of his real name, used in Edward Hooper's book The River—until journalists revealed his name around a decade ago. (The man is believed to have picked up HIV in West Africa in the early 1960s; his wife and one of his two daughters succumbed to AIDS as well.) The case still upsets Stig Frøland, a researcher at the Rikshospitalet in Oslo who published about the family and says he tried hard to protect their identity. Still, Frøland isn't surprised, “in view of my experience with the very aggressive attitude from national and international media through the years.”

    Craig Spencer's identity was revealed by the press, too. (A Twitter search suggests the New York Post identified Spencer first, 8 hours after his hospitalization, simply citing “sources,” followed shortly after by the New York Daily News.) The details that the health department subsequently made public about Spencer's movements before he fell ill were a clear invasion of his privacy, Fairchild says—and an unwarranted one, because he didn't have symptoms at the time and wasn't infectious. (Spencer, who asked the media to respect his right to privacy after he recovered, did not respond to requests for comment.)

    Mukpo, by contrast, agreed that his name could be released after he got Ebola, in part hoping it might help get him repatriated from Liberia, where he became infected. “Honestly, though,” he adds, “me remaining anonymous would never have been a realistic option,” given that he worked for NBC and knew many journalists.

    The upcoming WHO report, expected in 2016, will come up with recommendations for disease surveillance in general, not just for infectious diseases. But the panel may well borrow some pages from a similar WHO report, published in 2013, on the ethics of HIV surveillance, which remains an extremely sensitive issue today. That document recommended that the names of HIV patients be reported only for public health purposes—not for discrimination or criminalization—and only when confidentiality of the data is assured. It also said that people's right not to participate in surveillance should be respected as much as possible.

    The tension between privacy and public health will always remain, Caplan says, but preventing stigmatization and other negative consequences could help relieve some of the worries. “If you're not going to lose your job, lose your house, lose your mate, there's less reason to worry about your privacy,” he says. And eventually, he says, people may care less than they do today about whether officials track their movements and contacts. Young people already share massive amounts of information online—including where they are, what they're doing, and who they're with. (“When I ask them if they aren't worried about their privacy, I get a condescending look,” Caplan says.) Medical privacy, too, will become a “quaint notion,” Caplan predicts.

    For Mukpo—who noted the irony when Science e-mailed him to ask questions about his privacy—the exposure was actually a mixed experience. Although it was “very disconcerting to become such a public figure so quickly,” he did use the media spotlight to raise awareness about the Ebola situation in Africa. What's more, “the publicity also was an opportunity to see just how many amazing people I have in my life,” he adds. “The outpouring of concern was humbling.”

  8. The Privacy Arms Race

    Could your pacemaker be hackable?

    1. Daniel Clery

    Medical devices connected to the Internet are vulnerable to sabotage or data theft.

    In a 2012 episode of the TV series Homeland, Vice President William Walden is assassinated by a terrorist who hacks into his Internet-enabled heart pacemaker and accelerates his heartbeat until he has a heart attack. A flight of fancy? Not everyone thinks so.

    Internet security experts have been warning for years that such devices are open to both data theft and remote control by a hacker. In 2007, Vice President Dick Cheney's cardiologist disabled the wireless functionality of his pacemaker because of just that risk. “It seemed to me to be a bad idea for the vice president to have a device that maybe somebody on a rope line or in the next hotel room or downstairs might be able to get into—hack into,” said the cardiologist, Jonathan Reiner of George Washington University Hospital in Washington, D.C., in a TV interview last year.


    Medical devices such as insulin pumps, continuous glucose monitors, and pacemakers or defibrillators have become increasingly small and wearable in recent years. They often connect with a hand-held controller over short distances using Bluetooth. Often, either the controller or the device itself is connected to the Internet by means of Wi-Fi so that data can be sent directly to clinicians. But security experts have demonstrated that with easily available hardware, a user manual, and the device's PIN number, they can take control of a device or monitor the data it sends.

    Medical devices don't get regular security updates, like smart phones and computers, because changes to their software could require recertification by regulators like the U.S. Food and Drug Administration (FDA). And FDA has focused on reliability, user safety, and ease of use—not on protecting against malicious attacks. In a Safety Communication in 2013, the agency said that it “is not aware of any patient injuries or deaths associated with these incidents nor do we have any indication that any specific devices or systems in clinical use have been purposely targeted at this time.” FDA does say that it “expects medical device manufacturers to take appropriate steps” to protect devices. Manufacturers are starting to wake up to the issue and are employing security experts to tighten up their systems. But unless such steps become compulsory, it may take a fatal attack on a prominent person for the security gap to be closed.

  9. The Privacy Arms Race

    Hiding in plain sight

    1. Jia You

    Software lets you use location-based apps without revealing where you are.

    Whether they're looking for nearby restaurants, wondering what to wear, or finding the fastest route, most people allow their smart phones to send their GPS locations to Yelp, AccuWeather, or Google Maps without a second thought. But these data can be shared with advertisers and other third parties that profile users' movement patterns, often without their knowledge.


    Even anonymizing people's location data doesn't necessarily protect their privacy. When New York City released anonymized data on more than 173 million taxi trips in response to a Freedom of Information Act request in March, researchers quickly combined the data with known reference points—addresses, for example—to pinpoint celebrities' cab trips and identify who frequented local strip clubs.

    Computer scientists are devising countermeasures. CacheCloak, a system developed by researchers at Duke University in Durham, North Carolina, throws off tracking efforts by hiding users' actual location data. When you want to find, say, nearby restaurants, CacheCloak doesn't send Yelp or Google your exact GPS coordinates, but an entire path that it predicts you will take. That path is made to intersect with predicted paths from other users, so that the service sees requests from a series of interweaving paths where a driver can go either way at each crossing, and cannot track any single user. But consumers can still receive relatively accurate results.

    A slightly different camouflage strategy is to send dummy locations along with a user's real location. Researchers at Microsoft, for instance, have built an algorithm that can generate realistic car trips in Seattle based on real GPS data on 16,000 drives taken by about 250 volunteer drivers in the area. The dummy trips have plausible start and end points—no stopping in the middle of a highway—adhere to speed limits, and deliberately follow slightly nonoptimal routes, so that a filter can't easily pick out the false trips from the real ones. A mobile phone would draw on the library of routes to send both the user's actual location and points from many dummy trips to a cloud-based location service like Google Maps. The app responds—say, to a request for traffic warnings—for all locations, but users can use the answers they need and disregard the rest.

    The downside of the strategy is that such dummy searches can result in embarrassment, says computer scientist Michael Herrmann of the University of Leuven in Belgium. For example, many people might not want their trip to the library masked as a visit to an HIV testing site.

    In a third strategy, algorithms can simply send imprecise location data to services, cloaking a user's whereabouts in 1-kilometer squares rather than revealing precise GPS coordinates. But that has the obvious drawback of decreasing the quality of an online service, Herrmann says. For a weather app, your exact location may not matter, but if you're on foot and need to find a nearby ATM, precision is crucial.

    In the end, human movements are often so predictable that they are hard to conceal. Location-hiding techniques are most valuable when you want to hide one-off trips, Herrmann says. But when it comes to protecting the location of your home and workplace, you might as well give up on privacy.

  10. The End of Privacy

    Trust me, I'm a medical researcher

    1. Jennifer Couzin-Frankel

    Scientists can no longer guarantee patients' privacy. They're looking for new ways to build trust.


    In an Oxford, U.K., suburb, a short distance from the track where Roger Bannister ran the world's first 4-minute mile, a quiet revolution is under way. One hundred and twenty-six people and counting, all suffering from a rare rheumatologic disease or the parent of an affected child, are involved in a research project on the disorders. But rather than donating a few samples, filling out a questionnaire, and hoping something useful will come of it one day, these subjects are deeply invested in the research. They are contributors with a voice.

    One is Elaine Rush, a 53-year-old from outside Southampton. She was born with the brittle bone disease osteogenesis imperfecta and has had, in her words, “only around 25 fractures in all—quite low compared to many.” Once not expected to live past the age of 5, Rush uses a wheelchair and has battled heart and lung problems associated with the disease. An eager partner in the quest to advance science, she now dials in to Skype calls every other month with one or more researchers and offers advice. When they were struggling with recruitment, Rush advocated posting on Facebook, where patients find each other. The study leaders are looking to follow her suggestion.

    RUDY, as this project to study rare diseases of the bones, joints, and muscles is called, represents a new kind of bargain between researchers and subjects in response to dwindling expectations of privacy. Until quite recently, a volunteer might have offered DNA or tissue to a single research group at a nearby university. Today many samples are banked, sequenced, and shared with potentially thousands of researchers. That allows for bigger studies with more statistical muscle, but it also makes it more difficult to keep patients' data private. It's now widely accepted that if someone can read your DNA, they might figure out who you are, either now or in the future, as technology marches ahead. The promise long made to participants—that their identity is stored in an unbreakable vault—no longer holds.

    “Patients are scared about access to ‘my data,’” says one of RUDY's leaders, Kassim Javaid, a balding, bespectacled University of Oxford rheumatologist based at the university's Nuffield Orthopaedic Centre. Offering them many layers of control, as RUDY does, “is a possible solution,” he believes.

    Rush and other participants in RUDY—which is co-funded by the United Kingdom's National Institute for Health Research—can decide whether their blood, their scans, and their medical histories can be shared with researchers at, say, a lab elsewhere in Europe or in the United States. They will be able to log on to a clinical trial Web page to learn whether one of their tissue samples has been flagged—an indication that a researcher somewhere is studying it.

    Throughout biomedical research, the advent of large repositories of DNA and tissue samples has forced researchers and ethicists to rethink their relationship with the volunteers who make their work possible. “Twenty years ago, people consented to do experiments based on trust and a handshake,” says Jamie Heywood, the co-founder of PatientsLikeMe in Cambridge, Massachusetts, a company that provides a platform for people with different diseases to share their health data. Now, patients shake hands with faceless others around the world. And in return for sharing their DNA far and wide—and potentially shelving their privacy—participants want a louder voice in research, and transparency about how it's conducted.

    THAT THE END OF GENOMIC PRIVACY has arrived became clear 2 years ago, when a young human geneticist now at Columbia University and the New York Genome Center, Yaniv Erlich, published a startling paper in Science (18 January 2013, p. 321) that confirmed the worries of many in the field. Erlich and his colleagues showed that it was possible to identify a man based on a partial DNA sequence of his Y chromosome, his age, and his U.S. state of residence—the type of basic information that researchers commonly post in DNA databases widely accessible to their community. By combining these snippets of information with what he found for others in the same family on popular genealogy databases—where more than 100,000 people have already posted DNA markers—Erlich could not only identify the donors of the DNA, but also their family members as far as second cousins once removed. “I don't even know my second cousins,” he says.

    Erlich hastens to point out that DNA can be anonymized, for instance by scrambling or deleting nucleotides containing sensitive information. That technique can render the information largely useless for research, however, and doesn't even always protect the donor. When James Watson, a co-discoverer of DNA's double helix, had his genome sequenced and published in Nature in 2008, he requested that his APOE gene—which can reveal a predisposition to Alzheimer's disease—be left out. But as three geneticists politely pointed out later that year in the European Journal of Human Genetics, genetic knowledge had advanced sufficiently to impute Watson's APOE status based on patterns in nearby DNA.

    Cryptographers are still exploring how to better protect DNA, and many agree it's important to continue that work. But Erlich's energies have shifted elsewhere. About a year ago, he and about 30 others convened at Cold Spring Harbor Laboratory to consider alternatives to privacy in research. They came back to that decades-old handshake and contemplated how to adapt it for 21st century science.

    Erlich was inspired in part by recent Internet phenomena where trust is a guiding force, such as Uber, which runs an online car-sharing service that matches drivers with passengers. “Uber takes two individuals that don't know each other,” he says. “I'm getting into someone else's car; he could chop me to pieces.” Airbnb, where people offer a room or their entire home for rent to complete strangers, is another example. These websites hold users accountable with reviews, profiles, and extensive documentation. This openness appears to build trust, Erlich says, and he thinks the same strategy can be applied to genetics and biomedical research.

    Research volunteers have always valued trust and transparency. In 2007, Alan Westin, a legal scholar who studied privacy and who died in 2013, conducted a survey of almost 2400 people for the Institute of Medicine. He found that respondents were less preoccupied with whether researchers knew who they were than with knowing what was happening to their medical information. Among those surveyed, 81% were not happy to have researchers parsing even so-called de-identified health data without their consent.

    “They are not hung up on privacy so much as autonomy,” says Mark Rothstein, a law professor at the University of Louisville in Kentucky. “Let's assume that you've de-identified, anonymized, and nobody can figure out who it is—but if people think you've used that information without their permission, they're still going to be very angry.”

    U.S. regulation is adapting to that sentiment. In August, the National Institutes of Health announced that, starting this month, it expects researchers to obtain informed consent from participants if their DNA, cell lines, tissue, or any other de-identified biological material will be used for research at any point in the future. “Part of governance is transparency,” says Bartha Maria Knoppers, who studies law and genetics at McGill University in Montreal, Canada, and is a member of a consortium called the Global Alliance for Genomics and Health, which is looking for new methods to share data more openly and responsibly. “You put in place a process of oversight and a mechanism to ensure that what you tell me is going to happen to my data is what is going to happen to my data.”

    Knoppers and many others point out that patients often want to share their DNA in the name of advancing research—and that fears of being identified through DNA may be overblown. Some databases ban researchers from re-identifying volunteers. There have been no breaches yet—or at least none that anyone knows about. Gail Jarvik, a medical geneticist at the University of Washington, Seattle, believes that most scientists don't care who handed over a blood or tissue sample. “Why identify them?” she asks.

    A HANDFUL OF EXPERIMENTS are now testing how to better inform volunteers about what's happening to their data. PatientsLikeMe has recruited 300,000 people with more than 2300 different diseases. Participants share their health data, analyze how they're faring in clinical trials, support each other, and help researchers and drug companies answer existing scientific questions and pose new ones. Heywood founded the company in 2004 while his younger brother Stephen was suffering from amyotrophic lateral sclerosis. (Stephen died 2 years later.) Jamie Heywood's philosophy is that if people are “understanding and enthusiastic participants,” they will agree that sharing widely will maximize the value of their DNA and other health information to the community—even if this offers them less privacy. The company keeps in touch with participants with a blog, social media, and regular e-mails.

    The Personal Genome Project (PGP) at Harvard Medical School in Boston, founded by geneticist George Church, goes even further: It asks participants to share their DNA sequences and health histories online for everyone to see. Almost 4000 have signed up so far; last month, PGP launched a “real name” option, whereby they can post their identity. “Many participants do mention altruistic reasons: sharing data publicly in order to promote our collective knowledge,” says Jeantine Lunshof, a Harvard ethicist on the project. “That seems to outweigh potential drawbacks.” To make sure participants understand its ramifications, PGP asks a series of hard questions. One example, says Heywood, who participates in PGP but flunked the test the first time: “If I commit a crime, could the DNA in this bank be used to identify me?” (The answer is yes.)

    Javaid of RUDY hopes that his strategy, called dynamic consent because consent is a continual process, will change how patients think about research. Patients can choose which portions of the study to complete—questionnaires, or the sharing of scans, for example—and also whether to restrict their data to RUDY investigators or allow them to be more broadly distributed. If someone says, “don't use my DNA, don't use my blood … we archive the samples so no one else can use them,” Javaid says, but they're preserved in case the patient changes his or her mind.

    Rush, the participant with brittle bone disease, has given broad consent, along with nearly all of RUDY's early adopters. “I personally don't feel that there's anything I need to hide,” Rush says, but she recognizes that not everyone feels the same way. She has also reached out to fellow patients. “The fact that they can opt out of some things” has helped her explain the study to those who might hesitate to sign on.

    As for Erlich, he's ready to borrow more ideas from Uber and Airbnb. In November, he and nine other attendees at the Cold Spring Harbor meeting published a paper in PLOS Biology outlining a “trust-centric framework … that rewards good behavior, deters malicious behavior, and punishes noncompliance.” Like people griping online about a driver's body odor or praising the free coffee and snacks in their vacation home, patients could write reviews about the researchers they have worked with. The system could include trusted mediators to engage both researchers and participants, and automated auditing of how study data are used. Perhaps, Erlich speculates, the visibility of researchers and their reputation would climb when they received accolades from peers or high marks from patients for returning results and raw data.

    Of course, trust is difficult to build and easy to squander. Last year, Uber itself received a failing grade from the Better Business Bureau after a deluge of customer complaints, and the company has been accused of exaggerating how carefully it vets its drivers. A similar breakdown could transpire in scientific projects.

    RUDY has won Rush's trust. “The RUDY researchers are reputable,” she says. “They wouldn't be sharing with [just] anyone.” As the study plods on in the months and years ahead, its success will depend on upholding that confidence.

    Correction (2 February 2015): An affiliation has been updated. Additionally, PatientsLikeMe was mistakenly referred to as a nonprofit; this has been corrected.

  11. The Privacy Arms Race

    Camouflaging searches in a sea of fake queries

    1. Jia You

    A browser extension masks your true interests with customized decoy questions.

    From health questions to shopping habits, your Web search history contains some of the most personal information that you reveal online. Search engine giants such as Google and Bing carefully log these data and save them in databases, where they might be shared with advertisers and the government.


    Privacy-conscious users can switch to anonymous search engines such as DuckDuckGo, which doesn't log a user's IP address, identifying information, or search history. DuckDuckGo alone processes about 7 million direct queries a day; traffic spiked after the 2013 revelations about the National Security Agency's snooping. But these services don't match the speed and convenience that Google offers. For consumers who want to continue using their favorite search services but with added protection, researchers at New York University in New York City have developed a browser extension that produces dummy search requests that drown out a user's real queries, thwarting any attempt to profile them.

    The software, known as TrackMeNot—which can be downloaded as a Firefox or Chrome extension—creates the fake search queries by harvesting phrases from RSS feeds from popular websites such as The New York Times. Dummies such as “George Clooney” and “Amtrak” are sent to a search engine in the background while consumers use their browsers as usual. You can customize the RSS feed to control the content of the decoys and pick which search engines to target. To make the dummies more believable, the algorithm automatically updates the search terms and even simulates clicks on links displayed on the results pages. It can also schedule fake queries primarily when users are actually searching.

    There's no guarantee that search engines wouldn't be able to separate the fake searches from the real ones, but it could cost them considerable resources to do so, says privacy expert Helen Nissenbaum, who co-developed the software. She hopes the project will serve as a proof of concept and garner sufficient users—it has more than 60,000 so far—to pressure search engine companies into meaningful dialogues on their privacy policies.

    The software may not be much help to users who look up sensitive terms monitored by governments—those related to political opposition, for instance—as it doesn't hide a user's real queries. For those users, computer scientists at Purdue University have developed an algorithm that not only sends out fake queries, but also hides a user's real interests by substituting real queries with phrases related to the same topics. The downside: The results become less relevant, forcing users to go through multiple pages of results to find the link they need.