# News this Week

Science  11 Oct 2013:
Vol. 342, Issue 6155, pp. 172
1. # Around the World

1 - Aar, Namibia
Namibia Top Choice for Gamma Ray Telescope
2 - Amsterdam
Dutch Challenge Russia Over Arctic Ship Seizure
3 - Mountain View, California
'Designer Baby' Patent Concerns Bioethicists
4 - Amsterdam
Scientist Falsified Work, CV
5 - Seattle, Washington
Microsoft Co-Founder Hints at Cell Bio Investment

## Aar, Namibia

### Namibia Top Choice for Gamma Ray Telescope

A patch of bushy land in southern Namibia is the best candidate to host part of the Cherenkov Telescope Array (CTA), which will be the world's largest gamma ray telescope. Scientists representing the 27-country CTA Consortium met in Warsaw in September, ranking the Namibian site as the best of five options for the CTA's southern array; four sites competing for the northern array earned equal ratings.

2. # Random Sample

## Noted

Off with that head: New DNA analyses published this week in the European Journal of Human Genetics are refuting claims that two famous specimens belonged to French kings: blood from Louis XVI, stored for centuries in an 18th century gourd; and a mummified head thought to be that of his ancestor Henry IV (the Y-chromosome resembled DNA isolated from the blood) (Science, 24 May, p. 906). http://scim.ag/BourbID

## Two Million and Counting

Rare as this carnivorous plant is, the purple pitcher plant Sarracenia purpurea has just gotten a lot easier to study. The pitcher plant is the 2 millionth specimen added to the New York Botanical Garden's (NYBG's) massive effort to take high-resolution photographs of the 7 million plant specimens in its William and Lynda Steere Herbarium and digitize them.

The digitization effort began in the 1990s to give Brazilian botanists and others access to information on the NYBG's 450,000 plants from Brazil. At first, only data about the specimens went online, but in the 2000s, NYBG began adding images of the pressed plants, making it possible to study a specimen with the same magnification as under a microscope, says NYBG botanist Barbara Thiers. It took 12 years to digitize the first million plants; 300,000 are now added each year.

## They Said It

For more tweets on science and the shutdown, visit: http://news.sciencemag.org/tags/shutdown

"While the government is shutdown, I can't even apply for an internship with NASA. Come on politics, I'm just trying to science."

—Jered Hoff @jered_hoff

"No new experiments. Halted research on deadly brain disease #PML at NIH. Only allowed to maintain cell lines. #essentialscience #shutdown"

—Michael Ferenczy @ViralScience

## RoboRoach: Imperio!

At the TEDx conference in Detroit last week, RoboRoach #12 scuttled across the exhibition floor bearing a tiny backpack of microelectronics. Its direction was controlled by the brush of a finger against an iPhone touch screen.

RoboRoach is a do-it-yourself neuroscience experiment that allows students as young as 10 years old to create their own "cyborg" insects, say creators Greg Gage and Tim Marzullo, both neuroscientists and engineers, and co-founders of an educational company called Backyard Brains. In November, the company will begin shipping live cockroaches, accompanied by microelectronic hardware and surgical kits, across the nation for $99. The roaches' movements are controlled by electrodes that feed into their antennae and receive signals via Bluetooth signals emitted by smartphones. To attach the device, students are instructed to douse the insect in ice water to "anesthetize" it, sand a patch of shell on its head so that the superglue and electrodes will stick, and insert a groundwire into the insect's thorax. Next, they must carefully trim the insect's antennae, and insert silver electrodes into them. Gage says that the roaches feel little pain from the stimulation, to which they quickly adapt. But critics say the project is sending the wrong message. "They encourage amateurs to operate invasively on living organisms" and "encourage thinking of complex living organisms as mere machines or tools," says Michael Allen Fox, a professor of philosophy at Queen's University in Kingston, Canada. Bioethicist Gregory Kaebnick of the Hastings Center in Garrison, New York, who says he finds the product "unpleasant," likened it to the forbidden "Imperius Curse" of the Harry Potter novels. The RoboRoach, he says, "gives you a way of playing with living things." http://scim.ag/Roboroach ## ScienceLIVE Join us on Thursday, 17 October, at 3 p.m. EDT for a live chat on how humans are affecting our own water supplies. http://scim.ag/science-live 3. # Impact Theory Gets Whacked 1. Daniel Clery Planetary scientists thought they had explained what made the moon, but ever-better computer models and rock analyses suggest reality was messier than anyone expected. LONDON—Among many other contributions to science, the Apollo space program gave geophysicists a grand unified theory of the origin of the moon. The story goes like this: A few tens of millions of years after the birth of the solar system, a now-vanished planet roughly the size of Mars struck Earth a glancing blow that shattered them both and sprayed nearby space with debris. Earth reformed itself; the debris settled into a disk around Earth, which accreted into the moon. The giant impact scenario, based in large part on careful study of the 382 kilograms of moon rocks astronauts brought back between 1969 and 1972, was a triumph of planetary science. But the truth may not be that simple. Over the past decade, increasingly sophisticated computer simulations have shown that the tidy scenario clashes with what geochemists have discovered about moon rocks and meteorites from elsewhere in the solar system. As a result, researchers are casting around for new explanations. At a meeting* at the Royal Society in London last month—the first devoted to moon formation in 15 years—experts reviewed the evidence. They ended the meeting in an even deeper impasse than before, as several proposed solutions to the moon puzzle were found wanting. So near and—compared with other solar system bodies—so well-known, the moon is not yielding its secrets easily. "It's got people thinking about the direction we need to go to find a story that makes sense," says coorganizer David Stevenson of the California Institute of Technology in Pasadena. He and others already see one place that might hold a clue: Earth's superheated twin, Venus. Before Apollo, planetary scientists had put forward various theories of the moon's formation—for example, that it took shape alongside Earth from the same accretion disk of dust and rubble; that it was a wanderer captured by Earth's gravity; or that the proto-Earth was spinning so rapidly that it flung into space a blob of material, which condensed into the moon. But each scenario turned out to have some inescapable flaw. Some could not explain the moon's age, as determined from the Apollo rocks: just slightly younger than Earth's. Others could not account for the angular momentum bound up in its orbital dance with Earth. An alternative was waiting: the giant impact hypothesis, first proposed by William Hartmann and Donald Davis in 1975. The scenario appeared outlandish at first, but the closer researchers looked, the more plausible it seemed. The early solar system swarmed with planetesimals that could have struck Earth. A moderate-speed collision with an impactor about a tenth as massive as Earth would have spewed enough material into space to make the moon while leaving the angular momentum of the new system close to what astrophysicists measure today. The clincher was the fact that the giant impact scenario also explained three key findings from Apollo: the moon's age, the evidence that it was very hot in its youth, and its chemical similarity to Earth. "It works. You can make a moon," Stevenson told last month's meeting. Scientists embraced the model at a seminal meeting in Kona, Hawaii, in 1984, and, said Jay Melosh of Purdue University in West Lafayette, Indiana, "it worked brilliantly for a decade or two." Then things started to get complicated. About the same time as the Apollo moon rocks arrived, researchers began studying the ratios of different chemical isotopes in meteorites. The relative abundances of oxygen-16, -17, and -18, in particular, varied so much between meteorites traced to different parts of the solar system that scientists started using the ratios as a marker for the rocks' origin. The moon rocks, however, showed ratios markedly similar to those of rocks from Earth. "The moon and Earth are indistinguishable on the oxygen isotope plot," Melosh said. The isotopes of other elements told the same story. That didn't trouble geophysicists unduly at first, because they assumed that during the giant impact material from Earth and impactor would be thoroughly mixed. Doubts began to arise in 1986, after a team at Los Alamos National Laboratory in New Mexico published the first computer simulation of the hypothesized collision. The model was crude—it simulated the Earth-moon system with just 3000 particles—but the results were decisive. They clearly showed that after an impact big enough to produce a moon without leaving the Earth-moon system with excessive spin, the moon would consist almost exclusively of material from the impactor. More recent simulations tell the same story. A 120,000-particle model run in 2004 by Robin Canup of the Southwest Research Institute in Boulder, Colorado, suggested that the standard giant impact would leave the moon with more than 80% impactor material. This uneven mixing could explain the isotope results only if proto-Earth and the impactor were made of very similar material to start with—a sign, researchers concluded, that they must have formed close together under similar conditions. That idea received a blow in 2007 from a paper by Stevenson and his then-colleague Kaveh Pahlevan, which modeled how both the impactor and Earth took shape from the disk of debris surrounding the young sun. They argued that a planetesimal like the impactor, with mass a fraction of Earth's, would form out of material from a relatively narrow band around its orbit. A planet the size of Earth, however, would scavenge material from a much wider swathe of the disk, extending past the orbits of Mars and Mercury, where the planet-forming material had very different isotope ratios. So even if the proto-Earth and the impactor formed in similar orbits, their compositions would be different enough to produce distinct isotope ratios in the postcollision Earth and moon. Stevenson and Pahlevan's unexpected result threw origin-of-the-moon research into disarray, forcing planetary scientists to confront two unpleasant possibilities: either the collision between impactor and proto-Earth was more complicated than they have assumed, or their understanding of the makeup of the solar system needs a major overhaul. "The giant impact has major problems. It doesn't produce the moon as seen," Stevenson told the meeting. Theorists soon began tinkering with collision scenarios to come up with one that leaves the ingredients of the proto-Earth and its impactor thoroughly mixed. Stevenson and Pahlevan suggested one such scenario in their 2007 paper. They point out that the heat generated by the giant impact would produce an Earth and a debris disk made of molten and vaporized magma. They calculate that this superheated, churning inferno would take between 100 and 1000 years to cool. During that time, they argue, enough turbulent mixing and diffusion could take place between disk and Earth for them to reach an equilibrium, resulting in a homogeneous Earth and moon. At the meeting, however, some cast doubt on such a scenario. Melosh argued that to reach a uniform composition, Earth and the disk would have to exchange so much material that the disk would collapse. Stevenson admits that this mixing process "can help, but it's not an explanation in itself." Canup says it is important to study the scenario more thoroughly, "but it's a very challenging process to model." More radically, some want to rethink the whole giant impact scenario. Last year, Matija Ćuk and Sarah Stewart of Harvard University proposed that the impactor was far smaller than thought—only about 1/200 as massive as Earth—that it was moving much faster, and that the proto-Earth was already spinning rapidly (Science, 23 November 2012, p. 1047). The model can produce a moon of the correct size made up almost exclusively of material blasted from Earth's mantle. Unfortunately, the Earth-moon system winds up with twice as much angular momentum as it has today, but Ćuk and Stewart also proposed a mechanism for shedding the excess. Soon after the moon forms and as the Earth-moon system is evolving toward its current state, the moon's perigee—the point at which its orbit brings it closest to Earth—moves around Earth in a cyclic motion called precession. The cycles get longer and longer until the rate of precession slows to once per year. Then the precession becomes locked in a fixed position relative to the sun—a rhythm known as evection resonance. As a side effect, the resonance transfers angular momentum from the moon to the sun, in effect spinning up the sun while slowing down the moon. Evection has long been known, but most researchers thought the moon would not stay in the resonance for long. Ćuk and Stewart say that in their particular scenario, it could last long enough for the moon to shed half its angular momentum. In another paper published simultaneously with Ćuk and Stewart's (Science, 23 November 2012, p. 1052), Canup modified the impact scenario in the opposite way. She showed that a high-speed head-on collision between two bodies with similar masses also could have yielded a homogeneous Earth and moon—again at the cost of leaving the system with too much angular momentum. "Now we can produce a disk with the correct composition, but it still requires [evection] resonance to slow it down," Canup says. Evection may prove the Achilles' heel of both scenarios. At the meeting, Jack Wisdom of the Massachusetts Institute of Technology in Cambridge described unpublished results suggesting that Ćuk and Stewart had overestimated its effects. When the moon is in the evection resonance, he said, its orbit becomes more elongated, and the change produces extreme tidal forces that heat up the moon. This heating, Wisdom says, would change the moon's physical characteristics enough to end the resonance before it had time to drain enough angular momentum from the system. Details aside, many researchers at the meeting bemoaned the fact that things are getting so complicated. In the old impact model, a single simple event was all it took to create the moon. Now the models require an impact followed by some other process—turbulent mixing or evection—to make it work. "We don't have a single scenario which stands out because of its simplicity," Canup said. Melosh agreed. "The solutions are contrived; they're not natural," he said. "We want a solution where isotopic similarity is a natural consequence of the model." There is a way out of the dilemma, but it will not be easy to test. The isotopic similarity between Earth and moon would arise naturally in any of the collisions if researchers are wrong in their assumption that isotope ratios vary markedly across the solar system. This assumption stems from meteorites collected on Earth that have been identified as coming from a few other bodies: a couple of asteroids, a comet, and Mars. For example, researchers have some 120 fragments that were blasted off the martian surface by asteroid or comet impacts. Those rocks show very different oxygen isotope ratios from Earth or moon rocks. But Mars itself is something of an enigma. According to planetary formation models, such a small planet—just 10% as massive as Earth—should not have formed where Mars now sits. So what if Mars actually formed somewhere entirely different and later moved? That would destroy the idea that isotope ratios in the inner solar system change progressively with distance from the sun. With that constraint removed, it would be much easier for scientists to explain how proto-Earth and its impactor could have wound up with similar compositions. That explains why at the London meeting, when the session chairs jokily asked each speaker what single measurement they would most like to perform, many said they would like to examine a piece of rock from the planet Venus. Venus is Earth's rogue twin, and together the two planets contain 80% of the mass between the asteroid belt and the sun. If it turns out that Venus has very similar isotope ratios to Earth, then it is much more likely that an impactor might have had them as well. "Venus is the key," Stevenson said. But how to get hold of a piece of rock from Venus? Venus's surface is often described as "hellish," with atmospheric pressures 92 times those at Earth's surface and temperatures approaching 500°C. Only a handful of probes have survived to reach the surface, and there are no firm plans to return there in the near future. That leaves rocks that fall from the sky: meteorites. Venus's strong gravity makes it much less likely than Mars to have chunks of its rock lofted into space and onto a trajectory toward Earth, but it's not impossible. "We could have a piece in our collections," Canup says. "But how do we know?" • * The Origin of the Moon, Royal Society, London, 23–24 September. 4. # Biology's Dry Future 1. Robert F. Service The explosion of publicly available databases housing sequences, structures, and images allows life scientists to make fundamental discoveries without ever getting their hands "wet" at the lab bench. Most life scientists single-mindedly focus their careers on a particular organism or disease—even just a specific molecular pathway. After all, it can often take months of training to master growing a particular cell type or learn a new laboratory technique. Atul Butte, however, wanders from topic to topic—and reaps scientific successes along the way. Though only 44 years old, he has earned tenure at Stanford University's School of Medicine in Palo Alto, California, based on advances in diabetes, obesity, transplant rejection, and the discovery of new drugs for lung cancer and other diseases. Butte's lab is different, too. It isn't crowded with cell cultures and reagents. His tools look like those of an engineer or software developer: Most often, he's simply working on a Sony laptop, although at times he does turn to a large computer cluster at Stanford and supercomputers elsewhere when in need of massive processing power. Instead of growing cells and sequencing DNA, Butte, his students, and postdocs sift through massive databases full of freely available information, such as human genome sequences, cancer genome readouts, brain imaging scans, and biomarkers for specific diseases such as diabetes and Alzheimer's. Many call this type of research "dry lab biology," to contrast it with the more hands-on "wet" traditional style of research. Although statistics on the number of dry lab biologists are hard to come by, these data hunters believe they are a growing minority. Butte is one of its top practitioners. Using publicly available data, for example, 2 years ago Butte and his colleagues surveyed the activity of large sets of genes in people affected by 100 different diseases and in cultured human cells exposed to 164 drugs already on the market. By comparing patterns of genes flipped on or off by the diseases and by the drugs, the team drew unexpected connections. They found clues that a drug now prescribed for ulcers might also be a useful lung cancer treatment, for example, and that an antiepileptic compound would fight two forms of inflammatory bowel disease (see chart, p. 188). Subsequent lab studies of animals offered support for both inferences. And last month, Butte's group reported in Cancer Discovery that a similar approach suggested that the antidepressant drug imipramine would be effective against small-cell lung cancers resistant to standard chemotherapy—a finding that has already prompted the launch of a clinical trial. "This is an exciting time to be doing biological research on a dry bench," Butte says. And not just for Butte. The growth of publicly accessible data troves on genome sequences, gene activity, and protein structures and interactions has opened new territory for biologists. Seizing on advances in computational power, data storage, and software algorithms able to separate the wheat from the chaff, dry lab researchers are making fundamental discoveries without ever filling a pipette, staining a cell, or dissecting an animal. Thanks to a National Science Foundation–funded initiative called the iPlant Collaborative, for example, there's an emerging generation of data-analyzing "plant biologists" who have never gotten their hands dirty digging in soil or watering seeds. And the National Institutes of Health (NIH) recently announced plans to sink$96 million into boosting analysis of big data. "There is a transformation happening in biology," says Daniel Geschwind, a neurogeneticist at the University of California, Los Angeles.

"You basically don't need a wet lab to explore biology," agrees David Heckerman, a computational scientist at Microsoft Research in Los Angeles. None of these dry lab biologists believe that advances in data sciences will replace the traditional approach. Rather, they argue that the two dovetail with one another like never before, each propelling the other forward. "I'm like a kid in a candy store," Butte says. "There is so much we can do."

## Data for all

Big data is certainly nothing new to science. (Science had a special package on the topic in the 11 February 2011 issue.) The Large Hadron Collider at CERN generates 15 petabytes (1015) of data every year it's in operation. Astronomy's Sloan Digital Sky Survey contributes terabytes (1012) yearly as well. Big data isn't even all that new to biology. As of the end of August, for example, NIH's 31-year-old gene sequence database, GenBank, held some 167 million gene sequences containing more than 154 billion nucleotide bases.

Nor is the marriage of computational science and biology novel on its own. Researchers have amassed large-scale basic biology data sets for years—unimaginatively dubbed genomics, proteomics, metabolomics, and so on—and combed them in search of novel insights into complex biological pathways and disease.

But many of these early efforts were run by large consortia of researchers, who often had rights to first mine the data before releasing them to the public. So much of that information is now public, however, that it's opened the door for researchers who never participated in those consortia. "Now it's possible to ask big-data questions with data that is extant in the public domain," says Ed Buckler, a research geneticist who specializes in maize genetics at the U.S. Department of Agriculture's Agricultural Research Service in Ithaca, New York, and Cornell University.

Asking those questions requires specialized algorithms and software, capable of handling massive data sets, and those are improving even as the data proliferate. Heckerman and his Microsoft Research colleagues, for example, made a splash recently with a software advance that eases large-scale searches within genetic databases, such as those used to compare entire genomes in what are known as genome-wide association studies (GWAS). These efforts examine DNA of large numbers of ill people and healthy controls, looking for genetic fingerprints linked to disease. Those fingerprints can be subtle, because most diseases are unlike the simple traits of classical genetics—the colors of Mendel's peas, for example—in which each trait maps to a single gene. "When people first started doing GWAS they thought this would be really easy," Heckerman says. "The problem is that Mendel's peas are the exception not the rule."

Instead, the genetics behind most traits and diseases, such as diabetes and prostate cancer, is far more complex, with small contributions from many genetic changes having an additive effect. "To uncover these weak signals you need tons of data. You need tens of thousands or hundreds of thousands of people," Heckerman says. "But there is a catch. When you analyze lots of data, there is hidden structure," in which separate individuals share a multitude of genetic similarities. But in many cases, these similarities are due to two individuals being more closely related than others, instead of sharing common disease genes. "That wreaks havoc with data. You get tons of what looks like signals. But when you look closer it evaporates."

One way around this has been to use a data analysis approach called a linear mixed model. The approach's mathematical rigor helps reduce false positives, but the computing power needed for it grows as a cube of the number of subjects being analyzed. That's no problem when analyzing a few dozen people or so, but if you want to comb through tens of thousands of genome samples, "forget about it," Heckerman says.

After grappling with the problem for some time, Heckerman and his colleagues came up with what he calls simple "algebraic tricks" to convert the problem to one that scales linearly with the number of subjects, making it tractable to crunch large data sets. The result, an algorithm dubbed FaST-LMM, reduces confounding results, increases the size of the samples that can be processed, and thereby increases the chance of seeing small signals hidden within large data sets. Last year, Heckerman's team used this FaST-LMM algorithm on Microsoft's cloud-based supercomputer known as Azure to compare the genomes of thousands of individuals in a database run by the Wellcome Trust, a biomedical research charity in the United Kingdom. They analyzed 63,524,915,020 pairs of genetic markers in total, finding a host of new associations that may serve as markers for bipolar disorder, coronary artery disease, hypertension, inflammatory bowel disease, rheumatoid arthritis, and type 1 and type 2 diabetes, as they announced in Scientific Reports on 22 January. These associations themselves have been made freely available on the Windows Azure Marketplace so that independent researchers can explore them further.

Butte cautions that such would-be links often fade away upon closer inspection, but he is delighted that software engineers are tackling hurdles in biology. "This is what we have been hoping for," Butte says.

Dry lab biology's impact on biomedicine extends well beyond GWAS studies. Researchers led by Asa Abeliovich at Columbia University, for example, reported in Nature on 1 August that they used a big-data approach to discover new molecular actors that influence whether patients with a common variant of a gene known as APOE4 come down with Alzheimer's. In this case, they used publicly available gene expression data sets from brain tissue of humans with and without a late-onset version of Alzheimer's. They found that two genes, called SV2A and RNF219, have abnormally low activity in people who develop the disease.

Combined with other clues to the genes' functions, the finding suggests that they act as previously undiscovered players in the molecular network that regulates intracellular accumulation of amyloid precursor protein. Amyloid collects in dense plaques in patients' brains and may play a causal role in the disease. Abeliovich's team confirmed the result in lab studies of mice, and then moved on to people—still in a dry lab. The team analyzed publicly available neuroimaging data of Alzheimer's patients and showed that variations in RNF219 are correlated with the amount of amyloid that accumulates in their brains.

The work not only raises hopes of new drug targets for fighting dementia, but it may also help doctors stratify patients into groups that may one day benefit from different Alzheimer's treatment programs, as they do today for patients with several types of cancer. The experiment, Geschwind notes, was impressive because of the combination of database mining, lab validation, and imaging analysis of now standardized brain scans. "Five years ago they would never have been able to do this," he says.

## Beyond biomedicine

The rapid rise in the number of plants that have had their whole genomes sequenced and made public has enabled plant biologists to produce their own dry lab discoveries. Buckler and his colleagues, for example, have been exploring disease resistance across the many species of maize, or corn. In one recent paper, they compared the genomes of 103 different maize species, analyzing 1000 different regions of DNA both within genes and nongene regions of the chromosomes. They linked certain traits, such as variation in disease resistance and in when the plant flowers, to specific patterns of the noncoding DNA. Now, Buckler says, his group and others are helping plant breeding programs improve disease resistance and other traits by singling out which offspring have nongene coding DNA signatures that promote desired traits. "Big data is already having a day-to-day effect on how people are breeding crops," Buckler says.

It's also helping answer more esoteric questions about plants. David Sankoff, a mathematician at the University of Ottawa, has tapped the whole genome sequences of some 30 flowering plant species to try to reconstruct the general genome architecture—not the specific DNA sequence—of the common ancestor of all flowering plants that lived some 120 million years ago. They recently reported a big step in that direction. By analyzing and comparing the presence of duplicate and triplicate copies of genes found within modern eudicots, one key branch of flowering plants, Sankoff's team concluded that the common ancestor had seven chromosomes and between 20,000 and 30,000 genes, making it a significantly smaller genome than many modern plants. Although such discoveries aren't likely to impact plant breeding or other commercial interests, "it's a really fun aspect of genetics work," says Eric Lyons, a plant geneticist at the University of Arizona in Tucson, who developed a comparative genomics database and software infrastructure used by Sankoff and his colleagues.

## Playing well together

Dry lab biology still faces plenty of growing pains. Among the most challenging is gaining access to other people's data. In many cases, researchers who have spent their careers generating powerful data sets are reluctant to share. They may be hoping to mine it themselves before others make discoveries based on their work. Or the data may be raw and in need of further analyses or annotation. "These are really hard problems," Butte says. "We need better systems to reward people that share their data."

A lack of common standards also handicaps the field. Not only do research groups file their data using different software tools and file formats, but also in many cases the design of the experiments—and therefore precisely what is being measured—can differ. Butte and others argue that dealing with multiple file formats is somewhat cumbersome but that the problem is surmountable. But it can be harder to account for differences in experimental design when comparing large data sets.

Years of work to standardize experiments, analysis, and interpretation of experiments involving tools such as DNA and RNA microarrays and proteomic mass spectrometry are beginning to pay off, Butte says. Heckerman agrees. Biological data, he says, are becoming "very standardized."

As the volume of publicly available data grows, so do concerns about genetic privacy. Geneticists have shown that even anonymous data can be "reidentified"—and any leaks can reveal not only the medical conditions of patients themselves, but also genetic predispositions to disease that other family members may share. In this case, however, at least one potential solution is already in place. In order to get access to the National Center for Biotechnology Information's database of genotypes and phenotypes (dbGaP), which archives studies such as GWAS associations and molecular diagnostic assays that attempt to link genes to traits, researchers must register and ask for approval. Furthermore, all such requests are made public, so that it's transparent who is attempting to gain access to the data and for what purpose.

To address these challenges—as well as take advantage of the scientific opportunities at the crossroads between big data and biomedical research—NIH announced this summer that it was launching a new project called Big Data to Knowledge (BD2K). With an initial funding of \$96 million over 4 years, BD2K has dual aims. It will establish a series of centers to push the development of novel algorithms and other methodology to make discoveries, and it will also create a series of working groups across NIH's institutes to deal with the trouble spots of data standards, access, and privacy. Other efforts to grapple with these tough issues exist as well, including a global alliance of more than 70 institutions in 40 countries that was launched in June 2013 to make more digital data freely available.

Dry lab biology could receive a further boost from an upcoming U.S. requirement that databases be open to the community. On 22 February, a memo from John Holdren, the director of the U.S. Office of Science and Technology Policy (OSTP), asked the heads of executive departments and agencies within the federal government to come up with new strategies to encourage access to federally funded science and data. The memo drew attention at the time for its call for increasing open access to scientific publications. But what went largely unnoticed is that the memo also called for digital data from federally funded unclassified research projects to be stored in publicly available databases. OSTP officials say they have the agency recommendations now and are in the process of reviewing them.

While a potential boon for biology's data miners, access to unprecedented data sources will likely exacerbate problems with data standardization and issues of patient privacy, Butte says. It could also create new headaches for those required to submit their data. They will either have to take time themselves, or hire assistants, to manage the data sets and prepare them for deposition in a public source. And that could take dollars and expertise away from actual research. Particularly in small labs, this may be a significant impact, says Peter Lyster, a program director in the Division of Biomedical Technology, Bioinformatics, and Computational Biology at the National Institute of General Medical Sciences in Bethesda, Maryland. "At some point, it's a zero-sum game."

That's only for the wet labs that generate the data, he adds. For the new breed of dry lab biologists, the combination of new tools, new policies, and burgeoning databases holds nothing but opportunities. Says Heckerman: "I think we're full steam ahead at this point."