# News this Week

Science  16 Feb 2001:
Vol. 291, Issue 5507, pp. 1177
1. # The Human Genome

1. Elizabeth Pennisi

It is an awe-inspiring sight. Open up the folded figure that comes with this issue of Science. There you will see the human genome, chromosome after chromosome, with its major features color-coded and described. Black tick marks show the coding regions along orange, blue, pink, and purple genes, the colors reflecting the function of the corresponding proteins. All told, some 2.9 billion bases of the genome are represented on this beach towel-sized poster.

It took geneticists 7 years to find the gene involved in cystic fibrosis—but here you can locate it in a few seconds in the last third of chromosome 7. Look down toward the bottom of the poster, on chromosome 17, to find BRCA1, one of the genes implicated in hereditary breast cancer. One quick look shows, too, that not all chromosomes are created equal. Number 19 is jam-packed with genes —23 per megabase, more than 1400 total, but chromosome 13 has relatively few, just five per megabase.

Thousands of scientists across the globe have labored for some 15 years to achieve this feat—the (almost) complete nucleotide sequence of human DNA, often called the book of life. Actually, two books exist, because the rival teams who compiled them were unable to mend their differences and pool their data. The genome sequence on the poster was compiled by J. Craig Venter and colleagues at Celera Genomics, a biotech company started just 3 years ago in Rockville, Maryland. The other, which appears in the 15 February issue of Nature, was produced by the International Human Genome Sequencing Consortium.

Both have yet to be finished, with all the i's dotted and the t's crossed. Small to large gaps exist in each draft, akin to a missing word or paragraph or page, but the gist of the story is still clear. Thus, even in this unpolished state, these two books offer the most comprehensive look at the human genome ever possible. To scientists like Richard Gibbs, who heads the sequencing effort at Baylor College of Medicine in Houston, that look is thrilling: “It's the same feeling you must get when you are on a satellite, and you are looking down at Earth.” Even more exciting, says Celera's Mark Adams, is that these drafts are really just the beginning. The Celera paper “is mostly a presentation of how we got where we are,” he points out, and it provides only a fleeting glimpse at the wealth of information contained in the sequence.

View this table:

Just obtaining the sequence is a phenomenal achievement, one that many researchers did not believe possible 15 years ago. (Science has highlighted a few of the unsung heroes in this massive endeavor.) Until now, the largest genome ever sequenced was that of the fruit fly, with 180 megabases, which Celera and academic researchers knocked off in March 2000. The human is almost 25 times as big and is infinitely more difficult to decipher. In essence, Figure 1, even with almost 50 meters of chromosomes, is just an abstract of the book. Spelling out the entire sequence, all 3 billion or so chemical letters that make up the DNA along each chromosome, would fill tomes equivalent to 200 New York City phone books. Yet all it takes is Internet access to view those letters, one by one. With a few clicks of the mouse, one can now scroll through the book of life. Fifteen months ago, the true positions of barely 10% of those letters were known; now some 90% are represented in both the Celera and public databases, with varying degrees of certainty in the latter. “Having this enormous amount of sequence all laid out is just the coolest thing,” says Robert Waterston, co-director of the Washington University Genome Sequencing Center in St. Louis.

This new text has enabled both groups to chart the genomic landscape with unprecedented precision and make their best guesses yet about the number and types of genes that humans share with other organisms or call their own. “There's a long list of things that blew my socks off,” says Francis Collins, director of the National Human Genome Research Institute, which supported the lion's share of the U.S. Human Genome Project. Collins points to the number and source of human genes as just two surprises. As the sequence is filled in over the coming months and years, almost every conclusion drawn by the several hundred researchers who've scanned this text will need revisiting, they concede. But the discoveries made so far have already made even these drafts best sellers.

## A new view

Perhaps most humbling of all is the finding by both Celera and the public consortium that humans have 32,000 genes, give or take a few thousand. That's only about twice as many as the nematode has, and the number “is a bit of an assault on our sensibility,” Collins notes. Celera's scientists have detected 26,383 genes that are almost sure bets and another 12,000 distant possibilities; the consortium came in at 24,500, with another 5000 expected to show up as gene-prediction programs improve. Both are a far cry from the commonly cited number of 100,000 genes.

“It shows that it is better to draw conclusions based on data rather than conjecture,” says Celera's Adams, who as late as May bet there were some 67,000 genes (Science, 19 May 2000, p. 1146). As the sequencers puzzled over what happened to the rest, reexamining evidence for the lower number, they realized that the oft-mentioned 100,000 arose from a back-of-the-envelope calculation by Harvard Nobel laureate Walter Gilbert in the mid-1980s; subsequent papers also predicted the total to be between 50,000 and 100,000 genes. Gilbert still stands by his count, and even those who have now predicted only about one-third that number are circumspect. There won't be fewer than 25,000, “but the top end of this number is still quite flexible,” says bioinformatics expert Ewan Birney of the European Bioinformatics Institute branch in Hinxton near Cambridge, U.K. Adams agrees: “I'm sure in some cases we've underpredicted” the genes.

One reason for wiggle room is that gene-prediction programs work either by looking for a sequence that's similar to known genes or gene fragments or by homing in on a sequence of the right size that has the telltale beginnings and ends of a gene. What these programs miss is “the mythical stuff called dark matter” by the gene predictors, says Birney—genes that are not very active. Gene-prediction software relies on, among other things, catalogs of expressed genes known as expressed sequence tags. But genes that are rarely active would not be detected in most screens of expressed genes. “There could be lots of dark matter, because there is no way to know [how much there is],” says Eric Lander, head of the Whitehead/MIT Genome Center in Cambridge, Massachusetts.

The less mythical genes are showing, however, how fewer genes can yield an organism as complicated as a person. By comparing the human genome with expressed sequence tags and with other genomic and protein data, researchers have figured out that human genes do more work than those in other organisms do—and therein may lie the difference between us and them. Whether in human, worm, or fly, each coding region of a gene is about the same size. Yet human genes assemble these regions in a startling array of combinations. So rather than specify just one protein, as was long believed, each human gene can, on average, spell out three proteins simply by using different combinations of the coding regions, called exons, located within its boundaries. “We're [now] understanding what vertebrate innovation is about,” Lander notes.

Proteins are turning out to be more complicated as well. Proteins consist of one or more identifiable domains, sections that have a particular shape or function. After looking at all the proteins potentially encoded in the genome, the public consortium concluded that although humans don't have appreciably more types of domains, they use those domains more creatively, “cobbling more of them together” than do worms or fruit flies, says Collins. Celera's team found this to be particularly true in certain classes, such as structural proteins involved in the actin cytoskeleton and proteins used in signal transduction and immune function.

Another surprise is “the whole architecture of chromosomes, the enormous differences,” notes molecular biologist Leroy Hood at the Institute for Systems Biology in Seattle. Adams was particularly intrigued by the distribution of single-nucleotide polymorphisms (SNPs), places on the genome where a certain base varies among individuals. “In some regions, the SNP density is higher than you'd expect, and [elsewhere] it's lower than you'd expect,” explains Adams. “There's something going on in the genome” that we don't understand, he adds, that determines why SNPs accumulate in some places but not in others.

Other features also vary across the genome. Regulatory regions called CpG islands that shut down nearby genes are denser in gene-rich regions than in the stretches of geneless DNA. Similarly, researchers are puzzling over why the rate of recombination, in which a pair of chromosomes swap equivalent bits of DNA, differs so dramatically. Parts of chromosome 13 are relatively stable, for instance, whereas chromosome 12 in men and chromosome 16 in women are enormously fickle.

Equally striking is how little of the genome actually codes for proteins and how those exons are distributed. Celera calculates that just 1.1% of the genome codes for proteins; the public figure is 1.5%. That's a sea change from when Fred Sanger, now retired and living outside Cambridge, U.K., did his pioneering work on DNA sequencing in the late 1970s. Then, “one imagined exons consecutively along the DNA,” he recalls. That's how bacterial genes are arranged. But human genes contain intervening sequence, sometimes extending thousands of bases, between exons. Not only does this make for big genes, but it complicates the task of gene identification.

Moreover, genes themselves can be separated by vast “deserts” of noncoding DNA, the so-called junk DNA. The term is proving to be a misnomer, however (see p. 1184). Celera scientists estimate that between 40% and 48% of the genome consists of repeat sequences: DNA in which a particular pattern of bases occurs over and over, sometimes for long stretches of a chromosome. One of the more common repeats, called Alu's, cover 288 megabases in the Celera human genome—nearly 10% of the total. And the public consortium's analysis shows that older Alu's tend to concentrate in gene-rich areas, suggesting that those Alu's located near genes may serve some useful purpose and thus were retained by the genome. “It's like looking into our genome and finding a fossil record, [one that shows] what came and went,” says Collins.

Among the most common DNA fossils are transposons—pieces of DNA that appear to have no purpose except to make copies of themselves and often jump from place to place along the chromosomes. They typically contain just a few genes—those needed to promote the transposon's proliferation. Both drafts confirm that transposons may also be a source of new genes. Celera found 97 coding regions that appear to have been copied and moved by RNA-based transposons called retrotransposons. Once in a new place, these condensed genes often decay through time for lack of any clear function, but some may take on new roles. And transposon genes themselves become part of the genome. Until recently, 19 of these transposon-derived genes were known. The public consortium just found 28 more. “It almost looks like we are not in control of our own genome,” notes Phil Green, a bioinformatics expert at the University of Washington, Seattle.

## Mysteries remain

For many years, these new texts are likely to suggest more questions than answers. Some questions, including gene number, arise because the incomplete sequence is hard to interpret. But continued sequencing by the public consortium should remedy that quickly, for both the public draft and the Celera version, as the company regularly incorporates new public data. “This is what scientists are supposed to do, look at the data” and revise their estimates as new information comes in, Adams says.

Other questions will persist despite an abundance of information. Both Celera and the public consortium, for instance, tried to determine whether sometime in its early history the human genome underwent a complete duplication similar to what is thought to have happened in plants. Such a duplication could explain why vertebrates have four times as many HOX genes, a group of key developmental genes, as do fruit flies. It might also explain why roughly 5% of the genome consists of stretches 1 kilobase or longer that have been copied and pasted, on either the same or a different chromosome, as the public consortium found. By contrast, large, duplicated segments make up less than 1% of the worm genome and less than 0.1% of the fly genome. Even so, the distribution of these human copies makes it hard to imagine that they resulted from a single whole-genome twinning event. “We can't entirely rule it out,” says Adams, “but there's not a lot of evidence for a systemic duplication.” Instead, duplication may have occurred in bits and pieces over millions of years.

Another head-scratching discovery, made by the public consortium, is that the human genome shares 223 genes with bacteria—genes that do not exist in the worm, fly, or yeast. Some researchers suspect that the ancient vertebrate genome took on bacterial genes, much the way pathogenic bacteria have taken in genes that confer antibiotic resistance. However, “it's not clear if the transfer was from human to bacteria or bacteria to human,” Waterston points out.

All this from a first glimpse at the nearly complete genome. Although their analyses occupy several hundred pages in Science and Nature, both Celera and the public consortium came away knowing that they had only scratched the surface. “It's like a book in a foreign language that you don't understand,” says Sanger. “That's the first job, working the language out.”

2. # Comparison Shopping

1. Eliot Marshall

Now that the human genome has come off the production line, researchers are eager to kick the tires and take it out for a spin. They actually have two versions to test drive, one produced with private money and the other with public funds. Naturally, people are asking how the two products compare. Getting an answer to that question, however, may not be straightforward.

Few scientists outside the groups that produced these draft genomes have examined the results side by side. Leaders of the two sequencing groups have written up their own evaluations; not surprisingly, each one concludes that its own team has done a superior job. A few independent analysts have taken a quick look at the data, but their judgments are tentative, in part because these genomes are fast-moving targets and are difficult to pin down. As additional data come in, both research groups are continuing to update their views of the human genome, touting the most recent improvements; the public consortium will continue to release updated drafts, but Celera's updates will be available only to its paying customers. The published reports appearing this week in Science and Nature represent a freeze of the data as they existed around the first week of October 2000. Given the extraordinary mass of data, it may take several months for molecular biologists to nail down the relative merits of each and get a good fix on their accuracy. Officials at the U.S. agencies that fund genome research are talking about holding a workshop to do just that, possibly on 3 April, but no meeting has yet been scheduled.

Anyone trying to evaluate the two products in the meantime needs to see the data in a format called a whole-genome assembly—a format that hasn't been released on the Web at this writing but will be available by the time the two papers are published. The assembly is a view of the genome that's meant to be as complete as possible: Redundancies in DNA sequence are supposedly removed, large chunks of contiguous DNA are assigned to specific chromosomes, and these chunks are meant to be in the right order and in the right back-to-front orientation.

J. Craig Venter and his crew at Celera Genomics in Rockville, Maryland, authors of this week's report in Science, say that their version of the genome, assembled last October, contains 2.65 billion base pairs of connected DNA, plus “chaff” DNA that isn't fully assembled, for a total of 2.9 billion base pairs. Venter calls this version “more than a draft,” because he says more of the data are in order and in correct orientation than in the version assembled by the public consortium last fall. Celera is making its October version of the genome available to the public for free, on condition that the data not be used commercially or redistributed, through the company Web site (http://www.celera.com/). The Celera team reports that more than 90% of its assembled genome is in contiguous data assemblies of 100 kilobases or more, and 25% is in assemblies of 10 megabases or more.

The publicly funded team, led by chief author Eric Lander of the Whitehead/MIT Genome Center in Cambridge, Massachusetts, reports in Nature this week that its version of the genome contains 2.7 billion base pairs of DNA. Like Celera's version, most of the sequence is in draft form except for chromosomes 21 and 22, which are considered “finished,” or as good as they get. Indeed, fully one-third of the genome is in finished form, and Lander's group estimates that the consortium is finishing at the rate of 1 billion bases per year. Like the Celera version, this draft contains more than 100,000 gaps.

The analysis in Nature is based on a genome assembly completed on 7 October by bioinformatics experts David Haussler and Jim Kent of the University of California, Santa Cruz (UCSC). This version initially had a problem, though: A computational glitch caused the finished DNA sequences to be “flipped” into reverse orientation. Lander says the glitch affected “less than one-half of 1%” of the data, but he notes that some details had to be corrected in the paper, and he says an improved assembly of the genome was placed on the UCSC Web site (genome.ucsc.edu) on 9 January. The Nature paper reports (using an index of contiguity called N50 to describe where 50% of the nucleotides are located) that the public N50 “scaffolds” of assembled data are at least 277,000 bases long. Celera's Gene Myers says the comparable value for Celera's scaffolds is more than 3 million bases.

Although both groups have produced genomes of approximately the same size, they describe the characteristics of their sequences in different terms, which makes a quick and easy comparison difficult. It is not clear how much of the DNA in either assembly is fully contiguous, accurately positioned, or correctly oriented.

To check the congruence of the two genomes, Stanford geneticists Michael Olivier, David Cox, and colleagues used a complex genome map devised in their lab—a collection of “radiation hybrid” clones that break the genome into fragments of known dimensions. With this admittedly imprecise measure, Cox reports on page 1298 that he found that the two versions and the radiation hybrid map differed relatively little. Only 766 unique genetic markers out of a set of 20,874 were not assigned to the same chromosome.

George Church, a genome researcher at Harvard University, also attempted to compare the two genomes. But instead of using the UCSC assembly of 7 October to represent the public version, he used a different assembly made in December by the National Center for Biotechnology Information, part of the National Institutes of Health. Church notes that he was “fortunate” in doing so, because of the glitch in the 7 October data. His report, which appears this week in Nature, concludes that the draft assemblies are “similar in size, contain comparable numbers of unique sequences … and exhibit similar statistics” on the number of active genes.

Researchers are eager to use these draft genomes. But the reviewers urge caution in using either one. As Lander points out, some “misassemblies” of DNA may have been “propagated into the current version of the draft genome,” creating potential landmines for the unwary.

3. # Watching Genes Build a Body

1. Gretchen Vogel

The human genome is touted as the master plan for building an organism. But it is up to developmental biologists to decipher how that “master plan” directs construction.

Traditionally, developmental geneticists have learned how genes control development by altering a gene and observing what goes wrong in model organisms such as the fruit fly Drosophila melanogaster, the nematode worm, and the mouse. Complete genomes —the fly, worm, and human are now finished—have simplified the process of locating genes that cause intriguing abnormalities.

But the genomes will also have a more profound effect. Genomics “has completely revolutionized how I think about developmental biology,” says Stuart Kim of Stanford University. That's because researchers can now take whole-genome snapshots of cells and tissues, instead of investigating one gene at a time. Kim and his colleagues have completed 800 microarray experiments recording the relative activity of nearly every worm gene at different developmental stages, in different body parts, and under different conditions. The result, Kim says, is a wealth of information about each of those genes. The problem now is how to make sense of the data avalanche—the team has yet to sort through the nearly 2000 genes that are turned on during development of the genitals, for instance.

Other researchers plan to conduct similar studies on human cells. For example, the biotechnology company Geron, based in Menlo Park, California, has signed an agreement with Celera Genomics in Rockville, Maryland, to analyze which genes are switched on in human embryonic stem cells, the prized cells taken from early embryos that can develop into any cell type. Following gene activity while the cells are still undifferentiated and as they develop into certain tissue types could reveal “the essence of being a stem cell,” says Kim.

4. # Controversial From the Start

1. Leslie Roberts

The human genome: the crown jewel of 20th century biology, heralded at the White House, plastered on the covers of countless magazines—and at last spelled out today in intricate detail in both Science and Nature. Deciphering this string of 3 billion A's, T's, G's, and C's is being hailed as an achievement that will usher in a new era of biology and even alter our understanding of who we are.

That's a far cry from how the idea was greeted when it was first proposed 15 years ago. “Absurd,” “dangerous,” and “impossible,” scoffed numerous critics, who noted that the technology did not exist to sequence a bacterium, much less a human. And even if the project's starry-eyed proponents could by some miracle pull it off, who would want the complete sequence data anyway?

It turns out a lot of people did. This once-ludicrous proposal became one of most hotly contested—and contentious—races in recent scientific history. Although the race has been dominated in the past few years by the acrimonious feud between the public and private teams, tensions go way back. And no wonder, with a prize this great and a project that has transcended and transformed traditional ways of doing biology. “The change is so fundamental, it is hard for even scientists to grasp,” notes geneticist Maynard Olson of the University of Washington, Seattle, who ranks decoding the human genome as one of the biggest accomplishments ever in biology.

## An impossible dream

One of the first to grasp that potential was Robert Sinsheimer, a biologist who was then chancellor of the University of California (UC), Santa Cruz. UC astronomers were already angling to build the world's biggest telescope, and Sinsheimer was looking for a project of similar magnitude in biology. Unraveling the sequence of the human genome might be just the ticket—if he could rally the scientific support and, of course, money. At the time, the largest genome yet sequenced was the minuscule Epstein-Barr virus—and that feat had taken several researchers years to complete. To apply such tools to the human genome, nearly 20,000 times bigger at 3 billion bases, was audacious beyond belief.

In 1985, Sinsheimer assembled some of the best minds in the nascent field of genome analysis to hash over the proposal at his idyllic campus, nestled in the hills above the sleepy beach town of Santa Cruz. John Sulston of Cambridge University and Robert Waterston of Washington University in St. Louis, who were already trying to map the genome of the nematode Caenorhabditis elegans, were there, as was Bart Barrell, head of large-scale sequencing at the U.K. Medical Research Council (MRC). So were genetic mappers David Botstein, then at the Massachusetts Institute of Technol-ogy (MIT), Helen Donis-Keller, then at Collaborative Research Inc., and sequencing afi-cionados Walter Gilbert and George Church of Harvard University and Leroy Hood of the California Institute of Technology in Pasadena. Their collective conclusion: bold, exciting—but simply not feasible. Sinsheimer's proposal for a genome institute at Santa Cruz died, but not before it had captured Gilbert's imagination.

Gilbert soon became the proposal's biggest champion, and his support meant the idea could no longer be blithely dismissed. A decade earlier, Gilbert and Allan Maxam, also at Harvard University, had invented a brand-new technique that enabled scientists for the first time to determine the genetic sequence of an organism. (Gilbert went on to share the Nobel Prize with Fred Sanger of Cambridge University, who independently invented a similar technique.) And he soon won over another giant of molecular biology: James Watson, who shared a Nobel Prize with Francis Crick and Maurice Wilkins for their 1953 discovery of the double helical structure of DNA.

The ambitious idea had also captivated Charles DeLisi, a cancer biologist who was then head of the Office of Health and Environmental Research at the Department of Energy (DOE). To DeLisi, the genome project was a logical outgrowth of DOE's mandate to study the effects of radiation on human health. Another equally compelling rationale—but one DeLisi did not openly tout—was that a massive new endeavor could provide new focus for DOE's national labs, whose bombmaking skills were in diminishing demand.

At the urging of DeLisi and DOE colleague David Smith, the Los Alamos National Laboratory hosted a workshop in Santa Fe, New Mexico, in March 1986 where the excitement was palpable. The idea quickly gained momentum, dominating discussion at a June meeting at Watson's Cold Spring Harbor Laboratory in New York. By then, biologists were beginning to think the project just might be doable. But whether it was worth doing was another matter (Science, 27 June 1986, p. 1598).

To many, like Botstein and Nobel laureate David Baltimore, then at MIT, the project ran counter to the way biology had been conducted for decades. The best work, the mantra went, came from investigator-initiated studies in small labs, not from some massive, goal-driven effort. Moreover, this was technology development, not experimental biology, and it would be mind-numbingly dull. Sydney Brenner of the MRC facetiously suggested that project leaders parcel out the job to prisoners as punishment—the more heinous the crime, the bigger the chromosome they would have to decipher. What was truly horrifying was the price tag, which was quickly estimated at $3 billion—a number that stuck through countless reports ever since. If the National Institutes of Health (NIH) were to foot the bill, the megaproject would rob funds from the rest of biology, the critics asserted. “It endangers all of us, especially the young researchers,” warned Botstein. The scientific value seemed dubious as well. Although many biologists agreed that maps of the chromosomes would be useful for finding genes, what good would come from deciphering every A, T, G, and C, especially since most of them were “junk” that did not code for genes. The sequence might be handy to have, but “was it worth the cost, not in terms of dollars but in terms of its impact on the rest of biological science?” asked Paul Berg of Stanford University. As the biology community wrestled with the merits of the project, NIH staked out a position firmly on the fence. By contrast, DeLisi and Smith were decidedly gung ho. DeLisi aggressively gained support for the project, first from his superiors at DOE and then from Congress, starting a small Human Genome Initiative within DOE in 1986. The following year, a prestigious advisory panel to DOE called for an all-out effort and urged the agency to take the lead. DOE was the logical choice, DeLisi argued, because this was “big science,” DOE's stock-in-trade, whereas NIH had never attempted a project of this scope (Science, 8 August 1986, p. 620; 31 July 1987, p. 486). The fact that DOE—not NIH—was lobbying for the project only heightened some biologists' unease, because they put great store in NIH's peer-review system. “The fear is not big science so much as bad science,” said Botstein, who in 1986 denounced DOE's proposal as “a scheme for unemployed bombmakers.” ## Emerging consensus Political posturing continued until 1988, when a National Research Council (NRC) panel gave the project its official seal of approval (Science, 12 February 1988, p. 725). Chaired by Bruce Alberts, then at UC San Francisco, the panel contained some of the project's staunchest advocates, such as Gilbert and Watson, and also some skeptics, including Botstein, mouse geneticist Shirley Tilghman of Princeton University, and yeast expert Olson, then at Washington University in St. Louis. Within a year, the panel endorsed the project unanimously, calling for a rapid scale-up in “new and distinctive” funds to$200 million a year over the next 15 years.

In the process, the panel redefined the project, laying out a phased approach that mollified critics and has guided the initiative ever since. Rather than plunge into sequencing—which no one knew how to do on a massive scale anyway—the project should begin by constructing maps of the human chromosomes. These would greatly speed the search for disease genes, offering immediate medical payoffs. The panel recommended that full-scale sequencing be postponed until new technologies made it faster and cheaper.

But it was the panel's recommendation to analyze the genomes of simple organisms, such as Escherichia coli, yeast, and the roundworm C. elegans, and eventually the mouse, that proved most persuasive. Tilghman and Botstein, in particular, argued vociferously that biologists had no hope of understanding the human genome if they couldn't compare it to the genomes of experimental organisms. Luckily for biologists, evolution has been remarkably conservative, retaining the same genes over and over again in different organisms, explains Tilghman—and it is far easier to figure out a gene's function by experimenting with it in a fruit fly than in a human. Looking back, Tilghman sees this as one of the panel's smartest decisions: “Model organisms were an extraordinary investment. We learned how to sequence on these simpler organisms. And more important, we got a preview of the human genome by sequencing these organisms.”

Gilbert, however, was impatient with the panel's cautious approach and with the interagency dithering. Arguing that the technology was already good enough to sequence the human genome, he left the NRC panel to launch his own company, Genome Corp. His plan, remarkably similar to J. Craig Venter's vision a decade later, was to set up a sequencing factory to churn out the data, which he intended to copyright and sell. “[It will be] available to everyone … for a price,” he explained (Science, 24 July 1987, p. 358). The plan infuriated Watson, who rankled at the idea of selling something as fundamental as data on human DNA. But the debate subsided when Gilbert failed to raise sufficient funds.

## NIH makes its move

As the genome project gained congressional funding and scientific respectability, NIH wrested control from DOE. Urged on by a group of advisers who met outside Washington, D.C., in Reston, Virginia, in March 1988, then-NIH director James Wyngaarden announced that NIH would create a special office for genome research (Science, 13 May 1988, p. 878). In short order, he nabbed Watson to head it, and with that coup, NIH was firmly ensconced as the lead agency. It has remained so, even as the project gathered international collaborators and Britain's Wellcome Trust took on a prominent role.

Watson proved a shrewd strategist, skilled in the care and feeding of those who controlled congressional purse strings, and a tough taskmaster. “My name was good,” he says by way of explanation. Indeed, members of Congress were spellbound when the eccentric Nobel laureate swept in to testify. Watson was eloquent in touting the project's goal: “to find out what being human is.” He also had the refreshing quality of saying what he thought, no matter how politically incorrect—an unusual quality in Washington, D.C.

Even as the project began, Watson's advisory panel was still debating the proper balance for the project—how much should be devoted to building tools, like maps and faster sequencing machines, and how much to actually using these tools to find disease genes? (Science, 13 January 1989, p. 167) Watson was adamant: Even though disease genes captured the public imagination and kept the dollars flowing, this project was designed to build the equivalent of a particle accelerator: They should not be sidetracked. As Botstein explained at a January 1989 meeting, “We are looking at the production of a set of tools that will enable human geneticists to do what they want. We are the Cray, if you like. We don't write software for your particular applications.”

At the same time, Watson relentlessly pushed the first stage of the project and its most tangible goal—building maps of the human chromosomes. Knowing that Congress did not have the patience to wait 15 years for results, Watson staked his reputation on getting the maps done in five. With the maps in hand, genes would fall out in short order, including the putative Alzheimer's gene, which, Watson joked, should be a priority given the age of most members of Congress.

Progress was rapid. By 1990, Sulston and colleagues had nearly completed the physical map of the worm—changing worm biology forever—and Olson and colleagues were proceeding apace on yeast (Science, 15 June 1990, p. 1310). Faster and easier ways to clone and map DNA were coming on line, and sequencing trials were beginning. For a short time, the controversy that had dogged the project from the outset seemed to have dissipated.

## Venter, round one

That newfound harmony was shattered in June 1991, when Venter, who ran a large sequencing lab at the National Institute for Neurological Disorders and Stroke, went public with an iconoclastic plan: Why not focus on finding the genes—the “real goods” that both scientists and companies were clamoring for—and leave tedious sequencing until later? Venter and colleague Mark Adams had developed a new technique, called expressed sequence tags, that enabled them to find genes at unprecedented speed. Never one of Watson's inner circle, Venter boasted that this new approach “was a bargain in comparison to the genome project” and claimed he could find 80% to 90% of the genes within a few years, for a fraction of the cost (Science, 21 June 1991, p. 1618).

Watson dismissed Venter's “cream-skimming approach,” but their feud remained subterranean until a few weeks later, when Venter described his work at a congressional hearing. NIH was so impressed with his progress, Venter said, that it was filing patent applications on the partial genes he was identifying—at a rate of 1000 a month.

Watson erupted, denouncing the patenting scheme as “sheer lunacy” and noting that “virtually any monkey” could do what Venter's group was doing (Science, 11 October 1991, p. 184). What irked him was that Venter and NIH had no clue about the function of the genes from which these fragments came. If the patents held, that meant anybody could lay claim to most of the human genes, undercutting patent protection for biologists who labored long and hard to identify whole genes and figure out what they did. “I am horrified,” Watson told Congress.

Watson also went to war on this issue with his boss, NIH Director Bernadine Healy. The fight cost him his job. In April 1992 he returned to Cold Spring Harbor Laboratory, muttering that no one could work with that woman (Science, 17 April 1992, p. 301).

Venter, too, left NIH in 1991 when he was offered $70 million from a venture capital company to try out his gene identification strategy at a new nonprofit, The Institute for Genomic Research (TIGR). ## From tools to medicine After Watson's sudden departure, NIH picked gene hunter Francis Collins of the University of Michigan, Ann Arbor, to take the helm. Fresh from the heady success of finding several elusive genes—including those involved in cystic fibrosis, neurofibromatosis, and Huntington's disease—Collins was then in a highly competitive race to find the gene involved in a form of inherited breast cancer. A physician by training, Collins brought a different perspective to the genome project, placing its medical applications front and center. Collins charmed Congress and the media by riding to work on his motorcycle and playing guitar in a pickup rock band. Whereas Watson and his advisers had spoken of creating a tool, Collins talked about saving children's lives. “The reason the public pays and is excited—well, disease genes are at the top of the list,” he explained. It was a heyday for gene hunters. The early investments in the genome project paid off as increasingly sophisticated maps of the human and mouse genomes were compiled (Science, 1 October 1993, p. 20). With these maps in hand, the time it took to track down most disease genes dropped from a decade to perhaps 2 years. Every week, it seemed, another deadly disease gene was discovered. Lost in the hoopla, however, was the fact that finding a gene was a far cry from having a treatment, much less a cure. The consortium was growing as well, fueled by an infusion of funds from the Wellcome Trust, which in 1993 set up a major new sequencing lab, the Sanger Centre near Cambridge, with Sulston as its head. But sequencing overall was lagging behind. At the existing rate and cost, Collins lamented when he took on the job, there was no chance they could finish the sequencing by 2005. None of the “blue sky” sequencing technologies that had been imagined at the outset materialized, and with U.S. funding tight and much of the money concentrated on mapping, Collins was worried that “we have mortgaged part of our future.” Steady, incremental advances were enabling scientists to spew out longer “sequence reads,” and the cost was slowly dropping. Even so, reassembling the DNA fragments in correct order was tricky. To do so, the sequencers looked for similar patterns in the fragments—much like assembling a jigsaw puzzle—but one with lots of missing pieces. Some pieces just wouldn't fit, some “fit” in the wrong place—others “got lost” in the cloning process. Still others refused to be sequenced. Sequencing clearly needed a shot in the arm—and soon got one, but from an unlikely source. In 1995, Venter surprised the community by announcing that along with Hamilton Smith, then at Johns Hopkins, and TIGRcolleagues Rob Fleischmann and Claire Fraser, they had sequenced the first entire genome of a free-living organism, Haemophilus influenzae, at 1.8 megabases (Science, 28 July 1995, p. 496). What's more, they had done it in just a year using a bold new approach, whole-genome shotgun sequencing, that NIH had insisted wouldn't work and wouldn't fund. Sequencers in the publicly funded project had adopted a conservative, methodical approach—starting with relatively small chunks of DNA whose positions on the chromosome were known, breaking them into pieces, then randomly selecting and sequencing those pieces and finally reassembling them. Eventually, larger pieces called contigs would be hooked together. By contrast, Venter simply shredded the entire genome into small fragments and used a computer to reassemble the sequenced pieces by looking for overlapping ends. NIH's deliberate approach won its spurs a year later, when an international consortium knocked off the yeast genome. Although still tiny, relative to humans, it was a major step up in size and complexity. By April 1996, Waterston and Sulston, who were well into sequencing C. elegans, were champing at the bit, urging Collins to let them plunge into all-out sequencing. In the right hands, they argued, the technology was good enough; the only stumbling block was money. “Just do it,” Sulston urged at the time. The two also broached the heretical topic of dropping the accuracy goal to speed the process, from 99.99% to 99.9% (Science, 12 April 1996, p. 188). But Collins would not be rushed. The goal was to assemble the definitive “book of life,” and he insisted it be done to the highest possible quality. He decided to test the water with six pilot projects—a cautious style that earned him praise in some corners and criticism in others. The charge to the labs was to complete a major chunk of sequence while also demonstrating big improvements in cost and speed. After that, he said, the project would home in on its final strategy. Collins soon abandoned his measured approach—not because of the persuasiveness of Waterston and Sulston's arguments, but because Venter threw down the gauntlet. ## Venter redux Showing a knack for impeccable timing, Venter dropped his bombshell on 9 May 1998, just days before the annual gathering of genome scientists at Cold Spring Harbor Laboratory. Venter announced that he had teamed up with Perkin-Elmer Corp., which was about to unveil an advanced, automated sequencing machine, to create a new company that would single-handedly sequence the entire human genome in just 3 years—and for a mere$300 million (Science, 15 May 1998, p. 994). What's more, said Venter, when he was done he would give the data away free to the community by posting it on his company's Web site. The company, soon to be named Celera Genomics and located in Rockville, Maryland, would make money not from the raw data, he explained, but from the analysis it would perform and sell to subscribers. Venter proposed to sequence the genome with the brute-force shotgun technique that had worked so well in Haemophilus—but this time, he would be shredding the entire 3-billion-base genome into zillions of fragments.

Leaders of the public project were angry and incredulous. After they had spent years laying the groundwork, could Venter really beat them to the finish and steal the glory? They were also deeply worried that if Congress fell for Venter's bravado, it might pull the plug on the public project. Venter's plan would never work, they countered—the sequence would be riddled with holes and impossible to reassemble.

Yet as they disparaged Venter's claim, they could not dismiss it. Venter had surprised them before. And this time, he had a hefty bankroll and 300 of Perkin-Elmer's sequencing machines, just then rolling off the assembly line at 300,000 a pop. And to reassemble his sequenced fragments, Venter would use one of the world's fastest supercomputers. The leaders of the public program wasted no time in increasing the pace and reorienting the game plan in an attempt to beat him to the finish line. Collins announced new goals for the public project in September 1998, just 6 months after Venter's surprise announcement (Science, 18 September 1998, p. 1774). First, the consortium would complete the entire genome by 2003—2 years ahead of schedule, but also 2 years behind Venter. And, in a dramatic departure from previous philosophy, the project would produce a “rough draft,” covering 90% of the genome, by the spring of 2001. Scientists were clamoring for the data even in rough form, Collins said by way of explanation. Yet he also admitted that producing a rough draft and making it public was a strategic move to undercut any patent position Celera or other businesses might claim. In a crucial test of the shotgun strategy, Celera first tackled the 180-megabase genome of the fruit fly Drosophila melanogaster. Venter teamed up with a publicly funded team headed by Gerald Rubin of UC Berkeley, and by March 2000, they had pulled it off. This proved that the shotgun methods could work on a big, complex genome, said Venter (Science, 25 February 2000, p. 1374). The race was on, punctuated by dueling press releases. First Venter announced in October 1999 that his crew had sequenced 1 billion bases of the human genome—a feat pooh-poohed by NIH, which noted that Celera hadn't released the data for other researchers to check. Then NIH jumped into the game, announcing in November that it had completed 1 billion bases, holding a “birthday” party at the National Academy of Sciences, complete with balloons and T-shirts emblazoned with the double helix. Venter countered in January 2000 that his crew had compiled DNA sequence covering 90% of the human genome, the public consortium asserted in March that it had completed 2 billion bases, and so on. Issues of data access heated up too, with the public consortium denouncing Venter for his plan to release his data on the Celera Web site rather than in GenBank, the public database. The feud became increasingly ugly, with each side disparaging the other's work and credibility in the press. Leaders in the scientific community urged them to stop squabbling and work together. The two had, in fact, begun talking about a possible collaboration in December 1999. Eric Lander, who runs the Whitehead/MIT Genome Center, was the main go-between. The two approaches are complementary, he said, and collaborating would speed the process. But in March, the discussions foundered amid considerable acrimony when the Wellcome Trust leaked to the press a letter from Collins to Venter, citing irreconcilable differences (Science, 10 March 2000, p. 1723). The sniping, seemingly at its peak, escalated further, until many considered it an embarrassment. “If they were my children, I would give them both a time out,” said one leading scientist at the time. Behind the scenes, Ari Patrinos of DOE played intermediary, finally brokering a truce under which both groups would announce their drafts at the same time, thereby sharing the glory. Venter still would not deposit his data in GenBank, as the consortium wanted, but he did concede that the public data has been useful in his own work. Defusing the issue of priority and credit, the two agreed to publish simultaneously, perhaps even in the same journal. Collins and Venter granted an exclusive interview to Time, which heralded, “The race is over,” and pictured the beaming duo side by side in their lab coats. They were all smiles, too, at a White House ceremony in June where President Clinton lauded both scientists for their phenomenal achievement, and Collins and Venter lavished praise on one another (Science, 30 June 2000, p. 2294). The façade held for 5 months—longer than many would have predicted—before all hell broke loose over plans to publish their papers (see p. 1189). At issue, again, was Venter's refusal to deposit his data in GenBank and the terms he might impose on commercial or academic users (Science, 15 December 2000, p. 2042). The two did manage to achieve simultaneous publications—but in separate journals. In their magnanimous moments, both concede that their race has speeded the project, to everyone's benefit. “Ten, 15 years from now, nobody is going to care about all this fuss and bother,” says Collins. “They're going to care that we got the fly sequence done, and shortly after that we got the human sequence done, and shortly after that we got the mouse sequence done. And all this back and forthing over who did what and what strategy was used and which money was public and which was private is probably going to sink below the radar screen. And hallelujah.” 5. # Objection #1: Big Biology Is Bad Biology 1. Robert F. Service The human genome project was biology's first foray into “big science,” and many scientists abhorred the idea at the outset. Researchers feared that a massive sequencing project would siphon precious dollars from investigator-initiated research, destroying the cottage industry culture of biology in the process. And just as bad, the project didn't even amount to hypothesis-driven science at all. Rather, critics charged, it was no more than a big fishing expedition, a mindless factory project that no scientists in their right minds would join. Were they right? Not exactly, says David Baltimore, president of the California Institute of Technology (Caltech) in Pasadena, who raised some of the early concerns. “One of the things I didn't fully anticipate was the state of progress in automation,” he says. In the mid-1980s, gene sequencing was done by hand. Baltimore and others feared that it would take an army of “worker bees” to carry out sequencing on a genomewide scale. But sequencing machines pioneered by Leroy Hood and colleagues at Caltech changed that equation forever. Today, sequencing is nearly completely automated. The genome project was still a fishing expedition, of course. But the enormous haul of genomic data it netted has changed most minds about such “discovery” research. This once-maligned type of research has enabled teams around the world to explore newfound genes and their links to health and disease. “Discovery science has absolutely revolutionized biology,” says Hood, now director of the Institute for Systems Biology in Seattle, Washington. “It's given us new tools for doing hypothesis-driven research,” maintains Hood, and these tools help rather than hinder individual investigators. The biggest objection to the audacious proposal was that funding for the genome project would come at the expense of other quality science. “There was a worry that it was a zero-sum game,” says Maynard Olson, a genome center leader at the University of Washington, Seattle. “Frankly, it was a gamble that we'd be able to expand the pie [of research dollars].” But the gamble paid off. In a 1998 National Research Council report, a committee led by Bruce Alberts, a former professor at the University of California, San Francisco, recommended that the human genome project be funded separately from traditional science budgets. And Congress happily went along, giving the Department of Energy10.7 million and the National Institutes of Health $17.2 million for the new project in fiscal year 1988. By voicing the early concerns, “I think we did what we hoped we would do,” says Baltimore. “It helped develop a debate, which set us on a productive course.” 6. # Finding the Talismans That Protect Against Infection 1. Martin Enserink Since 1995, the mini-genomes of dozens of pathogenic microbes have been sequenced, including those that cause tuberculosis, cholera, and ulcers. Many others are almost in the bag, including the much larger genome of Plasmodium, the malaria parasite. That data flood is helping researchers understand how nefarious microorganisms work—and how they might be stopped. The giant human genome promises to help solve another poorly understood problem: why some people get sick and die when they encounter a pathogen, whereas others stay healthy as an ox. Such information could eventually help put more people in the latter category. Researchers have long known that differences in disease susceptibility are partly genetic, the most famous example being the gene for sickle cell hemoglobin, which offers protection against malaria to those who inherit one copy of it. (Having two copies causes sickle cell anemia.) Several other susceptibility genes have been discovered for various diseases; malaria now tops the list with 14 genes. “We're just beginning to scratch the surface,” says Adrian Hill, a geneticist at the University of Oxford in the United Kingdom. To identify genes that might confer susceptibility or resistance, researchers try to find genetic differences between large groups of patients and healthy controls. Without the complete genome, they could only look for previously discovered genes. Now, they can theoretically take each and every gene into consideration. Eventually, such work will lead to a better understanding of the molecular interaction between a bug and its host. That, in turn, may reveal new drug or vaccine targets. 7. # Objection #2: Why Sequence the Junk? 1. Gretchen Vogel Genes and their corresponding proteins get most of the attention, but they make up only a tiny fraction—1.5% or less—of the human genome. The other 98% of DNA sequence that does not code directly for proteins was once dismissed as “junk DNA,” and numerous researchers argued that it would be a waste of time and money to include the repetitive, hard-to-sequence regions in the genome project. But scientists have discovered many riches hidden in the junk, and as the project nears completion, several researchers predict that some of the most intriguing discoveries may come from areas once written off as genetic wastelands. Included among the noncoding DNA, for example, are the crucial promoter sequences, which control when a gene is turned on or off. The repetitive sequences at the ends of chromosomes, called telomeres, prevent the ends of the chromosome from fraying during cell division and help determine a cell's life-span. And several teams have begun to make a strong case that repetitive, noncoding sequences play a crucial role in X inactivation, the process by which one of the two X chromosomes in a female is turned off early in development. Other genes are turning up in areas previously dismissed as barren. Scientists had assumed, for example, that the regions next to telomeres were buffer zones with few important sequences. But in this week's issue of Nature, H. C. Reithman of the Wistar Institute in Philadelphia and his colleagues report that these regions contain hundreds of genes. “The term ‘junk DNA' is a reflection of our ignorance,” says Evan Eichler of Case Western Reserve University in Cleveland. The human genome has much more noncoding DNA than any other animal sequenced so far. No one yet knows why. At least half of the noncoding DNA seems to be recognizable repeated sequences—perhaps genomic parasites that invaded the genomes of human ancestors. Eichler suspects that such repeats might provide some genomic wiggle room. Long stretches of noncoding DNA provide “a built-in plasticity that may be bad at the individual level, but if an organism is going to evolve, it may be a huge selective advantage,” he says. “There is a rich record of our history” in the repeats, agrees Francis Collins of the National Human Genome Research Institute in Bethesda, Maryland. “It's like looking into our genome and finding a fossil record, seeing what came and went.” 8. # Nailing Down Cancer Culprits 1. Jean Marx A general sending troops out to battle wants as much intelligence about the enemy and its weaknesses as possible. Researchers fighting cancer hope the complete human genome sequence will help provide such information. The sequence will greatly speed the identification of the genetic underpinnings of cancer. Over the past 15 years or so, researchers have learned that cancers are usually caused by the accumulation of several gene mutations, some of which activate cancer-promoting oncogenes, whereas others inactivate tumor suppressor genes. And though scientists have fingered roughly 100 oncogenes and 30 or so tumor suppressors, that's “only a fraction of the genes that cause cancer,” says cancer gene expert Bert Vogelstein of the Johns Hopkins University School of Medicine in Baltimore, Maryland. In the past, once researchers determined where in the genome a cancer gene resides, they could still spend months, or even years, scouring the region—often a megabase or two long—looking for likely candidate genes to test. Now, Vogelstein says, that can be done “literally with the click of a button. The availability of the sequence enormously simplifies the search for those [missing cancer] genes.” Researchers are also using microarrays and other techniques to measure changes in the expression of thousands of genes at a time—information that provides a very detailed picture of the alterations leading to cancer development and spread. Knowing all the human genes will make this picture more complete. Researchers have already found that tumors that look similar to a pathologist may display different gene expression patterns—and that these differences can reveal potentially lifesaving information about how the cancers will respond to therapy. 9. # Objection #3: Impossible to Do 1. Robert F. Service Perhaps the most surprising thing about the human genome project is that it was begun at all. In the mid-1980s, the technology for decoding DNA's sequence of chemical bases was in its relative infancy. State-of-the-art labs could sequence only about 500 bases a day, working day in and day out. And the computer technology that came to play such a vital role in the project wasn't even invented yet. “In retrospect, the optimism that the project could be done on a 15-year timetable was striking,” says Maynard Olson, who directs a sequencing center at the University of Washington, Seattle. Unexpectedly, however, says Stanford University geneticist David Botstein, sequencing technology didn't need a revolution to make the leap in speed. “In the early days, it was believed that a radical new technology would be required” to sequence the full human genome, says Botstein. “But it didn't turn out that way.” Incremental but vital improvements in manipulating DNA and chemical probes enabled researchers to switch from identifying bases with radioactive probes to fluorescent ones. That eased the way for detectors to read and catalog the sequence of bases automatically. That automation was then honed with the advent of high-speed machines that pushed snippets of DNA through dozens of capillaries, reducing the sequencing time and cost of reagents. “It was definitely evolution,” says molecular biologist David Baltimore, president of the California Institute of Technology in Pasadena. “But you can go a long way with evolution.” 10. # A Parakeet Genome Project? 1. Gretchen Vogel Researcher William Haseltine, head of Human Genome Sciences Inc. in Rockville, Maryland, likes to claim that knowledge from the human genome, combined with a few technology breakthroughs, will someday enable humans to live forever. Most researchers who study aging have more modest expectations—for example, trolling the genome for new insights into genes involved in so-called oxidative damage to cells and genes, which is thought to limit an organism's life-span. A few in the field have another request: sequence the parakeet. One avian genome, the chicken's, is in progress, but George Martin of the University of Washington, Seattle, and Steven Austad of the University of Idaho says aging research could gain key insights from comparing the genome of a “real flier” with that of humans. “Good flying birds have remarkably long life-spans for their size,” he says: Some can live for 20 years or more. At the same time, they use an enormous amount of energy—a process that researchers believe is at the root of oxidative damage. Mice, for example, use much less energy but typically live only 2 years. A parakeet genome project, Martin says, could tell scientists “what the birds are doing that's so great”—and how humans might mimic their secrets. 11. # Brain Calls Dibs on Many Genes 1. Laura Helmuth The human brain is an expensive tool: A huge proportion of human genes are thought to be involved in constructing, wiring up, and maintaining the nervous system. Neuroscientists hope the completed genome will help them to nail down the brain's share. Current estimates range from “a fair chunk” of the genome to “40%” to “most.” No one knows what all these genes do, but placing them on gene chips to see which ones are expressed by developing neurons is like “having a new type of microscope, a new way of looking at cells,” says neurobiologist Ben Barres of Stanford University. His team is using such chips, as well as protein analysis, to spot molecular signals passed between neurons and support cells called glia early in development, when neurons start transmitting messages. The completed genome will also accelerate the search for genes at fault in neurodegenerative diseases. Neurogeneticist Huda Zoghbi of Baylor College of Medicine in Houston looks for candidate genes in the Drosophila genome, then tries to find homologs in the human sequence. Making the jump from fruit fly to human used to take a year of lab time, she says; now she'll be able to search computerized databases to find candidate genes in minutes. Other neuroscientists hope the genome will help solve otherwise intractable questions about human behavior. For example, psychiatrist Eric Nestler of the University of Texas Southwestern Medical Center in Dallas and computational biologist David Landsman of the National Library of Medicine in Bethesda, Maryland, point out in this week's issue of Nature that newly identified genes might help make sense of addiction. Cocaine acts on certain dopamine transporters, which differ between people; correlating people's transporter subtypes with their propensity for cocaine addiction might reveal why some people are more vulnerable to the drug than others, they suggest. 12. # Sharing the Glory, Not the Credit 1. Eliot Marshall Greeted by chamber music and an honor guard, leaders of the public and private groups sequencing the human genome filed into the White House last June, shook hands with the president, and pledged to support each other's endeavors. Within weeks, this show of amity dissolved. Over the summer and fall, the teams withdrew to their labs, muttering about the doubtful quality and accessibility of each other's research. The grumbling continued until December, when the two decided to part company at the finish, collaborating only on a single publication date. The result: Two reports on the human genome are coming out this week—a privately funded version in Science and a publicly funded version in Nature. The falling-out over the final reports is just a footnote to the huge effort to complete the sequencing of the human genome. But it highlights a philosophical disagreement over how such data should be shared (see sidebar on p. 1192). It also reveals how the rules of scientific publishing, usually rigid, become flexible when the stakes are high. Journal editors are accustomed to telling authors what a paper must disclose and what kind of supporting data must be released. But in this case, the authors themselves—because they were offering a big prize—sought to write the rules. Scientists in the public sequencing group also sought to shape the rules that would apply to the paper from the rival private group. As they courted the authors of these hot papers, the journal editors invited comments on data release, received sharply clashing recommendations, and chased an elusive consensus. The imbroglio, which reached a peak in the last few months, first broke into public view in March 2000. At that time, the private genome group—headed by J. Craig Venter, president of Celera Genomics in Rockville, Maryland—was still discussing the idea of pooling data and publishing results with the public group, headed by Francis Collins, director of the U.S. National Human Genome Research Institute. Several public-group scientists led by Eric Lander, director of the Whitehead/MIT Genome Center in Cambridge, Massachusetts, had spearheaded efforts to work out a compromise. But the talks broke down. That failure became evident when an official at the Wellcome Trust, the British charity that supports one of the largest nonprofit sequencing teams, the Sanger Centre in Hinxton, U.K., leaked a letter to the press from Collins to Venter (Science, 10 March 2000, p. 1723). In the letter, Wellcome officials and U.S. scientists charged that Celera was trying to maintain control over the jointly produced genome data for 5 years and claim intellectual property rights on uses of those data in secondary technologies, such as gene chips. When Celera did not respond quickly, the publicly funded scientists declared the negotiations over. They had insisted that data be deposited immediately in a public database, with no commercial conditions attached. Celera wanted to guard against data piracy by retaining the information on its own Web site, with certain restrictions: Users would not be able to resell the information or use it for other commercial purposes. Addressing a congressional hearing on 6 April, Venter denied that he wanted exclusive control of the genome data. “We will release the entire consensus human genome sequence freely to researchers on Celera's Internet site when it is completed,” he said. But the public group leaders say they had trouble nailing down the details of Celera's conditions on how its data could be used. By the time of the much-publicized June ceremony at the White House, the two groups had stopped talking about a pooled database and agreed to “coordinate” but not to collaborate, as Collins explained in June. Collins and Venter still held out hope that they might release their reports in the same journal. At the time of publication, Collins and Venter explained in June, the public group would deposit its sequence data in the free public database GenBank, whereas Celera would release data through its own Web site. Science and Nature were competing for the papers, and the authors let both journals know that they were looking for the best terms. In June, Donald Kennedy succeeded Floyd Bloom as editor-in-chief of Science, taking charge of months-old negotiations. Members of the public consortium had by then made abundantly clear that they did not want Science, or Nature for that matter, to allow Celera an exception to the traditional practice that genomic data be released in GenBank. One respected scientist in this field who asked to remain anonymous says that “jealousy” over scientific credit played a big part in the split. “I got a very thoughtful memo from Eric Lander” about publishing genome data, Kennedy recalls. It laid out “three or four license terms that he thought would not be reasonable and a general one that he thought would be OK.” Serious negotiations began in September, with editors at Science running between the two camps. Editors worked out what they viewed as a balanced plan, requiring Celera to release data freely to academics but allowing the company to protect its database by requiring readers to obtain access at a company site and register as academic or commercial users. Nonprofit scientists would have free access, Celera said, but those with commercial connections would have to pay. Commercial users would also be bound by other intellectual property conditions. Lander objected to these terms as “discriminatory” and “absolutely unacceptable,” says Kennedy. Lander declined to comment publicly, saying he wanted to see the final terms (which were being finalized at the time; see sidebar). Colleagues say he argued forcefully in November that authors of scientific papers must share data freely with all readers—not just with academics. Biotech scientists, several people argued, would find it impossible to accept Celera's terms and would be excluded from examining the results. Harold Varmus, former director of the National Institutes of Health (NIH) and now president of the Memorial Sloan-Kettering Cancer Center in New York City, is sympathetic to Lander's view. “This is a complicated world now,” Varmus says. “It's not just people in industry who have commercial connections; many people in academia do.” Whitehead/MIT Genome Center scientists, for example, are involved in a 5-year consortium—funded by Affymetrix Inc. of Santa Clara, California; the Bristol-Myers Squibb Co. of Princeton, New Jersey; and Millennium Pharmaceuticals of Cambridge, Massachusetts—that aims to put genomic information on digital chips. In October, Kennedy solicited advice from several other experts, who identified previous scientific papers in which readers were required to obtain supporting data from an independent Internet site. Some limited free access to nonprofit scientists. “This horse had already left the barn,” Kennedy concluded. This interpretation prompted a new uproar in late October. Members of the public genome project mobilized opposition. Warnings poured in to Kennedy by e-mail from well-known biomedical researchers, including molecular biologist Marc Kirschner of Harvard University; Bruce Alberts, president of the National Academy of Sciences (NAS); and Varmus. Varmus's letter, dated 5 November, was co-signed by other heavyweights, including David Baltimore, president of the California Institute of Technology in Pasadena; J. Michael Bishop, chancellor of the University of California, San Francisco; Arthur Levinson, CEO of Genentech in South San Francisco; Edward Scolnick, president of Merck Research Labs in Rahway, New Jersey; Kenneth Shine, president of the Institute of Medicine in Washington, D.C.; and Maxine Singer, president of the Carnegie Institution in Washington, D.C. They wrote to “express our concern” that Science might allow authors of an unspecified paper to “restrict availability” of the raw data. Doing so, they argued, might “open the door to similar withholding of information by future authors, with unfortunate consequences. …” They urged Science to get more advice before taking this “unprecedented step.” Kennedy says he weighed the advice and criticism. Science's editors consulted with an intellectual property expert at NIH and with Tom Cech, president of the Howard Hughes Medical Institute in Chevy Chase, Maryland, a nonprofit organization that had already agreed to subscribe to Celera's private database. In a conference call, Kennedy received encouragement from Harvard chemist George Whitesides, molecular biologist James Hudson of Research Genetics Inc. in Huntsville, Alabama, geneticist Nina Fedoroff of Penn State University, and a half a dozen others. After proposing additional improvements in the terms of data release—including the use of materials transfer agreements that would let viewers have free access to the data but give Celera legal protection against data piracy—Kennedy decided that the terms were fundamentally acceptable. At this point, bioinformatics leaders raised objections. On 6 December, a former member of Science's board of reviewing editors, geneticist Michael Ashburner of Cambridge University, distributed an open letter to these editors, urging them to quit and boycott Science. Another board member, cancer researcher Bert Vogelstein of Johns Hopkins University School of Medicine in Baltimore, Maryland, circulated a reply, saying he believed the final agreements “will meet the standard of public access to data that has been and continues to be Science's policy.” The next day, leaders of the public genome project voted to end discussions with Science and submit their paper to Nature (Science, 15 December 2000, p. 2042). The decision to send the paper to Nature was not unanimous: Ari Patrinos, director of the U.S. Department of Energy's office that funds genome research, says, “It's no secret that I was advocating back-to-back publication in one journal, Science.” But British members of the consortium were outraged by the deal with Celera. Lander adds: “We had to choose between two journals, and Science's policy [on data release] wasn't clear.” Although Nature's editors haven't ruled out the use of private databases, the public consortium decided, Lander says, that it was “an easy choice” to submit a paper to them. Varmus says that he believes the letters, including his own, improved the terms of data access. He recognizes that Celera cannot give away information it has spent hundreds of millions of dollars to acquire. But he argues that publishers need to find new ways to make data from private ventures available, because we are “now in an era of heightened commercialism” in which a great deal of genome and protein structure data will be in private hands. Says Patrinos: “This issue is not going to go away.” Varmus hopes this episode will prompt a formal review—perhaps at the NAS—of “what publication really means.” 13. # Celera and Science Spell Out Data Access Provisions 1. Eliot Marshall When J. Craig Venter announced in 1998 that his company, Celera Genomics of Rockville, Maryland, intended to sequence the human genome, he also promised that he would make the results freely available. This week, the promise is coming due. Science is publishing Celera's report, and Celera is publishing the underlying genomic sequence data on its own Web site (http://www.celera.com/). According to terms negotiated between the company and Science, any reader will be able to view Celera's assembled genome at no cost through the Web site—or by obtaining computer disks from the company. Celera is also asking users to register and agree to specific conditions. At a press briefing last week, Venter described the conditions as they apply to several broad categories of readers: First, nonprofit researchers who want to search the database or download batches of DNA sequence (up to 1 megabase per week) may do so by mouse-clicking their agreement to a form on the Celera site. It requires that they not commercialize or distribute the data. However, they may use the information in research, in scientific articles, and in patents. Second, academic users who want to download more than 1 megabase per week must submit a signed letter from an institution official agreeing to the terms above. Third, scientists in industry or with commercial connections may use the data at no cost for the purpose of validating the results in the Science paper, after signing a materials transfer agreement promising not to use the data for commercial purposes. Fourth, those who want to use the data for commercial purposes must first negotiate an agreement with the company. 14. # Bermuda Rules: Community Spirit, With Teeth 1. Eliot Marshall The “Bermuda Rules” may sound like standards for lawn tennis, but in fact they are guidelines for releasing human sequence data. Established in February 1996 at a Bermuda meeting of heads of the biggest labs in the publicly funded genome project, the rules instruct competitors in this cutthroat field to give away the fruits of their research for free. “The whole raison d'être for the communal effort was to get useful tools into the hands of the scientific community as rapidly as possible,” says Francis Collins, director of the U.S. National Human Genome Research Institute in Bethesda, Maryland. But the rules also offer another benefit: They discourage the patenting of genes by sequencing labs, an activity executives of big pharmaceutical companies seem to despise as much as some academics do. The insistence on quick, unconditional release of data also lies at the heart of the dispute between publicly funded genome scientists and the private company that has just produced a draft version of the human genome, Celera Genomics of Rockville, Maryland. At the 1996 Bermuda gathering sponsored by the Wellcome Trust, a British charity that funds large-scale sequencing at the Sanger Centre in Hinxton, U.K., scientists agreed to two principles. First, they pledged to share the results of sequencing “as soon as possible,” releasing all stretches of DNA longer than 1000 units. Second, they pledged to submit these data within 24 hours to the public database known as GenBank. The goal, according to a memo issued at the time, was to “prevent … centers from establishing a privileged position in the exploitation and control of human sequence information.” The Bermuda policy, which replaced a 1992 U.S. understanding that such data should be made public within 6 months, has had a significant impact on the field. For example, Collins claims, it has already enabled the identification of more than 30 disease genes. Both Collins and Ari Patrinos, director of the U.S. Department of Energy's office that funds genome research, backed the Bermuda push for openness. “We felt it would strengthen international cooperation,” Patrinos says. “Scientists are by their very nature hoarders. They're chewing on the data all the time, and they never think they're ready” to let go, he adds. By adopting this formal mechanism, members of the consortium assured each other that no one would be squirreling away caches of data or quietly patenting genes. The policy also delivered a clear symbolic message, Patrinos says: “We all believe that the genome belongs to everybody.” When sequencers met in Bermuda again in 1997, they reaffirmed their pledge and added an explicit directive against patenting newly discovered DNA. Failure to cooperate, U.S. officials made clear, could be a black mark in future grant reviews. Although the message seemed to challenge private DNA databases by undermining their claims to exclusivity, large pharmaceutical firms welcomed it, because they would benefit if there were fewer patent holders to buy off. Alan Williamson, a former executive at Merck, the pharmaceutical giant in Whitehouse Station, New Jersey, embraced the policy enthusiastically. “Putting data out immediately was a good thing,” he says, because it encouraged the sharing of research tools without letting legal contracts get in the way. But he wishes sponsors of this research had taken active steps to make it difficult for others to patent and sell this genetic information—for example, by filing their own noncommercial patent claims that might block other claimants. Biomedical companies, he argues, should compete on the commercially difficult work—developing drugs—not on profiting from research tools such as DNA databases. Indeed, Merck was so certain that this was the right approach that beginning in 1994, the company poured tens of millions of dollars into creating a nonprofit database of gene fragments known as expressed sequence tags (ESTs). The Merck Gene Index, as it is called, was designed to counter privately owned genetic databases and a surge in gene patenting led by such companies as Human Genome Sciences in Rockville, Maryland, and Incyte Pharmaceuticals in Palo Alto, California. These companies sell genetic information, patent uses for newly discovered genes, and seek to obtain royalties for the use of their patents—by big pharmaceutical firms and all other users. Merck also contributed to a free database of mouse ESTs, which are useful in identifying human disease genes. In a similar defensive move, 10 companies joined with the Wellcome Trust in 1999 to create a nonprofit database of human genetic variations garnered from the genome, known as single-nucleotide polymorphisms (SNPs). SNP maps may be extremely valuable someday in identifying disease genes and standardizing gene-based medical therapy, and several companies had already begun to gather them in private collections. Quarreling over the principles of the Bermuda Rules broke out again when Celera announced that it would sequence the entire human genome. Its business plan, according to president J. Craig Venter, is to collect and process genomic data more efficiently than research outfits can do for themselves. The company would appear to have no incentive to give information away, but Venter grabbed headlines in 1998 when he declared that he would finish a rough draft of the genome earlier than the publicly funded effort and give everyone free access to Celera's sequence. Ever since then, Venter and the advocates of the Bermuda Rules have been arguing about what “free access” means. 15. # Genomania Meets the Bottom Line 1. David Malakoff, 2. Robert F. Service When a drug company announces that it will start testing a new compound in humans, the news typically draws cursory notice from investors and stock analysts. After all, only a small fraction of candidate drugs ever make it to the pharmacy and on to a company's bottom line. Last month, however, the financial savants took extra notice when Cambridge, Massachusetts-based Millennium Pharmaceuticals and European drug giant Bayer AG announced that they would soon put an anticancer drug into phase I clinical trials. What caught their eye was not the drug's potential profits, but the process the firms used to find it—and its speed. Aided by new technologies that enable researchers to rapidly screen thousands of genes and their protein products for potentially useful properties, the companies sped from gene identification to product testing in just 8 months, shaving at least 2 years off the typically long and costly drug-discovery process. “This is a major milestone for the pharmaceutical industry,” crowed Bayer executive Wolfgang Hartwig. Such expansive claims are not unusual in the biotechnology industry, which for more than a decade has hyped the profitmaking potential of sequencing human genes, only to see many of those claims founder in a sea of red ink. But the Millennium-Bayer announcement may be one sign that for-profit genomics—a loosely defined collection of commercial ventures that range from selling technologies, tools, and information to developing new drugs—is beginning to live up to its advance notices. “It's a wake-up call anytime you can punch years out of product development,” says Mark Edwards of Recombinant Capital, a biotech consulting firm in Walnut Creek, California. Still, many financial analysts remain wary of the growing genomics industry. Although a record number of self-proclaimed gene firms went public last year, and a few established firms saw their stock prices temporarily skyrocket in anticipation of the completion of the human genome, longtime observers note that most genomics companies have yet to turn a profit (see table). There are exceptions: Some genomics toolmaking companies and information brokers have impressive—and rising—earnings. But the industry is still too young to show that it can produce what Wall Street is really looking for: blockbuster drugs. Even some high-profile players, such as information broker Celera Genomics of Rockville, Maryland, are still struggling to figure out how they will ultimately make money (see sidebar on p. 1203). Such uncertainty is typical of an emerging industry, analysts say. And just because many genomics companies are showing losses in annual reports doesn't mean they are in danger of closing up shop. Indeed, some companies—such as Celera—have banked so much money from stock offerings that they could survive for years at current spending rates. In addition, Bayer and bigger pharmaceutical companies with deep pockets are pumping billions of dollars a year into a wide range of genomics companies. These cash streams not only fuel research and product development but also give some companies “some ability to decide whether or not to show profits. Everything hinges on how much they choose to spend on R&D,” explains Alexander Hittle, a stock analyst with A.G. Edwards & Sons in St. Louis, Missouri. ## Toolmakers to trailblazers Although the hundreds of companies involved in genomics are often hard to pigeonhole, and they can reshape themselves in a single board meeting, they are often placed in one of three major categories. At one end of the spectrum are the toolmakers, which sell the machines, chemicals, chips, and computer codes that make it possible to sequence raw DNA, characterize gene expression, and search for meaningful patterns in the data. Among these are Affymetrix of Santa Clara, California, which makes gene chips that give researchers the ability to screen the activity of scores of genes at a time, sequencing machine-maker Applied Biosystems of Foster City, California, and bioinformatics software developer Informax of Rockville, Maryland. The toolmakers are among the first to show profits, in large part because—like the peddlers who sold shovels, food, and blankets to gold miners—they typically demand payment whether or not their customers ever strike it rich. Applied Biosystems, for instance, made a profit of$186 million last year, primarily on sales of sequencing machines and reagents. Affymetrix could be profitable within a year or so.

The second category is the service sector. Companies such as Incyte Genomics of Palo Alto, California, and Celera, for example, are making their names as gene discoverers and information brokers, selling up-to-date information on genes and their products to companies searching for drugs and diagnostic tests. Although Incyte may move into the black this year, profits in this sector are uncertain, because the demand for privately held information may shrink as public databases grow. Indeed, to hedge against that development, both companies are reformulating themselves, having applied for patents on genes that could involve them more directly in drug development and staking claims in the new field of proteomics (see below and sidebar, p. 1194).

The third category consists of the drug discoverers like Millennium and Human Genome Sciences (HGS) of Rockville, Maryland, both of which are helping other companies find drugs and diagnostics while trying to develop their own. HGS has focused on finding proteins that can be used as drugs, and Millennium has established itself as an ambitious technology pioneer, attempting to use concepts borrowed from the steel, computer, and other established industries to scale up and speed drug discovery. Under its 1998 deal with Bayer, for instance, Millennium promised to identify 225 new drug targets within 5 years, in exchange for up to $465 million in cash and the right to commercialize up to 90% of the discoveries. (Bayer, which has already received nearly 100 targets, decides which 10% it keeps.) Such alliances, believes Edwards of Recombinant Capital, are the future of commercial genomics, especially as companies try to tackle diseases that involve a dozen or more genes. But profits in this business aren't likely to materialize for years. Millennium, for instance, expects to spend nearly$400 million on research this year, report losses of $125 million, and remain in the red for at least another 4 or 5 years. ## The proteomics generation Toolmakers, information suppliers, and discovery companies are already looking beyond genomics to proteomics, the latest effort to demystify the functions of the proteins coded for by all those genes. Surveying genes is a good way of finding possible drug targets, the reasoning goes. But drug targets themselves are almost always proteins. And because proteins undergo significant changes after being built from their gene templates, researchers have recently set out to look for high-throughput methods to study them. Many of these methods—two-dimensional gel electrophoresis, mass spectrometry, and protein binding studies—have been around for decades. But robotics and high-powered computers crunching massive amounts of data are making it possible to run these tests on a scale never seen before. “It's basically an old field being renewed because the technology has improved so much,” says Amos Bairoch, a proteomics expert at the Swiss Institute of Bioinformatics in Geneva. Still, working with that technology remains more difficult than sequencing genes. Whereas gene sequencing basically requires a single technology, proteomics today consists of a collection of nearly two dozen different techniques for analyzing a protein's function, its amino acid makeup, its three-dimensional structure, and the other proteins to which it binds. One benefit for companies entering the field is that there's plenty of room. “There is enough to be done that people don't need to collide head on immediately,” says Bairoch. View this table: Some proteomics groups may compete on the same turf anyway. Among the highest profile proteomics entrants are genomics powerhouses Celera and Incyte, both of which have made major moves into the field in the past year. In March, Celera raised nearly$1 billion on the stock market and announced that it was committing a sizable fraction to building a new proteomics research facility. In December, Incyte used money from its own recent stock offering to buy Proteome Inc., an early start-up in the field, to bolster its own burgeoning effort. Meanwhile in Europe, the Swiss start-up Geneva Proteomics is preparing a stock offering to raise money to set up a similar proteomics factory.

This proteomics gold rush suits the toolmakers just fine. Suppliers of well-proven proteomics technologies such as mass spectrometry, which can be used to identify different proteins, are already seeing their business jump. Meanwhile, companies like Ciphergen Biosystems of Fremont, California, which supplies protein-identification chips, are hoping to cash in as well. Still, these so-called “tool-kit” companies could face trouble down the road, says Craig West, another biotech analyst with A.G. Edwards & Sons. “Tool-kit firms are going to experience consolidation” as the proteomics field settles on a couple of key technologies as de facto standards, says West. And ultimately, West argues, the real money will flow to those who use the technology to find new blockbuster drugs. “It just doesn't seem to us that having the next cool way to find something out is viable for a long-term business model,” he says.

## What have you done for me lately?

Other analysts echo that sentiment in discussing the genome companies as a whole. Edwards, for instance, notes that as interesting as last month's Millennium-Bayer announcement was, the companies still have to show that they can move that speedily on a routine, sustained basis. Even then, some observers are skeptical that early agility will translate into substantially shorter drug development cycles, as major delays often occur during clinical trials and in the regulatory process. “We need a gene chip to speed up patients and the bureaucrats, not the science,” jokes one analyst.

Industry executives see other challenges. Some wonder who will train their next generation of employees, as many of the best and brightest academics and graduate students have been lured into the private sector by stock options and hefty salaries. Others fret about how to keep the talent they've hired—and sometimes made wealthy—happy. The challenge, one exec told analyst Hittle, “is to find ways of keeping the job interesting enough so that millionaires want to come to work every day.”

16. # Will a Smaller Genome Complicate the Patent Chase?

1. David Malakoff

When William Haseltine, president of Human Genome Sciences (HGS), spoke at industry seminars last year, he liked to impress his audiences with a striking statistic: His Rockville, Maryland-based company had applied for patents on a wide array of medical uses for about 7500 newly discovered human genes. Those filings, he noted, give the company an inside track on exploiting 5% of the 140,000 or so genes that he estimated are in the human genome. But it turns out that Haseltine, a man not known for understatement, may unwittingly have downplayed HGS's patent position. Now that researchers have had a chance to survey the entire genome, they believe it contains just 35,000 to 45,000 genes. That means HGS could have claims on up to 20% of the total.

Although that would seem to put HGS in a powerful position, the shrinking gene count could be a mixed blessing for the company and others, from universities to governments, that have rushed to lay claim to gene uses. “A smaller genome may mean more people pursuing claims on the same real estate,” says Mark Edwards of Recombinant Capital, a biotech consulting firm in Walnut Creek, California. As a result, firms may spend millions of dollars over the next decade battling to convince patent examiners and judges that they were the first to invent uses for a particularly valuable swath of DNA.

In the long run, it will be quality—not quantity—that counts. Genes themselves cannot be patented, only the uses to which the information can be put. “The real question is: ‘How many of the genes represent legitimate targets for drug development?'” asks Stephen Bent, a patent attorney with Foley & Lardner in Washington, D.C. “No one yet knows, but finding [commercially valuable genes] is probably not appreciably easier if the total pool is 45,000 instead of 100,000. You are still searching for that needle in a haystack.” Indeed, Randy Scott, chair of Incyte Genomics in Palo Alto, California, which claims ownership of the most patents related to human genes, says that “the actual number of loci is an interesting academic issue, but it is not at all relevant to our business,” which focuses on selling genetic information and helping other companies develop new drugs.

Companies that have tried to lock up rights to huge numbers of genes in the hope of snaring a few with valuable uses could find that to be an expensive strategy. Patent experts estimate it costs $100,000 to$500,000 simply to maintain a single patent over its 10- to 20-year life-span in the United States and other industrialized nations. And actively preventing other companies from infringing is far more costly; in the United States, for instance, legal defenses typically cost $1.6 million per contested patent, according to statistics compiled by the U.S. Patent and Trademark Office (PTO). Gene patent fights, PTO officials say, are likely to be even more expensive because of their biological and legal complexity. To recover such costs, Bent notes, most companies will need to cash in on at least one “blockbuster” patent that leads to a strong-selling product. View this table: Legal uncertainties over the patentability of uses of gene fragments also cloud the picture. A blockbuster gene may, for example, turn out to be covered by a patchwork of patents, with one firm winning the right to use the complete gene while others lock up related uses for fragments of the same sequence. Evolving patent rules in the United States and Europe are making patenting the uses of small fragments harder. But if the fragment patents came first, their owners could force whole-sequence patenters to cough up royalties, says Stephen Kunin, a PTO expert on gene patenting. The good news, Kunin notes, is that a smaller genome could speed the PTO's process of identifying and rejecting the thousands—if not tens of thousands—of redundant applications that have been filed on gene uses that are already spoken for. If fewer genes are indeed up for grabs, he adds, “a lot of people are going to discover that they lost the race to the patent office.” 17. # A History of the Human Genome Project Science's News staff tells the history of the quest to sequence the human genome, from Watson and Crick's discovery of the double helical structure of DNA to today's publication of the draft sequence. A graphical, interactive version of this timeline, containing links to some classic Science articles and news coverage from the early genomics era, is also available on Science's Functional Genomics Web site. 1953 (April) James Watson and Francis Crick discover the double helical structure of DNA (Nature). 1972 (October) Paul Berg and co-workers create the first recombinant DNA molecule (PNAS). 1977 Allan Maxam and Walter Gilbert (pictured) at Harvard University and Frederick Sanger at the U.K. Medical Research Council (MRC) independently develop methods for sequencing DNA (PNAS, February; PNAS, December). 1980 (May) David Botstein of the Massachusetts Institute of Technology, Ronald Davis of Stanford University, and Mark Skolnick and Ray White of the University of Utah propose a method to map the entire human genome based on RFLPs (American Journal of Human Genetics). 1982 Akiyoshi Wada (pictured), now at RIKEN in Japan, proposes automated sequencing and gets support to build robots with help from Hitachi. 1984 (May) Charles Cantor and David Schwartz of Columbia University develop pulsed field electrophoresis (Cell). (July) MRC scientists decipher the complete DNA sequence of the Epstein-Barr virus, 170 kb (Nature). 1985 (May) Robert Sinsheimer (pictured) hosts a meeting at the University of California (UC), Santa Cruz, to discuss the feasibility of sequencing the human genome. (December) Kary Mullis and colleagues at Cetus Corp. develop PCR, a technique to replicate vast amounts of DNA (Science). 1986 (February) Sydney Brenner of MRC urges the European Union to undertake a concerted program to map and sequence the human genome; Brenner also starts a small genome initiative at MRC. (March) The U.S. Department of Energy (DOE) hosts a meeting in Santa Fe, New Mexico, to discuss plans to sequence the human genome. (March) Renato Dulbecco of the Salk Institute promotes sequencing the human genome in a paper (Science). (June) Merits of a human genome project are hotly debated at a meeting at Cold Spring Harbor Laboratory in New York state, “The Molecular Biology of Homo sapiens.” (pictured) (June) Leroy Hood (pictured) and Lloyd Smith of the California Institute of Technology (Caltech) and colleagues announce the first automated DNA sequencing machine (Nature). (September) Charles DeLisi begins genome studies at DOE, reallocating$5.3 million from the fiscal year 1987 budget.

1987

(February) Walter Gilbert resigns from the U.S. National Research Council (NRC) genome panel and announces plans to start Genome Corp., with the goal of sequencing and copyrighting the human genome and selling data for profit.

(April) An advisory panel suggests that DOE should spend $1 billion on mapping and sequencing the human genome over the next 7 years-and that DOE should lead the U.S. effort. DOE's Human Genome Initiative begins. (May) David Burke, Maynard Olson, and George Carle of Washington University in St. Louis develop YACs (left) for cloning, increasing insert size 10-fold (Science). (October) Helen Donis-Keller and colleagues at Collaborative Research Inc. publish the “first” genetic map with 403 markers, sparking a fight over credit and priority (Cell). (October) DuPont scientists develop a system for rapid DNA sequencing with fluorescent chain-terminating dideoxynucleotides (Science). Applied Biosystems Inc. puts the first automated sequencing machine, based on Hood's technology, on the market. 1988 (February) In a pivotal report, the NRC endorses the Human Genome Project (HGP), calling for a phased approach and a rapid scale-up to$200 million a year of new money.

(March) Prompted by advisers at a meeting in Reston, Virginia, James Wyngaarden, then director of the National Institutes of Health (NIH), decides that the agency should be a major player in the HGP, effectively seizing the lead from DOE.

(June) The first annual genome meeting is held at Cold Spring Harbor Laboratory.

(September) NIH establishes the Office of Human Genome Research and snags Watson (pictured) as its head. Watson declares that 3% of the genome budget should be devoted to studies of social and ethical issues.

(October) NIH and DOE sign a memorandum of understanding and agree to collaborate on the HGP.

1989

(January) Norton Zinder of Rockefeller University chairs the first program advisory committee meeting for the HGP.

(September) Olson, Hood, Botstein, and Cantor outline a new mapping strategy, using STSs (Science).

(September) DOE and NIH start a joint committee on the ethical, legal, and social implications of the HGP.

(October) NIH office is elevated to the National Center for Human Genome Research (NCHGR), with grant-awarding authority.

1990

Three groups develop capillary electrophoresis (left), one team led by Lloyd Smith (Nucleic Acids Research, August), the second by Barry Karger (Analytical Chemistry, January), and the third by Norman Dovichi (Journal of Chromatography, September).

(April) NIH and DOE publish a 5-year plan. Goals include a complete genetic map, a physical map with markers every 100 kb, and sequencing of an aggregate of 20 Mb of DNA in model organisms by 2005.

(August) NIH begins large-scale sequencing trials on four model organisms: Mycoplasma capricolum, Escherichia coli (left, pink), Caenorhabditis elegans (left, rainbow), and Saccharomyces cerevisiae (left, ovals). Each research group agrees to sequence 3 Mb at 75 cents a base within 3 years.

(October) NIH and DOE restart the clock, declaring 1 October the official beginning of the HGP.

(October) David Lipman, Eugene Myers, and colleagues at the National Center for Biotechnology Information (NCBI) publish the BLAST algorithm for aligning sequences (Journal of Molecular Biology).

1991

(June) NIH biologist J. Craig Venter announces a strategy to find expressed genes, using ESTs (Science). A fight erupts at a congressional hearing 1 month later, when Venter reveals that NIH is filing patent applications on thousands of these partial genes.

(October) The Japanese rice genome sequencing effort begins.

(December) Edward Uberbacher of Oak Ridge National Laboratory in Tennessee develops GRAIL, the first of many gene-finding programs (PNAS).

1992

(April) After a dispute with then-NIH director Bernadine Healy over patenting partial genes, Watson resigns as head of NCHGR.

(June) Venter leaves NIH to set up The Institute for Genomic Research (TIGR), a nonprofit in Rockville, Maryland. William Haseltine heads its sister company, Human Genome Sciences, to commercialize TIGR products.

(July) Britain's Wellcome Trust enters the HGP with $95 million. (September) Mel Simon of Caltech and colleagues develop BACs for cloning (PNAS). (October) U.S. and French teams complete the first physical maps of chromosomes: David Page of the Whitehead Institute and colleagues (pictured) map the Y chromosome (Science); Daniel Cohen of the Centre d'Etude du Polymorphisme Humain (CEPH) and Généthon and colleagues map chromosome 21 (Nature). (December) After lengthy debate, NIH and DOE release guidelines on sharing data and resources, encouraging rapid sharing and enabling researchers to keep data private for 6 months. U.S. and French teams complete genetic maps of mouse and human: mouse, average marker spacing 4.3 cM, Eric Lander and colleagues at Whitehead (Genetics, June); human, average marker spacing 5 cM, Jean Weissenbach and colleagues at CEPH (Nature, October). 1993 (April) Francis Collins of the University of Michigan is named director of NCHGR. (October) NIH and DOE publish a revised plan for 1993–98. The goals include sequencing 80 Mb of DNA by the end of 1998 and completing the human genome by 2005. (October) The Wellcome Trust and MRC open the Sanger Centre at Hinxton Hall, south of Cambridge, U.K. Led by John Sulston (pictured), the center becomes one of the major sequencing labs in the international consortium. (October) The GenBank database officially moves from Los Alamos to NCBI, ending NIH's and DOE's tussle over control. 1994 (September) Jeffrey Murray of the University of Iowa, Cohen of Généthon, and colleagues publish a complete genetic linkage map of the human genome, with an average marker spacing of 0.7 cM (Science). 1995 (May to August) Richard Mathies and colleagues at UC Berkeley and Amersham develop improved sequencing dyes (PNAS, May); Michael Reeve and Carl Fuller at Amersham develop thermostable polymerase (Nature, August). (July) Venter and Claire Fraser of TIGR and Hamilton Smith of Johns Hopkins publish the first sequence of a free-living organism, Haemophilus influenzae, 1.8 Mb (Science). (September) The Japanese government funds several sequencing groups for a total of$15.9 million over 5 years: Tokai University, University of Tokyo, and Keio University.

(October) Patrick Brown of Stanford and colleagues publish first paper using a printed glass microarray of complementary DNA (cDNA) probes (Science).

(December) Researchers at Whitehead and Généthon (led by Lander and Thomas Hudson at Whitehead) publish a physical map of the human genome containing 15,000 markers (Science).

1996

(February) At a meeting in Bermuda funded by the Wellcome Trust, international HGP partners agree to release sequence data into public databases within 24 hours.

(April) NIH funds six groups to attempt large-scale sequencing of the human genome.

(April) Affymetrix makes DNA chips commercially available.

(September) DOE initiates six pilot projects, funded at $5 million total, to sequence the ends of BAC clones. (October) An international consortium publicly releases the complete genome sequence of the yeast S. cerevisiae (Science). (November) Yoshihide Hayashizaki's group at RIKEN completes the first set of full-length mouse cDNAs. 1997 (January) NCHGR is promoted to the National Human Genome Research Institute; DOE creates the Joint Genome Institute. (September) Fred Blattner, Guy Plunkett, and University of Wisconsin, Madison, colleagues complete the DNA sequence of E. coli, 5 Mb (Science). (September) Molecular Dynamics introduces the MegaBACE, a capillary sequencing machine. 1998 (January) NIH announces a new project to find SNPs. (February) Representatives of Japan, the U.S., the E.U., China, and South Korea meet in Tsukuba, Japan, to establish guidelines for an international collaboration to sequence the rice genome. (March) Phil Green (pictured) and Brent Ewing of Washington University and colleagues publish a program called phred for automatically interpreting sequencer data (Genetic Research). Both phred and its sister program phrap (used for assembling sequences) had been in wide use since 1995. (May) PE Biosystems Inc. introduces the PE Prism 3700 capillary sequencing machine. (May) Venter announces a new company named Celera and declares that it will sequence the human genome within 3 years for$300 million.

(May) In response, the Wellcome Trust doubles its support for the HGP to $330 million, taking on responsibility for one-third of the sequencing. (October) NIH and DOE throw HGP into overdrive with a new goal of creating a “working draft” of the human genome by 2001, and they move the completion date for the finished draft from 2005 to 2003. (December) Sulston of the Sanger Centre and Robert Waterston of Washington University and colleagues complete the genomic sequence of C. elegans (Science). 1999 (March) NIH again moves up the completion date for the rough draft, to spring 2000. Large-scale sequencing efforts are concentrated in centers at Whitehead, Washington University, Baylor, Sanger, and DOE's Joint Genome Institute. (April) Ten companies and the Wellcome Trust launch the SNP consortium, with plans to publicly release data quarterly. (September) NIH launches a project to sequence the mouse genome, devoting$130 million over 3 years.

(December) British, Japanese, and U.S. researchers complete the first sequence of a human chromosome, number 22 (Nature).

2000

(March) Celera and academic collaborators sequence the 180-Mb genome of the fruit fly Drosophila melanogaster (left), the largest genome yet sequenced and a validation of Venter's controversial whole-genome shotgun method (Science).

(March) Because of disagreement over a data-release policy, plans for HGP and Celera to collaborate disintegrate amid considerable sniping.

(May) HGP consortium led by German and Japanese researchers publishes the complete sequence of chromosome 21 (Nature).

(June) At a White House ceremony, HGP and Celera jointly announce working drafts of the human genome sequence, declare their feud at an end, and promise simultaneous publication.

(October) DOE and MRC launch a collaborative project to sequence the genome of the puffer fish, Fugu rubripes (left), by March 2001.

(December) An international consortium completes the sequencing of the first plant, Arabidopsis thaliana (left), 125 Mb.

(December) HGP and Celera's plans for joint publication in Science collapse; HGP sends its paper to Nature.

2001

(February) The HGP consortium publishes its working draft in Nature (15 February), and Celera publishes its draft in Science (16 February).

18. # In Their Own Words

## The Human Genome Project … In Their Own Words

From the outset, the proposal to map and sequence the human genome has sparked controversy and evoked strong emotions. The following quotes capture how the debate has shifted over the years.

## Early debates

“It endangers all of us, especially the young researchers.”

David Botstein, Science, 27 June 1986

“The idea is gathering momentum. I shiver at the thought.”

David Baltimore, Science, 27 June 1986

“The idea of trudging through the genome sequence by sequence does not command wide and enthusiastic support in the U.K.”

Sydney Brenner, Science, 8 August 1986

“The total human sequence is the grail of human genetics.”

Walter Gilbert, Science, 27 June 1986

“It is clearly no longer a question of whether the project ought to be done, but of how fast it will be done.”

Russell Doolittle, Science, 13 February 1987

“I'm surprised consenting adults have been caught in public talking about it [sequencing the genome]. … It makes no sense.”

Robert Weinberg, New Scientist, 5 March 1987

“The sequence of the human genome would be perhaps the most powerful tool ever developed to explore the mysteries of human development and disease.”

Leroy Hood, Issues in Science and Technology, Spring 1987

## Walter Gilbert declares he will copyright and sell DNA data

“The idea of the company is to be a service to the biotech and pharmaceutical industries and to the research community. … [The sequence data] would be made available to everyone—for a price.”

Walter Gilbert, Science, 24 July 1987

“This information is so important that it cannot be proprietary.”

C. Thomas Caskey, Science, 24 July 1987

“If a company behaves in what scientists believe is a socially responsible manner, they can't make a profit.”

Robert Cook-Deegan, Science, 24 July 1987

## The publication of the “first” genetic map

“What they have accomplished is important. … But it is not what we believe should be properly called a map. … We would never have dreamed of making such a publication with our data set, which is substantially larger than theirs, because we still have significant gaps.”

Ray White, Science, 6 November 1987

“A map is a map. Our map has holes, we make no bones about it. … It is not Ray White's ideal, but so what?”

Helen Donis-Keller, Science, 6 November 1987

“It's a real shame that the only two groups in the world who are doing this haven't communicated and shared probes.”

Leroy Hood, Science, 6 November 1987

## Support builds

“You can't be against getting this information; it is too fundamental.”

Charles Cantor, Science, 12 February 1988

“The argument against DOE is that while they talk about peer review, it is not clear that they do it. … [About NIH,] you can't have a lead agency that doesn't want to do it.”

Bruce Alberts, Science, 12 February 1988

## Patent skirmishes

“I am horrified.”

James Watson, Science, 11 October 1991, on NIH's plans to patent J. Craig Venter's partial genes

“There is no coherent government policy [on gene patents] and we need one quick—since the sequence is just pouring out. It would be a big mistake to leave this one to the lawyers.”

David Galas, Science, 11 October 1991

## Venter announces Celera

“It strikes me that this is a cream-skimming approach. It's clearly an attempt to short-circuit the hard problems and defer them to the [research] community at a very substantial cost.”

Robert Waterston, Science, 15 May 1998, p. 994

“I think it's great.”

David Cox, Science, 15 May 1998, p. 994

“Every time we talk, we move [the deadline] up.”

Robert Waterston, Science, 19 March 1999, p. 1832 on the new goal to produce a rough draft

“The scientific community thinks this is just a business project, and the business community thinks it's just a science project.”

J. Craig Venter, Science, 18 June 1999, p. 1906

“Why should I play by their rules when I am not getting a cent of federal money? Let me get this straight. I am being criticized for doing the work and giving it away free, but not giving it away fast enough?”

J. Craig Venter, interview with L. Roberts, 2 September 1999

## The draft nears completion

“The change is so fundamental it is hard for even scientists to grasp.”

Maynard Olson, interview with L. Roberts, 16 November 1999

“Ten, 15 years from now, nobody is going to care about all this fuss and bother. They're going to care that we got the … human sequence done. … And all this back and forthing over who did what and what strategy was used and which money was public and which was private is probably going to sink below the radar screen. And hallelujah.”

Francis Collins, interview with L. Roberts, 19 August 1999

“We've called the human genome the blueprint, the Holy Grail, all sorts of things. It's a parts list. If I gave you the parts list for the Boeing 777 and it has 100,000 parts, I don't think you could screw it together, and you certainly wouldn't understand why it flew.”

Eric Lander, Millennium Evening at the White House, 14 October 1999

“Free will will not go out of style once the sequence is done.”

Francis Collins, interview with L. Roberts, 11 November 1999

“The prevailing view is that the genome is going to revolutionize biology, but in some ways, it's overhyped. In the end, the real insights are coming from individuals studying one gene at a time in real depth.”

Gerald Rubin, interview with E. Pennisi, May 2000

“If there is anything worth doing twice, it's the human genome.”

David Haussler, interview with E. Pennisi, July 2000

“Biology will never be the same.”

John Sulston, interview with E. Pennisi, February 2000
19. # A Genome Glossary

View this table:
20. # Can Data Banks Tally Profits?

1. Robert F. Service

Celera's high-powered sequencing has achieved impressive results, but it hasn't yet translated into a healthy bottom line. Like most biotech start-ups, the nearly-3-year-old company has yet to turn a profit. And, although the company's stock rocketed after going public in May 1999, its price tanked over the past year, along with that of other biotechs, from a high of $275 a share to about$50 today.

Although the red ink—$234 million so far—is not unusual, Celera Genomics of Rockville, Maryland, faces a long-term problem, according to analysts. Much of the raw data its sequencers have churned out is, or soon will be, freely available in public databases. All of which leads to the question: Just how is Celera going to turn a profit? That's still a big unknown, says David Molowa, a biotech analyst with J.P. Morgan Chase in New York City: “Celera's business model continues to be in flux.” Celera officials originally suggested that the company would make its money by selling subscriptions to its genome databases, which now include genomes of the human, fly, and parts of the mouse, along with a catalog of more than 3.5 million single-nucleotide polymorphisms, spots where the “letters” of the DNA sequence differ among individuals. Celera's president, J. Craig Venter, also said early on that the company would patent about 300 genes linked to diseases and make money by licensing rights to pharmaceutical companies to speed the discovery of new drugs (Science, 15 May 1998, p. 994). That plan is making headway. Celera signed up its first set of database subscribers in early 1999. Since then, the company has made about 30 deals with pharma companies, universities, and research institutes, says Paul Gilman, Celera's head of policy planning. The terms of specific deals remain private. But Gilman says pharmaceutical companies pay from$5 million to $15 million a year, whereas universities and nonprofit research outfits typically ante up$7500 to $15,000 for each lab that is given access. In its 2000 annual report, Celera said it earned$43 million, primarily from subscription deals. And in a conference call with reporters last month, Celera Chief Financial Officer Dennis Winger suggested that the company could pull in twice that amount this year. As for patents, Gilman will say only that the company has filed for “some” and that it expects the eventual number to remain in the 100 to 300 range.

But even if Celera manages to keep adding new customers, many analysts question how long its trove of data will retain its value if much the same information is available elsewhere for free. Celera intends to retain subscribers, says Gilman, by staying one step ahead of the academic competition. That means designing a simple computer interface to access the human genome data and integrate them with data from other genomes and information on the proteins the genes encode. That way, even if the raw data are available elsewhere, Celera will still have an edge, says Gilman.

In any case, Celera is already looking beyond sequencing to a new horizon: proteomics. Last year, the company raised about $1 billion in a stock offering for a major new research effort to understand the role of the proteins coded for by genes. Although this is intended in part to feed new information into the database business, Gilman says the efforts will likely lead to discoveries of drug targets or new drugs that Celera will attempt to commercialize either in collaboration with pharmaceutical and biotech companies or possibly on its own. That's a clear indication “that they want to get into the drug business in a limited way,” says Franklin Berger, also a biotech analyst with J.P. Morgan Chase. And that, he says, would make Celera look more like a genomics-based pharmaceutical company like Millennium Pharmaceuticals of Cambridge, Massachusetts, than simply a data provider. Gilman agrees to a point but insists that unlike straight drug-discovery ventures, Celera will still be “grounded” in the online business. Berger, Mollowa, and other analysts applaud the shift toward drugs, as it proposes to exploit whatever moneymaking opportunities arise from the genome. But it also moves Celera into another arena with plenty of competition. 21. # What's Next for the Genome Centers? 1. Elizabeth Pennisi 1. With reporting by Dennis Normile and Robert Koenig. Like a company that has just performed the first opera in Wagner's four-part epic Ring Cycle, the international public sequencing consortium is now hoping it can stay together long enough to complete the opus—while some of its performers go off to stage less gargantuan works. It is an impressive company, fully capable of putting on multiple productions. Collectively, the entire cast can churn out close to 2000 bases every second—that's 7.2 million per hour, 172 million per day, every day, 365 days a year—counting all the sequencing machines in all the labs responsible for the human genome effort. “What was a month's work a year ago is a week's work now,” says Richard Gibbs, director of the sequencing center at Baylor College of Medicine in Houston. That capacity could translate into a first pass at sequencing about two big mammalian genomes a year, if all players turned their machines on the same species. Now that the curtain has closed on the first production, the next order of business is to finish the entire work. Over the next 2 years, most of the G16—the 16 principal players involved in generating the bulk of the public's working draft of the human genome—will finish sequencing their allotted pieces, while their computer-savvy colleagues process and polish that sequence into gleaming, “final” form. Yet already, several of the G16 have begun to move on, sometimes scaling back their involvement with human DNA and often leaving high-volume genomic sequencing to labs with more sequencing machines and bigger budgets. For many, the goal now is to develop the tools to make sense of all this sequence data and to move into the study of gene function. Meanwhile, mammoth operations such as the Whitehead/MIT Genome Center in Cambridge, Massachusetts, and the Sanger Centre in Hinxton, U.K., continue to grow, adding more machines to tackle more and more large genomes and expanding the global sequencing capacity to levels unthinkable 5 years ago. The expectation is that each new genome will help bring the human genome into clearer focus. As a result, “the appetite for sequencing continues to go up,” notes Francis Collins, director of the National Human Genome Research Institute (NHGRI), which funded much of the U.S. human genome effort. ## Job one: The polished draft As pressure mounted over the past year or so to complete a working draft of the genome, debate intensified about how best to do the next step: finishing the job. Finishing is the “dotting the i's” stage of sequencing, the process that kicks in once each section of DNA is sequenced 8 to 10 times over. That “8×” or “10×” coverage should be more than enough to identify each base with confidence. During finishing, the many stretches of sequence are put in order until they make up one long, virtually continuous series of A's, T's, G's, and C's representing a chromosome from one end, or telomere, to the hard-to-sequence centromere in the middle. A second stretch of sequence covers the rest of the chromosome, starting from the other side of the centromere and ending at the other telomere. Barring a few impossible-to-determine gaps, chromosomes 21 and 22 are now at that stage in the public draft. As the adage goes, the last 5% is often as hard as the first 95%. Finishing is so labor-intensive that it's unclear how well the economies of scale realized by the bigger, highly automated sequencing centers would apply to this job. So, while the megacenters are shouldering ever more of the world's sequencing and will do the lion's share of finishing, the public consortium has decided to spread that polishing around. Two smaller groups have made finishing their primary activity. Maynard Olson's team at the University of Washington, Seattle, for example, is trying to automate this phase. Richard Myers's group at the Stanford Human Genome Center has essentially become a finishing arm for the Department of Energy (DOE), taking on chromosomes 5, 19, and part of 16. Even with these specialists, though, “we really are relying on everybody kicking in and doing this,” says Myers. Most groups are completing the sections they started. Like many others, he worries that as the end nears, more and more groups will spin off into other genomic ventures, leaving the human genome incomplete. “It needs to be finished,” warns Allan Bradley, director of the Sanger Centre. Even though the overall draft might seem complete, he says, most gene hunters will find just a piece of what they are looking for in GenBank and other databases. The genome centers are “better prepared” than individuals to fill in the blanks, he adds. To keep everyone focused on the goal, the consortium is trying to maintain its tight-knit collaboration, with regular conference calls and frequent e-mails among the partners. The finishing experts met at the recent Marco Island genome meeting in February, and in May, the group will huddle at the annual genome meeting at Cold Spring Harbor Laboratory in New York. Collins expects the full shotgun phase of the sequencing to be done possibly as soon as July. As for dotting all the i's, “we'd like to have it all finished to the same standard as chromosomes 21 and 22 by April 25, 2003,” the 50th anniversary of the publication of James Watson and Francis Crick's seminal paper on the double helical structure of DNA. ## Calling more genomes Making a finished genome meaningful requires “annotation,” in which genes and other features of the genomic landscape are located and described (see p. 1177). Because deciphering the genomes of other species should speed and improve annotation, the big players in the human genome effort have taken on new genomes as well. At the top of their list: the lab mouse, which Celera Genomics, the Rockville, Maryland, company that sequenced the human genome separately from the public effort, has already sequenced to some degree. As for the public mouse effort, “it's going great guns,” Collins says. After NHGRI awarded 10 mouse sequencing and mapping grants 15 months ago, the groups involved were jockeying over how best to tackle the task and how to divide up the work, as it was rapidly becoming possible for a single center to take on the entire genome on its own. But now, an infusion of more than$30 million in October 2000 from several companies and Britain's Wellcome Trust has turbo-charged the work. With the new money, the Washington University, Whitehead, and Sanger centers bought more high-powered capillary sequencers; they expect to generate a working draft of the mouse genome, covering 95% of the 3 billion bases 2.5 to 3 times over, by March.

In addition to that draft, several smaller centers, such as the Institute of Molecular Biotechnology in Jena, Germany, and the University of Oklahoma, Norman, are producing finished data for biologically important regions of the mouse genome. “The mouse data will be gloriously useful,” Collins says. “Ninety-five percent of the human [genes] can be found if we have the mouse.”

A close second for sequencing is the rat genome, already under way at a modest level with NHGRI funds but about to receive a big boost from the U.S. National Heart, Lung, and Blood Institute. The rat is a key model for many diseases, because it's slightly larger than the mouse and its physiology is more amenable to study. Sequencing is under way at Baylor College of Medicine and at Genome Therapeutics Corp. in Waltham, Massachusetts.

The list continues. China has its eyes on the pig (Science, 3 November 2000, p. 913). DOE's Joint Genome Institute (JGI) plans to start on a sea squirt in March, after polishing off a rough draft of the Japanese puffer fish genome, which has about the same complement of genes but much less junk DNA. The whole genome is only 400 million bases, compared to the human and mouse at 3 billion apiece. Several other groups are dogging the genomes of the freshwater puffer fish, and the Max Planck Institute for Molecular Genetics in Berlin has even sequenced a small fraction of the chimpanzee genome in collaboration with RIKEN's Genomic Sciences Center in Japan. “There are many other nominees raising their hands saying, ‘Sequence me, please,'” says Collins, who notes that the National Institutes of Health will host a workshop this spring to sort out which to do first. The lobbying is already intense.

These sequencing efforts are the first steps toward what many biologists see as the best way to learn about the human genome: through comparative genomics. Each new genome helps biologists develop a clearer picture of what's important in DNA, as key regions, such as protein-coding exons or binding sites for transcription factors, are conserved to varying degrees among species. Genomicist Eric Green of NHGRI has picked a few key regions, including those containing the cystic fibrosis gene and the gene for Williams syndrome, to sequence in dog, cat, horse, cow, baboon, and five other vertebrates. Eric Lander's group at Whitehead is sequencing several fungi of known evolutionary distance from bakers' yeast to find out how evolutionary distance affects the comparisons. It turns out two closely related species can be too similar to reveal certain key conserved features.

In all likelihood, the big centers will tackle large genomes on their own in a few years. And that prospect is exciting not only to geneticists and biomedical researchers but also to a broad range of biologists. For instance, evolutionary and developmental biologists would like to decipher the genomes of some 100 species distributed across the evolutionary tree; they have already asked the National Science Foundation, which supported the first sequencing of a plant, for planning support. It won't be long, says Oklahoma's Bruce Roe, before “we have the technology to answer very broad-based questions on how organisms evolved.”

## 21st century biology

As valuable as these genome sequences will be, the sequence by itself doesn't tell researchers what genes do—and that's rapidly becoming the focus of a number of centers, both large and small. National genome budgets are beginning to reflect this emphasis. Already, about 90% of Germany's human genome project budget goes toward functional genomics, says Jörg Wadzack, a molecular biologist for the German Human Genome Project. NHGRI is evaluating proposals for centers of excellence that will also push the U.S. human genome effort in a functional direction. Stanford, for example, is one of several centers helping to build the Mammalian Gene Collection, a set of 25,000 full-length complementary DNAs (cDNAs) for mouse and human. (cDNA includes all the coding regions of a gene.) Already, the Japanese have collected 20,000 full-length mouse cDNAs that are part of a mouse gene encyclopedia (Science, 9 February, p. 963). These cDNAs will “help annotate the genome and find genes,” says Stanford's Myers. Adds Collins: “This [collection] will be one of the durable goods of the genome project.”

At the Sanger Centre as well, “the big emphasis is going to be understanding how genes work,” says bioinformaticist Richard Durbin. “We see ourselves expanding our biological programs.” The Whitehead Institute is already well along that road: Since 1997 it has worked with Bristol-Myers Squibb Co., Affymetrix Inc., and Millennium Pharmaceuticals Inc. to develop microarrays for monitoring gene expression. Lander expects to increase his group's emphasis on the genetics of disease traits. “We got into the genome project 15 years ago because our interest was in studying complex traits,” he says. “The genome was part of the necessary infrastructure [for] studying those traits.”

While these researchers are expanding into functional and comparative genomics, they are also venturing into new territory. “We are sitting on a [sequencing] capacity that can really change our thinking,” says Trevor Hawkins, director of JGI. Hawkins thinks it's now practical to sequence, say, the same coding region from 100 people to begin to understand the effect of variation on a particular disease or trait. Sanger's Bradley agrees: “A lot of sequencing capacity will be used for resequencing, looking at sequence variation, and looking for disease genes,” he predicts.

As Collins, his predecessor James Watson, and others predicted at the outset, the sequence of the human genome is turning out to be a tool to enable an astounding new array of biological studies. And Sanger's Durbin agrees: “It's going to provide a kickoff for a whole lot of interesting science for a very broad set of scientists.” Far from signaling the end of the genomic opera, this week's publication is merely the close of the first act.

22. # Hunting for Collaborators of Killer Toxins

1. Jocelyn Kaiser

Many gene hunters track sequences that inevitably lead to disease. Environmental health researchers seek a different quarry: variations in genes that by themselves might be harmless, but, when a person is exposed to environmental toxins, can amplify that person's risk of illness. Studies of these genes, which often code for enzymes that metabolize toxins or repair DNA damage from carcinogens, could lead to a better understanding of how the genes make people vulnerable and which individuals are at risk.

The growing list of diseases linked to these environmental susceptibility genes includes asthma, diabetes, lead poisoning, and a lung disease caused by the metal beryllium. Many disorders involve more than one gene: Variants of two genes, for example, boost the risk of bladder cancer 10-fold in people who smoke, according to recent studies at the National Institute of Environmental Health Sciences in Research Triangle Park, North Carolina.

To help find these genes and explore how they operate, toxicologists are using DNA arrays—glass chips dotted with gene sequences (Science, 28 July 2000, p. 536). For instance, Leona Samson's team at Harvard University has found that DNA-damaging chemicals known as alkylating agents turn on or off at least 400 genes in yeast cells, including “totally unexpected” genes involved in novel repair pathways, says Samson. The complete sequence of the human genome will enable scientists to do such toxicogenomic studies with cells from various human organs, notes Michael Gallo of the University of Medicine and Dentistry of New Jersey-Robert Wood Johnson Medical School in Piscataway. “That's the real beauty of the genome. We can now start to dissect toxic responses at the molecular level,” says Gallo.

23. # Unsung Heroes

Mike Hunkapillar and his team at Applied Biosystems Inc. put the first automated sequencing machine on the market in the mid-1980s. In the late 1990s, Hunkapillar's group at PE Biosystems developed the lightning-speed PE Prism 3700 machine, which was used for all of Celera's sequencing and much of the public project's.

Lauren Linton, a former biotech manager, swept into a sluggish Whitehead/MIT Genome Center in 1999 promising to boost productivity 10-fold. Instead, Whitehead rocketed it up 20-fold, becoming the top sequencer in the public consortium. Linton has now left to start her own company.

Phil Green, a mathematician and software designer, wrote the phred and phrap programs at Washington University in St. Louis, Missouri. These became essential tools for evaluating the quality of raw DNA sequence and linking up assemblies. He's now at the University of Washington, Seattle, creating new programs.

Although they were slow to win acceptance, the bacterial artificial chromosomes (BACs) created by geneticist Simon (left) of the California Institute of Technology in Pasadena soon became the “currency of the genome,” as he says. These clones' large capacity and stability make them highly efficient. Using BACs, Caltech's de Jong created massive “libraries” of DNA from various human tissues for sequencing.

Ever since he teamed up with J. Craig Venter at the National Institutes of Health (NIH) in 1990, Adams has been one of the country's top sequencing gurus. After developing expressed sequence tags with Venter at NIH, Adams followed him to The Institute for Genomic Research (TIGR) in Rockville, Maryland, and then to Celera, also in Rockville, where he is refining methods for whole-genome shotgun sequencing.

Jim Kent, a bioinformatics graduate student at the University of California, Santa Cruz, wrote a program in just 4 weeks that pieced together the rough draft of the human genome for the public consortium—producing an assembly called the “golden path.”

A U.S. Department of Energy (DOE) physicist, Branscomb got swept up in the genome program and became a bioinformaticist overnight, helping with genome mapping and later nudging DOE sequencing into high gear as director of the Joint Genome Institute in Walnut Creek, California.

An ocean apart, Dovichi at the University of Alberta in Canada and Kambara at the Hitachi Co. in Tokyo independently hit upon a sequencing technology that greatly advanced the human genome project. The method, used in today's high-speed machines, uses laser beams to scan DNA being pumped through numerous capillary tubes, simultaneously identifying the bases by color-coded chemical tags.

Bioinformaticist Li came to Celera from the publicly funded Genome Data Base organization at the Johns Hopkins School of Medicine in Baltimore to lead the chromosome team with Mural, a co-author of gene-finding software called GRAIL. Their team validates DNA assemblies and locates them on chromosomes.

After developing sequencing technology with Fred Sanger and producing physical maps of Caenorhabditis elegans, Coulson headed up the sequencing effort with John Sulston at the Sanger Centre in Hinxton, U.K. As Sanger scaled up to tackle the human genome, Coulson “quietly rolled up his sleeves,” says Sulston, to run the team that produced and selected clones for sequencing.