Special Reviews

Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology

See allHide authors and affiliations

Science  11 Dec 1998:
Vol. 282, Issue 5396, pp. 2012-2018
DOI: 10.1126/science.282.5396.2012


The 97-megabase genomic sequence of the nematode Caenorhabditis elegans reveals over 19,000 genes. More than 40 percent of the predicted protein products find significant matches in other organisms. There is a variety of repeated sequences, both local and dispersed. The distinctive distribution of some repeats and highly conserved genes provides evidence for a regional organization of the chromosomes.

The genome sequence of C. elegans is essentially complete. The sequence follows those of viruses, several bacteria, and a yeast (1, 2) and is the first from a multicellular organism. Some small gaps remain to be closed, but this will be a prolonged process without much biological return. It therefore now makes sense to review the project as a whole.

Here, we describe the origins of the project, the reasons for undertaking it, and the methods that have been used, and we provide a brief overview of the analytical findings. The project began with the development of a clone-based physical map (3, 4) to facilitate the molecular analysis of genes, which were being discovered at an ever increasing pace through the study of mutants. This, in turn, initiated a collaboration between the C. elegans Sequencing Consortium and the entire community of C. elegansresearchers (5). The resulting free exchange of data and the immediate release of map information (and later sequence) have been hallmarks of the project. The resultant cross correlation between physical and genetic maps is ongoing and is essential for achieving an increasing utility of the sequence.

Along with the genome sequencing project, expressed sequence tag (EST) sequencing has been carried out. Early surveys of expressed sequences were conducted (6), but complementary DNA (cDNA) analysis has been carried out primarily by Y. Kohara (7). This group has contributed 67,815 ESTs from 40,379 clones, representing an estimated 7432 genes. This extensive information has been invaluable in identifying and annotating genes in the genomic sequence. Others also contributed the 15-kilobase (kb) mitochondrial genome sequence (8).


The preexisting physical map, on which sequencing was based, had been initiated by the isolation and assembly of random cosmid clones (with a 40-kb insert, which was the largest insert cloning system available at the time) with a fingerprinting method (3). At a sixfold redundant coverage of the genome in cosmids, nonrandom gaps persisted. In most cases, hybridization screening of cosmid libraries failed to yield bridging clones, but the newly developed yeast artificial chromosome (YAC) clones (9) rapidly closed most of the cosmid gaps. Incidentally, the YAC clones also covered almost all of the genome, providing a convenient tool for the rapid scanning of the entire genome by hybridization (4). About 20% of the genome is represented only in YACs.

By 1989, it became apparent that, with the physical map in hand, complete sequencing of the genome might be both feasible and desirable. Joint funding [from the National Institutes of Health and the UK Medical Research Council (MRC)] for a pilot study was arranged, and in 1990, the first 3-megabase (Mb) sequence was undertaken. Success in this venture (10, 11) resulted in full funding and the expansion of the two groups of the consortium in 1993.

Sequencing began in the centers of the chromosomes, where cosmid coverage and the density of genetic markers are high. Cosmids were selected by fingerprint analysis to achieve a tiling path of overlapping clones (in practice, 25% overlap on average). Some sequencing of YACs was explored (12), but because of yeast DNA that contaminated preparations of YAC DNA, this approach was deferred in anticipation of the complete sequence of yeast, which enabled contaminating reads to be easily identified. The sequencing process (13) can be divided into two major parts: the shotgun phase, which is sequence acquisition from random subclones, and the finishing phase, which is directed sequence acquisition to close any remaining gaps and to resolve ambiguities and low-quality areas. Numerous and ongoing improvements to the shotgun phase have increased sequencing efficiency, improved data quality, and lowered costs. Similarly, finishing tools have improved dramatically. Nonetheless, finishing still requires substantial manual intervention, with a variety of specialized techniques (14, 15).

Restriction digests with several enzymes were performed on most cosmids and provided valuable checks on sequence assembly. Where assembly was ambiguous because of repeats, the digests were helpful in resolving the problem. At the start of the project, polymerase chain reaction (PCR) checks were conducted along the length of the sequence to confirm that the assembled sequence of the bacterial clone was an accurate representation of the genome. These checks were abandoned after it became clear that failures in PCR were more common than discrepancies between the clone and the genome.

When available cosmids were exhausted, we screened fosmids (which are similar to cosmids but are maintained at a single copy per cell and thus are potentially more stable) (16) and found that a third of the gaps were bridged in the central regions of the chromosomes but very few were bridged in the outer regions. We also used long-range PCR (17) to recover some of the central gaps. The remainder of the central gaps and all of the gaps in the outer regions were recovered by sequencing YACs. As for the cosmids, a tiling path of YACs was chosen, and DNA from selected clones was isolated by pulsed-field gel electrophoresis (18). Sequencing was performed as for cosmids, with suitable adaptations for the smaller amount of DNA that was available for making libraries. Restriction digests were carried out for assembly checks, but they were not as precisely interpretable as those for bacterial clones. At this stage, the physical map was consolidated and sometimes rearranged as the YAC sequences confirmed or rejected the links made previously by hybridization. The comparison of the assembled YAC sequences with the often extensively overlapping cosmid sequences showed few discrepancies between the two sequences. Generally, further investigation revealed that most discrepancies resulted from a rearrangement in the cosmid. It is interesting (and crucial to the success of the YAC sequencing) that nearly all regions of the YACs can be cloned in bacteria as short fragments, although cosmid and fosmid libraries failed to represent these regions.

The key step in closing sequence assemblies was to obtain subclones that bridged the gaps remaining after the shotgun phase. Often, gaps are spanned by the subclones used in the shotgun phase, because the insert length is deliberately set at two to four times the typical sequence read length. The introduction of plasmid clones halfway through the program greatly improved the coverage of inverted repeats and other unusual structures. In cases where the shotgun phase failed to yield a spanning subclone, plasmid clones that bridged gaps were obtained by isolating and subcloning restriction fragments from cosmids. In YACs, because of their greater size and complexity, screening by hybridization was necessary to recover the desired subclone. In the most difficult cases, we have exploited very short insert plasmid libraries to find gap-bridging clones. PCR was used occasionally, but because of its tendency to yield artifacts in repeat regions, it has recently been used as little as possible. Once isolated, the gap-bridging clone was either sequenced directly or, in cases of a difficult secondary structure, a short insert library (SIL) was made by breaking the insert of the gap-bridging clone into smaller fragments (0.5 kb or even smaller in difficult cases), with break points interrupting the secondary structure (15). In some cases, transposon insertion has been used (19), although SILs are generally preferred as a first pass because of their ease of throughput.

The 97-Mb sequence is a composite of 2527 cosmids, 257 YACs, 113 fosmids, and 44 PCR products (20, 21). For the 12 chromosome ends, nine of the telomere plasmid clones provided by Wicky et al. have been linked to the outermost YACs (22), either directly by sequence or by long-range PCR and sequencing, where no direct sequence link was found. This probably represents >99% of the genome sequence, on the basis of the representation in the genomic sequence of available EST data and of the sequence from random clones from a whole-genome library.

Much of the remaining DNA likely resides in the three residual gaps between the telomeres and the outermost sequenced YACs and in two internal gaps, where no spanning YAC clone has been identified. One of these is known to be <450 kb, on the basis of Southern (DNA) analysis, but a reliable size estimate is not available for the other gaps. A smaller amount will be recovered from four smaller segments (which are spanned by YACs), where shotgun sequencing has not been completed. Furthermore, very small segments (likely to be <1 kb each) have not been recovered in subclones for 139 segments. Finally, some sequence is likely to be missing from the large tandem repeats, which, in extreme cases, consist of tens of kilobases that are composed of hundreds of copies of a short sequence. Although most have been sized by restriction digestion of the cloned DNA, some segments in the larger YACs are of unknown size. Having established the repeat elements, we cannot usefully work further on them at this stage, because they are likely to be variable and because they do not clone stably; any repeat elements that prove to be important will become the subject of population studies in the future.

As shown by the resolution of discrepancies resulting from matches with sequence data from other sources, the error rate of almost all the product is <10−4. In a few regions (predominantly in regions of extensive tandem repeats), the sequence is tagged to indicate that a lower standard of accuracy has been accepted. Accuracy is maintained by a set of criteria (23), which is followed by the finisher and by a final checking step that requires specialized software (24) and a visual inspection. None of this, however, overcomes errors in the cloning process. A comparison of different clones in overlapping regions and the resolution of discrepancies have indicated a finite error rate associated with cloning. For example, cosmid B0393 (GenBank accession number Z37983) contains a deletion of a large hairpin that was only detected because it overlapped cosmid F17C8 (GenBank accession number Z35719); similarly, we detected a 400–base pair region that had been deleted in all M13 and PCR reads from cosmid F59D12 (GenBank accession numberZ81558). The F59D12 deletion was detected by restriction digestion and was recovered in plasmids. However, these instances are rare enough that undetected errors are likely to be few; thus, the advantages of the clone-based sequence, in avoiding long-range confusion in assembly, more than make up for its occasional defects.

Sequence Content

Whereas the sequencing has essentially been completed, analysis and annotation will continue for many years, as more information and better sequence annotation tools become available.

To begin the task, we subjected each completed segment to a series of automatic analyses to reveal possible protein (25) and transfer RNA (tRNA) genes (26), similarities to ESTs and other proteins (27–30), repeat families, and local repeats (31). The results were entered in the genome database “a C. elegans database” (ACEDB) (32), which merges overlapping sequences to provide seamless views across clone boundaries and allows the periodic and automatic updating of entries. To integrate and reconcile the various views of the sequence, we reviewed all data interactively through the ACEDB annotator's graphical workbench (32). In particular, the GENEFINDER (25) predictions are confirmed or adjusted to account for protein, cDNA, and EST matches, repeats, and so forth, and annotation concerning putative gene function is added.

The interruption of the coding sequence by introns, the generation of alternatively spliced forms, and the relatively low gene density make accurate gene prediction more challenging in multicellular organisms than in microbial genomes. The problem is made more complex in C. elegans by transplicing and by the organization of as many as 25% of the genes into operons (33). We have used GENEFINDER to identify putative coding regions and to provide an initial overview of gene structure. To quantitate the accuracy of gene identification, we compared introns that were confirmed by ESTs and cDNAs to those that were predicted by GENEFINDER. We found that 92% of the predicted introns had an exact match to the experimentally confirmed ones and that 97% had an overlap. Identification of the start and stop of genes is more difficult, and errors in this process sometimes result in the merging of some neighboring genes and in the splitting of others. To refine the computer-generated gene structure predictions, expert annotators use any available EST and protein similarities, as well as genomic sequence data from the related nematode C. briggsae. This information can be especially important in establishing gene boundaries. About 40% of the predicted genes have a confirming EST match, but because ESTs are partial, they presently confirm only ∼15% of the total coding sequence. In a number of cases, ESTs have provided direct evidence of alternative splicing; these instances have been annotated in the sequence (34).

The genes. The 97-Mb total sequence contains 19,099 predicted protein-coding genes—16,260 of which have been interactively reviewed, for an average density of 1 predicted gene per 5 kb (35). Each gene has an average of five introns, and 27% of the genome resides in predicted exons. The number of genes is about three times that found in yeast (2) and is about one-fifth to one-third the number predicted for humans. As expected from earlier estimates that were based on much smaller amounts of genome sequence, the number of predicted genes is much higher than the number of essential genes that was estimated from classical genetic studies (10, 36).

Similarities to known proteins provide a glimpse of the possible function of the predicted genes. Approximately 42% of predicted protein products have distant matches (outside Nematoda); most of these matches contain functional information (37). Another 34% of predicted proteins match only other nematode proteins, but only a few of these have been functionally characterized. The fraction of genes with informative similarities is far lower than the 70% seen for microbial genomes. This may reflect the smaller proportion of nematode genes that are devoted to core cellular functions (38), the comparative lack of knowledge of functions involved in building an animal, and the evolutionary divergence of nematodes from other animals studied extensively at the molecular level.

We compared the available protein sets from C. elegans, Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens to highlight qualitative differences in the predicted protein sets (39) (Fig. 1). Generally, we found that smaller genomes had matches to a larger fraction of their protein sets and larger genomes had higher numbers of matching proteins. As expected from evolutionary relationships, there were substantially more protein similarities found between C. elegans and H. sapiens than between any other cross-species pairwise comparison. There were also a substantial number of proteins common to C. elegans and E. coli that were not found in yeast. Similarly, C. elegans lacked proteins that were found in both yeast and E. coli (38).

Figure 1

Percentages of matching proteins resulting from pairwise comparisons (39). The organisms and the number of proteins used in the analysis are shown in boxes. For S. cerevisiae (a fungus), C. elegans (a nematode), andE. coli (a bacteria), the numbers reflect proteins that were predicted from an essentially complete genome sequence. The direction of the arrows indicates how the comparison was performed. Numbers that are adjacent to the arrows indicate the percentage of proteins that were found to match. Numbers that are underlined and in bold-faced type indicate the percentage of C. elegans proteins that were found to match each of the other organisms.

Genes encoding proteins with distant matches (outside Nematoda) were more likely to have a matching EST (60%) than those without such matches (20%). This observation suggests that conserved genes are more likely to be highly expressed, perhaps reflecting a bias for “housekeeping” genes among the conserved set. Alternatively, genes lacking confirmatory matches may be more likely to be false predictions, although our analyses do not support this (40).

We have also used the Pfam protein family database (41) to classify common protein domains in the nematode genome. Of the 20 defined domains that occur most frequently (Table 1), the majority are implicated in intercellular communication or in transcriptional regulation. We find comparatively fewer examples of second messenger proteins (for example, 54 G-beta and 3 Src homology 2 domains). This finding supports models in which the same intracellular signaling pathways are used with variant receptors and transcription factors in different cell states.

Table 1

The 20 most common protein domains in C. elegans (41). RRM, RNA recognition motif; RBD, RNA binding domain; RNP, ribonuclear protein motif; UDP, uridine 5′-diphosphate.

View this table:

In addition to the protein-coding genes, the genome contains at least several hundred genes for noncoding RNAs. There are 659 widely dispersed tRNA genes and at least 29 tRNA-derived pseudogenes (42). Forty-four percent of the tRNA genes are found on the X chromosome, which contains only 20% of the total sequence. Several other noncoding RNA genes occur in dispersed multigene families. The U1, U2, U4, U5, and U6 spliceosomal RNA genes occur in 14, 21, 5, 12, and 20 dispersed copies, respectively; there are five dispersed copies of signal recognition particle RNA genes, and there are at least four dispersed copies of splice leader 2 (SL2) RNA genes. A striking feature of these dispersed gene families is their high degree of sequence homogeneity. For example, of the 20 U6 RNA genes, 17 are 100% identical to each other. Either gene conversion or recent gene duplications may account for this homogeneity. Several of these RNA genes occur in the introns of protein-coding genes, which may indicate RNA gene transposition. In general, RNA genes in introns do not appear to occur preferentially in the coding orientation of the encompassing transcript, which indicates that these RNA genes are probably expressed independently.

Other noncoding RNA genes occur in long tandem arrays. The ribosomal RNA genes occur solely in such an array at the end of chromosome I. The 5S RNA genes occur in a tandem array on chromosome V, with array members separated by SL1 splice leader RNA genes. A few other known RNA genes, such as the small cytoplasmic Ro-associated Y RNA and the lin-4 regulatory RNA, are found only once in the genome. Some RNA genes that are expected to be present in the genome have yet to be identified, probably because they are poorly conserved at both the sequence and secondary structure level. These include ribonuclease P RNA, telomerase RNA, and 100 or more small nucleolar RNA genes.

Repetitive sequences. Some of the sequence that does not code for protein or RNA is undoubtedly involved in gene regulation or in the maintenance and movement of chromosomes. A significant fraction of the sequence is repetitive, as in other multicellular organisms. We have classified repeat sequences as either local (that is, tandem, inverted, or simple sequence repeats) or dispersed.

Tandem repeats account for 2.7% of the genome and are found, on average, once per 3.6 kb. Inverted repeats account for 3.6% of the genome and are found, on average, once per 4.9 kb. Many repeat families are distributed nonuniformly with respect to genes and, in particular, are more likely to be found within introns than between genes. For example, although only 26% of the genome sequence is predicted to be intronic, it contains 51% of the tandem repeats and 45% of the inverted repeats. The 47% of the genome sequence that is predicted to be intergenic contains only 49% of the tandem repeats and 55% of the inverted repeats. As expected, only a small percentage of the tandem repeats overlaps with the 27% of the genome encoding proteins.

Although local repeat structures are often unique in the genome, others come in families. For example, repeat sequence CeRep26 is the tandemly occurring hexamer repeat TTAGGC, which is seen at multiple sites that are internal to the chromosomes in addition to the telomeres (22). CeRep26 and CeRep27 are excluded from introns, whereas other repeat families show a slight positive bias toward introns. The reason for the biased distribution of these repeats is unclear. Furthermore, some repeat families show a chromosome-specific bias in representation. For example, CeRep11, with 711 copies distributed over the autosomes, has only one copy located on the X chromosome.

Altogether, we have recognized 38 dispersed repeat families. Most of these dispersed repeats are associated with transposition in some form (43) and include the previously described known transposons of C. elegans. However, these repeat elements may not explicitly encode an active transposon (44). For example, we have found four new families of the Tc1/mariner type, but these are highly divergent from each other and the other family members; they are probably no longer active in the genome.

In addition to multicopy repeat families, we observe a substantial amount of simple duplication of sequence, that is, segments ranging from hundreds of bases to tens of kilobases that have been copied in the genome. In one case, a segment of 108 kb containing six genes is duplicated tandemly with only 10 sites observed to be different between the two copies. At the left end of chromosome IV, immediately adjacent to the telomere, an inverted repeat is present where each copy of the repeat is 23.5 kb, with only eight different sites found between the two copies. Many cases of shorter duplications are found, which are often separated by tens of kilobases or more that may also contain a coding sequence. These duplications could provide a mechanism for copy divergence and the subsequent formation of new genes. In one example, two 2.5-kb segments, separated by 200 kb, were found to contain genes exhibiting a 98% sequence identity (C38C10.4 and F22B7.5). EST data indicate that both genes are expressed. More commonly, gene duplications are local. In a search for local clusters of duplicated genes, 402 clusters were found distributed throughout the genome (Fig. 2).

Figure 2

Locations by chromosome (shown by roman numerals) of local gene clusters. The x axis represents the physical distance in kilobases along the chromosomes. The yaxis represents the size of the clusters. For example, the chitinase cluster on chromosome II contains 17 chitinaselike genes. Local gene clusters were determined by searching for all cases of Ngenes that are similar within a window of 2N genes along the chromosomes (for example, three similar genes within a window of six were considered a cluster; clusters were extended until no similar genes could be added). Clusters of N = 3 or more were plotted. The criterion for similarity was defined as a BLASTP score of at least 200. ATP, adenosine 5′-triphosphate; TM, transmembrane; Mem. Recep., membrane receptor; SCP/TPX, a family of proteins (SCP, sperm-coating glycoprotein; TPX, Tpx-1, a testis-specific protein).

Chromosome organization. At first sight, the genome looks remarkably uniform; GC content (36%) is essentially unchanged across all the chromosomes, unlike the GC content in vertebrate genomes, such as human, or yeast (45). There are no localized centromeres as found in most other metazoa. Instead, the extensive, highly repetitive sequences that are characteristic of centromeres in other organisms may be represented by some of the many tandem repeats found scattered among the genes, particularly on the chromosome arms. Gene density is also fairly constant across the chromosomes, although some differences are apparent, particularly between the centers of the autosomes, the autosome arms, and the X chromosome (Table 2 and Fig. 3).

Figure 3

Distributions of predicted genes; EST matches; yeast protein similarities; and inverted, tandem, and TTAGGC repeats along each chromosome. Gene density varies little along and among the autosomes. On the X chromosome, genes appear at a lower density and are more evenly distributed. In contrast, the frequency of EST matches varies according to their position along the autosomes, indicating a clustering of highly expressed genes. The chromosomal locations of these clusters correlate well with the chromosomal locations of gene products that exhibit significant similarities to yeast proteins (P value of 10−9). For the autosomes, repeat density varies dramatically with chromosomal position and is highest on the arms. The density of inverted and tandem repeats on the X chromosome is more uniform, but similar to the autosomes, TTAGGC repeats tend to be located on the arms. Supplemental information regarding the analysis can be found atwww.sciencemag.org/feature/data/c-elegans.shl for a general overview.

Table 2

Gene density. Autosomes are divided into the genetically defined compartments of the left arm (L), the central cluster region (C), and the right arm (R). The percentage of genes with EST and database matches was determined only from manually inspected genes. Database matches to nonnematode proteins were determined with WUBLASTP (P ≤ 0.001). Parentheses denote the number of low-scoring predictions thought to be pseudogenes.

View this table:

Striking differences become evident after an examination of other features. Both inverted and tandem repetitive sequences are more frequent on the autosome arms (Fig. 3) than in the central regions of the chromosomes or on the X chromosome. For example, CeRep26 is virtually excluded from the centers of the autosomes (Fig. 3). (The abundance of repeats on the arms is likely to be a contributing factor to the difficulties in cosmid cloning and sequence completion in these regions.) The fraction of genes with similarities to organisms other than nematodes tends to be lower on the arms, as does the fraction of genes with EST matches. The difference between autosome arms and central regions is even more obvious in the number of EST matches (46). The local gene clusters described above also appear to be more abundant on the arms.

These features, together with the fact that meiotic recombination is much higher on the autosome arms, suggested that the DNA on the arms might be evolving more rapidly than in the central regions of the autosomes. If so, one might expect that the conserved set of eukaryotic genes shared by yeast and C. elegans would be largely excluded from the arms. To test this, we identified 1517 proteins inC. elegans that are highly similar to yeast genes and plotted their location along the length of the chromosomes (Fig. 3). For four of the five autosomes, the differences in the distribution of core genes are quite striking, with surprisingly sharp boundaries evident. These boundaries appear close to the boundaries in the genetic map that separate regions of high and low rates of recombination (47).


There are several reasons for completely sequencing a genome. The first and most simple reason is that it provides a basis for the discovery of all the genes. Despite the power of cDNA analysis and its enormous value in interpreting genome sequence, it is now generally recognized that a direct look at the genome is needed to complete the inventory of genes. Second, the sequence shows the long-range relationships between genes and provides the structural and control elements that must lie among them. Third, it provides a set of tools for future experimentation, where any sequence may be valuable and completeness is the key. Fourth, sequencing provides an index to draw in and organize all genetic information about the organism. Fifth, and most important over time, is that the whole is an archive for the future, containing all the genetic information required to make the organism (the greater part of which is not yet understood). As a resource, the sequence will be used indefinitely not only by C. elegans biologists, but also by other researchers for the comparison with and the interpretation of other genomes, including the human genome.

As was already known, the genome of a multicellular organism is very different from that of a microbial organism (and even different from that of a eukaryote such as yeast). It is predominantly noncoding, with genes extended (sometimes over many kilobases) by introns. Rather than acting primarily as the source for a set of protein sequences, the genomic sequence itself remains the primary focus of annotation. There are two reasons for this. First, much information about biological function is located in noncoding sequences; second, current methods of gene identification, both experimental and computational, are not yet accurate and complete enough to provide a definitive set of protein sequences.

If we began again now, would we employ the same approach? Almost certainly (48). The clone-based physical map was a critical factor in organizing the project between the two sites. The clones of the map have also been valuable reagents for the research community and continue to be so; the discrete assemblies of cosmids and YACs have been essential to disentangling extensive repeats in many areas. For the numerous small areas that are underrepresented in shotgun assemblies, rare subclones can be readily recovered from the cosmid and YAC subclone libraries.

There are two minor changes that we would make in the sequencing approach. We would add longer insert bacterial clones (for example, bacterial artificial chromosomes) to the map, fingerprinting them in the same manner as cosmids (48). Second, we would begin YAC sequencing earlier in the project. That we did not do so on this occasion was for historical reasons [in particular, the availability of the yeast genome sequence (see above)].

How important has the worm project been to the Human Genome Project? Through feedback from many sources, we gather that it has been influential in showing what can be done. Certainly, it is remarkable to look back to 1992, when a paper concerning just three cosmids was published as an important milestone (10). Undoubtedly, the worm project has contributed to technology and software development; it is not a unique test-bed, but along with the other genome projects, it has explored ways of increasing scale and efficiency.

Where is the finish line? This publication marks more of a beginning than an end and is another milestone in an ongoing process of the analysis of C. elegans biology. It is not very meaningful at any particular point to call genomes of this size finished, because of the inevitable imperfections that will only gradually be resolved. This is true no matter what method of sequencing is adopted. The important thing is not a declaration of completion, but rather the provision of the best possible tools to the users at every stage and a commitment to maintenance and improvement, through interaction with the user community, as long as that is needed.

  • * See genome.wustl.edu/gsc/C_elegans/ andwww.sanger.ac.uk/Projects/C_elegans/ for a list of authors. Address correspondence to The Washington University Genome Sequencing Center, Box 8501, 4444 Forest Park Parkway, St. Louis, MO 63108, USA. E-mail: worm{at}watson.wustl.edu; or The Sanger Centre, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK. E-mail: worm{at}sanger.ac.uk


Stay Connected to Science

Navigate This Article