Special Research Articles

The Genome of the Sea Urchin Strongylocentrotus purpuratus

+ See all authors and affiliations

Science  10 Nov 2006:
Vol. 314, Issue 5801, pp. 941-952
DOI: 10.1126/science.1133609

This article has a correction. Please see:


We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.

The genome of the sea urchin was sequenced primarily because of the remarkable usefulness of the echinoderm embryo as a research model system for modern molecular, evolutionary, and cell biology. The sea urchin is the first animal with a sequenced genome that (i) is a free-living, motile marine invertebrate; (ii) has a bilaterally organized embryo but a radial adult body plan; (iii) has the endoskeleton and water vascular system found only in echinoderms; and (iv) has a nonadaptive immune system that is unique in the enormous complexity of its receptor repertoire. Sea urchins are remarkably long-lived with life spans of Strongylocentrotid species extending to over a century [see supporting online material (SOM)] and highly fecund, producing millions of gametes each year; and Strongylocentrotus purpuratus is a pivotal component of subtidal marine ecology and an important fishery catch in several areas of the world, including the United States. Although a research model in developmental biology for a century and a half, for most of that time, few were aware of one of the most important characteristics of sea urchins, a character that directly enhances its significance for genomic analysis: Echinoderms (and their sister phylum, the hemichordates) are the closest known relatives of the chordates (Fig. 1 and SOM). A description of the echinoderm body plan, as well as aspects of the life-style, longevity, polymorphic gene pool, and characteristics that make the sea urchin so valuable as a research organism, are presented in the SOM.

Fig. 1.

The phylogenetic position of the sea urchin relative to other model systems and humans. The chordates are shown on the darker blue background overlapping the deuterostomes as a whole on a lighter blue background. Organisms for which genome projects have been initiated or finished are shown across the top.

The last common ancestors of the deuterostomal groups at the branch points shown in Fig. 1 are of Precambrian antiquity [>540 million years ago (Ma)], according to protein molecular phylogeny. Stem group echinoderms appear in the Lower Cambrian fossil assemblages dating to 520 Ma. Cambrian echinoderms came in many distinct forms, but from their first appearance, the fossil record illustrates certain distinctive features that are still present: their water vascular system, including rows of tube feet protruding through holes in the ambulacral grooves and their calcite endoskeleton (mainly, a certain form of CaCO3), which displays the specific three-dimensional structure known as “stereom.” The species sequenced, Strongylocentrotus purpuratus, commonly known as the “California purple sea urchin” is a representative of the thin-spined “modern” group of regularly developing sea urchins (euechinoids). These evolved to become the dominant echinoid form after the great Permian-Triassic extinction 250 million years ago.

We present here a description of the S. purpuratus genome and gene products. The genome provides a wealth of discoveries about the biology of the sea urchin, Echinodermata, and the deuterostomes. Among the key findings are the following:

  1. The sea urchin is estimated to have 23,300 genes with representatives of nearly all vertebrate gene families, although often the families are not as large as in vertebrates.

  2. Some genes thought to be vertebrate-specific were found in the sea urchin (deuterostome-specific); others were identified in sea urchin but not the chordate lineage, which suggests loss in the vertebrates.

  3. Expansion of some gene families occurred apparently independently in the sea urchin and vertebrates.

  4. The sea urchin has a diverse and sophisticated immune system mediated by an astonishingly large repertoire of innate pathogen recognition proteins.

  5. An extensive defensome was identified.

  6. The sea urchin has orthologs of genes associated with vision, hearing, balance, and chemosensation in vertebrates, which suggests hitherto unknown sensory capabilities.

  7. Distinct genes for biomineralization exist in the sea urchin and vertebrates.

  8. Orthologs of many human disease–associated genes were found in the sea urchin.

Sequencing and Annotation of the S. purpuratus Genome

Sequencing and assembly. Sperm from a single male was used to prepare DNA for all libraries (tables S1 and S2) and whole-genome shotgun (WGS) sequencing. The overall approach was based on the “combined strategy” used for the rat genome (1), where WGS sequencing to six times coverage was combined with two times sequence coverage of BAC clones from a minimal tiling path (MTP) (fig. S1). The use of BACs provided a framework for localizing the assembly process, which aided in the assembly of repeated sequences and solved problems associated with the high heterozygosity of the sea urchin genome, without our resorting to extremely high coverage sequencing.

Several different assemblies were produced during the course of the project (see SOM for details). The Sea Urchin Genome Project (SUGP) was the first to produce both intermediate WGS assemblies and a final combined assembly. This was especially useful, not only for the early availability of an assembly for analysis, but also because WGS contigs were used to fill gaps between BACs in the combined assembly. The pure WGS assembly was produced (v 0.5 GenBank accession number range AAGJ01000001 to AAGJ01320773; also referred to as NCBI build 1.1) and released in April 2005. The final combined BAC-WGS assembly was released in July 2006 as version (v) 2.1 and submitted to GenBank (accession number range AAGJ02000001 to AAGJ02220581).

A second innovation in the SUGP was the use of the clone-array pooled shotgun sequencing (CAPSS) strategy (2) for BAC sequencing (fig. S2). The MTP consisted of 8248 BACs, and rather than prepare separate random libraries from each of these, the CAPSS strategy involved BAC shotgun sequencing from pools of clones and then deconvoluting the reads to the individual BACs. This allowed the BAC sequencing to be performed in 1/5th the time and at 1/10th the cost.

The principal new challenge in the SUGP was the high heterozygosity in the outbred animal that was sequenced. It was known that single-copy DNA in the sea urchin varied by as much as 4 to 5% [single nucleotide polymorphism (SNP) plus insertion/deletion (indel)], which is much greater than human (∼0.5%) (3). Moreover, alignment of WGS reads to the early v 0.1 WGS assembly revealed at least one SNP per 100 bases, as well as a comparable frequency of indel variants. This average frequency of a mismatch per 50 bases or higher prevented merging by the assembly module in Atlas, the Phrap assembler, and also made it difficult to determine if reads were from duplicated but diverged sections of the genome or heterozygous homologs. This challenge was met by adding components to Atlas to handle local regions of heterozygosity and to take advantage of the BAC data, because each BAC sequence represented a single haplotype (see SOM). High heterozygosity has been seen in the past with the Ciona genomes (4, 5) and is likely to be the norm in the future as fewer inbred organisms are sequenced. Moreover, the CAPSS approach makes BAC sequencing more manageable for large genomes. Thus, the sea urchin project may serve as a paradigm for future difficult endeavors.

Combining the BAC-derived sequence with the WGS sequence generated a high-quality draft with 4 to 5% redundancy that covered more than 90% of the genome while sequencing to a level of 8× base coverage (table S2). The assembly size of 814 Mb is in good agreement with the previous estimate of genome size, 800 Mb ± 5% (6). The assembly is a mosaic of the two haplotypes, but it was possible to determine the phase of the BACs on the basis of how many mismatches neighboring BACs had in their overlap regions. This information will be used to create a future version of the genome in which the individual haplotypes are resolved.

Gene predictions. The v 0.5 WGS assembly displayed sufficient sequence continuity (a contig N50 of 9.1 kb) and higher-order organization (a scaffold N50 of 65.6 kb) to allow gene predictions to be produced and the annotation process to begin even while the BAC component was being sequenced. We generated an official gene set (OGS), consisting of ∼28,900 gene models, by merging four different sets of gene predictions with the GLEAN program (7) (see SOM for details). One of these gene sets, produced from the Ensembl gene prediction software, was created for both v 0.5 and v 2.0 assemblies.

To estimate the number of genes in the S. purpuratus genome, we began with the 28,900 gene models in the OGS and reduced this by the 5% redundancy found by mapping to the v 2.0 assembly, then increased it by a few percent for the new genes observed in the Ensembl set from the v 2.0 assembly compared with v 0.5. From manual analysis of well-characterized gene sets (e.g., ciliary, cell cycle control, and RNA metabolism genes), we estimated that, in addition to redundancy, another 25% of the genes in the OGS were fragments, pseudogenes, or otherwise not valid. Finally, whole-genome tiling microarray analysis (see below) showed 10% of the transcriptionally active regions (long open reading frames, not small RNAs) were not represented by genes in the OGS. Taken together, this analysis gave an estimate of about 23,300 genes for S. purpuratus. Information on all annotated genes can be found at (8).

The overall trends in gene structure were similar to those seen in the human genome. The statistics of the Ensembl predictions from the WGS assembly revealed an average of 8.3 exons and 7.3 introns per transcript (see SOM). The average gene length was 7.7 kb with an average primary transcript length of 8.9 kb. A broad distribution of all exon lengths peaked at around 100 to 115 nucleotides, whereas that for introns at around 750 nucleotides. The smaller average intron size relative to humans' was consistent with the trend that intron size is correlated with genome size.

Annotation process. Manual annotation and analysis of the OGS was performed by a group of over 200 international volunteers, primarily from the sea urchin research community. To facilitate and to centralize the annotation efforts, an annotation database and a shared Web browser, Genboree (9), were established at the BCM-HGSC. These tools enabled integrated and collaborative analysis of both precomputed and experimental information (see SOM). A variety of precomputed information for each predicted gene model was made available to the annotators in the browser, including expressed sequence tag (EST) data, the four unmerged gene prediction sets, and transcription data from whole-genome tiling microarray with embryonic RNA (see below) (10). Additional resources available to the community are listed in table S4.

Over 9000 gene models were manually curated by the consortium with 159 novel models (gene models not represented in the OGS) added to the official set. If we assume no bias in the curated gene models, the number of novel models added may imply that the official set contains >98% of the protein-coding genes.

Genome features. A window on the genetic landscape is scaffold-centric in S. purpuratus, because linkage and cytogenetic maps are not available. The 36.9% GC content of the genome is uniformly low because assessment of the average GC content by domains is consistent (36.8%), and the distribution is tight (see SOM). Genes from the OGS show no tendency to occupy regions of higher- or lower-than-average GC content. In fact, nearly all genes lie in regions of 35 to 39% GC.

The Echinoderm Genome in the Context of Metazoan Evolution

The sea urchin genetic tool kit lends evolutionary perspective to the gene catalogs that characterize the superclades of the bilaterian animals. The distribution of highly conserved protein domains and sequence motifs provides a view of the expansion and contraction of gene families, as well as an insight into changes in protein function. Examples are enumerated in Table 1, which presents a global overview of gene variety obtained by comparing sequences identified in Interpro, and Table 2, which shows the distribution of specific Pfam database domains associated with selected aspects of cell physiology, including sequences identified in the cnidarian Nematostella vectensis (11). The Interpro data suggest that about one-third of the 50 most prevalent domains in the sea urchin gene models are not in the 50 most abundant families in the other representative genomes (mouse, tunicate, fruit fly, and nematode), and thus, they constitute expansions that are specific at least to sea urchins, if not to the complex of echinoderms and hemichordates. Two of the most abundant domains make up 3% of the total and mark genes that are involved in the innate immune response. Others define proteins associated with apoptosis and cell death regulation, as well as proteins that serve as downstream effectors in the Toll–interleukin 1 (IL-1) receptor (TIR) cascade. The quinoprotein amine dehydrogenase domain seen in the sea urchin set is 10 times as abundant as in other representative genomes and may be used in the systems of quinone-containing pigments known to occur in these marine animals. The large number of nucleosomal histone domains found agrees with the long-established sea urchin–specific expansion of histone genes. In summary, the distribution of proteins among these conserved families shows the trend of expansion and shrinkage of the preexisting protein families, rather than frequent gene innovation or loss. Gene family sizes in the sea urchin are more closely correlated with what is seen in deuterostomes than what is seen in the protostomes.

Table 1.

Unique aspects of gene family distribution in sea urchin: Selected examples of the frequency of Interpro domains in the proteome of selected species. ID is the identification number used in the INTERPRO database; the second column shows the name given to the domain or motif family in the database. Species abbreviations: Sp, Strongylocentrotus purpuratus; Mm, Mus musculus; Ci, Ciona intestinalis; Dm, Drosophila melanogaster; Ce, Caenorhabditis elegans.

IDNameSpecies, total number (percentage of total matches)
IPR001190 Speract/scavenger receptor 361 (1.79) 14 (0.08) 1 (0.01) 2 (0.02) 0 (0.00)
IPR000157 TIR 248 (1.23) 22 (0.12) 9 (0.09) 9 (0.09) 2 (0.02)
IPR011029 DEATH-like 172 (0.85) 8 (0.05) 19 (0.18) 5 (0.05) 1 (0.01)
IPR007111 NACHT nucleoside triphosphatase 135 (0.67) 16 (0.09) 28 (0.27) 0 (0.00) 0 (0.00)
IPR011044 Quinoprotein amine dehydrogenase, β chain—like 122 (0.60) 7 (0.04) 15 (0.15) 5 (0.05) 6 (0.05)
IPR000558 Histone H2B 110 (0.54) 14 (0.08) 2 (0.02) 100 (1.00) 17 (0.13)
IPR001951 Histone H4 93 (0.46) 7 (0.04) 0 (0.00) 101 (1.01) 16 (0.12)
IPR002119 Histone H2A 87 (0.43) 24 (0.14) 2 (0.02) 104 (1.04) 19 (0.14)
IPR008042 Retrotransposon, Pao 76 (0.38) 0 (0.00) 0 (0.00) 0 (0.00) 6 (0.05)
IPR000164 Histone H3 72 (0.36) 17 (0.10) 5 (0.05) 103 (1.03) 22 (0.17)
Table 2.

Distribution among sequenced animal genomes of various Pfam domains associated with selected aspects of eukaryotic cell physiology. In S. purpuratus, the number of annotated genes is listed; the number in parentheses is the total number of models (including ones that were not annotated) predicted to contain the Pfam domain. For Nematostella veciensis (Nv), numbers were obtained by searching Stellabase (11).

Of equal interest are the sorts of proteins not found in sea urchins. The sea urchin gene set shares with other bilaterian gene models about 4000 domains, whereas 1375 domains from other bilaterian genomes are not found in the sea urchin set. In agreement with the lack of morphological evidence of gap junctions in sea urchins, there are no gap junction proteins (connexins, pannexins, and innexins). Also missing are several protein domains unique to insects, such as insect cuticle protein, chitin-binding protein, and several pheromone- or odorant-binding proteins, as well as a vertebrate invention—the Krüppel-associated box or KRAB domain, a repressor domain in zinc finger transcription factors (12). Finally, searches for specific subfamilies of G protein–coupled receptors (GPCRs) that are known as chemosensory and/or odorant receptors in distinct bilaterian phyla failed to detect clear representatives in the sea urchin genome. However, this failure more likely reflects the independent evolution of these receptors, rather than a lack of chemoreceptive molecules, because the sea urchin genome encodes close to 900 GPCRs of the same superfamily (rhodopsin-type GPCRs), several of which are expressed in sensory structures (13). A conservative way to compare gene sets is to count the strict orthologs that give reciprocal BLAST matches. Genes that are genuine orthologs are likely to yield each other as a best hit. Comparison of sea urchin, fruit fly, nematode, ascidian, mouse, and human gene sets (Fig. 2) indicates that the greatest number of reciprocal best matches is observed between mouse and human, which reflects their close relation. The numbers of presumed orthologous genes between the ascidian and the two mammals are about equal, but are less than the number counted between these species and the sea urchin. The difference is consistent with the lower gene number and reduced genome size in the urochordates (4).

Fig. 2.

Orthologs among the Bilateria. The number of 1:1 orthologs captured by BLAST alignments at a match value of e = 1 × 10–6 in comparisons of sequenced genomes among the Bilateria. The number of orthologs is indicated in the boxes along the arrows, and the total number of International Protein Index database sequences is shown under the species symbol. Hs, Homo sapiens; Mm, Mus musculus; Ci, Ciona intestinalis; Sp, S. purpuratus; Dm, Drosophila melanogaster; Ce, Caenorhabditis elegans.

The number of reciprocal pairs for sea urchin and mouse is about 1.5 times the matches between proteins in sea urchin and fruit fly. The number of nematode proteins matching either sea urchin or fruit fly is even lower. This is likely the result of the more rapid sequence changes in the nematode compared with the other species used in this analysis. More than 75% of the genes that are shared by sea urchin and fruit fly are also shared between sea urchin and mouse. Thus, these genes constitute a set of genes common to the bilaterians, whereas the additional sea urchin–mouse pairs are unique to the deuterostomes.

The sea urchin genome consequently provides evidence for the now extremely robust concept of the deuterostome superclade. A 1908 concept that originated in the form of embryos of dissimilar species (14) is demonstrated by genomic comparisons.

Developmental Genomics

In the 1980s, the sea urchin embryo became the focus of cis-regulatory analyses of embryonic gene expression, and there was a great expansion of molecular explorations of the developmental cell biology, signaling interactions, and regulatory control systems of the embryo. Analysis of the entire genome facilitated the first large-scale correlation of the gene regulatory network for development, which represents the genomic control circuitry for specification of the endoderm and mesoderm of this embryo (1517) with the encoded potential of the sea urchin.

The embryo transcriptome and regulome. Because of indirect development in the sea urchin, embryogenesis is cleanly separated from adult body plan formation, in developmental process and in time, and therefore, it is possible to estimate the genetic repertoire specifically required for formation of a simple embryo (10). Pooled mRNA preparations from four stages of development, up to the mid-late gastrula stage (48 hours), were hybridized with a whole-genome tiling array. Expression of about 12,000 to 13,000 genes, as conservatively assessed, was seen during this early period, indicating that ∼52% of the entire protein-coding capacity of the sea urchin genome is expressed during development to the mid-late gastrula stage. An additional set of microarray experiments extended the interrogation of embryonic expression to the 3-day pluteus larva stage (see SOM) (18).

The DNA binding domains of transcription factor families are conserved across the Bilateria, and these protein domain motifs were used to extract the sea urchin homologs (see SOM). For each identified gene, if data were not already available, probes were built from the genome sequence and used to measure transcript concentration by quantitative polymerase chain reaction with a time series of embryo mRNAs, as well as to determine spatial expression by whole-mount in situ hybridization.

All bilaterian transcription factor families were represented in the sea urchin with a few rare exceptions (see below), so the sea urchin data strongly substantiate the concept of a panbilaterian regulatory tool kit (19) or “regulome.” We found that 80% of the whole sea urchin regulome (except the zinc finger genes) was expressed by 48 hours of embryogenesis (20), an even greater genetic investment than the 52% total gene use in the same embryo.

Signal transduction pathways. More than 1200 genes involved in signal transduction were identified. Comparative analysis highlights include the protein kinases that mediate the majority of signaling and coordination of complex pathways in eukaryotes. The S. purpuratus genome has 353 protein kinases, intermediate between the core vertebrate set of 510 and the fruit fly and nematode conserved sets of ∼230. Fine-scale classification and comparison with annotated kinomes (21, 22) reveals a remarkable parsimony. Indeed, with only 68% of the total number of human kinases, the sea urchin has members of 97% of the human kinase subfamilies, lacking just four of those subfamilies (Axl, FastK, H11, and NKF3), whereas Drosophila lacks 20 and nematodes 32 (Fig. 3) (23). Most sea urchin kinase subfamilies have just a single member, although many are expanded in vertebrates; thus, the sea urchin kinome is largely nonredundant. The sea urchin therefore possesses a kinase diversity surprisingly comparable to that of vertebrates without the complexity. A small number of kinases were more similar to insect than to vertebrate homologs (including the Titin homolog Projection, the Syk-like tyrosine kinase Shark, and several guanylate cyclases), which indicated for the first time the loss of kinase classes in vertebrates (23). Expression profiling showed that 87% of the signaling kinases and 80% of the 91 phosphatases were expressed in the embryo (23, 24), which emphasized the importance of signaling pathways in embryonic development.

Fig. 3.

Protein kinase evolution: Invention and loss of protein kinase subfamilies in metazoan lineages. Deuterostomes share 9 protein kinase subfamilies absent from C. elegans and Drosophila, and the sea urchin has not lost any of the 158 metazoan primordial kinase classes, unlike insects or nematodes. [From (23)]

The small guanosine triphosphatases (GTPases) function as molecular switches in signal transduction, nuclear import and export, lipid metabolism, and vesicle docking. Vertebrate GTPase families were expanded after their divergence from echinoderms, in part by whole-genome duplications (2527). The sea urchin genome did not undergo a whole-genome duplication, yet phylogenies for four Ras GTPase families (Ras, Rho, Rab, and Arf) revealed that local gene duplications occurred (Fig. 4), which ultimately resulted in a comparable number of monomeric GTPases in the human and sea urchin genomes (28). Thus, expansion of each family in vertebrates and echinoderms was achieved by distinct mechanisms (gene-specific versus whole genome duplication). More than 90% of the small GTPases are expressed during sea urchin embryogenesis, which suggests that the complexity of signaling through GTPases is comparable between sea urchins and vertebrates.

Fig. 4.

Partial phylogenies of the Rho (A) and the Rab families (B) of small GTPases. The pink boxes highlight gene-specific duplications that increased sea urchin GTPase numbers, resulting in a complexity comparable to vertebrates. Numbers at each junction represent confidence values obtained via three independent phylogenetic methods [neighbor-joining (green), maximum parsimony (blue), and Bayesian (black)]; red stars indicate nodes retained by maximum likelihood. [From (28)]

The Wnt family of secreted signaling molecules plays a central role in specification and patterning during embryonic development. Phylogenetic analyses from cnidarian to human indicate that of the 13 known Wnt subfamilies, S. purpuratus has 11, missing Wnt2 and Wnt11 homologs (Fig. 5). S. purpuratus has WntA, previously reported as being absent from deuterostomes (29). Of 126 genes described as components of the Wnt signal transduction machinery, homologs of ∼90% were present in the sea urchin genome, which indicates a high level of conservation of all three Wnt pathways (30). However, of 94 Wnt transcriptional target genes reported in the literature, mostly from vertebrates (31), only 53% were found with high confidence in the sea urchin genome (Fig. 6). The absent Wnt targets include vertebrate adhesion molecules, which were frequently missing from the sea urchin genome (32), as well as signaling receptors, which are more divergent and thus more difficult to identify. In contrast, most transcription factor targets of the Wnt pathway are present in the genome, which reflects a higher degree of conservation of transcription factor families (20). Taken together, the genomic analysis of signal transduction components indicates that sea urchins have signaling machinery strikingly comparable to that of vertebrates, often without the complexity that arises from genetic redundancy.

Fig. 5.

Survey of the Wnt family of secreted signaling molecules in selected metazoans. Each square indicates a single Wnt gene identified either through genome analyses or independent studies, and squares with a question mark indicate uncertainty of the orthology. Letter X's represent absence of members of that subfamily in the corresponding annotated genome; empty spaces have been left for species for which genomic databases are not yet available. [From (30)]

Fig. 6.

Presence of Wnt signaling machinery components (A) and target genes (B) in the S. purpuratus genome. (A) The 126 genes involved in the transduction of the Wnt signals have been separated into four categories from the extracellular compartment to the nucleus. Sea urchin homologs are identified by the lighter shade (indicated by both the number and the percentage of homologs that were identified within the chart); the total number of known genes is indicated in the chart legend. (B) The 93 reported Wnt targets have been divided into three categories: signaling molecules, transcription factors, and cell adhesion molecules. Colors and numbers are as in (A).

Sea Urchin Biology

Analysis of the genome allows understanding of parts of the organism that have not been well studied. Several examples of this follow with further details in the SOM. Additional areas such as intermediary metabolism, metalloproteases, ciliary structure, fertilization, and germline specification are presented in the SOM.

Defense Systems

The need to deal with physical, chemical, and biological challenges in the environment underlies the evolution of an array of defense gene families and pathways. One set of protective mechanisms involves the immune system, which responds to biotic stressors such as pathogens. A second group of genes comprises a chemical “defensome,” a network of stress-sensing transcription factors and defense proteins that transform and eliminate many potentially toxic chemicals.

The sea urchin immune system. The sea urchin has a greatly expanded innate immunity repertoire compared with any other animal studied to date (table S5). Three classes of innate receptor proteins are particularly increased (Fig. 7). These make up a vast family of Toll-like receptors (TLRs), a similarly large family of genes that encode NACHT and leucine-rich repeat (LRR)–containing proteins (NLRs), and a set of genes encoding multiple scavenger receptor cysteine-rich (SRCR) domain proteins of a class highly expressed in the sea urchin immune cells or coelomocytes (33, 34). Receptors from each of these families participate in immunity by recognizing nonself molecules that are conserved in pathogens or by responding to self molecules that indicate the presence of infection (35). In contrast, homologs of signal transduction proteins and nuclear factor kappa B (NFκB)/Rel domain transcription factors that are known to function further downstream of these genes were present in numbers similar to those in other invertebrate species. One of the more unexpected findings from our analysis of sea urchin immune genes was the identification of a Rag1/2-like gene cluster (36). The presence of this cluster, along with other recent findings (37), suggested the possibility that these genes had been part of animal genomes for longer than previously considered. Further analysis of the genomic insights into the innate immune system and the underpinnings of vertebrate adaptive immunity can be found in a review in this issue (38).

Fig. 7.

Gene families encoding important innate immune receptors and complement factors in animals with sequenced genomes. For some key receptor classes, gene numbers in the sea urchin exceeds other animals by more than an order of magnitude. Representative animals include H.s., Homo sapiens; C.i., Ciona intestinalis; S.p. Strongylocentrotus purpuratus; D.m. Drosophila melanogaster; and C.e. Caenorhabditis elegans. Indicated gene families include TLR, toll-like receptors; NLR, NACHT and leucine-rich repeat (LRR) domain–containing proteins similar to the vertebrate Nod/NALP genes; SRCR, Scavenger receptor cysteine-rich domain genes; PGRP, peptidoglycan recognition protein domain genes; and GNBP, Gram-negative binding proteins. C3/4/5, thioester proteins homologous to vertebrate C3, C4, and C5; Bf/C2, complement factors homologous to vertebrate C2 and factor B; C1q/MBP, homologs of vertebrate lectin pathway receptors; and Terminal pathway, homologs of vertebrate C6, C7, C8, and C9. SRCR gene statistics are given as domain number/gene number for multiple SRCR-containing proteins (numbers for C. intestinalis includes all SRCR proteins). Asterisk in the D. melanogaster C3/4/5 column is meant to denote the presence of related thioester genes (TEPs) and a true C3/4/5 homolog from another arthropod. +/– for C. intestinalis Terminal pathway column indicates the presence of genes with similarity to C6 only (Nonaka and Yoshizaki 2004). Phylogenetic relations among species are indicated by a cladogram at the left.

The complement system. The complement system of vertebrates is a complex array of soluble serum proteins and cellular receptors arranged into three activation pathways (classical, lectin, and alternative) that converge and activate the terminal or lytic pathway. This system opsonizes pathogenic cells for phagocytosis and sometimes activates the terminal pathway, which leads to pathogen destruction. An invertebrate complement system was first identified in the sea urchin [for reviews, see (39, 40)], and the analysis of the genome sequence presented a more complete picture of this important immune effector system. In chordates, collectins initiate the lectin cascade through members of the mannose-binding protein (MBP)–associated protease (MASP)/C1r/C1s family. Several genes encoding collectins, C1q and MBP, have been predicted (39) and were present in the genome; however, members of the MASP/C1r/C1s family were not identified. There was no evidence for the classical pathway, which links the complement cascade with immunoglobulin recognition in jawed vertebrates. The alternative pathway is initiated by members of the thioester protein family, which, in the sea urchin, was somewhat expanded with four genes. Two of the thioester proteins, SpC3 and SpC3-2, are known to be expressed, respectively, in coelomocytes and in embryos and larvae. Furthermore, there were three homologs of factor B, the second member of the alternative pathway (41).

The terminal complement pathway in vertebrates acts to destroy pathogens or pathogen-infected cells with large pores called membrane attack complexes (MACs). Twenty-eight gene models were identified that encode MAC-perforin domains, but none of these had the additional domains expected for terminal complement factors (C6 through C9). Instead, these are members of a novel and very interesting gene family with perforin-like structure. In vertebrates, perforins carry out cell-killing functions by cytotoxic lymphocytes through the formation of small pores in the cell membranes. If the complement system in the sea urchin functions through multiple lectin and alternative pathways in the absence of the lytic functions of the terminal pathway, the major activity of this system is expected to be opsonization.

Homologs of immune regulatory proteins. Cytokines are key regulators of intercellular communication involving immune cells, acting to coordinate vertebrate immune systems. Genes encoding cytokines and their receptors often evolve at a rapid pace, and most families are known only from vertebrate systems. Although members of many cytokine, chemokine, and receptor families were not identified in the sea urchin genome, a number of important immune signaling homologs were present. These included members of the tumor necrosis factor (TNF) ligand and receptor superfamilies, an IL-1 receptor and accessory proteins, two IL-17 receptor–like genes and 30 IL-17 family ligands, and nine macrophage inhibitory factor (MIF)–like genes. Receptor tyrosine kinases (RTKs) included those that bind important growth factors that regulate cell proliferation in vertebrate hematopoietic systems. Of particular note, from the sea urchin genome, were two vascular endothelial growth factor (VEGF) receptor–like genes and a Tie1/2 receptor, all of which were expressed in adult coelomocytes. Many of these genes are homologs of important inflammatory regulators and growth factors in higher vertebrates, and these sea urchin homologs may have similar functions in regulating coelomocyte differentiation and recruitment.

Representatives of nearly all subclasses of important vertebrate hematopoietic and immune transcription factors were present in the sea urchin genome. Notably, the genome contained homologs of immune transcription factors that had not been identified previously outside of chordates, including PU.1/SpiB/SpiC, a member of the Ets subfamily, and a zinc finger gene with similarity to the Ikaros subfamily. Transcript prevalence measurements showed that PU.1, the Ikaros-like gene and homologs of Gata1/2/3, E2A/HEB/ITF2, and Stem Cell Leukemia (SCL) were all expressed at substantial levels in coelomocytes (41). This was consistent with the presence of conserved mechanisms of regulating gene expression among sea urchin coelomocytes and vertebrate blood cells.

ABC transporters. Many chemicals are removed from cells by efflux proteins known as ATP-binding cassette (ABC) or multidrug efflux transporters. S. purpuratus has 65 ABC transporter genes in the eight major subfamilies of these genes [ABC A to H; (42)]. The ABCC family of multidrug transporters is about 25% larger than in other deuterostome genomes with at least 30 genes in this family (nearly half of the sea urchin ABC transporters), and 25 of these 30 genes showed substantial mRNA expression in eggs, embryos, or larvae. Much of the expansion is in the Sp-ABCC5 and Sp-ABCC9 families, whereas orthologs of the vertebrate gene ABCC2 (also called MRP2) are absent. Because the ABCC family is known to generally transport more hydrophilic compounds than other transporter families, such as the ABCB genes, sea urchins may have increased need for transport of these compounds. ABCC efflux activity has been described in sea urchin embryos and, consistent with the genomic expansion of the ABCC family, the major activity in early embryos ensues from an ABCC-like efflux mechanism.

Cytochrome P-450 monooxygenase (CYP). Enzymes in the CYP1, CYP2, CYP3, and CYP4 families carry out oxidative biotransformation of chemicals to more hydrophilic products. The sea urchin has 120 CYP genes, and those related to CYP gene families 1 to 4 constitute 80% of the total, which suggests that there has been selective pressure to expand functionality in these gene families (42). Eleven CYP1-like genes are present in the sea urchin genome, more than twice the number in chordates. CYP2-like and CYP3-like genes are also present at greater numbers than in other deuterostomes. In addition to the CYPs in families 1 to 4, the sea urchin genome contains homologs of proteins involved in developmental patterning (CYP26), cholesterol synthesis (CYP51), and metabolism (CYP27, CYP46). Homologs of some CYPs with endogenous functions in vertebrates were not found; however, (CYP19, androgen aromatase; CYP8, prostacyclin synthase; CYP11, pregnenolone synthase; CYP7, cholesterol-7α-hydroxylase). These CYP genes in concert with additional expanded defensive gene families represent a large diversification of defense gene families by the sea urchin relative to mammals (42).

Oxidative defense and metal-complexing proteins. The metal-complexing proteins include three metallothionein genes and three homologs of phytochelatin synthase genes. Genes for antioxidant proteins include three superoxide dismutase (SOD) genes and a gene encoding ovoperoxidase (an unusual peroxidase with SOD-like activity), along with one catalase, four glutathione peroxidase, and at least three thioredoxin peroxidase genes. Reactive oxygen detoxification genes may be important in conferring the long life-span of sea urchins, because oxidative damage is thought to be a major factor in aging.

Diversity and conservation in xenobiotic signaling. The diversity of genes encoding xenobiotic-sensing transcription factors that regulate biotransformation enzymes and transporters was similar to other invertebrate genomes, but in most cases lower than vertebrates. For example, the sea urchin genome encoded a single predicted CNC-bZIP protein homologous to the four human CNC-bZIP proteins involved in the response to oxidative stress. There were two sea urchin homologs of the aryl hydrocarbon receptor (AHR), which in vertebrates mediates the transcriptional response to polynuclear and halogenated aromatic hydrocarbons and, in both protostomes and deuterostomes, also regulates specific developmental processes (4345). One of the sea urchin AHR homologs was more closely related to the vertebrate AHR; the other shared greatest sequence identity with the Drosophila AHR homolog spineless. Sea urchins also had two genes encoding hypoxia-inducible factors (HIFα subunits), which regulate adaptive responses to hypoxia, and a gene encoding ARNT, a PAS protein that is a dimerization partner for both AHRs and HIFs.

Strongylocentrotus purpuratus has 32 nuclear receptor (NR) genes (20), two-thirds the number in humans, including several with potential roles in chemical defense (42). The sea urchin genome also contains two peroxisome proliferator–activated receptor (PPAR, NR1C) homologs and an NR1H gene coorthologous to both liver X receptor (LXR) and farnesoid X receptor (FXR) (42). Genes homologous to the vertebrate xenobiotic sensor NR1I genes [pregnane X receptor, PXR; constitutive androstane receptor, CAR (46)] are absent, although three NR1H-related genes were found, which possibly form a new subfamily of genes involved in xenobiotic sensing.

Many of the defense genes are expressed during development (10, 42), which suggests that they have dual roles in chemical defense and in developmental signaling. In several cases (CYPs, AHR, NF-E2), the evolution of pathways for chemical defense may have involved recruitment from developmental signaling pathways (42).

Nervous System

The echinoderm nervous system is the least well studied of all the major metazoan phyla. For a number of technical reasons, the structure and function of echinoderm nerves have been neglected. Analysis of the sea urchin genome has enabled an unprecedented glimpse into the neural and sensory functions and has revealed several novel molecular approaches to the study of echinoderm nervous systems (Table 3).

Table 3.

Genomic insights into sea urchin neurobiology.

Neural processRevelations from the genomeGenes
Neural development Neurogenic ectoderm is specified in early embryonic development. Sp-Achaete-scute, Sp-homeobrain, Sp-Rx (retinal anterior homeobox), Sp-Zic2
Synapse structure and function Echinoderm synapses are structurally unusual, despite the presence of many genes encoding proteins involved in synapse function. Sp-Neurolignin, Sp-neurexin, Sp-agrin, Sp-MUSK, Sp-thrombospondin, Sp-Rim2, Sp-Rab3, exocyst complex, Snares, SM, synaptotagmins
Electrical signaling and coupling Neurons have ion channel proteins, but lack electrical coupling via gap junctions. Voltage-gated K+, Ca2+, and Na+ channels, but no connexins or pannexins/innexins
Neurotransmitter/neuromodulatory diversity Neurons use the same neurotransmitters as vertebrates, but lack melatonin and adrenalin. Enzymes involved in synthesis, transport, reception, and hydrolysis of serotonin, dopamine, noradrenaline, γ-aminobutryic acid (GABA), histamine, acetycholine, glycine, and nitric oxide
GPCR signaling Identification of GPCRs that are unique to chordates and identification of expanded GPCR families. Orthologs of vertebrate cannabinoid, lysophospholipid, and melanocortin receptors are absent; 162 secretin receptor-like genes
Peptide signaling G protein—coupled peptide receptors indicate diversity in peptide signaling systems, but only a few sea urchin neuropeptides or peptide hormones identified. 37 G protein—coupled peptide receptors. Precursors for SALMFamides, NGFFFamide, and a vasotocin-like peptide
Neurotrophins Neurotrophins and neurotrophin receptors are not unique to chordates. Sp-Neurotrophin, Sp-Trk, Sp-p75NTR, ependymins
Insulin and IGFs More similar to vertebrate forms than invertebrate insulin-like molecules. Sp-IGF1, SpIGF2
Chemosensory functions A large family of predicted chemoreceptor genes, some expressed in tube feet or pedicellariae, indicates a complex chemosensory system. Over 600 genes encoding putative G protein—coupled chemoreceptors, many tandemly repeated and lacking introns
Photoreception functions Genes associated with photoreception are expressed in tube feet. Photorhodopsins, Sp-Pax6, retinal transcription factors
Mechanosensory functions Orthologs of vertebrate mechanosensory genes are present. Sp-Usherin, Sp-VLGR-1, Sp-cadherins, Sp-myosin 7, Sp-myosin 15, Sp-harmonin, Sp-whirlin, Sp-NBC, Sp-TrpA1

The nervous systems of echinoderm larvae and adults are dispersed, but they are not simple nerve nets. This organization differs from both vertebrates, which do not have a dispersed nervous system, and hemichordates, which do have nerve nets (47). Adult sea urchins have thousands of appendages, each with sensory neurons, ganglia, and motor neurons arranged in local reflex arcs. These peripheral appendages are connected to each other and to radial nerves, which provide overall control and coordination (47, 48).

Nearly all of the genes encoding known neurogenic transcription factors are present in the sea urchin genome, and several are expressed in neurogenic domains before gastrulation, which indicates that they may operate near the top of a conserved neural gene regulatory network (47). Axon guidance molecules known from other metazoans are also expressed in the developing embryo. Unexpectedly, genes encoding the neurotrophin-Trk receptor system that were thought to be vertebrate-specific because they were not found in Ciona, are present in sea urchin, which suggests a deuterostome origin and a potential loss in urochordates.

The genes required to construct neurons and to transmit signals are present, but the repertoire of neural genes and the initial characterization of expression of a number of them led to unexpected and surprising conclusions. There appear to be no genes encoding gap junction proteins, which suggests that communication among neurons depends on chemical synapses without ionic coupling. The repertoire of sea urchin neurotransmitters is large, but melatonin and adrenalin are lacking, as they are in ascidians (4, 47). Cannabinoid, lysophospholipid, and melanocortin receptors are not present in urchins, but orthologs were found in ascidians (4, 47). In contrast, some sets of genes thought to be chordate-specific have sea urchin orthologs, for example, insulin and insulin-like growth factors (IGFs) that are more similar to their chordate counterparts than those of other invertebrates (47). Overall, the genome contains representatives of all five large superfamilies of GPCRs, including those that mediate signals from neuropeptides and peptide hormones. Both the secretin and rhodopsin superfamilies display marked lineage-specific expansions (13, 47).

Sensory systems. There were 200 to 700 putative chemosensory genes that formed large clusters and lacked introns, which are features of chemosensory genes in vertebrates, but not in Caenorhabditis elegans and Drosophila melanogaster. Many of these genes encoded amino acid motifs that were characteristic of vertebrate chemosensory and odorant receptors (13, 47). Sea urchins had an elaborate collection of photoreceptor genes that quite surprisingly appeared to be expressed in tube feet (13, 47). These included many genes encoding transcription factors regulating retinal development and a photorhodopsin gene.

Human Usher syndromes are genetic diseases affecting hearing, balance, and retinitis pigmentosa (retinal photoreceptor degeneration). Most of the genes involved have been identified, and they encode a set of membrane and cytoskeletal proteins that form an interacting network that controls the arrangement of mechanosensory stereocilia in hair cells of the mammalian ear. Many or all of the proteins play some roles in photoreceptor organization and/or maintenance. Orthologs of virtually the entire set of membrane and cytoskeletal proteins of the Usher syndrome network were found in the sea urchin genome. These include the very large membrane proteins, usherin and VLGR-1 and large cadherins (Cadh23 and possibly Pcad15), all of which participate in forming links between stereocilia in mammalian hair cells, as well as myosin 7 and 15, two PDZ proteins (harmonin and whirlin) and another adaptor protein (SANS), which participate in linking these membrane proteins to the cytoskeleton. In addition, two membrane transporters, NBC (a candidate Usher syndrome target known to interact with harmonin) and TrpA1 (the mechanosensory channel connected to the tip links containing cadherin 23), have orthologs in the sea urchin genome. Sea urchins do not have ears or eyes, so they must deploy these proteins in other sensory processes. Sea urchins respond to light, touch, and displacement and probably use some of same sensory genes used by vertebrates.

The Echinoderm Adhesome

The S. purpuratus genome contained representatives of all the standard metazoan adhesion receptors (table S7), but the emphasis on different classes of receptors differed substantially from that used by vertebrates. The integrin family was intermediate in size between those of protostomes and vertebrates—several chordate-specific expansions of the integrin repertoire were absent, and there were some expansions unique (so far) to echinoderms. The cadherin repertoire was also small relative to vertebrates (a dozen or so instead of over a hundred), and many chordate-specific expansions were missing. Specialized large cadherins shared by protostomes and vertebrates were present, as well as some specialized large cadherins previously thought to be chordate-specific, but overall, the cadherin repertoire was more invertebrate than vertebrate in character. Sea urchins lacked the integrins and cadherins that link to intermediate filaments in vertebrates.

In contrast, sea urchins had large repertoires of adhesion molecules containing immunoglobulin superfamily, fibronectin type 3 repeat (FN3), epidermal growth factor (EGF), and LRR repeats. In addition to the expansion of TLRs and NLRs mentioned above, there are large expansions of other LRR receptor families, including GPCRs (32). The key neural adhesion systems involved in regulating axonal outgrowth were present (netrin/Unc5/DCC; Slit/Robo; and semaphorins/plexins), as were adhesion molecules involved in synaptogenesis (Agrin/MUSK; and neurexin/neuroligins). This was not surprising because these molecules were known in both protostomes and vertebrates. However, structurally, the synapses of echinoderms are unusual because there are no direct synaptic contacts (49). Some of them were expressed in sea urchin embryos before there are any neurons, suggesting that they may have other roles as well.

The basic metazoan basement membrane extracellular matrix (ECM) tool kit was present— two alpha-IV collagen genes, perlecan, laminin subunits, nidogen, and collagen XV/XVIII. There did not appear to be much, if any, expansion of these gene families, as is found in vertebrates, which suggests that there is less diversity among basement membranes. Quite a few ECM proteins present in chordates, but not protostomes, were also missing in sea urchins, including fibronectins, tenascins, von Willebrand factor, vitronectin, most vertebrate-type matrix proteoglycans, and complex VWA/FN3 collagens among others (32). Absence of these genes may be related to the absences of neural crest migration, a high shear endothelial-lined vasculature and, of course, cartilage and bone.

In addition to the components of Usher syndromes mentioned above, it was surprising to find a clear ortholog of reelin, a large ECM protein involved in establishing the layered organization of neurons in the vertebrate cerebral cortex. Reelin is mutated in the reeler mouse, and mutations in the reeler gene in humans have been associated with Norman-Roberts-type lissencephaly syndrome. Reelin has a unique domain composition and organization (Reeler, EGF, BNR) that has not been found outside chordates, but the sea urchin genome included a very good homolog of reelin. Receptors for reelin are believed to include low-density lipoprotein receptor–related proteins (LRPs), and there are a number of these receptors in S. purpuratus although it is as yet unclear whether they are reelin receptors, lipoprotein receptors, or something else. Similar receptors are also involved in human disease (atherosclerosis).

Biomineralization Genes

Among the deuterostomes, only echinoderms and vertebrates produce extensive skeletons. The possible evolutionary relations between biomineralization processes in these two groups have been controversial. Analysis of the S. purpuratus genome revealed major differences in the proteins that mediate biomineralization in echinoderms and vertebrates (50). First, there were few sea urchin counterparts of extracellular proteins that mediate biomineral deposition in vertebrates. For example, in vertebrates, an important class of proteins involved in biomineralization is the family of secreted, calcium-binding phosphoproteins, or SCPPs. Sea urchins did not have counterparts of SCPP genes, which supports the hypothesis that this family arose via a series of gene duplications after the echinodermchordate divergence (51). Second, almost all of the proteins that have been directly implicated in the control of biomineralization in sea urchins were specific to that clade. The echinoderm skeleton consists of magnesium calcite (as distinct from the calcium phosphate skeletons of vertebrates) in which is occluded many secreted matrix proteins. The sea urchin spicule matrix proteins were encoded by a family of 16 genes that are organized in small clusters and likely proliferated by gene duplication. Counterparts of sea urchin spicule matrix genes were not found in vertebrates, amphioxus, or ascidians. Likewise, other genes that have been implicated in biomineralization in sea urchins, including genes that encode the transmembrane protein P16 and MSP130, a glycosylphosphatidylinositol-linked glycoprotein, were members of small clusters of closely related genes without apparent homologs in other deuterostomes. The members of all three of these sea urchin–specific gene families were expressed specifically by the biomineral-forming cells of the embryo, the primary mesenchyme cells [see (50)]. As a whole, these findings highlighted substantial differences in the primary sequences of the proteins that mediate biomineralization in echinoderms and vertebrates.

Cytoskeletal genes. In addition to identifying genes for all previously known S. purpuratus actins and tubulins, one δ- and two ϵ-tubulin genes were found (52). Newly identified motor protein genes include members of four more classes of myosin, and eight more families of kinesins. The first dynein cloned and sequenced was from sea urchin, and although most S. purpuratus dynein heavy chain genes mapped one-to-one to mammalian homologs, Sp-DNAH9 mapped one-to-three, as it was equidistant between the closely similar mammalian genes DNAH9, DNA11, and DNAH17 (52).


Our estimate of 23,300 genes is similar to estimates for vertebrates, despite the fact that two whole-genome duplications are believed to have occurred in the chordate lineage after divergence from the lineage leading to the echinoderms (2527). From the analysis presented here, it seems likely that many mechanisms shaped the final genetic content of these genomes. On the one hand, there are cases of gene families that are expanded in vertebrates compared with sea urchin, including examples of the expected 4:1 ratio from two duplications (15). However other patterns are also found. The nuclear receptor family is only slightly reduced in sea urchin compared with that of humans, which suggests gene loss followed the vertebrate duplications. The unprecedented expansions of innate immune system diversity contrast sharply with the much smaller sets of counterparts that are present in the sequenced genomes of protostomes, Ciona, and vertebrates, an example of independent expansion in the sea urchin, whereas the GTPases described here have expanded in sea urchin to about the same numbers as in vertebrates. Thus, whereas the duplications of the chordate lineage were a contributor to the increased complexity of vertebrates, regional expansions clearly play a large role in the evolution of these animals.

The refinement of the inventory of vertebrate-specific or protostome-specific genes likewise benefits from the sea urchin genome. Many more human genes have shared ancestry across the deuterostomes, and in fact, bilaterian genes are more broadly shared than had been inferred from comparison of the previously limited genome sequences. The new biological niche sampled by the sea urchin genome provides not only a clearer view of the deuterostome and bilaterian ancestor, but has also provided a number of surprises. The finding of sea urchin homologs for sensory proteins related to vision and hearing in humans may lead to interesting new concepts of perception, and the extraordinary organization of the sea urchin immune system is different from any animal yet studied. From a practical standpoint, the sea urchin may be a treasure trove. Because of the many pathways shared by sea urchin and human, the sea urchin genome includes a large number of human disease gene orthologs. Many of the genes described in the preceding sections fall into this category (see tables S8 and S9) and cover a surprising diversity of systems such as nervous, endocrine, and blood systems, as well as muscle and skeleton, as exemplified by the Huntington and muscular dystrophy genes. Continued exploration of the sea urchin immune system is expected to uncover additional variations for protection against pathogens. The immense diversity of pathogen-binding motifs encoded in the sea urchin genome provides an invaluable resource for antimicrobial applications and the identification of new deuterostome immune functions with direct relevance to human health. These exciting possibilities show that much biodiversity is yet to be uncovered by sampling additional evolutionary branches of the tree of life.

Sea Urchin Genome Sequencing Consortium Overall project leadership: Erica Sodergren,1,2 George M. Weinstock,1,2 Eric H. Davidson,3 R. Andrew Cameron3Principal investigators: Richard A. Gibbs,1,2 George M. Weinstock1,2

Annotation section leaders: Robert C. Angerer,4 Lynne M. Angerer,4 Maria Ina Arnone,5 David R. Burgess,6 Robert D. Burke,7 R. Andrew Cameron,3 James A. Coffman,8 Eric H. Davidson,3 Michael Dean,9 Maurice R. Elphick,10 Charles A. Ettensohn,11 Kathy R. Foltz,12 Amro Hamdoun,13 Richard O. Hynes,14 William H. Klein,15 William Marzluff,16 David R. McClay,17 Robert L. Morris,18 Arcady Mushegian,19,20 Jonathan P. Rast,21 Erica Sodergren,1,2 L. Courtney Smith,23 Michael C. Thorndyke,24 Victor D. Vacquier,24 George M. Weinstock,1,2 Gary M. Wessel,26 Greg Wray,27 Lan Zhang1,2

Annotation: Gene list: Erica Sodergren1,2 (leader), George M. Weinstock1,2 (leader), Robert C. Angerer,4 Lynne M. Angerer,4 R. Andrew Cameron,3 Eric H. Davidson,3 Christine G. Elsik,27 Olga Ermolaeva,29 Wratko Hlavina,29 Gretchen Hofmann,30 Paul Kitts,28 Melissa J. Landrum,28 Aaron J. Mackey,32* Donna Maglott,28 Georgia Panopoulou,33 Albert J. Poustka,33 Kim Pruitt,28 Victor Sapojnikov,29 Xingzhi Song,1,2 Alexandre Souvorov,28 Victor Solovyev,34 Zheng Wei,4 Charles A. Whittaker,35 Kim Worley,1,2 Lan Zhang1,2

Assembly of genome: Erica Sodergren1,2 (leader), George M. Weinstock1,2 (leader), K. James Durbin,1,2 Richard A. Gibbs,1,2 Yufeng Shen1,2 (v 2.1), Xingzhi Song1,2 (v 0.5), Kim Worley,1,2 Lan Zhang1,2

Basal transcription apparatus proteins and polymerases chromatin proteins: Greg Wray27 (leader), Olivier Fedrigo,26 David Garfield,27 Ralph Haygood,17 Alexander Primus,26 Rahul Satija,26 Tonya Severson27

BCM-HGSC annotation database and Genboree: Lan Zhang1,2(leader), Erica Sodergren1,2 (leader), George M. Weinstock1,2 (leader), Manuel L. Gonzalez-Garay,1,2 Andrew R. Jackson,1,2 Aleksandar Milosavljevic,1,2 Xingzhi Song,1,2 Mark Tong,1,2 Kim Worley1,2

Biomineralization: Charles A. Ettensohn11 (leader), R. Andrew Cameron,3 Christopher E. Killian,36 Melissa J. Landrum,31 Brian T. Livingston,37 Fred H. Wilt36

Cell physiology: James A. Coffman8 (leader), William Marzluff16 (leader), Arcady Mushegian19,20 (leader), Nikki Adams,37 Robert Bellé,38,39 Seth Carbonneau,8 Rocky Cheung,16 Patrick Cormier,38,39 Bertrand Cosson,38,39 Jenifer Croce,17 Antonio Fernandez-Guerra,40,41 Anne-Marie Genevière,40,41 Manisha Goel,19 Hemant Kelkar,42 Julia Morales,38,39 Odile Mulner-Lorillon,39,40 Anthony J. Robertson8

Cellular defense: Amro Hamdoun13 (leader), Jared V. Goldstone42 (leader), Nikki Adams,36 Bryan Cole,13 Michael Dean,9 David Epel,13 Bert Gold,9 Mark E. Hahn,43 Meredith Howard-Ashby,3 Mark Scally,9 John J. Stegeman43

Ciliogenesis and ciliary compounds: Robert L. Morris18 (leader), Erin L. Allgood,18 Jonah Cool,18 Kyle M. Judkins,18 Shawn S. McCafferty,18 Ashlan M. Musante,18 Robert A. Obar,44 Amanda P. Rawson,18 Blair J. Rossetti18

Cytoskeletal and organelle genes: David R. Burgess6 (leader), Erin L. Allgood,18 Jonah Cool,18 Ian R. Gibbons,45 Matthew P. Hoffman,6 Kyle M. Judkins,18 Andrew Leone,6 Shawn S. McCafferty,18 Robert L. Morris,18 Ashlan M. Musante,18 Robert A. Obar,44 Amanda P. Rawson,18 Blair J. Rossetti,18 Gary M. Wessel26Embryonic transcriptome: Eric H. Davidson3 (leader), R. Andrew Cameron,3 Sorin Istrail,46 Stefan C. Materna,3 Manoj P. Samanta,47,48 Viktor Stolc,47 Waraporn Tongprasit,47 Qiang Tu3

Embryonic temporal expression pattern list: Robert C. Angerer4 (leader), Lynne M. Angerer4 (leader), Zheng Wei4

Echinoderm adhesome: Richard O. Hynes14 (leader), Karl-Frederik Bergeron,49 Bruce P. Brandhorst,50 Robert D. Burke,7 Charles A. Whittaker,35 James Whittle51

Echinoderm evolution: R. Andrew Cameron3 (leader), Kevin Berney,3 David J. Bottjer,51 Cristina Calestani,53 Eric H. Davidson,3 Kevin Peterson,54 Elly Chow,55 Qiu Autumn Yuan55

Genome analysis [GC content]: Eran Elhaik,56 Christine G. Elsik,28 Dan Graur,56 Justin T. Reese28

Genome FPC map: Ian Bosdet,57 Shin Heesun,57 Marco A. Marra,57 Jacqueline Schein57

Human genetic disease orthologs: Michael Dean9 (leader), Amro Hamdoun13 (leader), The Sea Urchin Genome Sequencing Consortium

Immunity: Jonathan P. Rast21 (leader), L. Courtney Smith23 (leader), Michele K. Anderson,22 Kevin Berney,3 Virginia Brockton,23 Katherine M. Buckley,23 R. Andrew Cameron,3 Avis H. Cohen,58 Sebastian D. Fugmann,59 Taku Hibino,21 Mariano Loza-Coll,21 Audrey J. Majeske,23 Cynthia Messier,21 Sham V. Nair,60 Zeev Pancer,61 David P. Terwilliger22

Neurobiology and sensory systems: Robert D. Burke7 (leader), Maurice R. Elphick10 (leader), William H. Klein15 (leader), Michael C. Thorndyke24 (leader), Cavit Agca,62 Lynne M. Angerer,4 Enrique Arboleda,5 Maria Ina Arnone,5 Bruce P. Brandhorst,50 Nansheng Chen,50 Allison M. Churcher,63 F. Hallböök,64 Glen W. Humphrey,65 Richard O. Hynes,14 Mohammed M. Idris,5 Takae Kiyama,15 Shuguang Liang,15 Dan Mellott,60 Xiuqian Mu,15 Greg Murray,48 Robert P. Olinski,64 Florian Raible,66,67 Matthew Rowe,10 John S. Taylor,63 Kristin Tessmar-Raible,66 D. Wang,63 Karen H. Wilson,24 Shunsuke Yaguchi7

Reproduction: Kathy R. Foltz12 (leader), Victor D. Vacquier25 (leader), Gary M. Wessel26 (leader), Terry Gaasterland,25 Blanca E. Galindo,67 Herath J. Gunaratne,25 Meredith Howard-Ashby,3 Glen W. Humphrey,65 Celina Juliano,26 Masashi Kinukawa,25 Gary W. Moy,25 Anna T. Neill,25 Mamoru Nomura,25 Michael Raisch,12 Anna Reade,12 Michelle M. Roux,12 Jia L. Song,25 Yi-Hsien Su,3 Ian K. Townley,12 Ekaterina Voronina,26 Julian L. Wong26

Sea Urchin Genome Annotation Workshop in Naples: Maria Ina Arnone5 (leader), Michael C. Thorndyke24 (leader), Gabriele Amore,5 Lynne M. Angerer,4 Enrique Arboleda,5 Margherita Branno,5 Euan R. Brown,5 Vincenzo Cavalieri,69 Véronique Duboc,70 Louise Duloquin,70 Maurice R. Elphick,10 Constantin Flytzanis,70,71 Christian Gache,70 Anne-Marie Genevière,40,41 Mohammed M. Idris,5 François Lapraz,70 Thierry Lepage,70 Annamaria Locascio,5 Pedro Martinez,73,74 Giorgio Matassi,75 Valeria Matranga,76 David R. McClay,17 Julia Morales,38,39 Albert J. Poustka,33 Florian Raible,66,67 Ryan Range,70 Francesca Rizzo,5 Eric Röttinger,70 Matthew Rowe,10 Kristin Tessmar-Raible,66 Erica Sodergren,1,2 George M. Weinstock,1,2 Karen Wilson24

Signal transduction: David R. McClay17 (leader), Lynne M. Angerer,4 Maria Ina Arnone,5 Wendy Beane,17 Cynthia Bradham,17 Christine Byrum,17,78 Jenifer Croce,17 Veronique Duboc,70 Louise Duloquin,70 Christian Gache,70 Anne-Marie Genevière,40,41 Tom Glenn,17 Taku Hibino,22 Sofia Hussain,37 François Lapraz,70 Thierry Lepage,70 Brian T. Livingston,37 Mariano Loza,21 Gerard Manning,76 Esther Miranda,17 Ryan Range,70 Francesca Rizzo,5 Eric Röttinger,70 Rebecca Thomason,17,78 Katherine Walton,17 Zheng Wei,4 Gary M. Wessel,26 Athula Wikramanayke,77 Karen H. Wilson,23 Charles Whittaker,35 Shu-Yu Wu,17 Ronghui Xu78

Transcription regulatory factors: Eric H. Davidson3 (leader), Maria Ina Arnone,5 Margherita Branno,5 C. Titus Brown,3 R. Andrew Cameron,3 Lili Chen,3 Rachel F. Gray,3 Meredith Howard-Ashby,3 Sorin Istrail,46 Pei Yun Lee,3 Annamaria Locascio,5 Pedro Martinez,73,74 Stefan C. Materna,3 Jongmin Nam,3 Paola Oliveri,3 Francesca Rizzo,5 Joel Smith3

DNA sequencing: Donna Muzny1,2 (leader), Erica Sodergren1,2 (leader), Richard A. Gibbs1,2 (leader), George M. Weinstock1,2 (leader), Stephanie Bell,1,2 Joseph Chacko,1,2 Andrew Cree,1,2 Stacey Curry,1,2 Clay Davis,1,2 Huyen Dinh,1,2 Shannon Dugan-Rocha,1,2 Jerry Fowler,1,2 Rachel Gill,1,2 Cerrissa Hamilton,1,2 Judith Hernandez,1,2 Sandra Hines,1,2 Jennifer Hume,1,2 LaRonda Jackson,1,2 Angela Jolivet,1,2 Christie Kovar,1,2 Sandra Lee,1,2 Lora Lewis,1,2 George Miner,1,2 Margaret Morgan,1,2 Lynne V. Nazareth,1,2 Geoffrey Okwuonu,1,2 David Parker,1,2 Ling-Ling Pu,1,2 Yufeng Shen,1,2 Rachel Thorn,1,2 Rita Wright1,2

1Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA. 2Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA. 3Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA. 4National Institute of Dental and Craniofacial Research, NIH, Bethesda, MD20892, USA. 5Stazione Zoologica Anton Dohrn, Villa Comunale, 80121 Napoli, Italy. 6Department of Biology, Boston College, Chestnut Hill, MA 02467, USA. 7Department of Biology, Department of Biochemistry and Microbiology, University of Victoria, Victoria, BC, Canada, V8W 3N5. 8Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA. 9Human Genetics Section, Laboratory of Genomic Diversity, National Cancer Institute–Frederick, Frederick, MD 21702, USA. 10School of Biological and Chemical Sciences, Queen Mary, University of London, London E1 4NS, UK. 11Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, 15213, USA. 12Department Molecular, Cellular and Developmental Biology and the Marine Science Institute, University of California, Santa Barbara, Santa Barbara, CA 93106–9610, USA. 13Hopkins Marine Station, Stanford University, Pacific Grove, CA 93950, USA. 14Howard Hughes Medical Institute, Center for Cancer Research, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA. 15Departments of Biochemistry and Molecular Biology, University of Texas, M. D. Anderson Cancer Center, Houston, TX, 77030, USA. 16Molecular Biology and Biotechnology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. 17Department of Biology, Duke University, Durham, NC 27708, USA. 18Department of Biology, Wheaton College, Norton, MA 02766, USA. 19Stowers Institute for Medical Research, Kansas City, MO 64110, USA. 20Department of Microbiology, Kansas University Medical Center, Kansas City, KS 66160, USA. 21Sunnybrook Research Institute and Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M4N 3M5. 22Department of Immunology, University of Toronto, Toronto, Ontario, Canada, M4N 3M5. 23Department of Biological Sciences, George Washington University, Washington, DC 20052, USA. 24Royal Swedish Academy of Sciences, Kristineberg Marine Research Station, Fiskebackskil, 450 34, Sweden. 25Marine Biology, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA 92093–0202, USA. 26Department of Molecular and Cellular Biology and Biochemistry, Brown University Providence, RI 02912, USA. 27Department of Biology and Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708, USA. 28Department of Animal Science, Texas A&M University, College Station, TX 77843, USA. 29National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD 20894, USA. 30Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA. 31National Center for Biotechnology Information, NIH, Bethesda, MD 20892, USA. 32Penn Genomics Institute, University of Pennsylvania, Philadelphia, PA 19104, USA. 33Evolution and Development Group, Max-Planck Institut für Molekulare Genetik, 14195 Berlin, Germany. 34Royal Holloway, University of London, Egham, Surrey TW20 0EX, UK. 35Center for Cancer Research, MIT, Cambridge, MA 02139, USA. 36Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720–3200, USA. 37Department of Biology, University of South Florida, Tampa, FL 33618, USA. 38Université Pierre et Marie Curie (Paris 6), UMR 7150, Equipe Cycle Cellulaire et Développement, Station Biologique de Roscoff, 29682 Roscoff Cedex, France. 39CNRS, UMR 7150, Station Biologique de Roscoff, 29682 Roscoff Cedex, France. 40CNRS, UMR7628, Banyuls-sur-Mer, F-66650, France. 41Université Pierre et Marie Curie (Paris 6), UMR7628, Banyuls-sur-Mer, F-66650, France. 42Center for Bioinformatics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA. 43Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA. 44Tethys Research, LLC, 2115 Union Street, Bangor, Maine 04401, USA. 45Department of Molecular, Cellular, and Developmental Biology, University of California, Berkeley, Berkeley, CA 94720, USA. 46Center for Computational Molecular Biology, and Computer Science Department, Brown University, Providence, RI 02912, USA. 47Genome Research Facility, National Aeronautics and Space Administration, Ames Research Center, Moffet Field, CA 94035, USA. 48Systemix Institute, Cupertino, CA 95014, USA. 49Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia, Canada, V5A 1S6. 50Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada, V5A 1S6. 51Department of Biology, Center for Cancer Research, MIT, Cambridge, MA 02139, USA. 52Department of Earth Sciences, University of Southern California, Los Angeles, CA 90089–0740, USA. 53Department of Biology, University of Central Florida, Orlando, FL 32816–2368, USA. 54Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, USA. 55Center for Computational Regulatory Genomics, Beckman Institute, California Institute of Technology, Pasadena, CA 91125, USA. 56Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA. 57Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada, V5Z 4E6. 58Department of Biology and the Institute of Systems Research, University of Maryland, College Park, MD 20742, USA. 59Laboratory of Cellular and Molecular Biology, National Institute on Aging, NIH, Baltimore, MD 21224, USA. 60Department of Biological Sciences, Macquarie University, Sydney NSW 2109, Australia. 61Center of Marine Biotechnology, UMBI, Columbus Center, Baltimore, MD 21202, USA. 62Department of Cell Biology and Anatomy, Louisiana State University Health Sciences Center, New Orleans, LA 70112, USA. 63Department of Biology, University of Victoria, Victoria, BC, Canada, V8W 2Y2. 64Department of Neuroscience, Uppsala University, Uppsala, Sweden. 65Laboratory of Cellular and Molecular Biophysics, National Institute of Child Health and Development, NIH, Bethesda, MD 20895, USA. 66Developmental Unit, EMBL, 69117 Heidelberg, Germany. 67Computational Unit, EMBL, 69117 Heidelberg, Germany. 68Biotechnology Institute, Universidad Nacional Autónoma de Mexico (UNAM), Cuernavaca, Morelos, Mexico 62250. 69Department of Cellular and Developmental Biology “Alberto Monroy,” University of Palermo, 90146 Palermo, Italy. 70Laboratoire de Biologie du Développement (UMR 7009), CNRS and Université Pierre et Marie Curie (Paris 6), Observatoire Océanologique, 06230 Villefranche-sur-Mer, France. 71Department of Biology, University of Patras, Patras, Greece. 72Department of Molecular and Cellular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA. 73Departament de Genetica, Universitat de Barcelona, 08028–Barcelona, Spain. 74Institució Catalana de Recerca i Estudis Avancats (ICREA), Barcelona, Spain. 75Institut Jacques Monod, CNR-UMR 7592, 75005 Paris, France. 76Consiglio Nazionale delle Ricerche, Istituto di Biomedicina e Immunologia Molecolare “Alberto Monroy,” 90146 Palermo, Italy. 77Razavi-Newman Center for Bioinformatics, Salk Institute for Biological Studies, La Jolla, CA 92186, USA. 78Department of Zoology, University of Hawaii at Manoa, Honolulu, HI 96822, USA.

Supporting Online Material


Materials and Methods

SOM Text

Figs. S1 to S6

Tables S1 to S8


References and Notes

View Abstract


Navigate This Article