The Genome of the Western Clawed Frog Xenopus tropicalis

See allHide authors and affiliations

Science  30 Apr 2010:
Vol. 328, Issue 5978, pp. 633-636
DOI: 10.1126/science.1183670


The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes more than 20,000 protein-coding genes, including orthologs of at least 1700 human disease genes. Over 1 million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like that of other tetrapods, the genome of X. tropicalis contains gene deserts enriched for conserved noncoding elements. The genome exhibits substantial shared synteny with human and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.

African clawed frogs (the genus Xenopus, meaning “strange foot”) comprise more than 20 species of frogs native to Sub-Saharan Africa. The species Xenopus laevis was first introduced to the United States in the 1940s where a low-cost pregnancy test took advantage of the responsiveness of frogs to human chorionic gonadotropin (1). Since the frogs were easy to raise and had other desirable properties such as large eggs, external development, easily manipulated embryos, and transparent tadpoles, X. laevis gradually developed into one of the most productive model systems for vertebrate experimental embryology (2).

However, X. laevis has a large paleotetraploid genome with an estimated size of 3.1 billion bases (Gbp) on 18 chromosomes and a generation time of 1 to 2 years. In contrast, the much smaller diploid western clawed frog, X. tropicalis, has a small genome, about 1.7 Gbp on 10 chromosomes (3), matures in only 4 months, and requires less space than its larger cousin. It is thus readily adopted as an alternative experimental subject for developmental and cell biology (Fig. 1).

Fig. 1

Comparison of adults and tadpoles of X. tropicalis and X. laevis. Adult body length is 5 and 10 cm, respectively. (A) Tailbud, (B) swimming tadpole, and (C) feeding tadpole. Bar, 1 mm.

As a group, amphibians are phylogenetically well positioned for comparisons to other vertebrates, having diverged from the amniote lineage (mammals, birds, reptiles) some 360 million years ago. The comparison with mammalian and bird genomes also provides an opportunity to examine the dynamics of tetrapod chromosomal evolution.

The X. tropicalis draft genome sequence described here was produced from ~7.6-fold redundant random shotgun sampling of genomic DNA from a seventh-generation inbred Nigerian female. The assembly (4) (tables S1 to S3 and accession number AAMC00000000) spans about 1.51 Gbp of scaffolds, with half of the assembled sequence contained in 272 scaffolds ranging in size from 1.56 to 7.82 Mb. Of known genes, 97.6% are present in the assembly, attesting to its near completeness in genic regions (4). Nearly 2 million Xenopus expressed sequence tags (ESTs) from diverse developmental stages and adult tissues complement the genome and enable studies of alternative splicing and identification of developmental stage- and tissue-specific genes (4).

More than one-third of the frog genome consists of transposable elements (TEs) (table S7), higher than the 9% TE density in the chicken genome (5) but comparable to the 40 to 50% density in mammalian genomes (6, 7). Many families of frog TEs are more than 25% divergent from their consensus sequence, so like mammalian and bird TEs they have persisted for as long as 20 to 200 million years (5, 6). This contrasts with the faster turnover observed in insects, nematodes, fungi, and plants (6, 8, 9). Recently active TEs (1 to 5 million years ago) are more common in frogs than in mammals or birds, and their prevalence is comparable to that in fish, insects, nematodes, and plants. Among these is an unusually high diversity of very young families of L1 non-LTR (long terminal repeat) retrotransposons, Penelope, and DIRS retrotransposons. In contrast to those of other vertebrates, most recognizable frog TEs (72%) are DNA transposons, rather than the retrotransposons that dominate other genomes (58, 10). Among these families (11, 12), we identified Kolobok as a previously uncharacterized superfamily of DNA transposons. The genome also contains LTR retrotransposons of all major superfamilies, with higher diversity than in all other studied eukaryotes (table S8). Although most are ubiquitous, Copia, BEL, and Gypsy elements are not found in birds and mammals, suggesting that this subset became immobile after divergence from the amphibian lineage.

Using homology-based gene prediction methods and deep Xenopus EST and cDNA resources, we estimated that the X. tropicalis genome contains 20,000 to 21,000 protein-coding genes. These include orthologs of 79% of identified human disease genes (4). The genome contains 1850 tandem expanded gene families with between 2 and 160 copies, accounting for nearly 24% of protein-coding loci. The largest expansion comprises tetrapod-specific olfactory receptors (class II) occupying the first 1.7 Mb on scaffold_24. Other large expansions include protocadherins, bitter-taste receptors, and vomeronasal (pheromone) receptors (table S9).

The X. tropicalis genome displays long stretches of gene colinearity with human and chicken (Fig. 2). Of the 272 largest scaffolds (totaling half the assembly), 267 show such colinearity (4). Sixty percent of all gene models on these scaffolds can be directly associated with a human and/or chicken ortholog by conserved synteny. Patches of strict conserved colinearity are interrupted by large-scale inversions within the same linkage groups, and more rarely by chromosome breakage and fusion events, similar to the findings reported for the human and chicken genome (Fig. 2) (5) and in agreement with persistent conservation of linkage groups across chordates (13).

Fig. 2

Blocks of conserved tetrapod linkage for human (A) and chicken (B) chromosome 1 reveal fusions (solid black triangles) and break points (unfilled triangles) in amniotes. A total of three human fusions (A), seven human breaks (B), and one chicken fusion (B) are observed. The green triangle in (B) indicates the position of an apparent frog-specific break or ancestral amniote fusion. Gray areas indicate origin in different ancestral chromosomes. Shaded areas show larger regions with insufficient three-way synteny information. Detailed comparison of gene order in human and chicken reveals multiple large-scale inversions (dot plots on the black blocks). The green frog blocks consist of multiple scaffolds, 55 in (A) and 97 in (B). Bars on the frog blocks show the location of scaffolds that do not contain markers from the linkage map, but have been predicted to associate with the linkage group by conserved synteny.

We uniquely placed 1696 markers from the existing genetic map of X. tropicalis ( onto a total of 691 scaffolds constituting more than 764 Mb of genomic sequence (4, 14). To identify lineage-specific fusion- and breakage-events within the mammals and sauropsids, we analyzed blocks of conserved synteny between frog, human, and chicken. These blocks were detected with genomic probes comprising three-way orthologs between these tetrapods. Of these probes, 5642 define conserved linkage blocks containing at least 15 genes and at least 2 Mb of sequence (4, 14). The tetrapod ancestry of human and chicken chromosome 1 is outlined in Fig. 2. Notably, a core of more than 150 Mb of sequence spanning the centromere of human chromosome 1 [chicken chromosome 8, frog linkage group (LG) VII] has remained largely intact during ~360 million years of evolution since the tetrapod ancestor (Fig. 2A). Detailed shared synteny is interrupted by large-scale inversions, but gene order is frequently conserved over stretches of tens of megabases. Human chromosome 1 is seen to have grown by three lineage-specific mammalian fusions. In contrast, there are several mammalian-specific breakpoints (Fig. 2B). The genomic material on the entire q arm of chicken shows linkage conservation to frog LG VI, whereas the human counterparts are scattered over regions of chromosomes 2, 3, 11, 13, 21, and X. The p arm indicates two mammalian breaks, suggesting that regions of chromosomes 7, 12, and 22 were once part of the same chromosome.

By extending this analysis to all human and chicken chromosomes, we identified 22 human fusion and 21 fission events, versus only four fusions and one break in chicken. Clearly, the mammalian lineage has undergone considerably more rearrangement than that of the sauropsids, although the total chromosome count appears to have remained fairly constant. The segments analyzed here are distributed on 23 human and 22 chicken chromosomes, consistent with a derivation from 24 or 25 ancestral amniote chromosomes. The chicken microchromosomes are unresolved by this analysis, however, preventing determination of the exact ancestral chromosome number. Both the vertebrate and eumetazoan ancestors have been suggested to have had about a dozen large chromosomes (13, 15). The current analysis indicates that the amniote ancestor had twice as many, suggesting substantial chromosome breakage on the amniotic stem.

The extensive conserved synteny among tetrapods allows us to provisionally place frog scaffolds without genetic markers onto the linkage map. These are shown in Fig. 2 as black bars within the blocks of conserved linkage with frog. A total of 170 large scaffolds containing about 200 Mb of sequence were assigned a linkage group in this manner. Such in silico inferred linkages will ultimately need to be verified experimentally, but have already proven useful in the positional identification and cloning of the gene responsible for the muzak mutation, which affects heart function (16).

The X. tropicalis genome exhibits extensive sequence conservation with other vertebrates, with the amphibian sequence filling a phylogenetic gap. Recognizable noncoding sequence conservation diminishes steadily with increasing evolutionary distance (fig. S6). Frog genes adjacent to conserved noncoding sequences (CNS) are enriched or depleted in several gene ontology categories, including sensory perception of smell, response to stimulus, and regulation of transcription, among others (table S16).

Gene deserts (defined as the top 3% of the longest intergenic regions) cover 17% of the genome and vary between 201 kbp and 1.2 Mbp. The 683 gene deserts contain almost 25% of CNSs. In mammalian genomes, these gene deserts have been found to harbor cis-regulatory elements (17).

The power of genome comparison and high-throughput transgenesis in Xenopus is illustrated in fig. S7, where several mammalian-Xenopus CNSs at the Six3 locus were assayed for enhancers regulating its eye- and forebrain-specific expression. The analysis suggests that frog-mammal comparisons may be more suitable than fish-mammal comparisons for identifying conserved cis-regulatory elements (see, e.g., CNS5 in fig. S7).

Developmental pathways controlling early vertebrate axis specification were first implicated by work in Xenopus (2), but some interesting amphibian modifications can be found. For example, a Wnt ligand required for dorsal development, named Wnt11b in X. tropicalis, has been lost from mammals, but is found in the chick and zebrafish (as silberblick) (18). Despite its retention in these vertebrates, there is no evidence to support a maternal role in axis formation similar to that in Xenopus. Similarly, a tbx16 homolog, vegT, is retained in frog, fish, and chick, but is uniquely used in Xenopus for the establishment of the endoderm and mesoderm (19).

X. tropicalis also shows multiplications of genes deployed at the blastula and gastrula stages. For example, mammals have a single nodal gene, whereas X. tropicalis has more than six. Synteny relationships reveal that nodal4 on scaffold 204 is orthologous to the single human nodal, whereas a cluster of more than six nodals on scaffold 34 is orthologous to the chicken nodal. Further analysis suggests that these two nodal loci arose in one of the whole-genome duplications at the base of vertebrate evolution and that the birds and mammals subsequently lost different nodal genes, whereas the lizard Anolis carolinensis has retained both copies (4).

The theme of duplication is reiterated by several transcription factors that act during gastrulation (4). The transcriptional activator siamois, expressed in the organizer, is triplicated locally in the genome; so far this gene is unique to the frog. The ventx genes are expressed at the same time, but opposite the organizer, and are present in six linked copies.

Conservation of the vertebrate immune system is highlighted by mammalian and Xenopus genome comparisons (20, 21). Although orthology is usually obvious, synteny has been an important tool to identify diverged genes. For example, a diverged CD8 beta retains proximity to CD8 alpha, and CD4 neighbors Lag3 and B protein. Similarly, an interleukin-2/interleukin-21–like sequence was identified in a syntenic region between the tenr and centrin4 genes. The immunoglobulin repertoire provides further links between vertebrate immune systems. The IgW immunoglobulin was thought to be unique to shark/lungfish, but an orthologous IgD isotype in frog provides a connection between the fish and amniote gene families (22, 23).

Unique antimicrobial peptides play an important role in skin secretions that are absent in birds, reptiles, and mammals. Antimicrobial peptides (caerulein, levitide, magainin, PGLa/PYLa, PGQ, xenopsin), neuromuscular toxins (e.g., xenoxins), and neuropeptides (e.g., thyrotropin-releasing hormone) (24) are secreted by granular glands, and the first group represents an important defense against pathogens (25). Antimicrobial peptides are clustered in at least seven transcription units >350 kbp on scaffold 811, with no intervening genes.

X. tropicalis occupies a key phylogenetic position among previously sequenced vertebrate genomes, namely amniotes and teleost fish. Given the utility of the frog as a genetic and developmental biology system and the large and increasing amounts of cDNA sequence from the pseudo-tetraploid X. laevis, the X. tropicalis reference sequence is well poised to advance our understanding of genome and proteome evolution in general, and vertebrate evolution in particular.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 to S9

Tables S1 to S17


Dataset S1

References and Notes

  1. Supporting material is available on Science Online.
  2. Dataset S1 is available on Science Online.
  3. This work was performed under the auspices of the U.S. Department of Energy’s Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory, under contract DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344, and Los Alamos National Laboratory under contract DE-AC02-06NA25396. This research was supported in part by the Intramural Research Program of the NIH, National Library of Medicine, and by a grant to R.K.W. from the National Human Genome Research Institute (NHGRI U01 HG02155) with supplemental funds provided by the National Institute of Child Health and Human Development. We thank R. Gibbs and S. Scherer of the Human Genome Sequencing Center, Baylor College of Medicine, for their contributions to identification and mapping of simple sequence length polymorphisms.
View Abstract

Stay Connected to Science

Navigate This Article