The Xist RNA Gene Evolved in Eutherians by Pseudogenization of a Protein-Coding Gene

See allHide authors and affiliations

Science  16 Jun 2006:
Vol. 312, Issue 5780, pp. 1653-1655
DOI: 10.1126/science.1126316


The Xist noncoding RNA is the key initiator of the process of X chromosome inactivation in eutherian mammals, but its precise function and origin remain unknown. Although Xist is well conserved among eutherians, until now, no homolog has been identified in other mammals. We show here that Xist evolved, at least partly, from a protein-coding gene and that the loss of protein-coding function of the proto-Xist coincides with the four flanking protein genes becoming pseudogenes. This event occurred after the divergence between eutherians and marsupials, which suggests that mechanisms of dosage compensation have evolved independently in both lineages.

Mammalian X and Y chromosomes evolved from a pair of autosomes shortly after the divergence of mammals from other amniotes (1). In eutherians and in marsupials, the desiquilibrium in gene dosage between XY males and XX females is compensated for by silencing one of the X chromosomes in females (24). In eutherians, this silencing involves the Xist gene, which is located in the X inactivation center (Xic) and encodes a long untranslated RNA (5). Xic is located on the long arm of the human X chromosome, which corresponded to the proto–X chromosome in the mammalian cenancestor (last common ancestor) (6, 7). This observation is consistent with the hypothesis that X-chromosome inactivation might have emerged contemporaneously with the chromosomal sex-determining mechanism, early in mammalian evolution (8).

To study the evolution of X inactivation, we searched for homologs of Xist in 14 vertebrate genomes (9). We found Xist in all eutherians (Fig. 1), which demonstrates that Xist was already present in the eutherian cenancestor. With BLAST, we failed to detect significant sequence similarity to Xist in noneutherian vertebrates.

Fig. 1.

Phylogenetic distribution of Xist and Lnx3 within vertebrate species for which whole-genome sequence data are available. The phylogenetic tree was adapted from references (16, 17). We searched for homologs of Xist within genomic sequences with BLASTN. The presence of a significant hit homologous to Xist is indicated by a plus sign (+). We searched for homologs of Lnx3 with BLASTP against Ensembl protein predictions and with TBLASTN against genome assemblies, except for the platypus, for which we used whole-genome shotgun sequences. Phylogenetic analyses were conducted to distinguish Lnx3 from its paralogs (fig. S2). The presence of a Lnx3 ortholog is indicated by a plus sign (+).

In humans, the genomic region surrounding the Xist gene contains three protein-coding genes (Cdx4, Chic1, and Xpct) that have orthologs in all vertebrate classes (table S1). The linkage between these genes is conserved in chicken and in Xenopus (Fig. 2A). We will hereater refer to the genomic interval between Chic1 and Xpct in noneutherian species as the XicHR (Xic homologous region). In eutherians, besides Xist, the Xic region contains two RNA genes (Jpx and Ftx) and two protein-coding genes (Tsx and Cnbp2) (10) (Fig. 2A). Cnbp2 is a retrotransposed gene that is specific to eutherians (9). We failed to detect any homolog of Tsx, Jpx, or Ftx genes in noneutherian vertebrates. In both chicken and Xenopus, the XicHR contains five protein genes (Fip1l2, Lnx3, Rasl11c, UspL, and Wave4) that have no detectable orthologs in eutherian genomes (table S1). The gene content, order, and orientation of the XicHR is perfectly conserved between chicken and Xenopus (Fig. 2A), which indicates that the chicken XicHR (on an autosome) corresponds to the ancestral state in the tetrapod cenancestor.

Fig. 2.

Comparison of the human Xic region and of its orthologous region in opossum, chicken, and Xenopus. (A) Genomic map. Protein-coding genes are indicated in black, RNA-genes in gray, and pseudogenes in white. Groups of genes for which there is evidence of homology are surrounded by a rectangle. The assembly of the opossum genome is incomplete, and the order and orientation of contigs (thick black lines) is therefore not known. (B) Alignment of the chicken Fip1l2 genomic region with the human Tsx region. (C) Alignment of the chicken Lnx3 genomic region with the human Xist region. The numbering of Xist exons corresponds to the nomenclature proposed by reference (10). Positions are indicated as base pairs.

To search for possible vestiges of XicHR genes in eutherians, we compared the chicken genomic sequence to its counterpart from four species representative of different eutherian orders (human, mouse, dog, and cow) (9). The XicHR covers 162 kb in chicken (998 kb in human), of which 5% (2%) consists of exons and 3% (59%) of repeat sequences. Comparison of human and chicken sequences revealed 22 alignments in nonrepeated sequences. Although these alignments are short (on average 62 base pairs with 72% identity), eight of them overlap with known exons in chicken, of which five also correspond to exons in human. The probability that such alignments would occur by chance is extremely low (P < 10–7 in each species) (9), indicating that they correspond to homologous regions, conserved between human and chicken. Overall, we detected 63 distinct fragments in chicken, covering 3.4 kb, that align with at least one of the four eutherian species, with 12 of these alignments overlapping with chicken exons (from Fip1l2, Lnx3, and Rasl11c) (table S2).

There are six exons of Fip1l2 that show homology with the human or mouse sequence. Three of them correspond to exons of Tsx (Fig. 2B). The protein alignment revealed that the mouse Tsx is a truncated gene, encoding a protein orthologous to the N-terminal end of Fip1l2. Tsx is functional (transcribed and translated) in mouse and rat, but is evolving very rapidly (11). Tsx is a pseudogene in human (10), as it is in dog and cow. We identified the four exons of the Rasl11c gene in cow and dog (one in human, none in mouse), but in all these species, Rasl11c has become a pseudogene. Two exons of Lnx3 are homologous to Xist (Fig. 2C). The first corresponds to Xist exon h4/m4, which is well conserved in eutherians (Fig. 3A and fig. S1). The second corresponds to the exon h5/m6, which, although conserved in human and mouse, is more divergent in dog and cow (Fig. 3B). The probability that, by chance, two independent alignments overlap exons in both species is extremely low (5 × 10–5), which indicates that these exons of the Xist RNA gene are homologous to the Lnx3 protein gene.

Fig. 3.

Alignment of the two homologous regions of chicken Lnx3 and eutherian Xist. (A) Sequence alignment [computed with MUSCLE (18)] of the 5′ part of Xist exon h4/m4 and of Lnx3 exon 3 (Ensembl transcript ENSGALT00000012483). (B) Alignment of the entire Xist exon h5/m6 and Lnx3 exon 9. Exon boundaries are indicated in bold. Sites that are conserved in all species or in four out of five species are indicated respectively by an asterisk (*) and a colon (:).

In marsupials, the XicHR is located on the X chromosome (7, 12). We have sequenced an opossum genomic clone including Rasl11c and the 5′ end of Lnx3, and we have sequenced the Lnx3 mRNA (9). We have identified Wave4 in sequence databases (Fig. 2A and table S1). Phylogenetic analyses indicate that these three genes are functional in the opossum (see fig. S2 for details on Lnx3). Thus, the loss of protein-coding function of Lnx3 occurred in the eutherian lineage and was concomitant with the pseudogenization of at least two of the four other XicHR genes.

Lnx3 is conserved in all vertebrate classes and is highly similar to its paralogs Lnx1 and Lnx2 (13). The exons conserved in Xist correspond to two PDZ motifs, and both contain frameshift mutations (Fig. 3). By screening databases of expressed sequence tags, we found that in both chicken and Xenopus, Lnx3 is transcribed in varied tissues and developmental stages. In the opossum, Lnx3 is expressed both in males and females, a behavior very different from that of Xist in eutherians. In mice, although Xist exon h4/m4 (which is homologous to Lnx3) is dispensable for X inactivation (14), the exon has been shown to affect the transcription and/or processing of Xist RNA (14). This suggests that Xist might have retained some regulatory elements of the Lnx3 transcription unit.

Our results show that two exons of Xist derive from Lnx3. However, Lnx3 and Xist contain, respectively, 11 and 6 other exons, for which we failed to detect significant similarities. Notably, we did not detect homology to the Xist A-repeat, which is a discrete sequence element implicated in X-silencing function (15). This lack may be because RNA genes and protein genes are subject to very different selective constraints and may rapidly diverge. It is also possible that the first exons of Xist are not homologous to Lnx3 and derive from the insertion of a sequence (e.g., a transposable element) that was recruited to form a proto-Xist gene. We analyzed the opossum genomic interval between Rasl11c and Lnx3 to search for hallmarks of a potential proto-Xist gene, but failed to detect any significant similarity with Xist, even using the most accurate alignment software (9). Given that Xist exons are highly conserved among eutherians, the lack of similarity with the opossum strongly suggests that marsupials do not contain any proto-Xist gene at this locus and, hence, that Xist is specific of eutherians.

The mechanisms of dosage compensation in marsupials and eutherians both involve chromosomewide X inactivation (XCI), but with some significant differences. In marsupials, it is always the paternal X-chromosome that is inactivated, and the inactivation, which is incomplete and tissue-specific, does not seem to involve DNA methylation (4). Our results, moreover, indicate that in marsupials, XCI does not involve Xist. In monotremes, the XicHR has been translocated to an autosome, which indicates that dosage compensation does not require this locus (6). There is, therefore, no evidence that the processes of dosage compensation in eutherian, marsupial, and monotremes are homologous. It is possible that Xist-independent XCI existed in the mammalian cenancestor and that Xist overtook this mechanism in eutherians. However, it should be stressed that in the earliest stages of the divergence of the X and Y chromosomes, most of the X-linked genes still had an active Y homolog and so did not need dosage compensation. It is only after the Y chromosome had lost a large number of genes that it might become advantageous to achieve dosage compensation by inactivation of the whole X chromosome. We, therefore, propose that the emergence of XCI might be a late event in the evolution of sexual chromosomes.

Supporting Online Material

Materials and Methods

SOM Text

Figs. S1 and S2

Tables S1 and S2

References and Notes

References and Notes

View Abstract

Navigate This Article