Report

Birth of Two Chimeric Genes in the Hominidae Lineage

Science  16 Feb 2001:
Vol. 291, Issue 5507, pp. 1293-1297
DOI: 10.1126/science.1057284

Abstract

How genes with newly characterized functions originate remains a fundamental question. PMCHL1 and PMCHL2, two chimeric genes derived from the melanin-concentrating hormone (MCH) gene, offer an opportunity to examine such an issue in the human lineage. Detailed structural, expression, and phylogenetic analysis showed that the PMCHL1 gene was created near 25 million years ago (Ma) by a complex mechanism of exon shuffling through retrotransposition of an antisense MCH messenger RNA coupled to de novo creation of splice sites. PMCHL2 arose 5 to 10 Ma by an event of duplication involving a large chromosomal region encompassing the PMCHL1 locus. The RNA expression patterns of those chimeric genes suggest that they have been submitted to strong regulatory constraints during primate evolution.

Processes of exon shuffling, retrotransposition, and gene duplication have been suggested to lead to creation of newly found genes with specific expression characteristics and to fixation of advantageous novelties by acquisition of functional constraints (1, 2). However, because of the rapid sequence divergence characteristic of previously unknown genes, the study of the origin of a gene in detail requires the discovery of a young gene, and in particular one that has retained important features of its early stages (3, 4). Because of their recent history, two human chimeric genes, PMCHL1 andPMCHL2, open an unprecedented way to analyze the molecular mechanisms of gene remodeling and selection of functions that have operated during the late stages of primate evolution.

The PMCHL genes were named pro-MCH–like 1 and 2 genes (PMCHL1 and PMCHL2) on the basis of partial identity to the MCH gene (5). The humanMCH gene maps on chromosome 12q23 and encodes a neuropeptide precursor, whereas PMCHL1 and PMCHL2 are located onto human chromosome 5p14 and 5q13, respectively, and correspond to 5′-end truncated versions of the MCH gene (6). In previous studies, we revealed that thePMCHL genes arose recently during primate evolution by a first event of truncation/transposition from the ancestral chromosome 12 to the ancestral chromosome 5p about 25 to 30 Ma, i.e., before divergence of the Cercopithecoidae. This was followed by a second duplication event, which operated in the Hominidaelineage about 5 to 10 Ma and which distributed the two genes on each side of the chromosome 5 centromere (7). Both unspliced sense and antisense transcripts from the PMCHL1gene but not the PMCHL2 gene have been observed in different areas of the developing human brain (8, 9). A puzzling issue concerns the relation between their recent emergence and their putative function or, more precisely, whether thePMCHL genes are functional genes not previously characterized or inactive pseudogenes. This made it crucial to further study the structure, expression, and early molecular evolution of thePMCHL genes.

The focus on the molecular mechanisms responsible for the emergence ofMCH-derived sequences onto human chromosome 5 had first come from parallel studies on the regulation of MCH gene expression undertaken in our laboratory. Recently in human and rodents, we showed two classes of antisense RNAs complementary to theMCH gene (10): (i) spliced-variant mRNAs complementary in their 3′ end to the MCH gene, encoding newly found DNA/RNA binding proteins, and (ii) short noncoding unspliced RNAs that overlap only the coding part of the MCHgene (MCH exons II and III) and initiate at cap site CS3-5 (Fig. 1A). This transcriptional unit was named AROM for antisense-RNA-overlapping-MCH gene (10). Concurrently, our analysis of the structure of thePMCHL genes revealed the presence of a stretch of A at the end of the MCH-derived portion that exactly coincidates with one of the polyadenylation [poly(A)] sites found within theAROM gene, polyA(b) (Fig. 1A). This led to the conclusion that a MCH-derived sequence likely was inserted in the ancestral chromosome 5p by an event of retrotransposition of anAROM messenger RNA, incidentally strongly expressed in testis (10), as depicted in Fig. 1B.

Figure 1

(A) Extent of the homology between the MCH/AROM locus on 12q24 and the PMCHLloci on 5p14/5q13. The MCH/AROM and PMCHLexon structure given here are based on Borsu et al.(10) and Viale et al. (9), respectively. MCH- and PMCHL-derived exons are marked with roman numerals, and AROM exons are in arabic numerals. Dotted lines define the limits of the 12q24 sequence which was retrotransposed onto chromosome 5 during primate evolution. The position of the region of homology and exon-intron nomenclature are as previously described (9). Inverted black triangles correspond to AROM polyadenylation sites [poly A (a, b, or c)]. Arrows (CS1 and CS2) and the thick black line (CS3-5) represent the AROM cap sites (CS) (10). Percent homology between the MCH/AROM and PMCHLloci are also indicated. AAAA illustrates the poly(A) tail found to the end of the retrotransposed sequence: (A)11 on 5p14 and (A)14 on 5q13. GenBank accession numbers are as follows: PMCHL1, AF238382; PMCHL2, AF238383;MCH, M57703; and AROM, AF303035. (B) Proposed model for the emergence of MCH-derived sequence onto chromosome 5p. (a) An AROM mRNA initiating in the CS3-5 region and ending at poly A (b) polyadenylation site was retrotransposed onto the equivalent of chromosome 5p at the time ofCatarrhini divergence 25 to 30 Ma. (b) After this first event or concurrent to it, an Alu sequence was inserted in intron A and a fragment corresponding to the 3′ end of the retrotransposed mRNA (part of exon II-intron A-Alu) was broken and transposed to the downstream insertion site. This led to the PMCHL gene versions observed in Cercopithecoidea andHominoidea.

By combining “in silico” (through computer modeling) screening [BLAST search of GenBank against many databases in the Web site of the National Center for Biotechnology Information of the National Institutes of Health (11)] and direct sequencing of bacterial artificial chromosome (BAC) clones specific to the chromosomal regions 5p14 and 5q13 (12), the genomic structure of the PMCHL genes was further compared. According to the Web survey, several expressed sequence tags (ESTs) were found in two categories: (i) 3′ cDNA clone IMAGE ah92f11.s1 and qf54b04.x1, which are parts of PMCHL1 spliced sense transcripts and (ii) 3′ cDNA clone IMAGE qf66aO4.x1, al54h4.s1, and al47h07.s1, corresponding to parts of PMCHL2unspliced antisense transcripts and indicating that the regulation of the expression of the PMCHL genes was far more complex than previously thought (9). Structural analysis of those genes was refined by using rapid amplification of cDNA ends and polymerase chain reaction (RACE-PCR) and reverse transcriptase–PCR (RT-PCR) (13) in conjunction with the genomic analysis.

As shown in Fig. 2A, we revealedPMCHL1/PMCHL2 gene expression in human testis and established the precise 5′ and 3′ ends of the sense and antisensePMCHL1 RNA unspliced products previously described in different areas of the human brain (9). We also found in human fetal brain and in human adult testis several classes of alternative spliced mRNAs (Fig. 2B). This suggested that on both loci,MCH-derived, retrotransposed sequences recruited a group of downstream exons and introns into their transcription units thereby creating previously unknown genes with a chimeric structure. The existence of such an impressive variety of PMCHL1 andPMCHL2 transcripts resulted from the use of four polyadenylation sites (A1-A4) and a tissue-specific modulation of alternative splicing (Fig. 2B). Several cap sites were also found on the basis of RACE-PCR experiments. PMCHL2 cap sites were mainly located from 500 base pairs (bp) to more than 2 kb upstream to the insertion site, whereas PMCHL1 cap sites were found 500-bp upstream as well as 50- to 100-bp downstream to the insertion site. However, because of the complex population of mRNAs in all the tissues analyzed, it was not possible to assign a precise cap site to each class of mRNAs. Even though we cannot exclude artifactual pausing of the reverse transcriptase during synthesis of the cDNA products, this suggests that alternative splicing coupled to different starting points of transcription is probably a mechanism that allows the cell to generate a “wide repertoire” of PMCHL genes transcripts.

Figure 2

Schematic representation of the PMCHLtranscripts and potentially functional ORFs. (A) Sense and antisense unspliced mRNA products. (B) Alternatively spliced transcripts. Dotted lines delineate the chromosome 12 recruited region. The genomic organization of the transcription units is indicated in each case. The exons are boxed in gray and numbered in arabic numerals. Exon x′ illustrates alternative 3′ splice donor sites. 4′t and 4′b are tissue-specific 3′ splice donor sites; T is for testis and FB is for fetal brain. White stripes at the 5′ end of the RNAs indicate that a unique precise cap site was not assigned to these populations of mRNAs. The gene- and tissue-specificities of expression are indicated for each class of RNA: 5p and 5q are for PMCHL1 and PMCHL2transcription units, respectively. Polyadenylation sites are represented by small dark bars (A1-A4, As1, As2). Canonical polyadenylation signals AATAAA were found a few bases upstream to the sites of poly(A) addition (A1, As1 and As2). Putative polyadenylation signals ATTAAA were also found to be located 29 and 17 bases 5′ to the A2 and A3 sites of poly(A) addition, respectively, and a GATAAA signal was found 40 bases 5′ to the A4 site. Although nonconventional, ATTAAA and GATAAA have been previously noted to serve as polyadenylation signal sequences (5, 22,28). Black lines indicate the extent of the potentially functional ORFs. Upper black lines are ORFs specific of thePMCHL1 transcripts (5p locus) and down below black lines are ORFs specific of the PMCHL2 transcripts (5q locus). The translation of DNA sequences to protein sequences was conducted in the Web site of NCBI of the NIH (www.ncbi.nlm.nih.gov/).

The longest open reading frames (ORFs) initiated from an ATG codon in a reasonable translation initiation context (14) were deduced from the mRNA sequences obtained by RACE-PCR and RT-PCR. Two major classes of ORFs (≥33 amino acids) were found regardless of the alternative splicing pattern (bracketed in Fig. 2): (i) ORFs encoded by exon 1 and intron A (unspliced RNAs) and exon 1/exon 2/exon 2′ (spliced RNAs) exhibit a strong similarity with pro-MCH, and (ii) ORFs encoded by exons 4 to 5a and 5b display no sequence similarity with known proteins. No ORF of large length could be found for antisense RNAs. We previously demonstrated that sense unspliced PMCHL1transcripts may produce a nuclear localization signal (NLS)–containing protein deduced from ORF 1 sequence (Fig. 2A) in an in vitro translation assay and in transfected Cos cells (9). Direct proofs of the translational ability of the spliced mRNA products described here are still lacking. However, that both PMCHL1 and PMCHL2 are specifically and differentially regulated in testis and that only PMCHL1 is expressed in human fetal as well as newborn and adult brains (9) (Fig. 2) is consistent with the conclusion that those newly originated genetic elements are transcriptionally active and tightly regulated genes.

To determine whether the divergent expression patterns ofPMCHL1 and PMCHL2 could be explained by a different genomic environment in the flanking regions, we expanded our comparative analysis of the genomic structure of the PMCHLgenes. The nucleotide sequence of the PMCHL1 andPMCHL2 genomic regions over 17-kb revealed similar genomic environments with strong sequence identity (98%) between the 5p14 and 5q13 loci. To further delineate the extent of the region duplicated on both arms of chromosome 5, we performed fluorescent in situ hybridization (FISH) analysis on human metaphase chromosomes with several BAC clones bearing the PMCHL1 locus and extending more than 100-kb both 5′ and 3′ to this gene (namely 2303C18, 344I3, 283L20, and 811M22) (Fig. 3A). All those clones displayed the same hybridization patterns with strong cross-hybridization on both arms of the human chromosome 5 at bands 5p14 and 5q13 (Fig. 3B). This showed that the event of duplication that took place 5 to 10 Ma involved a large region of ancestral 5p14 encompassing several hundreds of kilobases. However, further studies are required to delineate the particular environment of cis-regulatory elements driving the striking tissue-specific expression of bothPMCHL genes.

Figure 3

(A) Genomic structural organization of the PMCHL genes. 15.4-kb of genomic sequence from both PMCHL loci was obtained by direct sequencing (both forward and reverse strands) of the 5p14-specific 283L20 and the 5q13-specific 484D2 BAC clones bearing the PMCHL1 andPMCHL2 loci, respectively (12). Dashed line represents the 1.6-kb unsequenced part of intron C. Arrows indicate BAC clone ends (not drawn to scale), and the lines represent the extent of the clones. Their localization and orientation were determined by in silico screening (www.tigr.org/). BAC clones in red were used for in situ hybridization analysis on metaphase chromosomes. All the clones described in this study come from the CIT-HSP BAC library. Blue boxes correspond to interspersed repeated sequences (same orientation thatPMCHL genes, light blue; opposite orientation, dark blue). a, LINE/L2; b, SINE/MIR-LINE/L2; c, SINE/Alu; d, LTR/THE-1B; e, SINE/Alu; f, MER91A; g, SINE/MIR; h, LINE/L1MA8; i, SINE/MIR-LINE/L1M1; j, LTR/ERVL-LINE/L1MA9; k, SINE/Alu; and l, LTR/MLT1E2. GenBank accession numbers are as follows:PMCHL1, AY08405 and PMCHL2, AY08406 (29). (B) FISH on human chromosomes with the chromosome 5p–specific BAC clone 283L20 (left) and 811M22 (right). (C) FISH of the same mouse metaphase with the chromosome 5p–specific BAC clone 811M22 (left) and a whole-chromosome painting (WCP) probe for mouse chromosome MMU15 (right). FISH was performed as previously described (15) on metaphase chromosomes from human peripheral blood lymphocytes and from mouse SV22-CD cell line. Fluorescent images were captured using a high-resolution cooled charge-coupled device (CCD) camera C4880 (Hamamatsu). Image acquisition, processing, and analysis were performed using the Vysis software package (Quips SmartCapture FISH).

As we suggested above, the source of the 5′ exons was identified as a retrotransposed sequence originated from the MCH/AROM locus. However, the origin of 3′ exons remained unclear. We examined several hypotheses concerning the origin of these non–MCH-derivedPMCHL exons: (i) these exons might be part or duplicate of an unrelated previously existing gene, supporting the concept of exon shuffling or, alternatively, (ii) these exons might originate from a unique genomic sequence that fortuitously evolved as a standard intron-exon structure and regulatory sequences for PMCHL.

To study the early molecular evolution of the PMCHLtranscription units, we first performed a FISH analysis (15) on mouse metaphase chromosomes with BAC clones surrounding the area of insertion of the MCH-derived sequences (namely 2303C18, 344I3, and 811M22) (Fig. 3A). Only the 811M22 BAC clone, bearing the 3′ PMCHL exons but not the 5′-transposed portion of the gene, displayed a clear unique hybridization signal. This signal was found onto the pericentromeric region of the mouse chromosome 15 (Fig. 3C). After comparing this result with the mapping data found in the “Mendelian Inheritance of Man gene map” and “mouse to human homology region map” databases (16), we propose that the transposed MCHsequence was inserted in a region close to the site of evolutionary rearrangement that disrupted the conserved synteny relationship with the mouse Mus musculus genome from MMU13 to MMU15. Furthermore, probes bearing the 3′ exons did not reveal cross-hybridization signal on mouse and primates [this study, (7)] and these exonic sequences did not display any similarity to any sequence of the GenBank database except the IMAGE cDNA clones previously cited. This ruled out the hypothesis that the 3′ exons might be a duplicate of an unrelated previously existing gene. However, this does not exclude that the retrotransposed sequence may have been inserted in a pre-existing gene on 5p.

To test this alternative, the phylogeny of the PMCHLintronic and exonic sequences was analyzed. We attempted to amplify the corresponding region from DNA samples from nine species of primates and from mouse by using the set of primers used to amplify intronic and exonic sequences of human genomic DNA (17). Several PCR products of the same size as those obtained from human DNA were obtained from seven primate species [Pan troglodytes (PTR),Pan paniscus (PAN), Pongo pygmaeus (PPY),Hylobates lar (HLA), Cercopithecus hamlyni (CHA),Papio papio (PAP), Cebus capucinus (CCA)]. All of the amplified products obtained from anthropoids were sequenced and compared with the human DNA sequence.

The comparative phylogenetic analysis of the PMCHLintron-exon boundaries (Fig. 4) revealed that consensus sequences at the 5′ donor splice site and in the 3′ acceptor splice site of the PMCHL1 intron A (intron Bv, Fig. 1A) were conserved in all the primates, suggesting existence of a functional constraint. Similarly, strong conservation of sequences was noted at the intron B and C boundaries. In contrast, a splice donor site in intron D was created in Cercopithecoidae (CHA) as a result of a C to T substitution at nucleotide +2. Alternative splice acceptor sites for exon 5a and exon 5b were also created by nucleotide substitution, GA to AG in Hylobatidae (HLA) and G to A at nucleotide +1 in Cercopithecoidae (PAP and CHA), respectively. Furthermore, poly(A) signals PS2 and PS3 corresponding to the poly(A) addition sites A2 and A3 (Fig. 2B) were also found to be the sites of mutations. Interestingly, a C nucleotide was found at nucleotide +3 of PS2 in CHA and PAP but not CCA, suggesting that this mutation arose specifically in the Cercopithecidae (Fig. 4). Furthermore, HLA possess the same ATTAAA sequences as the ones found in human, whereas CCA, PAP, and CHA have GA and TC at nucleotides +3 and +4 in PS3 (Fig. 4). Therefore, these results are consistent with the hypothesis that the 3′ part of the PMCHL transcription unit evolved from noncoding DNA in a common ancestor of Hominoids as a result of the creation of standard intron-exon boundaries and poly(A) signals that have been conserved in humans.

Figure 4

Phylogenetic analysis of the intron-exon boundaries and poly(A) signals of the PMCHL gene. Exonic nucleotidic sequences are in uppercase letters, and intronic nucleotidic sequences are in lowercase. The most extended consensus sequences at the 5′ splice donor site and 3′ splice acceptor site are indicated. The nearly invariant dinucleotides GT/AG at the extreme 5′ (donor) and 3′ (acceptor) ends of the introns are in bold characters. Dashes indicate identity to the human sequence. Sequence differences at the consensus sites are in gray. Sequences are arranged according to the evolutionary lineage. Intron C does not possess a canonical functional 5′ donor end; it has TT instead of GT dinucleotide. GenBank accession numbers are as follows: PAN sequences,AY008414, AY008423, and AY008426; PTR sequences, AY008416,AY008418, AY008424, AY008429, and AY008433; PPY sequences, AY008415,AY008419, AY008422, and AY008425; HLA sequences, AY008417, AY008420,AY008421, AY008427, and AY008432; CHA sequences, AY008428 andAY008430; CCA sequence, AY008431.

In CHA and PAP, which do not carry functional splice sites, we succeeded in amplifying only a small part of AROM/MCHretrotransposed sequence from the genomic DNA. In addition, a strong divergence of PMCHL1 sequence was noted in these species reflecting weak selective constraint (18). The similar exon structure of the PMCHL genes found in HSA, PAN, PTR, PPY, and HLA together with the divergence of sequence of the retrotransposedAROM/MCH sequences in the Cercopithecoidaeindicates that there was a relatively short time between the first insertion event and the subsequent mutation events leading to the recruitment of intronic and exonic components into a functional transcription unit and the speciation. As expected for emerging functions, the underlying genes were likely to undergo fast divergence until they gained stronger physiological constraints. This strongly suggests that the PMCHL gene was conserved inHominidae due to the acquisition of some constraints, probably an emerging role in primates.

Our results reveal the molecular, genetic, and evolutionary mechanisms that participated in the origin of two chimeric functional genesPMCHL1 and PMCHL2 in the Hominidaelineage (Fig. 5). Taken together, our data on the tissue-specific expression and the conserved features of thePMCHL genes suggest that their mRNA or protein have been “exapted” into a functional role [i.e., co-opted into a variant or newly characterized function (19)] in the primate lineage. The identification of the many processes in genome evolution have shown that de novo generation of building blocks—single genes or gene segments coding for protein domains—seems to be rare. Instead, genome novelty was mainly built by modification, duplication, and functional changes of the available blocks by processes of gene duplication, exon shuffling, or retrotransposition of genes (3, 20–24). In the context of human genome evolution, the previously unknown mechanism of transcript fusion of the adjacent Kua and Uev genes was recently proposed to create a chimeric Kua-Uev mRNA and the cognate fused protein (25, 26). However, in the case we described the recruited portion fused to theAROM/MCH-derived sequences was shown to have originated from a unique noncoding sequence. Moreover, the complex structure and evolutionary history ofPMCHL encompass several phenomena pointing to an important role for introns in the origin of newly characterized genes, as the exon theory of gene has suggested (27): (i) emergence of the 5′ exons by an event of duplication of a 5′-end truncated part of the MCH gene via a process of retrotransposition of an antisense MCH mRNA; (ii) creation of 3′ exons from a unique noncoding genomic sequence that fortuitously evolved as a standard intron-exon structure and polyadenylation signal sequences; (iii) alternative transcriptional initiation and splicing processes, further complicated by the presence of antisense RNAs; and (iv) a nested gene encoding unspliced mRNAs products. In the context of genome research, the existence of such gene structures poses a particular dilemna in the perspectives of prediction of exons from genome sequence data. In fact, the complex gene structure of thePMCHL loci, as described here, was not predicted from the genome sequence and exon prediction programs (GRAIL, Fex, Hexon, MZEF, Genemark, Genefinder, Fgene, Polyah).

Figure 5

Proposed model for the emergence of the chimeric PMCHL1 and PMCHL2 genes during primate evolution. A MCH-derived sequence has originated onto chromosome 5p by a complex event of retrotransposition (detailed inFig. 1B) at the time of Catarrhini divergence 25 to 30 Ma. Intron-exon boundaries and poly(A) signals were created by subsequent mutation processes before the divergence of Hylobatidae, 15 to 20 Ma, leading to the chimeric gene structure observed in theHominoidae. A last event of duplication involving a large region of ancestral 5p14 encompassing several hundreds of kb has led to the distribution of PMCHL1 andPMCHL2 on each side of the chromosome 5 centromere. This operated in the Hominidae lineage, about 5 to 10 Ma. Exons based on mRNA characterized in human are boxed in gray or white and marked with arabic numerals. The brackets indicate consensus alternative splice acceptor site for exons 4 and 5b. Polyadenylation sites are represented by small dark bars. Arabic numerals in gray indicate the location of unique noncoding sequences that gave rise to exons. Dashed lines indicates that the MCH-derived sequence was absent inPlatyrrhini.

  • * To whom correspondence should be addressed. E-mail: nahonjl{at}ipmc.cnrs.fr

REFERENCES AND NOTES

View Abstract

Related Content

Subjects

Navigate This Article