Report

Construction and Analysis of a Human-Chimpanzee Comparative Clone Map

See allHide authors and affiliations

Science  04 Jan 2002:
Vol. 295, Issue 5552, pp. 131-134
DOI: 10.1126/science.1065199

Abstract

The recently released human genome sequences provide us with reference data to conduct comparative genomic research on primates, which will be important to understand what genetic information makes us human. Here we present a first-generation human-chimpanzee comparative genome map and its initial analysis. The map was constructed through paired alignment of 77,461 chimpanzee bacterial artificial chromosome end sequences with publicly available human genome sequences. We detected candidate positions, including two clusters on human chromosome 21 that suggest large, nonrandom regions of difference between the two genomes.

With the draft human genome sequences (1, 2), comparative genomics has become a powerful approach to extract genetic information from large stretches of nucleotide sequences through identification of conserved regions that are most likely functionally important. Genomic information is also the most valuable resource for understanding the genetic differences between species, a basis for deciphering how genome information is processed into phenotypes. Because chimpanzees are our closest relatives, the differences between us are less than with any other species, yet these differences are more likely to be important. It has been estimated that the sequence identity between human and chimpanzee is within the range of 98 to 99% (3–12). Thus, comparisons between humans and chimpanzees are the most efficient and effective approach to understand what makes us human.

In this report we present the construction and analysis of a first-generation human-chimpanzee comparative genomic map based on the alignments of 77,461 chimpanzee bacterial artificial chromosome (BAC) end sequences (BESs) to human genomic sequences obtained from the public databases. To prepare the BESs, we used two independently prepared BAC libraries, PTB1 and RPCI-43 (Table 1) (13). Briefly, we sequenced 64,116 BAC clones (roughly 3.3 times coverage of the currently available human contiguous genomic sequence) that produced 114,421 valid BESs (13). The BESs were then aligned with the RefSeq human genome contigs [National Center for Biotechnology Information (NCBI)] through NCBI-BLAST. The number of BESs having an alignment longer than 50 base pairs (bp) with ≧90% identity was 77,461 (13). Out of this number, 49,160 BESs from 24,580 clones formed paired ends where each pair was derived from the same clone. Only one end could be successfully aligned from the remaining 28,301 clones. The remaining 36,960 BESs that were not mapped to the human genome were categorized into three different classes: (i) those corresponding to repeat sequences (1168 BESs) or showing hits to human sequences not included in the NT contigs (20,376 BESs), (ii) those matched only with sequences from several species other than human (515 BESs), and (iii) the 14,901 BESs that did not match with human sequences, which either correspond to unsequenced human regions or are from chimpanzee regions that have diverged substantially from humans or did not match for other unknown reasons.

Table 1

Summary of BES readings and mapping.

View this table:

The BESs mapped with high confidence (13) were used to calculate the difference between the chimpanzee and human genomes at the nucleotide level. The number of sites in valid alignments (nucleotide sites that have PHRED quality values q ≧ 30) was 19,813,086. Out of this number, 19,568,394 sites were identical to their human counterparts for a mean percent identity of 98.77. This value is consistent with previous observations (3,9, 10); however, our calculation comes from a much larger random comparison of slightly less than 1% of the total genome. The distribution of the percent identity of BESs (q ≧ 30) that have ≧ 100 sites per bin is depicted in Fig. 1. The distribution was skewed to 100% identity instead of the normal distribution, and hence the mode of the difference is at around 0.8%. Although most of the BESs have higher identity, we also found the existence of many low-identity BESs in the genome.

Figure 1

Distribution of identity scores of the chimpanzee-BES alignments to human genome sequences. Frequencies of BESs with percent identity scores from 94 to 100% are shown.

To construct a human-chimpanzee comparative map, we aligned 49,160 paired BESs (practically equivalent to BAC clones) from 24,580 chimpanzee BACs to the RefSeq human genome contigs (build 24, July 2001) constructed by NCBI (13). In addition, 28,301 singleton BAC ends were mapped onto the human genome, even though sequences of the opposite ends were either repeats, low-similarity hits, or not found in the current database (Table 1). The exact positions of these clones will become clearer through the progress of the human genome sequencing and/or the sequencing of the corresponding chimpanzee clones in the near future. The entire map showing the relative positions of the chimpanzee BAC clones (with links to the corresponding human BAC sequences) along the human genome contigs (NT contigs, NCBI) and LocusLink (NCBI) information and a comprehensive cross reference table that includes corresponding positions in the human genome and links to the human BAC sequences can be freely accessed through the Web (14).

We found that 48.6% of the whole human genome was covered by the chimpanzee BACs (Table 2). One of the reasons for this apparently low coverage is that we used rather stringent conditions for the calculation; that is, BAC clones were incorporated into the calculation only when they had two sequenced ends in the same NT contig with the correct orientation. Probably because the orientation of draft sequences within the NT contig is sometimes incorrect, 70% of the total paired ends fit the condition. The coverage for chromosomes 14, 20, 21, and 22 was substantially higher. This difference correlates closely with the quality of the human genome sequences used as reference where finished chromosomes and those with longer contigs display higher BAC coverage. We also tested 1 Mb of the human draft sequences corresponding to positions 8012178 to 8426236 and 18502342 to 19012063 of chromosome 21, which we were able to retrieve from the public portion of the Celera database; however, we observed no substantial differences between the mapping results obtained through the public database and that of Celera. Theoretically speaking, the probability of coverage assuming totally random nucleotide distribution was calculated at around 0.7 (15); thus, we concluded that the actual coverage of about 70% for these essentially finished chromosomes is reasonable. Relatively lower coverage of chromosome X (about half of the other unfinished chromosomes) can be explained by the haploid status of the chromosome in the chimpanzee BAC libraries. In contrast, we can only speculate as to why the Y chromosome coverage is so much lower (4.8%) as compared with the other chromosomes (Table 2). One possibility is that the pseudoautosomal region of the Y chromosome confounded our matching algorithm. Another hypothesis may be that the human Y sequences in the DNA databank are so different that the BESs in this study could not find many valid matches. We will have to wait until sequencing of the human Y chromosome is finished to answer some of these questions. Alternatively, we will also need an independent approach, such as the construction and analysis of chimpanzee Y chromosome–specific libraries prepared from sorted chromosomes, generating a complementary tool to help fully resolve the comparative mapping for the Y chromosome.

Table 2

Estimated coverage of chromosomes by the chimpanzee BACs. The RefSeq human genome contigs (NT sequences) of NCBI, build 24 (July 2001, ftp://ncbi.nlm.nih.gov/genomes/H_sapiens) were used for the calculation.

View this table:

Because the genomic sequence of human chromosome 21 is finished to a high degree (16), we analyzed the chimpanzee-human relationship of this chromosome by combining the BES mapping information with a sequence-tagged site (STS)–based approach (17). We identified 18 STSs that amplified products from human DNA but not from that of chimpanzee (shown as circles inFig. 2). Because we used genomic DNA isolated from three chimpanzee individuals, two males and one female, the effects of relatively larger polymorphisms among the chimpanzee genomes should be minimized. These 18 primer sets, together with the flanking STSs, were further tested with other primates including gorilla. Out of these, amplification products appeared exclusively in humans from seven primer pairs (filled green circles in Fig. 2 whose positions in human chr21 are about 7.2, 8.5, 10.0, 11.6, 11.8, 18.1, and 29.3 Mb from the centromeric end, respectively) (16,17), suggesting that these loci might correspond to insertions that are specific to the human lineage. Nonhuman primate specific deletions cannot be ruled out but seem less likely because this deletion would have had to occur in all primates but humans. The remaining 11 primer pairs fail to amplify any products from chimpanzee DNAs but showed positive signals in some of the other primates, suggesting the existence of deletions or mutation sites at those positions in the chimpanzee genome. Although one cannot exclude the possibility that these findings are mere reflection of the statistical variation, the region around these sites remains to be the primary target of further investigations in the future. Simple extrapolation of these results suggests that there might be more than several hundred such sites in the entire chimpanzee or human genomes, again including statistical variations.

Figure 2

Distribution of chimpanzee BESs on human chromosome 21 and detection of human chromosome 21–specific STSs in the chimpanzee genome. BAC clones based on the BES alignment are positioned on the human chromosome 21 sequence and shown as red bars (the shortest bar denotes 40 kb). Positions of human STSs that were not detected in the chimpanzee genome are shown in the middle bar with open circles. Filled circles designate STSs that were detected only in human DNA (see text and notes for details). Positions of clone gaps in the human chromosome 21 sequence are also shown as vertical arrowheads.

To identify the boundaries of possible genomic rearrangements, we then searched for candidate clones containing chromosomal breakpoints by inspecting the mapping results. If a clone contains one of the breakpoints of a large genomic rearrangement, the paired BESs of the clone should map to sites separated by an interval longer than expected or the experimentally determined insert size (13). We identified one such example, clone PTB-053J22. One of the BESs and 62% of the finished sequence of this clone matched (>99%) to the 12q15 human BAC clone (AC005294; clone GSHB-410F4 contained in NT_029419.1), and the other BES and 37% of the sequence matched (>98%) to the 12p12 BAC (AC011604; clone RP11-80N2 contained in NT_009700.5). Sequences of these three clones are in finished status and the location of the human clones has been confirmed by both electron–polymerase chain reaction (e-PCR) and fluorescent in situ hybridization (FISH) analyses (http://ncbi.nlm.nih.gov/genome/clone). In addition, PTB-053J22 is fully included in another independently sequenced chimpanzee BAC clone (AC007214, RP43-135M19). Thus, it is highly likely that PTB-053J22, which we detected through this study, contains one of the breakpoints corresponding to the human (or vice versa in chimpanzee) chromosomal inversion between 12p12 and 12q15. In addition, this region in human chromosome 12 or in chimpanzee chromosome 10 is known to be inverted in gorilla and chimpanzee as opposed to human and orangutan (8, 18). We found several genes around the PTB-053J22 BESs in the following order, SCL21A14 (solute carrier, organic anion transporter, family 21, member 14), <36 kb>, PTB-053J22-F, <5 kb>, SLC21A8 (solute carrier, organic anion transporter, family 21, member 8, <cen>, DYRK2 (dual-specificity tyrosine-Y-phosphorylation regulated kinase 2), <330 kb>, PTB-053J22-R, <168 kb>, and IFNG (interferon-γ), based on the annotations on the corresponding NT contigs. The effect of the inversion on these genes should be the target of future studies.

To independently test our mapping procedure, we selected 15 chimpanzee BAC clones mapped to human chromosomes 1 to 8 by the BES alignment procedure (13) and subjected them to FISH analysis with both human and chimpanzee M-phase cell spreads (Fig. 3). As shown, 13 clones showed single locus signals at the corresponding positions on both human and chimpanzee chromosomes, and two clones, RP43-50L24 and RP43-60K09, showed similar signals at two loci on the human and chimpanzee chromosomes, suggesting that the mapping procedure we used in this study is reliable. We believe that the whole genome chimpanzee/human comparative map built here by the BES alignment procedure is reasonably accurate and useful for future studies. Recent development of the human-mouse comparative map (19,20) also supports our approach.

Figure 3

FISH analysis of selected chimpanzee BAC clones against human and chimpanzee chromosomes. (A, D, and G) Clone names and positions (in parentheses) on human chromosomes determined by the BES analysis (13) [FISH data found in the clone server (www.ncbi.nlm.nih.gov/genome/clone/) were also used]. (B, E, and H) FISH mapping onto human chromosomes. (C, F, andI) FISH mapping onto chimpanzee chromosomes. Chromosome numbers in the parentheses (expressed in Roman letters) are expressed as phylogenetic chromosome number (X = 10) (18). Arrowheads designate the positions of hybridization signals. Magnification of each panel is variable.

Users of this map should still be careful in applying the information because the possibility remains that assignment of particular clones in the NT contig is incorrect or that inter- or intrachromosomally duplicated regions may be included within an insert. However, the quality of the map, and thus its usefulness, should increasingly improve as the finishing of the human genome sequence proceeds.

  • * These authors form the International Consortium for the Sequencing of Chimpanzee Chromosome 22.

  • To whom correspondence should be addressed. E-mail: afujiyam{at}gsc.riken.go.jp, watanabe{at}gsc.riken.go.jp, hattori{at}gsc.riken.go.jp, sakaki{at}gsc.riken.go.jp

REFERENCES AND NOTES

View Abstract

Navigate This Article