Research Article

A Fine-Scale Chimpanzee Genetic Map from Population Sequencing

See allHide authors and affiliations

Science  13 Apr 2012:
Vol. 336, Issue 6078, pp. 193-198
DOI: 10.1126/science.1216872

Going Ape Over Genetic Maps

Recombination is an important process in generating diversity and producing selectively advantageous genetic combinations. Thus, changes in recombination hotspots may influence speciation. To investigate the variation in recombination processes in humans and their closest existing relatives, Auton et al. (p. 193, published online 15 March) prepared a fine-scale genetic map of the Western chimpanzee and compared it with that of humans. While rates of recombination are comparable between humans and chimpanzees, the locations and genetic motifs associated with recombination differ between the species.

Abstract

To study the evolution of recombination rates in apes, we developed methodology to construct a fine-scale genetic map from high-throughput sequence data from 10 Western chimpanzees, Pan troglodytes verus. Compared to the human genetic map, broad-scale recombination rates tend to be conserved, but with exceptions, particularly in regions of chromosomal rearrangements and around the site of ancestral fusion in human chromosome 2. At fine scales, chimpanzee recombination is dominated by hotspots, which show no overlap with those of humans even though rates are similarly elevated around CpG islands and decreased within genes. The hotspot-specifying protein PRDM9 shows extensive variation among Western chimpanzees, and there is little evidence that any sequence motifs are enriched in hotspots. The contrasting locations of hotspots provide a natural experiment, which demonstrates the impact of recombination on base composition.

Multiple factors are likely to influence recombination rate, from the scales of individual hotspots to entire chromosomes. Evidence as to the nature and importance of such factors can potentially be obtained by studying the evolution of recombination rates at different scales (1). For example, previous studies of localized regions suggest that recombination hotspots are typically not shared between humans and chimpanzees (26), likely due to the function of the zinc-finger protein PRDM9 (2, 7, 8), which binds motifs associated with hotspot activity (7, 9) and is highly diverged between the human and chimpanzee reference genomes (2, 10). In humans, sequence variation within the PRDM9 zinc-finger array leads to differential activity at both allelic and nonallelic cross-over hotspots (7, 11, 12), and alleles found only in individuals of African ancestry lead to population-specific hotspots in patterns of both linkage disequilibrium (LD) and admixture (13).

However, to assess whether different classes of hotspot evolve in different ways, or to study recombination rate evolution over broader scales, requires genome-wide fine-scale genetic maps, which have only been generated for humans (1316) and several distantly related model species including mice (17) and yeast (18, 19). Experimental techniques for identifying recombination events require either extensive pedigree data (15) or molecular characterization of meiotic cells (1719), which are impractical for many species of interest. Methods for estimating recombination rates from single-nucleotide polymorphism (SNP) data (20, 21) have been validated at both broad and fine scales (14, 20), but there remains a gap for species without SNP arrays (i.e., most species). Hence, we set out to develop approaches based on sequence data, which, if successful, potentially open the possibility of producing genetic maps for many species.

Constructing a fine-scale chimpanzee genetic map from population sequencing. The genomes of 10 unrelated Western chimpanzees, Pan troglodytes verus, were sequenced (average 9.1× coverage; table S1). Variants and haplotypes were inferred in a manner similar to that inferred for the 1000 Genomes Project (22, 23). Across the autosomes, we identified 5.3 million SNPs with a false-discovery rate of less than 3% (tables S2 and S3 and fig. S1). With 85% power to detect variant alleles present more than once in the sample (fig. S2) and >97% genotype accuracy (23), these data enable the construction of a high-resolution genetic map.

A major challenge in estimating genetic maps from sequence data is that erroneous, misassembled, or incorrectly genotyped genetic variants may mimic the effects of recombination. Initial maps estimated from variation data by existing methods (20) were dominated by large and artefactual increases in genetic distance (fig. S3) caused by clusters of false-positive SNP calls, often in large repeats that are systematically underrepresented in the chimpanzee reference genome (fig. S4). Most of these SNPs do not fail standard filters; hence, we developed regional filtering strategies (23). To validate the protocol and to estimate the sampling variance, we performed the same analyses on 10 human samples each from populations of European (CEU) and African (YRI) ancestry from the 1000 Genomes Project (22, 23). Genetic maps estimated for the human data sets showed strong correlations to previously generated LD-based maps, enabling us to quantify map quality (tables S4 and S5 and fig. S5) (16, 23). Hotspots estimated in the human data are concordant with previously described peaks in recombination rate (fig. S6). Moreover, we found a strong correlation between rates estimated in this study and from limited genomic regions in a larger sample of Western chimpanzees (5) (r = 0.67 at 20 kb; fig. S7). We conclude that sequencing data from only 10 individuals gives sufficient power to identify hotspots and estimate recombination rates at broad and even fine scales. For comparative analysis, we aligned genetic maps from human and chimpanzee over 2.5 Gb of synteny, 90% of the assembled genomes (fig. S8).

Broad-scale recombination rates. At the level of entire chromosomes, recombination rates were found to be very similar in humans and chimpanzees (fig. S9), with the exception of chromosome 2, discussed below. Even at the megabase scale, strong similarities emerge between human and chimpanzee rates, particularly driven by subtelomeric rate increase in both species (Fig. 1A). Yet we also found regions with substantial divergence (Fig. 1B). Notably, inverted regions showed a lower correlation in rate than noninverted regions (Fig. 1C and fig. S10), despite causing no systematic change in mean rate, indicating that chromosomal rearrangements often result in broad-scale changes in recombination rate. Change in distance to the telomere is a major significant factor (table S6; P = 4 × 10−9), with regions that move closer to the telomere increasing in rate. All except one of the inverted regions are pericentric; hence, the effect is not due to changes in proximity to the centromere.

Fig. 1

Evolution of recombination rates between humans and chimpanzees. (A) Genome-wide comparison of recombination rates for chimpanzee (red and orange) and human (light and dark blue); rates were averaged over 1-Mb windows in regions of synteny. Unless otherwise stated, human rates are from the population-averaged HapMap genetic map (16). (B) Recombination rates estimated in human (blue) and chimpanzee (red) along chromosome 21q, averaged over 2-Mb intervals; fine-scale rates are shown behind. (C) Pearson correlation coefficients at different scales, estimated between the recombination rates of chimpanzee and HapMap YRI (black), and between HapMap YRI  and ten 1000 Genomes YRI samples (green). Noninverted regions: solid lines; inverted regions: dotted lines. (D) Recombination rates in 2-Mb syntenic windows along chimpanzee chromosomes 2a and 2b (blue, red) and the corresponding syntenic region of human chromosome 2 (gray) derived from an ancient telomeric fusion. (E) Differences between chimpanzee and human recombination rates in 5-Mb syntenic windows across the genome. Regions involved in inversions are underlined.

The most notable change in broad-scale recombination rate is between the short arms of chimpanzee chromosome 2a and 2b and the orthologous regions in human chromosome 2, which originated from a telomeric fusion event in the human ancestral lineage (24) and which provides a natural experiment to explore the effect of chromosomal organization on recombination (Fig. 1D). We found that whereas the subtelomeric regions of chromosome 2a and 2b in chimpanzee show high recombination rates, the rate over the syntenic region in humans is suppressed by nearly threefold, and overall, the genetic map length of the fused chromosome is reduced by 20%. The extent to which recombination events are concentrated within the fused region is no different than in the unfused regions (fig. S11), indicating that the change in broad-scale rates was not accomplished by specifically eliminating cross-over events at hotspots.

Although less pronounced, regions within structurally conserved chromosomes can also show large changes in rate between species (Fig. 1E; 1-Mb correlation between human and chimpanzee maps in conserved regions is 0.60). Using a linear model, we found that the strongest determinant of rate divergence in noninverted regions was base composition, such that although there is a substantial correlation between GC fraction and recombination rate in humans (partial r = 0.51 at 1Mb scale, with substantial variation between chromosomes, fig. S12), the correlation is much weaker in chimpanzees (partial r = 0.11; fig. S12). One consequence is that in low-GC regions (GC fraction <35%), the recombination rate in chimpanzees is more than 50% higher than in humans.

Fine-scale recombination rates. In humans, the PRDM9-bound 13–base pair (bp) motif is clearly detected only in a minority of hotspots (25), although activity at some hotspots with no clear match is PRDM9-dependent (7, 11). Nevertheless, there could exist different classes of hotspot in humans, some of which are PRDM9-independent and hence potentially shared between species. However, we found no evidence of sharing of recombination hotspots between species (Fig. 2, A and B, and fig. S13), even for human hotspots with no match to the PRDM9 motif (fig. S13).

Fig. 2

(A) Recombination rates around hotspots identified in chimpanzee (red) at syntenic regions in CEU (green), YRI (blue), and HapMap (black). (B) As for (A) but around sites identified as recombination hotspots in 10 YRI; see also fig. S6. (C) The concentration of recombination rate in fine-scale genetic maps estimated from the chimpanzee and equivalent data from human populations of European (CEU) and African (YRI) ancestry (23). The higher degree of concentration seen in European relative to African populations likely reflects the lower diversity of PRDM9 alleles in the European population (11).

Despite the absence of hotspot sharing, the landscape of recombination in the chimpanzee population is dominated by recombination hotspots to a similar extent as in African populations (Fig. 2C; though European populations show greater concentration of recombination). Moreover, the average fine-scale recombination rate profiles around genes and CpG islands are similar between species. Recombination increases on average by about 20% around transcription start and end sites and decreases on average by about 30% within the transcribed region (Fig. 3A). Such concordance suggests that features affecting chromatin state—for example, nucleosome occupancy, which is destabilized around CpG islands and promoters (26)—may similarly shape the propensity for recombination at these sites in humans and chimpanzees (17, 19, 27). Possibly reflecting a similar effect, we found recombination to be elevated around CpG islands in both species (Fig. 3B), although the effect is stronger in chimpanzees (increase of nearly 50% in rate relative to background compared to 15% in humans). The rate elevation around promoters in humans was found to be driven by genes that have a high rate of CpG methylation in sperm, but in chimpanzees it occurs around genes with low rates of sperm CpG methylation (fig. S14).

Fig. 3

The fine-scale profile of recombination rate variation around genomic features in chimpanzees and humans. (A) Average recombination rate as a function of distance to nearest transcription start site (TSS) and transcription end site (TES) in chimpanzee (red), YRI (blue), CEU (green), and HapMap (black). (B) Average recombination rate as a function of distance to nearest CpG island; colors as for (A). Dashed lines indicate start and end of elements; estimates were smoothed using a running average with a 7.5-kb window.

Extensive structural and sequence diversity in chimpanzee PRDM9. We sequenced 48 PRDM9 alleles from Western chimpanzees, including alleles from the 10 individuals for whom genomewide data were collected. We found extensive variation in the number of zinc fingers and the identity of the DNA-contacting residues, with three common alleles of 6, 16, and 18 zinc fingers (Fig. 4A), a level of diversity greater than in human populations (Fig. 4A and fig. S15). Sequences from three Bonobo and one Eastern chimpanzee revealed a shared and hence potentially ancestral six–zinc-finger PRDM9 variant (Fig. 4A) not found in the Western samples, suggesting that Western allelic diversity may have arisen since the separation of the subspecies ~0.51 million years ago (28). Moreover, patterns of polymorphism among zinc fingers pointed to recurrent adaptive evolution of DNA-contacting residues, as seen in other mammalian species (10, 23).

Fig. 4

Sequence and structural variation in chimpanzee PRDM9 and implications for hotspot motifs. (A) Schematic representations of the zinc-finger arrays found in chimpanzee PRDM9 alleles with colors representing unique combinations of DNA-contacting amino acids within zinc fingers. Western chimpanzee alleles are labeled W1 through W11. Also shown is the putatively ancestral allele shared between Bonobo and Eastern chimpanzee (A1), and the remaining detected Eastern chimpanzee allele (E1). Tick marks indicate binding specificity to motifs indicated in (C). Allele frequencies estimated from 48 Western chimpanzees alleles. (B) Predicted binding motif for the chimpanzee reference PRDM9 allele (W6) showing positions of shared submotifs referred to in (A) and (C) and a shared set of C residues (below sequence). (C) Recombination rates around shared predicted submotifs for chimpanzee PRDM9 alleles in nonrepeat DNA (the percentage of alleles predicted to bind is indicated).

In humans, using the same number of hotspots as detected in chimpanzees, we can identify the known motifs associated with hotspot activity (fig. S16). In Western chimpanzees, computationally predicted (23, 29) DNA-binding motifs for the different PRDM9 variants showed considerable overlap of submotifs (fig. S17). However, we found no evidence for local increases in recombination rate around any of the shared submotifs (Fig. 4C) or best matches to the predicted binding targets across the genome (23).

Moreover, a systematic analysis of repeat-element families showed no overall correlation in recombination-localizing activity between humans and chimpanzees (Fig. 5A). The strongest activating repeats in humans (LTR49, THE1A, and THE1B), which all contain the human PRDM9 A–allele 13-bp binding motif CCTCCCTNNCCAC, suppress recombination in chimpanzees (Fig. 5B, top). A second class of elements, typically of low complexity (CT-rich, GA-rich, and G-rich), was found to be weakly activating in both species (Fig. 5B), whereas a few elements (e.g., L1PA2) suppress recombination in both species (Fig. 5b, middle right). Only a few elements [notably (GGAA)n and MER92B elements] showed activation only in chimpanzees (Fig. 5b, bottom, and fig. S18). Among these and other repeats, we found that motifs with high GC fraction and CpG dinucleotide content lead to local rate increases in chimpanzees (table S7). For example, on Alu elements the motif CGGGCGC showed significant hotspot enrichment (Pcorrected = 2 × 10−4, RR = 1.2), but the effect was better explained by CpG content (fig. S19).

Fig. 5

Recombination rates around DNA repeat elements in chimpanzees and humans. (A) Recombination-influencing activity of repeat-element families in chimpanzees and humans (HapMap). The value reported is the ratio of the peak rate to background rate, as estimated from the robust genetic map after fitting a Gaussian profile using maximum likelihood. Repeat elements were required to have greater than 200 instances after thinning elements to at least 10-kb separation, and a profile fit with R2 greater than 0.55 in at least one species. Selected repeat elements are labeled. (B) Recombination rate profiles around selected repeat elements, as estimated in the robust map. Top: Two elements (THE1B and LTR49) that are recombination-promoting in humans only. Middle: Elements that are recombination-promoting (CT-rich repeats) or recombination-suppressing (L1PA2) in both humans and chimpanzees. Bottom: Two elements [(GGAA)n and MER92B] that are recombination-promoting in chimpanzees only. Number of elements after thinning is indicated.

We also carried out an exhaustive search for short DNA motifs enriched in nonrepeat DNA recombination hotspots relative to cold-spots, which identifies the known motifs CCTCCCT and CCCCACCCC and related sequences in the samples of 10 humans (14) (RR = 1.16 and 1.28, respectively; P < 1 × 10–10 after Bonferroni correction). In chimpanzees, the same approach only identifies two motifs, CGCG and CCCGGC, that are significantly enriched in chimpanzee hotspots after Bonferroni correction (corrected P = 0.0024, RR = 1.28 and P = 0.015, RR = 1.31, respectively; table S8). Both motifs are typical of CpG islands. Overall, we could not identify any motif that was consistently activating in chimpanzees across multiple backgrounds (fig. S20).

The influence of recombination on sequence evolution. Shifts in both local and broad-scale patterns of recombination between humans and chimpanzees act as natural experiments that reveal the effect of recombination on patterns of molecular evolution while other factors, for example, gene density, remain similar. In particular, we can assess the ability of recombination to drive local increases in GC content through a preference for GC bases during mismatch repair within gene conversion tracts (30, 31). Around human hotspots, we observed strong GC skew in both patterns of polymorphism (40% increase in GC skew at the hotspot center) and substitution (20% increase in GC skew), but only for mutations on the human lineage (Fig. 6A and fig. S21). In chimpanzees, we observed much weaker signals of GC bias (18% increase in GC skew at the hotspot center for polymorphisms compared to 10% increase for substitutions; Fig. 6B), despite comparable density and intensity for chimpanzee and human hotspots. These observations are consistent with a recent origin for hotspot locations in both species, and a more recent origin in chimpanzees.

Fig. 6

The influence of broad- and fine-scale changes in recombination rate on GC-promoting mutations. (A) GC skew [defined as the ratio of the number of GC-increasing changes compared to GC-decreasing changes; see (23)] in both polymorphism (left) and substitutions (right). Estimates from mutations on the human lineage are indicated in blue, whereas those on the chimpanzee lineage are in red. Smoothed lines were estimated using loess. The observed increase in skew in humans is completely absent in chimpanzees. (B) As for (A), but around hotspots detected in chimpanzees. Although the pattern of skew in chimpanzees is considerably weaker than for (A), no corresponding skew is observed in humans. (C) Broad-scale (1 Mb) effects of changes in recombination rate between chimpanzees and humans on patterns of GC skew in polymorphism (left) and substitution (right). Flux ratio is defined as the ratio of the GC skews in chimpanzees compared to humans. Chimpanzee recombination rate estimates are from the robust genetic map. Colors indicate different parts of the genome, with Pearson correlation coefficient indicated.

At the megabase scale, we found that changes in the rate of recombination between species correlate with changes in GC bias in both substitutions and polymorphisms (Fig. 6C). The correlation was stronger in polymorphism (r = 0.39 in nonrearranged regions) than substitution (r = 0.25), consistent with the changes in broad-scale recombination being evolutionarily recent. We see stronger correlations in regions that have experienced chromosomal rearrangements, where the changes in recombination rate have typically been greater. The most pronounced changes are seen in the chromosome 2 fusion region, where the suppression of recombination in the regions syntenic to the short arms of chimpanzee chromosomes 2a and 2b has led to a large reduction in GC skew over megabase scales (32).

Discussion. Our study demonstrates how fine-scale genetic maps can be obtained by the analysis of patterns of genetic variation obtained from population sequencing. Studying humans and Western chimpanzees, we found no hotspot sharing between the two species, consistent with earlier reports based on limited data (26). The complete lack of hotspot sharing is consistent with the hypothesis that in humans, PRDM9 plays a critical role in localizing cross-over activity at all hotspots, not just those that contain clear matches to previously identified motifs bound by PRDM9. Despite the marked shift in hotspot locations between the two species, we found that some fine-scale patterns, particularly the average profile of recombination rate around genes and CpG islands, remain similar, pointing to the importance of chromatin state in influencing where double-strand breaks occur (19) or to additional levels of control acting on broader scales (19, 33).

A notable difference between the species is that in chimpanzees no repeat elements, simple DNA motifs, or predicted PRDM9 binding sites are strongly or consistently associated with hotspot locations. There are three possible explanations. First, PRDM9 may have lost its role in specifying hotspot locations in chimpanzees, as has occurred in dogs, although we find no evidence for inactivating mutations (34). Second, PRDM9 alleles may each have similar specificity to target DNA sequences, but the substantial allelic diversity and their possibly recent origin may obscure signals for individual alleles. However, this hypothesis cannot explain why, when the density and strength of hotspots at the population level are similar in African populations and Western chimpanzees (Fig. 2c), we can recover known PRDM9-binding motifs in humans but no comparable motif in chimpanzees. Third, PRDM9 may play the same role as in humans and mice, but individual PRDM9 alleles may bind to a much greater variety of target sequences than do the predominant human alleles. If so, hotspot localization in chimpanzees may be more strongly driven by other factors, such as chromatin state. Whichever hypothesis is correct, one consequence is that, across the genome, no motif in chimpanzees will be strongly targeted for depletion by the inherent self-destructive drive of hotspots (though specific instances may be).

Our results also reveal the different processes that operate at fine and broad scales. At broad scales, we find substantial correlation in recombination rate between the species, which is disrupted by chromosomal rearrangement. However, even among conserved regions, less than 40% of the variance in chimpanzee recombination rate at 1 Mb can be explained by the human rate. Determining the factors that shape stasis and change in broad-scale recombination rates presents a key challenge in the study of recombination. A population sequencing approach, such as the one taken here, should enable further informative studies of recombination across a wide range of species.

Supplementary Materials

www.sciencemag.org/cgi/content/full/science.1216872/DC1

Materials and Methods

Figs. S1 to S21

Tables S1 to S9

References (3560)

  • These authors jointly supervised the project.

References and Notes

  1. Detailed information on methods and analyses can be found in the supplementary materials available in Science Online.
  2. Acknowledgments: This work was funded by NIH grants R01 GM83098 (to M.P.) and T32 GM007197 (to E.M.L.) and by Wellcome Trust grants 076113/E/04/Z (to P.D.), 086084/Z/08/Z (to G.M.), and 090532/Z/09/Z contribution to Core Facility. P.D. was supported in part by a Wolfson-Royal Society Merit Award. M.P. is supported by the Howard Hughes Medical Institute. O.V. is funded by a Wellcome Trust studentship (086786/Z/08/Z). We thank G. Sella, G. McVicker, members of the PPS labs, and reviewers for their comments and H. Thorogood and W. Czyz for assistance with PRDM9 sequencing. Part of this work has been supported by EUPRIM-Net under the European Union contract RII3-026155 of the 6th Framework Programme. Data are available from http://panmap.uchicago.edu. Some primate samples used in this study are under a Material Transfer Agreement from the San Diego Zoo.
View Abstract

Navigate This Article