Report

Comparison of Fine-Scale Recombination Rates in Humans and Chimpanzees

See allHide authors and affiliations

Science  01 Apr 2005:
Vol. 308, Issue 5718, pp. 107-111
DOI: 10.1126/science.1105322

Abstract

We compared fine-scale recombination rates at orthologous loci in humans and chimpanzees by analyzing polymorphism data in both species. Strong statistical evidence for hotspots of recombination was obtained in both species. Despite ∼99% identity at the level of DNA sequence, however, recombination hotspots were found rarely (if at all) at the same positions in the two species, and no correlation was observed in estimates of fine-scale recombination rates. Thus, local patterns of recombination rate have evolved rapidly, in a manner disproportionate to the change in DNA sequence.

Recombination shapes genomic diversity, breaking up ancestral linkage disequilibrium (LD) and creating new combinations of alleles on which natural selection can act. As in yeast (1), recombination in the human genome principally occurs at so-called “hotspots” of recombination (2, 3); experimentally characterized examples include the β-globin (4) and human leukocyte antigen (HLA) regions (5, 6). Because direct observation of recombination hotspots is laborious, only with the recent development of statistical methods to estimate recombination rates from population genetic (polymorphism) data (2, 3) has it become practical to study fine-scale recombination rates on a genomic scale.

The molecular determinants of hotspot location and activity are largely unknown. In yeast, chromatin structure influences initiation of double-strand breaks (DSBs) at hotspots (7). Directed mutagenesis of single nucleotides can disrupt hotspot activity (8), and different alleles of the same locus can show differences in recombination (911), indicating strong sequence specificity. However, no sequence motif has been identified as causing recombination hotspots. The observation of meiotic drive at hotspots has led to the hypothesis that hotspots may be short-lived because of evolutionary selection against sites that initiate DSBs (9, 12).

We compared fine-scale recombination patterns inferred from polymorphism data at orthologous loci in western chimpanzees and in two human population samples. Information about the DNA samples, regions examined, and polymorphisms studied is in table S1; details about experimental and analytic methods are provided online (13). Briefly, single-nucleotide polymorphisms (SNPs) were ascertained by resequencing in both species and by querying public databases. To validate SNPs and to expand the sample, we genotyped SNPs in a larger panel of humans and chimpanzees. Patterns of LD among SNPs were expressed using the pairwise metric |D′|, representing the extent of historical recombination among alleles (14). Statistical evidence for hotspots of recombination, as well as quantitative estimates of local rates of recombination, was calculated as in (2). Informally, recombination rates are estimated from polymorphism data by fitting an approximation to the coalescent model. Recombination rate estimates were obtained from the program LDhat by allowing a different recombination rate between each adjacent pair of SNPs and using Bayesian Markov chain Monte Carlo methods; statistical significance of a putative hotspot was calculated by comparing the fit of empirical genotype data to models that incorporate a constant recombination rate and those that allow variation in recombination rates (LDhot). The validity of this approach was previously confirmed empirically and through simulation (2). Whereas sperm typing estimates male recombination rates in the present, coalescent approaches estimate rates that are averaged over sexes and over many generations.

We first collected polymorphism data spanning known human recombination hotspots in HLA and β-globin (5, 6, 15, 16) and at the orthologous loci in western chimpanzees. As expected, estimated recombination rates were markedly increased in humans at the sites of known hotspots (Table 1 and fig. S1). At the orthologous locations in chimpanzee, however, there was no significant evidence for a hotspot (P > 0.01), and the estimated recombination rate was low (Table 1, fig. S1). We did detect (P < 0.005) a hotspot in the chimpanzee ∼1.7 kb away from, but not overlapping with, the β-globin hotspot in humans (fig. S1). Two of the six hotspots described here had previously been studied in chimpanzee (17, 18) with similar results (19).

Table 1.

Recombination rates estimated from population genetic data at known human hotspots. Rates estimated using LDhat (2) in Utah residents with ancestry from northern and western Europe from the CEPH resource (CEU), Beni sampled from Nigeria (BEN), and western African chimpanzees. P values were obtained by performing a one-sided test for the presence of a 2-kb hotspot centered at the position of the known human hotspot (2), based on 10,000 simulations. “Chimp sites” indicates the number of SNPs found solely by resequencing in the region of the human hotspots. The estimated rates are centered at the positions of the known human hotspots. NA, not applicable.

Hotspot Hotspot width (kb) Statistical significance for test of hotspot presence (P value) Estimated rate (cM/Mb) Chimp sites
CEU BEN Chimp CEU Chimp
DNA2 1.3 0.107 0.127 1 6.38 0.46 12
DNA3 1.2 <0.001 0.003 NA 20.73 0.44 2
DMB1 1.8 0.009 0.057 0.182 11.83 1.54 7
DMB2 1.2 <0.001 0.007 1 58.08 1.50 10
TAP2 1.2 <0.001 <0.001 0.028 23.78 0.25 14
β-Globin 1.7 <0.001 <0.001 1 46.08 1.37 16

To determine whether the lack of correspondence in the location of hotspots was general and to increase the power to detect correlation in fine-scale recombination rates across species, we examined three contiguous 500-kb regions (on chromosomes 4q26, 7q21, and 7q31) studied by the HapMap and ENCODE projects (20). These regions were selected without prior knowledge of LD or diversity in either species.

Qualitatively similar patterns of LD were observed in human and chimpanzee: Both had regions of strong LD (haplotype “blocks”) (21) interspersed with sites of LD breakdown. Although overall patterns were similar, there was little alignment in the locations and extent of LD breakdown (Fig. 1, fig. S2). In humans, there is extensive LD in ENCODE region 7q21, but much less in chimpanzee. Conversely, in chimpanzee there is extensive LD in ENCODE region 7q31, but not in the human samples. The ENCODE region on chromosome 4 shows similar overall extent of LD in both species, but little alignment where LD is extensive and where it breaks down.

Fig. 1.

Comparison of LD patterns and recombination rates for three 500-kb ENCODE regions (A through C). Pairwise LD of common SNPs (frequency > 0.05) in Yoruba sampled in Ibadan, Nigeria (YRI) (row i), CEU (row ii), and western chimpanzee (row iii) is expressed as D′, with red indicating LD that is strong (D′ > 0.8) and statistically significant [logarithm of the odds ratio for linkage (LOD) score > 2.0] (14). For comparability, genotype data for CEU and YRI were thinned to match the spacing in the chimpanzee data. Locations of SNPs are shown as lines above each plot. Comparison of estimated recombination rates (row iv from the complete data) for YRI (green), CEU (blue), and chimpanzees (red). Blue arrows indicate positions of human hotspots with statistical significance P < 0.01 in one human population and P < 0.05 in both human populations. Red arrows indicated positions of chimpanzee hotspots with statistical significance P < 0.01.

Statistical support was obtained for 18 hotspots in humans and 3 in chimpanzees (P < 0.01). In both species, most recombination events were estimated to occur over a small fraction of the overall sequence, with greater concentration of recombination activity in the two human samples than in the chimpanzee (fig. S3).

Although hotspots were detected in both species, there was little concordance in the location of hotspots in humans and in chimpanzee. At the site of a recombination hotspot in one species, the recombination rate in the other species is typically lower by a factor of 10 to 60 and not above baseline.

We analyzed the genotype data using a second analytic method (3) and obtained very similar estimates (Table 2). This second method provided statistical evidence (P < 0.05) for a hotspot in chimpanzee at 3 of the 18 human hotspots. Examination of obligate recombination events at these hotspots indicates that one of the 18 human hotspots may be a site of historical recombination in chimpanzee, and the other two are likely false positives of the method (fig. S4). Moreover, given 21 hotspots, two species, and two analytical methods, some overlap in the sites of recombination might occur by chance.

Table 2.

Summary of findings for identified ENCODE hotspots. Each row corresponds to a different hotspot identified using LDhot P values and requiring P < 0.01 in one human population, P < 0.05 in both (human hotspots), or P < 0.01 (chimpanzee hotspots). Throughout, we assume the effective population size (Ne) = 10,000 (CEU), 16,000 (YRI), 12,000 (chimpanzees) (13). Estimated hotspot intensity: LDhat estimates of rates of hotspot intensity in humans (averaged over CEU and YRI rates) and chimpanzees are averaged over the 2 kb around the hotspot center. Rate estimates by Hotspotter (3) are based on fitting a two-rate model, using the parameters above. Statistical significance for hotspots: P values were estimated using LDhot (2) or Hotspotter. Hotspotter P values are based on data phased using PHASE v2.0, obtained by fitting a two-rate model with a 2-kb hotspot centered at the position identified by LDhot, using data for the 25 sites on either side of the hotspot center. P values are calculated based on a one-sided likelihood ratio test, assuming a standard mixture of chi-squared distributions for the test statistic. Estimated power: For the hotspots discovered in humans, estimated power in chimpanzee to detect a hotspot of the same intensity with P < 0.05, and the probability of obtaining a P value lower than that observed in chimps (P < Pobs). For hotspots discovered in chimpanzee, the estimated power to detect a human hotspot of the same intensity for the CEU population (estimates of power based on the YRI data were very similar), as above. E stands for ×10 to the value stated after it (e.g., 1.28 × 10–9).

Hotspot information Estimated hotspot intensity (cM/Mb) Statistical significance for hotspots (P value) Estimated power
Human (average) Chimp YRI CEU Chimp Other species
Chromosomal location Nucleotide position (kb) Species LDhat Hotspotter LDhat Hotspotter LDhot Hotspotter LDhot Hotspotter LDhot Hotspotter LDhot (P < 0.05) LDhot (P < Pobs)
4q26 16.00 Human 16.942* 13.092* 1.438 4.430 0.001* 1.28E-09* 0.001* 1.25E-05* 0.120 0.007* 0.802 0.864
4q26 141.96 Human 10.673* 26.479* 0.099 0.000 0.001* 3.84E-10* 0.005* 7.09E-14* 1.000 1.000 0.577 0.948
4q26 154.25 Human 3.005 6.719* 0.099 0.000 0.050* 0.002* 0.003* 0.010* 1.000 1.000 0.229 0.793
4q26 233.21 Human 17.739* 12.351* 0.120 0.000 0.001* 6.46E-13* 0.001* 2.34E-04* 1.000 1.000 0.807 0.988
4q26 272.50 Human 9.151* 8.417* 0.086 2.039 0.020* 6.44E-07* 0.003* 4.41E-06* 1.000 0.036* 0.521 0.938
4q26 301.25 Human 13.171* 11.438* 0.082 0.478 0.001* 8.99E-08* 0.002* 3.62E-04* 1.000 0.186 0.670 0.965
4q26 409.96 Human 10.517* 4.053* 0.463 0.000 0.001* 5.51E-09* 0.030* 0.038* 1.000 1.000 0.571 0.947
4q26 485.96 Human 1.486 1.182 0.650 3.322 0.001* 0.371 0.007* 0.204 1.000 0.116 0.132 0.633
7q21 382.37 Human 4.578 2.834 0.073 0.000 0.002* 0.001* 0.030* 0.179 1.000 1.000 0.321 0.918
7q21 415.87 Human 7.107* 10.531* 0.207 0.091 0.007* 3.93E-07* 0.020* 3.84E-07* 0.290 1.000 0.435 0.590
7q31 67.00 Human 35.995* 28.649* 0.111 2.915 0.001* 3.79E-11* 0.001* 9.36E-19* 0.160 0.007* 0.922 0.952
7q31 137.25 Human 0.831 1.743 0.054 1.069 0.001* 0.007* 0.008* 0.229 1.000 0.173 0.090 0.564
7q31 211.75 Human 4.971 7.192* 0.049 0.716 0.001* 0.002* 0.002* 3.04E-07* 0.290 0.217 0.339 0.479
7q31 265.25 Human 15.546* 13.410* 0.027 0.457 0.001* 2.26E-07* 0.001* 2.11E-12* 0.270 0.248 0.758 0.834
7q31 290.25 Human 31.250* 8.810* 0.003 0.145 0.001* 1.74E-09* 0.001* 0.002* 1.000 0.366 0.892 0.995
7q31 344.50 Human 6.501* 5.408* 0.091 1.149 0.001* 8.34E-06* 0.050* 0.306 1.000 0.306 0.408 0.925
7q31 364.50 Human 28.034* 12.880* 0.081 0.000 0.001* 4.85E-17* 0.001* 4.60E-05* 1.000 1.000 0.872 0.993
7q31 479.75 Human 11.202* 6.392* 0.051 0.000 0.001* 1.05E-09* 0.001* 3.12E-07* 1.000 1.000 0.597 0.952
4q26 200.00 Chimp 0.195 0.459 9.036* 9.977* 1.000 0.444 1.000 0.441 0.003* 1.51E-08* 0.926 0.994
4q26 255.50 Chimp 0.314 0.693 7.928* 14.631* 0.380 1.000 1.000 0.464 0.002* 6.97E-07* 0.878 0.987
4q26 425.25 Chimp 0.204 0.553 9.033* 8.859* 0.060 0.329 1.000 1.000 0.001* 2.88E-05* 0.926 0.994

Finally, we estimated recombination rates in both species, averaged over 10-kb windows across these three 500-kb regions and 14 additional 160-kb regions previously described (13). No evidence for correlation in recombination rates was observed across the two species using the Spearman rank correlation test (P > 0.1).

We considered three possible artifacts that could erroneously cause a lack of correspondence in the estimated locations of hotspots: first, that the regions we studied were unusual, with low rates of sequence similarity between humans and chimpanzees; second, that population structure in the chimpanzee sample might confound analysis; and third, that the data and analytic methods provide insufficient statistical power to detect hotspots even where present.

Sequence identity ranged from 98.4 to 98.8%, with a mean of 98.6%, similar to previous estimates (22, 23). The three 500-kb regions occur at nearly identical chromosomal positions in both species, which makes it unlikely that rearrangements (e.g., centromeric to telomeric) explain differences in recombination rates.

We genotyped 40 loci to assess population structure in the chimpanzee sample (13). Analyzed with Structure 2.0 (24), the best-fitting demographic model was that of a single population. When genotypes from two central African chimpanzees were added, two subpopulations were predicted, and the confounding individuals were identified. Analysis of chimpanzee pedigrees and genotype data ruled out cryptic relatedness.

Low power to detect hotspots, or a high rate of false positives, could cause a lack of overlap in the observed locations of hotspots in two species, although sensitivity and/or specificity would have to be extremely poor to explain the nearly complete lack of correspondence across 21 hotspots. Both sensitivity and specificity are thought to be good when analyzing human data, on the basis of hotspot analysis previously validated by sperm typing. We assessed power to detect hotspots in chimpanzee (where we lack sperm-typing data) in several ways. First, we used the standard coalescent to simulate genotype data based on hotspots of the same intensity as the human HLA and β-globin hotspots, matching the chimpanzee data in terms of sample size, ascertainment, and number of sites. In these simulations, rates as low as those seen in the actual chimpanzee data were observed less than 2% of the time.

Second, we evaluated power using the empirical genotype data from human and chimpanzee by juxtaposing collections of genotypes separated by different distances on the estimated fine-scale genetic map, artificially creating hotspots of known intensities. Figure 2 shows the relation between the estimated hotspot intensity and the fraction of simulations in which statistically significant evidence for recombination hotspots was obtained (table S2). Power in the chimpanzee is >80% for 8 of the human hotspots and >50% for 14 hotspots. At the sites of the chimpanzee hotspots, power in humans is >87%. These analyses make it extremely unlikely that the limited correspondence observed across 21 hotspots is an artifact of low power [(13); figs. S3 and S5].

Fig. 2.

Power to detect hotspots in two human populations and chimpanzee. “Power” indicates the likelihood of observing statistical support for a hotspot (P < 0.05) of a given intensity in centimorgans per megabase for YRI (green), CEU (blue), and chimpanzee (red). Arrows indicate the estimated intensity of the hotspots detected in each species, using the color scheme above. Power in chimpanzee ranges from 40 to 95% for hotspots of intensity >5 cM/Mb observed in the human data and, in human, is >87% for each hotspot observed in chimpanzee. For full details of the simulations, see (13).

It is unlikely that the hotspots identified are false positives of the methods used, for a number of reasons: Hotspot detection results are highly congruent across both human population samples when analyzed with two computational methods (Table 2), and hotspots align well with patterns of LD breakdown (Fig. 1). Perhaps the strongest argument that claimed hotspots are not false positives is that a completely model-free approach that makes no assumptions about demography (25) confirms that for both human and chimpanzee there is a clustering of obligate recombination events at detected hotspots (fig. S4).

The lack of correlation in recombination patterns between humans and chimpanzees demonstrates that fine-scale recombination rates evolve rapidly, to an extent disproportionate to the change in nucleotide sequence. Rapid evolution of hotspots has previously been hypothesized on the basis of examples of meiotic drive at hotspots and the mechanism of DSB repair (9, 12). Our observations argue against models in which hotspots are directed solely by short, neutrally evolving DNA motifs, which would almost always be identical between the two species. Epigenetic factors, which are known to play a role in recombination hotspots (7), may vary more substantially across closely related species than does DNA sequence. Alternatively, if the trans-acting molecular machinery that initiates crossover events has nucleotide site preferences, then it is possible that substitutions in these components could dramatically alter site preference across the genome. Although DNA sequence is typically shared across human and chimpanzee, the polymorphisms in each species are not (26). It is intriguing to speculate that polymorphisms could themselves play a role in shaping fine-scale recombination; this could also explain why different alleles of a given locus can have substantially different recombination rates (9). Finally, we note that if recombination rates evolve rapidly, then in some cases, rates from “historical” polymorphism data might truly differ from contemporaneous rates in sperm.

By applying these analytical methods to genome-wide polymorphism surveys, an extensive collection of recombination hotspots will soon be available across the human genome. Studying these hotspots should ultimately illuminate the as yet mysterious factors that direct the location and frequency of recombination in our species.

Supporting Online Material

www.sciencemag.org/cgi/content/full/1105322/DC1

Materials and Methods

Tables S1 and S2

Figs. S1 to S5

References and Notes

References and Notes

View Abstract

Navigate This Article