Report

Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination

See allHide authors and affiliations

Science  12 Feb 2010:
Vol. 327, Issue 5967, pp. 876-879
DOI: 10.1126/science.1182363

Homing in on Hotspots

The clustering of recombination in the genome, around locations known as hotspots, is associated with specific DNA motifs. Now, using a variety of techniques, three studies implicate a chromatin-modifying protein, the histone-methyltransferase PRDM9, as a major factor involved in human hotspots (see the Perspective by Cheung et al.). Parvanov et al. (p. 835, published online 31 December) mapped the locus in mice, and analyzed allelic variation in mice and humans, whereas Myers et al. (p. 876, published online 31 December) used a comparative analysis between human and chimpanzees to show that the recombination process leads to a self-destructive drive in which the very motifs that recruit hotspots are eliminated from our genome. Baudat et al. (p. 836, published online 31 December) took this analysis a step further to identify human allelic variants within Prdm9 that differed in the frequency at which they used hotspots. Furthermore, differential binding of this protein to different human alleles suggests that this protein interacts with specific DNA sequences. Thus, PDRM9 functions in the determination of recombination loci within the genome and may be a significant factor in the genomic differences between closely related species.

Abstract

Although present in both humans and chimpanzees, recombination hotspots, at which meiotic crossover events cluster, differ markedly in their genomic location between the species. We report that a 13–base pair sequence motif previously associated with the activity of 40% of human hotspots does not function in chimpanzees and is being removed by self-destructive drive in the human lineage. Multiple lines of evidence suggest that the rapidly evolving zinc-finger protein PRDM9 binds to this motif and that sequence changes in the protein may be responsible for hotspot differences between species. The involvement of PRDM9, which causes histone H3 lysine 4 trimethylation, implies that there is a common mechanism for recombination hotspots in eukaryotes but raises questions about what forces have driven such rapid change.

In humans and most other eukaryotes, meiotic crossover events typically cluster within narrow regions termed hotspots (15). Previously (6), we identified a degenerate 13–base pair (bp) motif, CCNCCNTNNCCNC, that is overrepresented in human hotspots. Both linkage disequilibrium (LD)–based analysis (6) and sperm typing at currently active hotspots (7) implicated this motif in the activity of 40% of hotspots.

Despite nearly 99% identity at aligned bases, humans and chimpanzees show little if any sharing of hotspot locations (4, 5), although it has remained undetermined whether the recently identified hotspot motif is also active in the chimpanzee. To resolve this question, we collected chimpanzee genetic variation data at 22 loci where there is both an inferred hotspot at the orthologous location in humans and human-chimpanzee sequence conservation of the 13-nucleotide oligomer: 16 motifs within THE1 elements and 6 within L2 elements, chosen for their high activity of a particular “core” version of the motif in humans (fig. S1). We used the statistical software LDhat to estimate recombination rates separately in each region in different populations of both species (8). For humans, we used the Haplotype Map (HapMap) Phase II data. For chimpanzees, we genotyped 36 Western, 20 Central, and 17 Vellorosus chimpanzees at a total of 694 chimpanzee single-nucleotide polymorphisms (SNPs), an average of 31.5 per region.

Because these regions are inferred human hotspots, the average estimated recombination rate surrounding the motif in humans showed a strong peak for both L2 and THE1 elements (Fig. 1A). In contrast, chimpanzees showed no evidence of increased recombination rates for either background. In Western chimpanzees, the THE1 estimated recombination rate around the motif was similar to the regional average, whereas a weak peak in mean rate for the L2 elements was produced solely by a single potential hotspot in one of the six regions (Fig. 1B). Results for the other chimpanzee subspecies were less informative (fig. S2) (8) but did not reveal a different pattern. To ensure that unknown haplotypic phase, smaller sample size, less dense data, and SNP ascertainment in chimpanzees had not compromised the ability to detect hotspots, we repeatedly sampled from the Centre d'Etude du Polymorphisme Humain (CEPH) from Utah (CEU) HapMap population data to produce human data sets comparable with those from chimpanzees in terms of these features (8). We conditioned only on the presence of the 13-nucleotide oligomer in THE1 and L2 elements and not the presence of a hotspot. This bootstrap technique revealed that the differences between human and chimpanzee rates cannot be explained by differences in power (P = 0.00052), although the signal was only significant for THE1 elements when analyzed separately (P = 0.00012) (fig. S3). These results provide evidence that the 13-nucleotide oligomer motif does not recruit hotspots in chimpanzees, implying changes in recombination machinery between humans and chimpanzees. The existence of factors capable of such changes in recombination genome-wide has been demonstrated in Caenhabdoritis elegans (9) and by the mapping in mice of a trans-acting factor responsible for differences in hotspot location among inbred strain crosses (10, 11).

Fig. 1

Recombination rates and patterns of motif gain and loss in human and chimpanzee. For additional details, see (8). (A) Estimated HapMap Phase II recombination rate across the 40 kb surrounding 16 human THE1 elements (red line) and six L2 elements (blue line) orthologous to the 22 regions analyzed in chimpanzee, and each containing a conserved exact match to the 13-bp core motif. Rates are smoothed using a 2-kb sliding window slid in 50-bp increments, averaged across elements. Horizontal dashed line indicates the human average recombination rate of 1.1 cM/Mb. Vertical dotted line indicates the center of the repeat. (B) Average estimated recombination rate for the western chimpanzee data around the 16 THE1 elements (red line) and six L2 elements (blue line) containing the 13-bp core motif. Other details are the same as (A). (C) Numbers of core motif gains (left bars) versus losses (right bars), inferred using macaque and orangutan outgroup information (8), in humans (orange bars) and chimpanzees (light blue bars) on three backgrounds: THE1, L2, and non-repeat (NR). For each background, gains are shown as a fraction of motifs currently present in each species and losses as a fraction of motifs inferred in the human-chimpanzee ancestor. The intervals flanking the plot on each side show exact 1-sided 95% confidence intervals and associated P values for testing equality of gain/loss rate between the species (8).

A separate process, predicted to cause a rapid evolution of individual hotspots, is the self-destructive drive inherent in double-strand break (DSB) formation, known as biased gene conversion (BGC) (12). Mutations reducing DSB formation in cis at recombination hotspots are preferentially transmitted as a consequence of repair of DSBs initiated on the other more recombinogenic strand in heterozygotes and are thus favored in a manner mimicking natural selection (13). This phenomenon could lead to rapid hotspot loss (14, 15). Direct evidence from sperm typing (16) has shown BGC at one polymorphic point mutation disrupting an occurrence of the 13-bp motif. More generally, BGC is predicted to eliminate copies of any recombination-promoting motif from the genome. The species-specific recombination activity of the 13-bp human hotspot motif suggests that losses of this motif should have occurred preferentially on the human lineage rather than that leading to chimpanzees.

To examine the evidence for BGC-driven motif loss, we therefore characterized rates and patterns of molecular evolution for the degenerate 13-nucleotide oligomer and the “core” version of the motif on specific backgrounds: THE1 elements, L2 elements, AluY/Sc/Sg elements (degenerate motif only), other repeats, and unique nonrepeat DNA (Table 1). We found a consistent substitution pattern imbalance, with chimpanzees having more copies of the motif than humans [empirical P = 0.003 for the most active form, with three of four independent backgrounds showing P < 0.05; P = 0.002 for the degenerate 13-nucleotide oligomer motif, with P < 0.05 for three of five individual backgrounds (8)]. As predicted by theoretical considerations of BGC [supporting online material (SOM) text and table S1] (14, 15), the magnitude of the imbalance was strongest for cases in which the motif has greatest activity. To assess whether motifs have been gained in chimpanzees or lost in humans, we used the published macaque (17) and draft orangutan (18) genome sequences to infer ancestral sequence. For THE1 elements, L2 elements, and nonrepeat DNA, we observed an excess of human losses of the most active motif relative to chimpanzee (P < 0.05 in each case) (Fig. 1C and table S2) and similar results for the degenerate 13-nucleotide oligomer motif (table S3). The effect strength again correlates with hotspot activity. In contrast, there are no significant differences between species in motif gains (P > 0.3). Alu elements were not analyzed because of a high rate of uncertainty in inferring the ancestral base.

Table 1

Motif imbalance between human and chimpanzee. For the core motif and the degenerate motif, we analyzed cases in which the motif occurs in exactly one of human and chimpanzee. Results are shown for the full set of nonshared motifs and stratified into five backgrounds that differ in average human recombination activity. Significance levels are calculated in two ways: P values for ratios are based on a one-sided exact binomial test of fewer human-only cases because the motif is known to be active in humans. Empirical P values are one-sided and obtained through comparisons of counts for the core or degenerate motif with counts observed for motifs of the same length and GC content on the same backgrounds (8). Dashes indicate zero counts in both species.

View this table:

To determine whether motif activity has been lost on the chimpanzee lineage or gained on the human lineage, we compared our observations with a population-genetics model (SOM text) (14, 15). On the human lineage, approximately 16% of motifs on the THE1 and 8% on the L2 background have been lost in humans since human-chimpanzee divergence (Fig. 1C). If the motif had been active since the time of speciation, we predict that 46 to 56% and 31 to 38% of motifs in THE1 and L2 elements, respectively, should have been lost. The observed patterns of motif evolution in humans are instead consistent with a recent (1 to 2 million years ago) activation of the 13-bp motif on the human lineage rather than inactivation on the chimpanzee lineage.

We next investigated the function of the 13-nucleotide oligomer motif. Previously, we suggested that the human hotspot motif was probably bound by a zinc finger protein with at least 12 zinc fingers, on the basis of an extended 30- to 40-bp region of weaker sequence specificity containing the motif and a 3-bp periodicity of influential bases (6). We therefore set out to identify candidates for such a protein using a computational algorithm that predicts DNA binding specificity for C2H2 zinc-finger proteins (19). Among the 691 identified human C2H2 zinc-finger proteins, the 13-nucleotide oligomer motif was present within the predicted binding sequence of five (fig. S4). Binding specificity was then further explored in silico by comparing predicted motif degeneracy for each candidate (inferred by calculating the relative binding score for every 1-bp mutation relative to the consensus) with empirical degeneracy patterns in the 13-bp motif (Fig. 2A). Predictions for one of the candidates, PRDM9, exactly matched the observed degeneracy at positions 3, 6, 8, 9, and 12 within the 13-bp motif (Fig. 2B) and lack of degeneracy at the other eight positions. Predictions for the other four candidates showed features inconsistent with the observed degeneracy (fig. S4). The predicted binding sequence for PRDM9 also contains an exact match on the opposite strand for an 8-bp region of the extended motif, upstream of the 13-bp degenerate motif, perhaps suggesting that PRDM9 zinc fingers might contact both DNA strands. Finally, the number of zinc fingers (13) in this protein, the positioning of the match to the 13-bp motif within the longer predicted binding sequence, and strong influence of this 13-bp region on specificity all match our previous predictions (6).

Fig. 2

(A) Previously estimated degeneracy of the 13-bp hotspot motif (logo plot; relative letter height proportional to estimated probability of hotspot activity and total letter height determined by degree of base specificity) (6) as well as an extended ~39-bp motif [text below logo, with influential positions (P < 0.01) shown in red]. (B) In silico prediction of the binding consensus for PRDM9, aligned with the 13-nucleotide oligomer, with more influential positions shown in red. Underlined in both (A) and (B) is an additional 8-bp matching sequence. The logo shows predicted degeneracy within this consensus (8). Below the text is the sequence of four predicted DNA-contacting amino acids for the 13 successive human PRDM9 zinc fingers (one oval per finger, differing colors for differing fingers, and the separated finger is gapped N-terminal from others) and their predicted base contacts within the motif. (C) Sequence of four predicted DNA-contacting amino acids for the PRDM9 zinc fingers in seven mammalian species, presented as in (B). Distinct fingers are given different colors; fingers present in at least two species have a black border.

The lack of activity of the 13-bp motif in chimpanzees demonstrated above suggests that in addition to having the predicted binding specificity, any motif-binding protein candidate should also show differences between humans and chimpanzees. For four of the five candidates, the predicted DNA-contacting amino acids within the zinc fingers are identical between human and chimpanzee. Chimpanzee PRDM9, however, has a dramatically different predicted binding sequence (fig. S5). Although PRDM9 has multiple zinc fingers in both species (12 and 13 respectively), the DNA-contacting residues –1, 2, 3, and 6 are only shared between species in the first finger (Fig. 2C). Such rapid evolution is exceptional. Comparing these residues among all 544 C2H2-containing zinc-finger protein human-chimpanzee ortholog pairs, PRDM9 is the most diverged (P = 0.0018). The PRDM9 sequences in five additional mammals (elephant, mouse, rat, macaque and orangutan) exhibit rapid evolution, variation in zinc-finger number (between 8 and 12), and patterns of substitution suggestive of complex repeat shuffling (Fig. 2C) (20).

Multiple lines of evidence point to a role for the orthologous mouse gene, Prdm9, in recombination. Prdm9 lies within a 5.1-Mb region that contains a locus that influences genome-wide hotspot locations (10, 11) and is exclusively expressed during meiotic prophase, with mice in which Prdm9 has been knocked out showing infertility and failure to properly repair DSBs (21). Mouse PRDM9 trimethylates lysine 4 of histone H3 (H3K4me3) (21), an epigenetic mark specifically enriched on mouse chromatids carrying recombination initiation sites within the mouse hotspot Psmb9 (22). In yeast, mutation of the sole gene, Set1, encoding H3K4me3 reduces crossover activity at 84% of hotspots (23). The lack of well-defined target-sequence specificity of Set1 (which is not a zinc-finger protein) may indicate why no dominant hotspot motif has been identified in yeast. Intriguingly, Prdm9 is also the only species-incompatibility gene yet identified in mouse (24), with differences among nine PRDM9 zinc fingers between mouse strains potentially playing a causal role in male sterility.

Baudat et al. find that variation in PRDM9 among humans correlates with variability in genome-wide hotspot usage, and PRDM9 binds the 13-bp motif in a sequence-specific manner in vitro (25). The findings of both studies imply that PRDM9 determines human hotspot locations, with PRDM9 evolution explaining lack of hotspot conservation in other species. Exactly how PRDM9 functions, for example through altering transcription of DSB repair genes or directly recruiting DSB repair proteins, remains unknown. These findings also raise the question of why such an important gene is evolving so rapidly. The DNA sequence of the zinc-finger array of PRDM9 constitutes a coding minisatellite, suggesting a high intrinsic mutation rate resulting from repeat instability. However, patterns of evolution within the zinc-finger array, notably the clustering and coordination of changes at sites that interact with DNA bases, strongly suggest positive selection on binding specificity (20). Selection could possibly arise from the gradual degradation of hotspots through BGC, leading to a loss in fitness either through the promotion of deleterious alleles within hotspots (15) or through having insufficient crossover events to support proper disjunction (14, 15). Alternatively, the rapid evolution of PRDM9 could be indicative of genetic conflict, such as meiotic drive or conflict involving mobile elements (26, 27). Although there is no direct evidence for this, mouse Prdm9 lies within one of the inversions characterizing the meiotic-drive t-complex (28).

Supporting Online Material

www.sciencemag.org/cgi/content/full/science.1182363/DC1

Materials and Methods

SOM Text

Figs. S1 to S5

Tables S1 and S2

References

Database S1

  • * These authors contributed equally to this work.

  • Present address: Institute of Cell and Molecular Science, Barts and The London School of Medicine and Dentistry, 4 Newark Street, London E1 2AT, UK.

  • § These authors contributed equally to this work.

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank N. Mundy for advice and provision of chimpanzee samples and C. Mitchell and E. Nerrienet for assisting in chimpanzee sample collection. We thank D. Falush and G. Coop for helpful conversations. Part of the work was completed while S.M. was a fellow at the Broad Institute. We would like to acknowledge funding from the Leverhulme Trust (to G.M.), the Royal Society (to P.D.), and the Wellcome Trust (to S.M., C.F., G.M., and P.D.). The chimpanzee genotype data for 22 hotspot candidate regions generated is available online with the SOM as Database S1.
View Abstract

Navigate This Article