Research Article

PRDM9 Is a Major Determinant of Meiotic Recombination Hotspots in Humans and Mice

See allHide authors and affiliations

Science  12 Feb 2010:
Vol. 327, Issue 5967, pp. 836-840
DOI: 10.1126/science.1183439

This article has a correction. Please see:

Homing in on Hotspots

The clustering of recombination in the genome, around locations known as hotspots, is associated with specific DNA motifs. Now, using a variety of techniques, three studies implicate a chromatin-modifying protein, the histone-methyltransferase PRDM9, as a major factor involved in human hotspots (see the Perspective by Cheung et al.). Parvanov et al. (p. 835, published online 31 December) mapped the locus in mice, and analyzed allelic variation in mice and humans, whereas Myers et al. (p. 876, published online 31 December) used a comparative analysis between human and chimpanzees to show that the recombination process leads to a self-destructive drive in which the very motifs that recruit hotspots are eliminated from our genome. Baudat et al. (p. 836, published online 31 December) took this analysis a step further to identify human allelic variants within Prdm9 that differed in the frequency at which they used hotspots. Furthermore, differential binding of this protein to different human alleles suggests that this protein interacts with specific DNA sequences. Thus, PDRM9 functions in the determination of recombination loci within the genome and may be a significant factor in the genomic differences between closely related species.

Abstract

Meiotic recombination events cluster into narrow segments of the genome, defined as hotspots. Here, we demonstrate that a major player for hotspot specification is the Prdm9 gene. First, two mouse strains that differ in hotspot usage are polymorphic for the zinc finger DNA binding array of PRDM9. Second, the human consensus PRDM9 allele is predicted to recognize the 13-mer motif enriched at human hotspots; this DNA binding specificity is verified by in vitro studies. Third, allelic variants of PRDM9 zinc fingers are significantly associated with variability in genome-wide hotspot usage among humans. Our results provide a molecular basis for the distribution of meiotic recombination in mammals, in which the binding of PRDM9 to specific DNA sequences targets the initiation of recombination at specific locations in the genome.

Meiosis is a specialized cell cycle, essential for sexual reproduction, in which diploid cells give rise to haploid gametes. The halving of genome content during meiosis results from two successive divisions. During the first one, the reductional division, which is unique to meiotic cells, homologous chromosomes segregate. This segregation requires the establishment of connections between homologs that are mediated in most species by reciprocal recombination events known as crossing over (CO) (1). COs also increase genome diversity, thereby improving the efficacy of natural selection (2). The molecular process of CO formation involve a highly regulated pathway of induction of programmed DNA double-strand breaks (DSBs), followed by their repair on the homolog (3). In yeasts Saccharomyces cerevisiae and Schizosaccharomyces pombe, initiation sites have been mapped by the direct molecular detection of DSBs. These studies have shown that DSBs are not randomly distributed along chromosomes but occur in specific regions of the genome, according to rules that are as yet poorly understood (4). A common chromatin feature, the trimethylation of lysine 4 of histone H3 (H3K4me3), defines yeast and mouse initiation sites (5, 6).

In mammals, in most cases, the locations of initiation sites are deduced from mapping CO events. COs can be mapped at high resolution, by pedigree analysis, detection of recombinant molecules in gametes, or analysis of linkage disequibrium (LD) (7, 8). In humans, these approaches have shown that most COs are clustered in narrow regions (1 to 2 kb), called hotspots, that are predicted to be preferred initiation sites (9). On the basis of LD patterns, more than 30,000 hotspots have been identified in the human genome, spaced, on average, every 50 to 100 kb, often outside from genes and with highly variable levels of activity (10, 11). In addition, some hotspots show interindividual variation in activity as shown by sperm-typing studies (7) or pedigree analysis (12).

LD-based hotspots were found to be highly enriched for a degenerate 13-mer motif (13). Moreover, in sperm-typing studies, single-nucleotide polymorphisms within this 13–base pair (bp) motif were found to be associated with variation of hotspot activity in cis (14, 15). Genome-wide, the motif plays a role in ~40% of hotspots and is proposed to be involved in initiation specification or other aspects of recombination activity (13). In mice, on the basis of the analysis of a 25-Mb interval on chromosome (chr) 1 (16) and several individual regions (17), initiation of meiotic recombination also appears to be clustered in small intervals. Recently, by comparing recombination activity between different mouse strains, a genetic locus responsible for the distribution of recombination in the genome was identified (18, 19), which potentially contributes, either directly or indirectly, to the specification of initiation sites in the genome. Specifically, the genetic background at this locus (wm7 haplotype from Mus musculus molossinus or b haplotype from M. m. domesticus strains C57BL/10 or C57BL/6) was found to affect recombination activity measured both chromosome-wide and at two individual hotspots (Psmb9 on chr 17 and Hlx1 on chr 1). This locus (named Dsbc1) was mapped to a region between 10.1 and 16.8 Mb on mouse chr 17 (18).

Prdm9, a candidate gene. Upon additional crossing, we refined the Dsbc1 locus to the 12.2- to 16.8-Mb region of mouse chr 17 [see supporting online material (SOM) text]. This region contains the Prdm9 gene coding for a protein with a PRD1-BF1 and RIZ/Su(var)3-9, enhancer-of-zeste, and trithorax (PR/SET)–methyl transferase domain and a tandem array of 12 C2-H2 zinc fingers. PRDM9 has been shown to trimethylate H3K4 and is expressed specifically in germ cells during meiotic prophase (20). Strains with distinct Dsbc1 alleles (wm7 or b) have different levels of H3K4me3 at the two recombination hotspots, Psmb9 and Hlx1. Specifically, a high level of H3K4me3 was correlated with high recombination activity at these hotspots (6). The Prdm9 gene is the only reported gene encoding for a histone methyl transferase in the Dsbc1 region and thus represents a strong candidate gene for the effect of Dsbc1. On this basis, we reasoned that the zinc fingers of PRDM9 could mediate DNA binding specificity and thus target its activity to specific sites in the genome. According to this hypothesis, altering the zinc fingers is predicted to lead to changes of sites targeted by PRDM9.

Distinct predicted DNA sequence specificities for two mouse PRDM9 zinc finger variants. We therefore determined and compared the cDNA sequences of Prdm9 from M. m. molossinus (wm7) and M. m. domesticus (b) (Fig. 1A and fig. S1). These two Prdm9 alleles showed a high level of polymorphism (24 changes over 847 residues); all but one of the changes are located in the zinc finger array. This array, encoded within a single exon, has a minisatellite-like genomic structure in which each zinc finger, 28 amino acids long, is encoded within a 84-bp unit, which is repeated in tandem with almost perfect homology at both the DNA and protein levels (fig. S1). For a given allele, the differences between repeats are restricted to seven positions, five of which encode for amino acids at coordinates –1, 3, and 6 of the zinc finger alpha helix, predicted to be in contact with the DNA and known to be involved in DNA sequence specificity (21, 22). When comparing the two Prdm9 alleles (wm7 and b), most polymorphisms (21 out of 23) were at residues –1, 3, and 6 of the zinc finger (Fig. 1A and fig. S1). The wm7 allele is also missing one zinc finger compared with the b allele. Sequencing the Prdm9 zinc finger array from M. m. castaneus showed it to be identical to wm7. This is consistent with the genetic origin of M. m. molossinus, known to be in part derived from M. m. castaneus, and with the observation that the two hotspots, Psmb9 and Hlx1, are active at similar levels in the presence of Dsbc1 alleles from either M. m. castaneus or M. m. molossinus [(18) and SOM text]. Using the Zinc Finger Consortium Database (23, 24), we predict that these two PRDM9 proteins preferentially recognize distinct DNA motifs (Fig. 1B). Due to the low predicted specificity of some zinc fingers and the multiple combinations through which several zinc fingers of a protein may contribute to DNA recognition, PRDM9 is expected to recognize a large number of sites in the genome. For these reasons, and also because of the limited DNA recognition predictability of some zinc fingers (25), the predicted motif has limited power in identifying PRDM9 binding sites. Nonetheless, it is noteworthy that sequences respectively matching 8 and 9 of the 13 highest score bases of PRDM9wm7 predicted recognition motif are found near the center of Psmb9 and Hlx1 hotspots (fig. S2).

Fig. 1

Mouse wm7 and b Prdm9 alleles are polymorphic at residues involved in specifying DNA targets in the zinc finger array. (A) Tandem repeat structure of the mouse PRDM9 zinc finger array. (Top) The structure of the mouse b allele is shown, with the Krüppel-associated box (KRAB), the PR/SET domain, and the zinc fingers (Zn) shaded in blue, yellow, and green, respectively. (Bottom) Sequences of the C-terminal tandem arrays of zinc fingers of the b allele (left) and the wm7 allele (right). The coordinate of the first residue of each repeat on the protein sequence is indicated. The residues identical to the second repeat are represented by stars (except for the first, incomplete zinc finger). The C and H residues, characteristic of the C2H2 zinc fingers, are depicted in red. The residues at positions –1, 3, and 6 of every zinc finger, which are of special importance for specifying the DNA target, are shown in blue. (B) PRDM9 wm7 and b alleles are predicted to recognize distinct DNA sequences. The amino acids at position –1, 3, and 6 of the zinc finger alpha helices, used for the prediction, are indicated under the corresponding bases of each DNA motif.

Variability in human PRDM9 zinc fingers. In humans, the degenerate 13-mer motif was proposed to be a potential binding site for zinc fingers, given its apparent 3-bp periodicity (13). Therefore, we analyzed the zinc finger region of the human PRDM9 protein for its predicted binding specificity. The human PRDM9 protein referenced in databases (Ensembl release 56, based on the Genome Reference Consortium GRCh37) contains 13 zinc fingers, with a tandem repeat structure similar to that observed in mice, in which repeats are highly identical except at positions –1, 3, and 6 of the zinc finger alpha helices (fig. S3A). Notably, a group of five zinc fingers had a predicted affinity for a sequence that matches the 13-mer hotspot motif (Fig. 2A). This finding suggested to us that the role for Prdm9 in specifying hotspot localization might be conserved from mouse to human. If so, we might expect allelic variation in the zinc finger array to be associated with hotspot usage differences among humans. To test these predictions, we analyzed Prdm9 polymorphism by sequencing individual cDNAs from a testis library derived from a pool of 39 individuals and also by genotyping the zinc finger array by minisatellite variant repeat (MVR)–PCR (26) in individuals of European ancestry: the Centre d’Etude du Polymorphisme Humain (CEPH) resources and the Hutterites, a founder population currently living in North America (Fig. 2B and figs. S3B and S4). A large number of alleles were found with differences in both the number of repeats and their identity. In the CEPH families, six alleles were found among 105 unrelated individuals, with the major allele (allele A) occurring at a frequency of 90%. Except for one amino acid change in the 6th zinc finger, allele A is identical to the genome sequence reference allele (allele B), which is at a frequency of 5%. Among other alleles (named C, D, E, and K), the first five zinc fingers of PRDM9 show little variability, but zinc fingers 8 to 11 from allele A are highly variable with amino acid changes at the positions involved in contact with the DNA (fig. S5). Variability in humans seems to be concentrated on one side of the zinc finger array, in the region involved in recognition of the 13-mer motif in allele A.

Fig. 2

(A) Human PRDM9 major alleles (alleles A and B) are predicted to bind the 13-mer hotspot motif, whereas the I allele is predicted to bind a distinct motif. The LD-based hotspot consensus identified by Myers et al. (13) is shown above. The amino acids at position –1, 3, and 6 of the zinc finger alpha helices are indicated as in Fig. 1B, with the residues predicted to recognize the LD-hotspot consensus motif shown in red. (B) Allelic diversity of the human PRDM9 zinc finger tandem array. Interspersion patterns of variant repeats (colored boxes) of alleles from unrelated individuals were established by either MVR mapping (105 CEPH unrelated parents or grandparents and 351 Hutterite parents) or sequencing clones from a testis cDNA library made from 39 donors. Major allele A and minor allele B were found in all three sets of unrelated individuals; other rare alleles were only found in one or two sets. The structures of some rare alleles (I, C, E, and F) differ strongly from alleles A and B in the region encoding the critical domain (red bar) for recognition of the 13-mer hotspot motif. N, number of alleles.

Association of human PRDM9 zinc finger variants with hotspot usage. In the Hutterite sample (26), three Prdm9 alleles, A, B, and I, were present at frequencies of 94, 4, and 2%, respectively. Given the amino acid changes in its zinc finger array, the I allele variant is not expected to recognize the 13-mer motif (Fig. 2A). The presence of these variants allowed us to test the functional relation between Prdm9 alleles, their predicted binding specificity, and hotspot usage, taking advantage of well-localized CO events in Hutterite families. Variation among Hutterite parents with respect to genome-wide “hotspot usage” (the fraction of COs that occurred in recombination hotspots inferred from LD data) was previously found to be significant and heritable [h2 = 0.22 (12)]. To increase our sample size, we typed an additional 188 Hutterite parents, in which we found 6 AI and 10 AB genotypes. Among these, we were able to call crossover events in transmissions from an additional two AB individuals, three AI individuals, and their five AA partners (i.e., the subset of parents for which genotyping information was available for two or more children). To assess the impact of variation at the zinc finger array of Prdm9 on hotspot usage in the Hutterites, we regressed the maximum likelihood estimate of hotspot usage for each parent on his/her genotype (Fig. 3A). Both AB and AI heterozygote individuals differed significantly from AA homozygotes in their use of LD-based hotspots of recombination (PAB = 0.033, PAI = 9.3 × 10−12). The AI heterozygotes had significantly lower hotspot usage in both males and females (PAI = 1.6 × 10−8 and PAI = 0.0032, with nAI = 7 and nAI = 2, respectively, where n is the number of individuals), whereas the AB result was only significant in females (PAB = 0.020, nAB = 9), but was in a consistent direction in males (PAB = 0.40, nAB = 9). This result was robust to the relatedness among Hutterite individuals and remained significant when the phenotypes were quantile normalized (26). Moreover, variation at the zinc finger array of Prdm9 alone explained 18% of the population variance in hotspot usage among Hutterite individuals (26); the true proportion is likely to be even higher, given that the phenotype is measured with considerable error.

Fig. 3

Association of human Prdm9 alleles with genome-wide (LD-based) hotspot usage. The different genotypes for variants in the zinc finger array are indicated by different colors. (A) In each individual, the percentage of recombination events that occurred in LD-based hotspots. The maximum likelihood estimate (MLE) for each individual is shown as a point, and the 95% confidence intervals (asymptotic cutoff) are indicated by the lengths of the horizontal lines. Individuals are ordered by their MLE values. The black vertical line shows the joint MLE for all individuals. (B and C) The relative log likelihood surfaces of the percentage of recombination events that occurred in LD-based hotspots for the three genotypes (AA, AB, and AI) in females and males, respectively. The curve for the BI genotype is left out because of low sample size (n = 1). The gray horizontal line is provided as a visual guide, to indicate where the asymptotic cutoff is for the 95% confidence interval.

Because individuals differ greatly in the precision with which their phenotype is estimated due to differences in the number of well-localized CO events (Fig. 3A), we considered whether this measurement error could affect our conclusions. To this end, we calculated the likelihood surface for the hotspot usage phenotype for each genotype, in females and males (Fig. 3, B and C). A likelihood ratio test of a model in which hotspot usage does not depend on genotype to one in which it does was highly significant in both males and females [P = 0.0014 in females, P < 10−5 in males, as assessed by permutation (26)]. Notably, the AI genotype is associated with a threefold (~70%) drop in the usage of LD-based hotspots (the maximum likelihood estimates fall from 60 to 18% in the joint analysis of males and females, see fig. S6). The large difference in LD-based hotspot usage between AA and AI individuals suggests that the I allele activates a set of hotspots that have not left a footprint on genetic diversity, either because they are too recent or too weak. The interpretation of the difference in hotspot usage between AA and AI individuals depends on how many crossovers are specified by the A allele in AA individuals. As a first approximation, we might consider that the 13-mer motif has been predicted to be causal at 40% of LD hotspots (13) and, thus, all else being equal, that 40% of crossovers placed in LD hotspots might depend on the A allele. The fact that the estimated difference between genotypes is far larger (~70%) suggests that the binding specificity of PRDM9 explains more than 40% of LD-based hotspot activity in the current population. In any case, the strong decrease observed in AI heterozygotes suggests that the I allele is out-competing the A allele in determining crossovers in LD-based hotspots, for example, because of a greater number of sites recognized or a higher binding affinity. The small but significant increase in LD-based hotspot usage in AB compared with AA individuals suggests that the sequences recognized by A and B are slightly different. This might be explained by the amino acid difference (serine to threonine) between these two alleles (Fig. 2A), located on a residue of a zinc finger potentially involved in interaction with the DNA.

Furthermore, whereas across individuals, hotspot usage was not significantly correlated with genetic map length (12), AB heterozygotes showed a significantly longer genetic map in the combined sample of both sexes (PAB = 0.014). This effect remained, even when the phenotype was quantile normalized. In contrast, there was no detectable effect of the AI heterozygote on the map length (PAI = 0.37) (26).

Direct PRDM9 binding to hotspot motifs. Together, these results provide direct evidence that Prdm9 is involved in hotspot specification and in controlling the distribution of recombination events in the human genome. To demonstrate that this effect is mediated through the binding of PRDM9 at hotspots, we directly tested the interaction between PRDM9A and PRDM9I proteins and their predicted recognition motifs. By southwestern analysis, PRDM9A protein (labeled ZA) was shown to have high affinity to a DNA fragment including the 13-mer hotspot motif (HM); it was also found to have low affinity to the same fragment carrying mutations in the most conserved positions of this motif (HM*), as well as to a DNA fragment including the predicted motif of the PRDM9I protein (IM) (Fig. 4, A to C). Reciprocally, binding of PRDM9I (ZI) was specific for the predicted I motif (Fig. 4B). These assays were independently confirmed by band-shift assays that showed the greater affinity of PRDM9A to the 13-mer hotspot motif compared with its mutated form and to the predicted I motif, as well as the greater affinity of the PRDM9I for the predicted I motif compared with the 13-mer hotspot motif (Fig. 4D).

Fig. 4

Human PRDM9 zinc finger domains of alleles A (ZA) and I (ZI) interact specifically with double-stranded oligonucleotides containing the extended motif associated with LD-based hotspots (13) (HM) and the predicted binding motif for hPRDM9 I allele (IM), respectively. (A to C) (Left panels) Southwestern blotting experiment performed with His-tagged ZI and ZA proteins from total E. coli extracts, probed with HM. (Right panels) Mirror-image blots obtained after diffusion transfer to a membrane placed on the other side of the same protein gel (26). (A) Immunoblotting experiment using monoclonal α-polyhistidine antibody. (B) Southwestern blotting using the IM probe. (C) Southwestern blotting using the HM* probe, which contains multiple mutations in the 13-mer motif. (D) Electrophoretic mobility shift assays with in vitro translated glutathione S-transferase–hPRDM9 zinc finger domain fusions of alleles A (ZA) or I (ZI). The probes on the left and right panels are HM and IM, respectively. Cold competitor, in molar excess of 20- and 200-fold over the probe, has been added as mentioned.

In summary, our observations reveal an entirely unexpected feature of initiation of meiotic recombination: a role for Prdm9 in specifying the sites of initiation in mammals, through the direct binding of PRDM9 to specific sequences in the genome and by promoting DSB formation in the vicinity of its binding site. Using a different strategy, Myers et al. (27) predicted the preferential binding of human PRDM9 to the 13-mer hotspot motif, and thus proposed PRDM9 to be involved in hotspot localization in humans. The precise mechanism of action of Prdm9 is not known. It is likely that the histone methyl transferase activity has an important role by promoting enrichment of H3K4me3 on nucleosomes located next to PRDM9 binding sites as observed at two mouse hotspots (6). In turn, this modification of the chromatin, or downstream signals, might be recognized by a component of the recombination initiation machinery allowing the recruitment of SPO11 that catalyzes meiotic DSB formation. Interestingly, in S. cerevisiae, the enrichment for H3K4me3 has also been observed at initiation sites (5). In this case, this histone modification depends on the histone methyl transferase Set1 that does not contain a DNA binding domain and that is probably recruited by an alternative mechanism. In mice and humans, PRDM9 seems to control the activity of a large fraction of hotspots. In fact, the presence of different Prdm9 alleles leads to major changes of crossover distribution on several chromosomes in mice (18, 19) and substantial changes in hotspot usage in humans (Fig. 3). Analysis of Prdm9–/– mice has shown that Prdm9 is essential for progression through meiotic prophase (20). On the basis of cytological analysis, DSBs were detected in Prdm9–/– spermatocytes, suggesting that Prdm9 might not be absolutely required for DSB formation. It is therefore possible that in the wild-type, some DSBs might occur at sites not bound by PRDM9.

Prdm9 has also been shown to be involved in hybrid sterility in M. musculus. This phenotype depends on polymorphisms in the zinc finger array of PRDM9 and on several independently segregating genes (28). In sterile hybrids, a defect is observed during meiotic prophase, after the stage of DSB formation, which may indicate an additional role for PRDM9 (for instance, in the regulation of gene expression) and presumably involves a limited number of genes. In fact, one does not expect PRDM9 to be a master transcriptional regulator given the rapid evolution of its DNA binding specificity among metazoans (29).

The features of the PRDM9 protein described above carry major implications for hotspot variability and genome evolution. The minisatellite structure of the Prdm9 zinc finger encoding region confers a strong potential to generate variability by recombination or replication slippage within the array. Specifically, a single–amino acid change within zinc fingers could lead to a PRDM9 variant with novel DNA binding specificity and, thus, could potentially create a new family of hotspots genome-wide. The introduction of new hotspots may counteract the loss of individual hotspots due to biased gene conversion upon DSB repair (which acts against the initiating allele), and so changes in the Prdm9 gene offer a mechanistic solution to the “recombination hotspot paradox” (30). Rapid evolution of both the PRDM9 protein and the hotspot motif have been shown by Myers et al. (27). Further, the zinc fingers of PRDM9 are evolving under positive selection and concerted evolution across many metazoan species, specifically at positions involved in defining their DNA-binding specificity (29). Regardless of the precise selective pressures acting on this gene, the properties of PRDM9 uncovered here, together with features of DSB repair, provide an interpretation for the divergence of fine-scale genetic maps between closely related species and even among individuals within species (19, 31, 32).

Supporting Online Material

www.sciencemag.org/cgi/content/full/science.1183439/DC1

Materials and Methods

SOM Text

Figs. S1 to S7

Table S1

References

  • * These authors contributed equally to this work.

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank all members of our laboratories for discussions, J. Pritchard for comments on an earlier version of the manuscript, R. Hernandez and E. Leffler for their help with bioinformatics, E. Leffler for generating fig. S7, E. Brun for technical assistance on PRDM9 in vitro assays, and D. Haddou and F. Arnal for mouse facility service. This study was supported by a grant from CNRS; the Association pour la Recherche sur le Cancer (grant ARC 3939); the Fondation Jerôme Lejeune and the Agence Nationale de la Recherche (grants ANR-06-BLAN-0160-01 and ANR-09-BLAN-0269-01) to B.d.M. C.G. was supported by a grant from Electricité de France. This research was further was supported by NIH grants HD21244 and HL085197 to C.O., a Sloan Foundation Fellowship to G.C., NIH grant GM83098, American Recovery and Reinvestment Act supplement 03S1, and a Howard Hughes Medical Institute Early Career Scientist Award to M.P. Sequences generated for this study are deposited in GenBank with the following accession numbers: GU216222, GU216223, GU216224, GU216225, GU216226, GU216227, GU216228, GU216229, and GU216230.
View Abstract

Navigate This Article