Research Article

Characterizing mutagenic effects of recombination through a sequence-level genetic map

See allHide authors and affiliations

Science  25 Jan 2019:
Vol. 363, Issue 6425, eaau1043
DOI: 10.1126/science.aau1043

Human recombination and mutation mapped

Genetic recombination is an essential process in generating genetic diversity. Recombination occurs both through the shuffling of maternal and paternal chromosomes and through mutations generated by resolution of the physical breaks necessary for this process. Halldorsson et al. sequenced the full genomes of parents and offspring to create a map of human recombination and estimate the relationship with de novo mutations. Interestingly, transcribed regions of the genome were less likely to have crossovers, suggesting that there may be selection to reduce changes in genetic sequences via recombination or mutation in these regions.

Science, this issue p. eaau1043

Structured Abstract

INTRODUCTION

Diversity in the sequence of the human genome, arising from recombinations and mutations, is fundamental to human evolution and human diversity. Meiotic recombination is initiated from double-strand breaks (DSBs). DSBs occur more frequently in regions of the genome termed hotspots, and a small subset eventually gives rise to crossovers, a reciprocal exchange of large pieces between homologous chromosomes. The majority of DSBs do not lead to crossovers but end as localized transfers of short segments between homologous chromosomes or sister chromatids, observable as gene conversions when the segment includes a heterozygous marker. Crossovers co-occurring with distal gene conversions are known as complex crossovers.

RATIONALE

Current meiotic recombination maps either have limited resolution or the events cannot be resolved to an individual level. The detection of recombination and de novo mutations (DNMs) requires genetic data on a proband and its parents, and a fine resolution of these events is possible only with whole-genome sequence data. Whole-genome sequencing and DNA microarray data allowed us to identify crossovers and DNMs in families at a high resolution. We resolved crossovers at an individual level, allowing us to examine variation in crossover patterns between individuals, analyzing which crossovers are complex and how crossover patterns are influenced by age, sex, sequence variants, and epigenomic factors. It is known that the mutation rate is increased near crossovers, but the rate of DNMs near crossovers has been characterized only indirectly or at a small scale.

RESULTS

We show that a number of epigenomic factors influence crossover location, shifting crossovers from exons to enhancers. Complex crossovers are more common in females than males, and the rate of complex crossovers increases with maternal age. Maternal age also correlates with an increase in the recombination rate in general and a shift in the location of crossovers toward later-replicating regions and regions of lower GC content. Both sexes show an ~50-fold increase in DNMs within 1 kb of crossovers, but the types of DNMs differ considerably between the sexes. Females, but not males, also exhibit an increase in the mutation rate up to 40 kb from crossovers, particularly at complex crossovers. We found 47 variants at 35 loci affecting the recombination rate and/or the location of crossover, 24 of which are coding or splice region variants. Whereas some of the variants affect both the recombination rate and several measures of crossover location in both sexes, other variants affect only one of these measures in one of the sexes. Many of these variants are in genes that encode the synaptonemal complex.

CONCLUSION

Our genome-wide recombination map provides a resolution of 682 base pairs. We show that crossovers have a direct mutagenic effect and demonstrate that DNMs and crossovers accumulate in the same regions with advancing maternal age. Furthermore, our results illustrate extensive genetic control of meiotic recombinations and highlight genes linked to the formation of the synaptonemal complex as determinants of crossovers.

Our search for crossovers in parents and their offspring.

Histone modifications influence crossover location. The DNM rate is higher within 1 kb from a crossover in both sexes, but the type of mutations differs between the sexes. The DNM rate is also higher up to 40 kb from crossovers in females with enrichment of G→C mutations. We used crossovers from many individuals to construct genetic maps and performed genome-wide association studies (GWAS) on the recombination rate and attributes of crossover locations to search for genes that control crossover characteristics.

Abstract

Genetic diversity arises from recombination and de novo mutation (DNM). Using a combination of microarray genotype and whole-genome sequence data on parent-child pairs, we identified 4,531,535 crossover recombinations and 200,435 DNMs. The resulting genetic map has a resolution of 682 base pairs. Crossovers exhibit a mutagenic effect, with overrepresentation of DNMs within 1 kilobase of crossovers in males and females. In females, a higher mutation rate is observed up to 40 kilobases from crossovers, particularly for complex crossovers, which increase with maternal age. We identified 35 loci associated with the recombination rate or the location of crossovers, demonstrating extensive genetic control of meiotic recombination, and our results highlight genes linked to the formation of the synaptonemal complex as determinants of crossovers.

In meiosis, recombination between paired homologous chromosomes contributes to genetic diversity by introducing new combinations of alleles (1, 2). Recombination is initiated through the formation of double-strand breaks (DSBs) catalyzed by SPO11 (3, 4). A large number of DSBs are generated across the genome (5), of which only a small subset eventually gives rise to crossovers, which result in the exchange of sequences flanking the crossover point and yield recombinant chromosomes (6). The majority of DSBs do not lead to crossovers but instead yield localized transfer of genetic material between homologous chromosomes or sister chromatids. Such transfers are known as noncrossovers and are observable as gene conversions.

The distribution of crossovers is nonrandom, with certain regions, termed hotspots, being more favorable to DSB formation (7). A critical element in defining hotspots in humans is the histone methyltransferase PRDM9 (8), a DNA binding protein that catalyzes the trimethylation of histone H3 at lysine (K) residue 4 (H3K4me3) and recruits SPO11 for DSB formation. Variants in the DNA binding domain of PRDM9 are known to influence its sequence specificity, which in turn influences the distribution of DSBs and consequently the locations of hotspots (9). Additionally, histone modifications can affect crossover formation and resolution (10); telomeric regions have a higher rate of crossovers, particularly in males (11); and recombination occurs more frequently in GC-rich regions (12, 13).

We constructed a sequence-level genetic map pinpointing the locations of crossovers in individuals to a subkilobase resolution. The locations of crossovers were determined from haplotype phase transitions in parent-offspring pairs, at loci where the parent is heterozygous (Fig. 1A). The sequence data resolution allows us to identify the crossovers that are accompanied by distal gene conversion events, hereinafter referred to as complex crossovers. The subkilobase resolution of crossovers provides an opportunity to examine the correlation of any sequence attribute with crossovers, analyze differences between individuals, and determine whether crossovers occur in an age-dependent manner.

Fig. 1 Determination of crossovers.

(A) The locations of crossovers can be determined only up to the interval between heterozygous markers. (B) Proportion of crossovers that fall within a given feature size. (C) Proportion of crossovers as a function of standardized recombination rate. (D) Proportion of crossovers as a function of genomic size. (E) Pearson correlations of the map presented in this study with previously published maps. Comparisons with sex-specific maps were performed for the relevant sex. (F) Haplotypes transmitted from a mother in the cases of simple and complex crossovers. P, paternal; M, maternal; SL, the sequence-level map presented in this study; AK, the map of (18); CB, the map of (12); HapMap, LD-based map from the International HapMap Project (19); AA, African-American admixture-based map (21); 1000GP, LD-based map from the 1000 Genomes Project (20); b, bases.

The role of crossovers in mutagenesis is unclear (14). Recombination hotspots show high sequence diversity (15) and greater regional de novo mutation (DNM) rates (16). The study of this relationship has been hampered because direct observation of DNMs and crossovers requires a large number of sequenced trios. By using whole-genome sequence (WGS) trios, we examined the rate of DNM and its relationship to crossovers.

Genetic map

Microarray genotype data from 626,828 single-nucleotide polymorphisms (SNPs) allowed us to identify 1,476,140 crossovers in 56,321 paternal meioses and 3,055,395 crossovers in 70,086 maternal meioses, a total of 4,531,535 crossovers in 126,427 meioses. By using WGS data with 9,305,070 SNPs, we refined the boundaries for 761,981 crossovers: 247,942 crossovers in 9423 paternal meioses and 514,039 crossovers in 11,750 maternal meioses (Tables 1 and 2).

Table 1 Marker and meiosis data.

Shown are the number of microarray and WGS markers used in the study and the number of meioses considered for computing the recombination rate and genetic maps. Also indicated are the number of distinct parents and the number of meioses used for the refinement of crossovers with the WGS data, the number of meioses used for estimating complex crossovers, and the number of WGS trios used for DNM analysis. Dashes are used as placeholders where the category is not applicable. ChrX, chromosome X.

View this table:
Table 2 Crossover data.

The number of crossovers found in the data used for computing recombination rates and genetic maps and the number of regular and complex crossovers found in the data from microarray-typed trios with a WGS child are shown.

View this table:

All of our results are presented for autosomes, unless otherwise noted. We estimate the genetic lengths of the autosomes as 2602 and 4180 centimorgans (cM) for males and females, respectively, corresponding to a sex-averaged length of 3391 cM (17). The average resolution of our genetic map is 682 base pairs (bp): 655 and 708 bp for our paternal and maternal maps, respectively (Fig. 1B and table S1). This advances previous genetic maps from pedigrees with resolutions of 2832 bp (12) (table S2) and 8210 bp (18) (table S3), linkage disequilibrium–based (LD-based) inference (table S4) with resolutions of 1324 bp (19) and 1407 bp (20), and admixture-based inference with a resolution of 2491 bp (21) (Fig. 1B). To date, LD-based maps have been preferred for fine-scale resolution of genetic positions. However, our new pedigree-based map has greater resolution and provides direct estimates of genetic length and sex-specific rates from crossovers assigned to individual parents.

The fine resolution of our map reveals that the fraction of crossovers that occur in hotspots (regions where the recombination rate is 10 times the genomic average) is larger than previously estimated (12, 18). We find that 74.9% [95% confidence interval (CI), 74.9 to 75.0%] of paternal crossovers and 71.1% (95% CI, 71.1 to 71.2%) of maternal crossovers occur in hotspots (Fig. 1C), which cover only 1.6 and 1.8% of the genome for males and females, respectively (Fig. 1D).

Correlation with existing maps shows that medium- and large-scale features of our maps are consistent with previous findings (Fig. 1E and table S5). At a scale of 10 kb, the Pearson correlation coefficient between our current and prior maps (18) is 0.81 for both the paternal and maternal maps. At 1 Mb, the correlation is 0.99 for both maps.

Complex crossovers

We consider a crossover to be complex when a gene conversion is found distal to the crossover but within 100 kb of its location. Complex crossovers can be detected (Fig. 1F) only when a heterozygous SNP is located within the segment that is subject to gene conversion (22, 23). These sites are generally short and consequently difficult to detect in low-resolution SNP microarray data. However, we are able to estimate the rate of complex crossovers in a subset of probands with available WGS information and microarray data from both parents. On the basis of 15,841 meioses, we found that 0.53% (95% CI, 0.50 to 0.56%) and 1.24% (95% CI, 1.21 to 1.29%) of crossovers are complex for fathers and mothers, respectively. These estimates are consistent with previous estimates (22) of 0.31% (95% CI, 0.06 to 0.60%) and 1.33% (95% CI, 0.85 to 1.82%) (17). Our results for fathers are also consistent with estimates from recombination hotspots in spermatocytes (24).

De novo mutations

WGS data allowed us to assess the contribution of crossovers to mutagenesis. In 2976 WGS trios, we identified 200,435 DNMs, including 5748 on chromosome X (17), for which the parent of origin was determined for 79,685 DNMs (Table 3). Parental age at birth is a major determinant of DNM in the proband (2527). We estimate that the number of DNMs increases by 1.39 (95% CI, 1.35 to 1.44; likelihood ratio test, P < 10−50) and 0.38 (95% CI, 0.34 to 0.43; likelihood ratio test, P < 10−50) for each paternal and maternal year at birth, respectively.

Table 3 DNM data.

The number and type of DNMs found in WGS trios are shown.

View this table:

We examined each of the 200,435 DNMs to assess their distance from crossovers and found 173 autosomal DNMs within 1 kb from a crossover, 101 and 72 near paternal and maternal crossovers, respectively. Through attempts at Sanger sequencing of 169 identified DNMs, for which 139 were successful, we confirmed 134 as DNMs. This is comparable to our DNM validation rate on the basis of concordance between monozygotic twins (97.3%) (17). The parent of origin could be determined for 73 of the 173 DNMs, and among those, 69 (94.5%) occurred on the parental chromosome that also harbored the crossover.

The mutation rate near crossovers is markedly greater than the genomic average (Fig. 2A and Table 4); within 1 kb of paternal and maternal crossovers, we estimate mutation rates 41.5 times (95% CI, 33.2 to 52.0 times; jackknife-m, P = 6.6 × 10−232) and 58.4 times (95% CI, 44.0 to 77.4 times; jackknife-m, P = 3.4 × 10−176) the average, respectively. This results in a mutation rate of 4.0 × 10−7 per base pair per generation (95% CI, 3.2 × 10−7 to 4.8 × 10−7) in these crossover regions in fathers, in agreement with the 4.6 × 10−7 per base pair per generation previously estimated from sperm genotyping of two hotspot locations (28). We determined that this increase cannot be explained by regional sequence diversity (table S6), and the DNM rate is minimally affected by marker density and crossover map resolution (table S7).

Fig. 2 DNMs and crossovers.

(A) DNM rate within 10 kb from a crossover. (B) Fraction of phased DNMs in individual mutation classes. (C) Strand asymmetry for CpG→TpG DNMs within 1 kb from a crossover. (D) Number of DNMs as a function of distance from a crossover. (E) DNM rate within 40 kb from a crossover. GW, genome-wide; CO, crossover.

Table 4 Crossover and DNM rates.

Results are presented for the sexes separately. Numbers represent the crossover rate relative to the genome average within annotated regions; values within parentheses are 95% CIs. ChromHMM categories are measured in adult ovaries. Abbreviations: Enhancers/DNase, enhancer states (EnhA1/2/AF/W1/W2/Ac) and deoxyribonuclease (DNase)–only states (DNase); Biv/Poised, bivalent and poised promoters; PRC2, polycomb-group–repressive complex 2 (ReprPC); Prom, promoter regions (PromU/D1/D2); Tx, transcribed regions (Tx5′/Tx/Tx3′/TxWk); TxEnh, enhancers within transcribed regions (TxEnh5′, TxEnh3′, TxEnhW, and TxReg); ZNF, enriched over zinc-finger genes and repeats (ZNF/Rpts); Het, heterochromatin.

View this table:

We next analyzed the spectrum of DNMs (27), grouping mutations and their reverse complements into mutation classes. C→T mutations (including the reverse complement, G→A) are further broken into those that occur inside or outside of a CpG context, sites where cytosine is followed by guanine. Most cytosines at CpG sites are methylated, with the exception of those within regions where CpGs are highly concentrated, referred to as CpG islands (29). As a consequence of deamination, methylated cytosines give rise to thymine and, if left unrepaired, result in CpG→TpG mutation. Although similar increases in the mutation rate near crossovers are observed in both sexes, mutations near paternal crossovers are primarily C→T mutations in a CpG context, with an overrepresentation of 3.81-fold (95% CI, 2.50- to 5.75-fold; Fisher exact test, P = 4.0 × 10−10), whereas mutations near maternal crossovers are mainly C→T mutations outside of a CpG context, with an overrepresentation of 2.65-fold (95% CI, 1.61- to 4.33-fold; Fisher exact test, P = 6.6 × 10−5) (Fig. 2B). Strand asymmetry of variants near DSBs has been attributed to increased mutability of single-strand intermediates in DSB resolution (9). We used the position of DNM relative to the crossover median location to orient the DNM strand (Fig. 2C). For males, we find that C→T DNMs in a CpG context occur on the 5′ side of the crossover and their complement (CpG→CpA) occurs on the 3′ side (75.5-fold overrepresentation; 95% CI, 9.6- to 1189.2-fold; Fisher exact test, P = 5.2 × 10−8).

The DNM rate drops rapidly with distance from the crossover (Fig. 2, A, D, and E). Mutation rates of 6.9 times (95% CI, 4.8 to 10.0 times; jackknife-m, P = 3.7 × 10−24) and 11.9 times (95% CI, 7.4 to 19.2 times; jackknife-m, P = 1.8 × 10−24) the genomic average are observed within 1 to 3 kb from paternal and maternal crossovers, respectively. As the distance from DNMs to crossovers is resolved only up to the 682-bp median uncertainty in the crossover resolution, some of the DNMs that fall within the 1- to 3-kb window may actually be closer than 1 kb from the crossover.

At longer distances of 3 to 40 kb, a mutation rate of 2.2 times (95% CI, 1.6 to 3.1 times; jackknife-m, P = 1.6 × 10−6) the genomic average is still observed for maternal crossovers, whereas such an increase is not observed for paternal crossovers (Fig. 2E). At these distances, the increase in DNMs is seen with complex maternal crossovers, where we observe a mutation rate of 1.2 × 10−7 per base pair per generation (95% CI, 5.4 × 10−8 to 1.9 × 10−7), which is 49.7 times (95% CI, 27.5 to 90.0 times; jackknife-m, P = 3.6 × 10−38) the genomic average for mothers. We previously identified regions (C→G mutation–enriched regions) with high maternal DNM rates (27) characterized by clustered and age-related C→G DNMs. We find that DNMs within 3 to 40 kb of complex crossovers share these attributes: They are generally clustered (22.5-fold overrepresentation; 95% CI, 6.4- to 79.0-fold; jackknife-m, P = 1.1 × 10−6) and located in C→G mutation–enriched regions (8.8-fold overrepresentation; 95% CI, 2.4- to 31.6-fold). The maternal complex crossover rate is 2.1% (95% CI, 1.9 to 2.3%) within these C→G mutation–enriched regions, 1.95 times (95% CI, 1.82 to 2.09 times; bootstrap test, P < 0.002) the genomic average. Overall, this indicates that, in mothers, complex crossovers and age-related DNMs may be rooted in the same mechanisms.

The effect of maternal age on the recombination rate

For mothers, we observe an age effect on the recombination rate (Fig. 3A and table S8) that corresponds to an increase of 6.6 cM/year (95% CI, 5.6 to 7.7 cM/year; t test, P = 4.4 × 10−34), consistent with previous estimates (3032). A similar increase of 6.4 cM/year (t test, P = 1.3 × 10−31) is found when analyzing the children of the same mothers, demonstrating that with increasing age of the mother the oocytes that get fertilized and successfully carried to term have a larger number of crossovers. No age-related increase in the recombination rate was observed for fathers.

Fig. 3 Mother’s age and crossovers.

(A) The maternal recombination rate as a function of the mother’s age at birth. (B) Fraction of crossovers (including chromosome X) that are complex as a function of the mother’s age at birth. pval, P value.

The fraction of complex crossovers is dependent on maternal age (Fig. 3B) (t test, P = 1.7 × 10−19), increasing from 1.03% of all crossovers in mothers at 20 years to 1.66% in mothers at 40 years. An increase in complex crossovers with maternal age has been indicated previously (22), although a significant effect could be found only in a microarray dataset. Those data indicated a lower rate than we observed in this study, likely because of the lower marker density. In absolute terms, the number of complex crossovers transmitted by a mother increases by 1.5 cM (95% CI, 1.2 to 1.8 cM) per year. Consequently, whereas only 1% of a young mother’s crossovers are complex, 21% of the age-related increase in maternal crossovers is due to complex crossovers.

We also observed a greater increase (t test, P = 5.2 × 10−7) in the recombination rate correlated with maternal age (27) in regions with a large number of C→G mutations; the yearly increase is 0.31% inside C→G mutation–enriched regions and 0.14% outside. Moreover, linear regression reveals that the fraction of crossovers that are complex increases by 0.14% (95% CI, 0.11 to 0.17%) and 0.018% (95% CI, 0.011 to 0.025%) per year inside and outside C→G mutation–enriched regions, respectively, with a significant difference between regions (t test, P = 5.4 × 10−20).

Crossovers in older mothers occur less frequently in early-replicating regions (t test, P = 2.5 × 10−10); 54.7% of crossovers in a 40-year-old mother occur in the earlier-replicating half of the genome, compared with 55.5% in a 20-year-old mother. A similar pattern is observed for the association between crossovers and GC content: The average GC content values near the crossover location are 44.3 and 44.2% for 20- and 40-year-old mothers, respectively (t test, P = 5.1 × 10−7).

Genomic attributes coinciding with crossovers

To shed light on the genetic control of crossovers, we examined the correlation between the median location of each crossover (17) and various genomic attributes.

The initiating event of meiotic recombination is the formation of DSBs. We compared the locations of crossovers with a map of meiotic DSBs in human testes (9), referred to hereinafter as Pratto DSB regions. The locations of the DSBs are highly dependent on PRDM9 alleles; the Pratto DSB regions and crossovers thus reflect the alleles carried by the testis tissue donors used in the Pratto study (table S9). Although Pratto DSB regions represent only 2.95% of the genome, 68.4 and 52.5% of the observed paternal and maternal crossovers, respectively, fall within these regions, corresponding to relative crossover rates of 24.90 (95% CI, 24.87 to 24.93) and 18.93 (95% CI, 18.91 to 18.96) (Table 4). Maternal crossovers show a lower overrepresentation than paternal crossovers in Pratto DSB regions (table S10) (bootstrap test, P < 0.002), most likely because these regions were defined in testes and sex differences affect the location of DSB regions; measurements from ovaries were not available. Comparing our results with the locations of PRDM9 B-allele binding sites measured in human embryonic kidney (HEK) cell line 293T (33), referred to hereinafter as Altemose PRDM9 regions, we observed that within 500 bp of these sites, the relative crossover rates are 7.28 (95% CI, 7.27 to 7.30) for fathers and 7.12 (95% CI, 7.11 to 7.13) for mothers. For both datasets, the overrepresentation of crossovers is observed only within 1600 bp of the annotation peak (fig. S1).

Complex crossovers, like other crossovers, are more common than average for the genome at the Pratto DSB regions (9). In fathers, the rate is 21.8 (95% CI, 20.8 to 22.9) times the genomic average, and in mothers, 15.6 (95% CI, 15.1 to 16.2) times. This increase is, however, significantly less than that for all crossovers (bootstrap test, P < 0.002) (table S10). Similar results were obtained for maternal, but not paternal, complex crossovers and Altemose PRDM9 binding sites (table S10). The lower overrepresentation of complex crossovers in Pratto DSB regions and Altemose PRDM9 regions suggests that they are less dependent on programmed DSBs, although a bias may be introduced by the larger genomic regions defining the complex crossovers. The relative rate of complex crossovers is different from the rate of other crossovers in various parts of the genome (table S10).

Epigenetic factors play a role in determining crossovers (10), although their effect on the location of crossovers is not fully understood. We compared crossover locations with chromatin state annotations (from the software ChromHMM) and their constituent histone marks in adult ovaries on the basis of data generated by the Roadmap Epigenomics project (34) (Table 4 and table S11). These annotations were not available for the male gonads or embryonic ovaries, where crossovers occur. Crossovers occur less frequently in regions annotated as transcribed, reflected in the transcription-associated H3K36me3 and H4K20me1 (histone H4 methylation at lysine residue 20) histone marks. In contrast, crossovers are overrepresented in regions annotated as enhancers, reflected in the active H3K27ac (H3 acetylation at lysine residue 27) and H3K4me1 histone marks. In addition, the repressive polycomb group state is associated with an increased recombination rate, whereas the repressive chromatin states, annotated as heterochromatin and zinc-finger genes, are associated with a reduced recombination rate (Table 4). Crossovers were also increased in regions with H3K4me3 marks (Table 4), which is of particular interest because PRDM9 regulates some of these methylations (35). As expected, we see a higher relative crossover rate in locations with H3K4me3 marks because of PRDM9 (33) than in locations with other H3K4me3 marks (bootstrap test, P < 0.002) (Table 4 and table S10). Additionally, crossovers are associated with 5-hydroxymethylated DNA (36) and regions containing the retrotransposon THE1B (12, 37) (Table 4). The effect of these epigenetic factors on crossovers cannot be fully accounted for by different frequencies of Pratto regions within the annotated regions (table S11). This supports a role for epigenetic factors in influencing which DSBs lead to crossovers. However, the interplay between epigenetics and crossovers is more complex than can be resolved with the data presented here.

Consistent with results from previous studies, crossovers are associated with regions of high GC content (fig. S2) (12). The GC contents within 500 bp from a crossover are 3.2 and 3.4% greater than the genomic average in fathers and mothers, respectively. Crossovers in both sexes are also more pronounced in early-replicating regions (38), with 52.6% (95% CI, 52.5 to 52.7%) of paternal and 55.1% (95% CI, 55.0 to 55.1%) of maternal crossovers occurring in the earlier-replicating half of the genome. Moreover, crossovers are overrepresented near telomeres (fig. S3), particularly in males (11, 39).

Genome-wide associations

We performed genome-wide association studies (GWAS) of the recombination rate and four other phenotypes derived from attributes of crossover locations: the fraction of crossovers within recombination hotspots, the average distance (as a fraction of chromosome length) of crossovers from the closest telomere, the average GC content within 500 bp from crossover locations, and the average replication timing score of crossover locations. Phenotypes were normalized before association analysis (17), and for each phenotype, we performed parental sex-specific and joint GWAS. For identification of genome-wide significant associations, we applied thresholds that account for prior probability of association of the variants (40).

We found associations with these five phenotypes at 35 loci, 26 of which have not previously been described (table S12). Forty-seven variants show independent association at the 35 loci (Table 3 and table S13). Five of the 47 are low frequency [minor allele frequency (MAF) between 1 and 5%] and seven are rare (MAF < 1%) with large effects (minimum absolute effect = 0.24 SD). Seventeen of the 35 common variants are coding or splice region variants, whereas 7 of the 12 rare and low-frequency variants are coding (missense or predicted loss of function).

We found 20 loci influencing the recombination rate, 12 not previously reported (Table 5 and table S13). Three of these loci are captured by coding variants in MEIOB, H2BFM, and HFM1 and two splice region variants in CT45A9 and SYCE1. MEIOB encodes a meiosis-specific protein required for meiotic recombination and chromosomal synapsis (41, 42). Notably, MEIOB:p.Ile261Thr (frequency, 15.6%) increases the recombination rate in males but decreases the recombination rate in females. The p.Gln73Ter nonsense variant in H2BFM (frequency, 46.5%) associates with a lower recombination rate. H2BFM is a member of the histone H2B family, a nucleosome component involved in regulating chromatin structure. The H2BFM histone protein is ubiquitinated by RNF20 in meiosis at DSB sites, leading to chromatin relaxation and thereby enabling the recruitment of meiotic recombination proteins (43). HFM1:p.Ser115Pro (frequency, 29.4%) associates with a lower number of crossovers, with a stronger effect observed in males than in females (P value of heterogeneity, 0.0052). HFM1 is a DNA helicase, expressed mainly in germ cells (44). Mutations in the HFM1 gene have been linked to ovarian insufficiency (45). Its mouse homolog is required for crossover formation and the completion of synapsis of homologous chromosomes (44). The yeast homolog is involved in meiotic crossover (46) and catalyzes the unwinding of Holliday junctions (47). The splice region variant c.136+G→A in SYCE1 (frequency, 9.4%) is associated with an increased recombination rate. SYCE1 encodes a protein that, along with proteins encoded by SYCE2, SYCE3, SYCP1, SYCP2, and SYCP3, constitutes the synaptonemal complex (48), a structure that links homologous chromosomes during prophase I. SYCE1 is a component of the central element of the synaptonemal complex that mediates extension of the complex along chromosomes while also contributing to pairing and synapsis between homologous chromosomes in meiosis (49, 50).

Table 5 Loci or variants associating with recombination rate or location.

Association results for coding and splice region variants and for noncoding variants are summarized. The association analyses were performed separately for crossovers in males and females (as indicated by symbols) and jointly (j) for both sexes. The effect and P value for the phenotype with the strongest association are shown. The association strength is shown with the darkest applicable shade given its P value. Results are partitioned into three categories: recombination rate, those variants where an association is found only to recombination rate phenotypes at genome-wide significance (GWS) thresholds; location, those variants where an association is found only to location phenotypes at GWS thresholds; and mixed, those variants where an association is found to phenotypes in both categories at GWS thresholds. Abbreviations: Chr, chromosome; Pos, position; r2, coefficient of determination; Freq, frequency; pheno, phenotype; p, paternal; m, maternal; RR, recombination rate; RH, recombination hotspots; GC, GC content; TD, telomere distance; RT, replication timing; dHJ, double Holliday junction. Data are from (1, 8, 4244, 53, 5561, 63, 64, 68, 76, 77, 9597) as indicated.


Embedded Image

Embedded Image
View this table:

In addition to the PRDM9 locus, we found three loci associating with recombination hotspot usage (Table 5), represented by a marker upstream of C17orf104 associated with the recombination rate (51), a common variant near zinc-finger ZNF84 and ZNF140 genes on chromosome 12, and a common 3′ untranslated region (UTR) variant in C11orf80/TopoVIBL. At the PRDM9 locus, the strongest associations (52) (linear regression, P = 3.6 × 10−2382) are with variants that affect the DNA binding domain of PRDM9 (Table 5 and table S13), which alters the sequence-specific binding affinities of the protein (9). TopoVIBL encodes a protein that forms a heterodimer interacting with SPO11, involved in DSBs (53). The minor allele (MAF = 8.0%) of the 3' UTR variant of TopoVIBL associates with increased hotspot usage, and carriers of the variant express less RNA in testis (54), supporting a dosage-dependent effect on hotspot usage. TopoVIBL expression levels may affect hotspot usage through local control at PRDM9-induced DSB sites where interactions between TopoVIBL and SPO11 could be under tighter regulation than at PRDM9-independent sites.

To better understand the molecular mechanisms governing crossover location, we looked at associations with distance from telomeres, GC content, and replication timing and found 24 loci (Table 5) associating with these phenotypes. Seventeen of these loci are captured by missense variants (Table 5 and table S13). Fourteen are in genes that have been linked to meiosis and/or recombination: MSH4 (1), HFM1 (44), MAPT (55), C14orf39 (56), RAD21L1 (57), RNF212 (58), HUS1B (59), CTCFL (60), SYCE2 (1), SYCP3 (1), SMC1B (6163), FANCB (64), HSF2BP (65), and HORMAD1 (66) (Table 5). Notably, two of the remaining genes harboring coding variants, ANHX and PRAME, are preferentially expressed in germ cells (54), whereas EAPP is a cell cycle regulator (67) that has not been linked to meiosis.

The variants in SYCE2 and SMC1B associate with all three location phenotypes (Table 5). SYCE2 is a part of the central element of the synaptonemal complex and has been shown to be required for DSB repair and homologous recombination (68). In females, but not in males, the p.His89Tyr (frequency, 1.3%) mutation in SYCE2 associates with crossovers that occur in earlier-replicating DNA, closer to the telomere and in regions with higher GC content. p.Phe1055Leu (frequency, 5.1%) in SMC1B associates with crossovers in late-replicating regions with lower GC contents and greater distance to the telomere. SMC1B encodes SMC1β, a meiosis-specific cohesin (69) required for sister chromatid cohesion and recombination (61), with a role both in protection of telomeres by mediating their attachment to the nuclear envelope (70) and in synapsis-related functions (71). SMC1β influences the formation of the synaptonemal complex (63), regulates the organization of chromatin loops (72), and is likely involved in attaching PRDM9-occupied sites to the synaptonemal complex, facilitating DSB formation (73). In male carriers of SYCP3:p.Met66Thr, crossovers occur further from the telomere. SYCP3 encodes another synaptonemal protein, and mutations in this protein have been associated with recurrent spontaneous abortions (74).

The rare loss-of-function variants p.Thr327GlfnsTer18 (frequency, 0.089%) in HORMAD1 and p.Gly224Ter (frequency, 0.33%) in HSF2BP increase the distance of crossovers from telomeres. For HORMAD1:p.Thr327GlfnsTer18, the effect is observed in both sexes, whereas HSF2BP:p.Gly224Ter affects only crossovers in males (table S13). HORMAD1 is a key meiosis protein needed for DSB-associated processes (75), such as DMC1-dependent repair of DSBs to avoid intersister chromatid repair (66), as well as being required for synaptonemal complex formation (75, 76). As the loss-of-function variant in HORMAD1 does not associate with the recombination rate, the observed effects on crossover locations likely emerge through influences on synapsis. HSF2BP binds to HSF1 and HSF2, heat shock proteins required for oogenesis and spermatogenesis (65). HSF2 is required for synaptonemal complex formation (77). In our dataset of 155,250 genotyped Icelanders, we found one individual homozygous for HORMAD1:p.Thr327GlfnsTer18 and two siblings homozygous for HSF2BP:p.Gly224Ter. All three reached old age; the p.Thr327GlfnsTer18 homozygote was a female who lived to 75 years of age, and the p.Gly224Ter homozygotes were a male who lived to age 84 and a female who lived to age 88. None of them had any children, suggesting that these genes may have a role in fertility. This is consistent with mouse data, as both male and female Hormad1 knockout mice are sterile (75) and the Hsf2bp knockout males have small testes, consistent with HSF2BP:p.Gly224Ter affecting male crossover location (78).

Discussion

As the resolution of crossovers is inherently limited by the interval between heterozygous markers and as the WGS data used in this study capture almost all such markers, we expect that future genetic maps may yield only minor improvements in resolution. Our fine-scale genetic maps show the direct mutagenic effect of crossovers. Previous observations (79) with more limited resolution (201 kb) than that for our data indicated a more moderate (1.4-fold) increase in mutation rate than we observe (Fig. 2). Our results show a 50-fold increase within 1 kb of crossovers. This size range is reminiscent of the 750- to 1000-bp average resection zone observed at programmed DSBs in mice (80) and the 1464-bp average size of Pratto DSB regions, measuring regions of single-stranded DNA at DSBs in meiotic cells (9). DNMs near crossovers show an excess of C→T mutations, possibly because the cytosines in the single-strand DNA intermediates are prone to deamination (81) during the crossover formation and DSB repair. Within 1 kb of crossovers, the types of mutations differ considerably between the sexes. Paternal C→T mutations near crossovers are mainly in a CpG context, whereas maternal C→T mutations occur outside of a CpG context. C→T mutations in a CpG context are linked to deamination of methylated cytosines. Perhaps this difference is due to the sex-specific timing of meiosis in the germline development. In the male germ line, the genome is methylated before meiosis, whereas in the female germ line the methylation occurs after the meiotic arrest (82).

The molecular mechanisms responsible for the localization of crossovers are not fully understood, although the histone methyltransferase PRDM9 is known to play an important role (52, 83). However, PRDM9 is not necessary for DSB formation in mammals (8, 84). In this study, we identified enhancer elements and polycomb group–repressed regions as genomic attributes with an increased recombination rate. By contrast, crossovers are underrepresented in regions annotated as transcribed, particularly loci marked by H3K36me3 and H4K20me1. These data together with the increased DNM rate near crossovers suggest that through evolution, a mechanism emerged that guided crossovers toward regulatory regions and away from coding sequences. This would reduce the harmful effect of DNMs while at the same time promoting increased variation within regulatory elements.

In addition to a 3.2% increase in crossovers over two decades of maternal age, we find that the locations of crossovers are shifted toward later-replicating regions and regions of lower GC content with maternal age. In humans, the initiation of replication is correlated with open chromatin regions (85), and in many organisms, early- and late-replicated regions are determined by epigenetic modifications (86). It is conceivable that age-related loss of epigenomic integrity (87) is more pronounced in late-replicating regions. Thus, epigenomic changes, or possibly unrepaired lesions due to less efficient repair (88), may be among factors underlying the age-associated shift in the location of maternal crossovers.

We show that complex crossovers, although representing only a small fraction of all crossovers, account for a large fraction of the increase in recombination rate with maternal age. As it is not always possible to determine whether a crossover is complex, we cannot rule out the possibility that most, if not all, age-related crossovers are complex. Complex crossovers occur in large part within C→G mutation–enriched regions. These regions are correlated with an age-related increase in DNMs and gene conversion (22, 27), and we suspect that the age-related increases in maternal crossovers and DNMs may share an underlying mechanism, likely an age-related DNA damage response in oocytes. Our results lend support to the notion that crossovers accumulate in aging oocytes, although an alternate hypothesis is that having more crossovers increases the opportunity to overcome partial cohesion loss (89), leading to increased viability of oocytes with higher numbers of crossovers (31).

Notably, whereas some of the sequence variants identified in our GWAS affect both the recombination rate and various measures of crossover location in both sexes, other variants affect only one of the phenotypes or one sex (Table 5). Both the recombination rate and crossover locations are affected by coding variants in genes encoding components of the synaptonemal complex and genes implicated in its formation, suggesting that the synaptonemal complex is actively involved in regulating crossover distribution and rate. In general, the variants we describe in association with crossover location likely exert local effects on interactions that occur in the context of the synaptonemal complex and processes regulating the maturation of DSBs into either crossovers or noncrossovers. However, whether distance from the telomere, GC content, and replication timing directly affect the location of crossovers or whether our results reflect another attribute remains to be seen.

Unsuccessful crossover formation can result in chromosomal missegregation and aneuploidy and affect fertility, possibly through inefficient crossover maturation (90). None of the variants that we identified in this study associate with fertility, although three homozygous carriers of very rare loss-of-function variants in HORMAD1 and HSF2BP did not have any children. Cohesins are implicated in aneuploidy (49, 91). We identified a missense variant in SMC1B (encoding a cohesin) that associates with three different location phenotypes. Whether this variant or others identified in our association study associate with greater risk for aneuploidy remains to be seen.

Our results emphasize that recombination is mutagenic but can be shaped by gene conversion in future generations to reduce mutation load. Crossovers are necessary for the proper segregation of the chromosomes during meiosis, and to guarantee proper control of this process, the crossovers occur at highly regulated programmed locations. The mutagenic effect of crossovers, however, guarantees that individual hotspots will eventually erode, leading to greater diversity in the location of crossovers.

Materials and methods summary

We used SNP-chip–genotyped and whole-genome–sequenced Icelandic samples collected as part of disease association efforts at deCODE genetics. Using the Icelandic genealogical database, we identified SNP-chip–genotyped parent-child pairs. A total of 126,407 meioses were available for study, 70,086 maternal and 56,321 paternal. By using methodology previously described (27), similar to that in (25, 26, 92), DNMs were identified in 2976 WGS trios.

The locations of crossovers are determined from haplotype phase transitions in parent-proband pairs, at genetic markers where the parent is heterozygous. Initially the locations are determined by using chip-level data, with the two heterozygous markers closest to the crossover giving upper and lower bounds for the location. Sequence-level data are used to refine the locations when available for the proband. A genetic map was computed by using a bootstrap version of an expectation-maximization algorithm. The genetic map was then used to determine a median position for each crossover. Complex crossovers were discovered in 15,841 meioses. Phenotypes were computed from crossovers in the children and normalized, with all effects presented in standard deviations. All phenotypes were subjected to GWAS, both separately and jointly for the sexes. We tested for association on the basis of a linear mixed model implemented in BOLT-LMM (93). We used BOLT-LMM to calculate leave-one-chromosome-out (LOCO) residuals, which were then tested for association by simple linear regression. A generalized form of linear regression was used to test for the association of phenotypes with indels and SNPs. We assume that the phenotypes follow a normal distribution with a mean that depends linearly on the expected allele at the variant and a variance-covariance matrix proportional to the kinship matrix (94). Detailed methods are available in (17).

Supplementary Materials

www.sciencemag.org/content/363/6425/eaau1043/suppl/DC1

Materials and Methods

Figs. S1 to S14

Tables S1 to S22

References (98115)

Data S1 to S7

References and Notes

  1. Materials and methods are available as supplementary materials.
Acknowledgments: Funding: No outside funding was received for this work. Author contributions: G.P. implemented the methodology for determining crossovers with input from B.V.H., M.T.H., H.P.E., F.Z., S.A.G., M.L.F., G.T., G.M., and D.F.G. H.J. identified DNMs with input from B.V.H., G.P., O.A.S., H.P.E., and D.F.G. B.V.H., G.P., O.A.S., and H.J. analyzed the data with assistance from G.H.H., S.A.G., G.M., and D.F.G. B.V.H., O.A.S., and U.T. interpreted the association results with input from B.G., A.O., S.N.S., P.S., and D.F.G. B.V.H., G.P., O.A.S., H.J., U.T., and K.S. wrote the paper with input from S.N.S., P.S., A.H., and D.F.G. A.S. performed the polymerase chain reaction validation. B.V.H. and K.S. conceived and supervised the study. All authors approved the final version of the manuscript. Competing interests: All authors are employees of deCODE genetics, a subsidiary of Amgen. Data and materials availability: The crossover and DNM data used in this work are available in the supplementary materials. Microarray SNP genotypes and WGS data from Icelanders cannot be made publicly available, as Icelandic law and the regulations of the Icelandic Data Protection Authority prohibit the release of individual-level and personally identifying data. Access to these data can be granted only at the facilities of deCODE genetics in Iceland, subject to Icelandic laws regarding data usage. Anyone wanting to gain access to Icelandic data should contact B.V.H. (bjarni.halldorsson@decode.is) or K.S. (kstefans@decode.is).
View Abstract

Subjects

Navigate This Article