Research Article

Germline selection shapes human mitochondrial DNA diversity

See allHide authors and affiliations

Science  24 May 2019:
Vol. 364, Issue 6442, eaau6520
DOI: 10.1126/science.aau6520

Heteroplasmy incidence in mitochondrial DNA

In humans, mitochondrial DNA (mtDNA) is predominantly maternally inherited. mtDNA is under selection to prevent heteroplasmy—the transmission of multiple genetic variants into the next generation. Wei et al. explored human mtDNA sequences to determine mtDNA genome structure, selection, and transmission. Whole-genome sequencing revealed that about 45% of individuals carry heteroplasmic mtDNA sequences at levels greater than 1% of their total mtDNA. Furthermore, studies of more than 1500 mother-offspring pairs indicated that the female line selected which mtDNA variants were passed on to children. This effect was influenced by the mother's nuclear genetic background. Thus, mtDNA is under selection at specific loci in the human germ line.

Science, this issue p. eaau6520

Structured Abstract


Only 2.4% of the 16.5-kb mitochondrial DNA (mtDNA) genome shows homoplasmic variation at >1% frequency in humans. Migration patterns have contributed to geographic differences in the frequency of common genetic variants, but population genetic evidence indicates that selection shapes the evolving mtDNA phylogeny. The mechanism and timing of this process are not clear.

Unlike the nuclear genome, mtDNA is maternally transmitted and there are many copies in each cell. Initially, a new genetic variant affects only a proportion of the mtDNA (heteroplasmy). During female germ cell development, a reduction in the amount of mtDNA per cell causes a “genetic bottleneck,” which leads to rapid segregation of mtDNA molecules and different levels of heteroplasmy between siblings. Although heteroplasmy is primarily governed by random genetic drift, there is evidence of selection occurring during this process in animals. Yet it has been difficult to demonstrate this convincingly in humans.


To determine whether there is selection for or against heteroplasmic mtDNA variants during transmission, we studied 12,975 whole-genome sequences, including 1526 mother–offspring pairs of which 45.1% had heteroplasmy affecting >1% of mtDNA molecules. Harnessing both the mtDNA and nuclear genome sequences, we then determined whether the nuclear genetic background influenced mtDNA heteroplasmy, validating our findings in another 40,325 individuals.


Previously unknown mtDNA variants were less likely to be inherited than known variants, in which the level of heteroplasmy tended to increase on transmission. Variants in the ribosomal RNA genes were less likely to be transmitted, whereas variants in the noncoding displacement (D)–loop were more likely to be transmitted. MtDNA variants predicted to affect the protein sequence tended to have lower heteroplasmy levels than synonymous variants. In 12,975 individuals, we identified a correlation between the location of heteroplasmic sites and known D-loop polymorphisms, including the absence of variants in critical sites required for mtDNA transcription and replication.

We defined 206 unrelated individuals for which the nuclear and mitochondrial genomes were from different human populations. In these individuals, new population-specific heteroplasmies were more likely to match the nuclear genetic ancestry than the mitochondrial genome on which the mutations occurred. These findings were independently replicated in 654 additional unrelated individuals.


The characteristics of mtDNA in the human population are shaped by selective forces acting on heteroplasmy within the female germ line and are influenced by the nuclear genetic background. The signature of selection can be seen over one generation, ensuring consistency between these two independent genetic systems.

Germline selection of human mitochondrial DNA is shaped by the nuclear genome.

(Top left) Initially, new mtDNA variants are heteroplasmic, with the proportion changing during maternal transmission. (Bottom left) Selection during the transmission of mtDNA heteroplasmy. Transmitted variants are more likely to have been seen before as homoplasmic polymorphisms. (Right) New haplogroup-specific variants are more likely to match the nuclear genetic ancestry than the mtDNA ancestry.


Approximately 2.4% of the human mitochondrial DNA (mtDNA) genome exhibits common homoplasmic genetic variation. We analyzed 12,975 whole-genome sequences to show that 45.1% of individuals from 1526 mother–offspring pairs harbor a mixed population of mtDNA (heteroplasmy), but the propensity for maternal transmission differs across the mitochondrial genome. Over one generation, we observed selection both for and against variants in specific genomic regions; known variants were more likely to be transmitted than previously unknown variants. However, new heteroplasmies were more likely to match the nuclear genetic ancestry as opposed to the ancestry of the mitochondrial genome on which the mutations occurred, validating our findings in 40,325 individuals. Thus, human mtDNA at the population level is shaped by selective forces within the female germ line under nuclear genetic control, which ensures consistency between the two independent genetic lineages.

Primarily inherited from the maternal line, the 16.5-kb human mitochondrial DNA (mtDNA) genome acquired mutations sequentially after modern humans emerged out of Africa (13). Pedigree and phylogenetic analyses have estimated a de novo mtDNA nucleotide substitution rate of ~10−8 substitutions per base pair per year (4). However, from 30,506 mitochondrial genome sequences from across the globe (5), only 2.4% of nucleotides show genetic variation with >1% frequency within a population (Fig. 1). Although the idea has been the subject of debate (6, 7), selection might contribute to the nonrandom distribution of common variants across the mitochondrial genome in the human population.

Fig. 1 Circos plot of mitochondrial heteroplasmic variants identified in 1526 mother–offspring pairs.

Circles from the outside to the inside indicate the following: (i) position of a variant on the mtDNA (removed regions are represented by red crosses); (ii) minor allele frequency for common variants (MAF>1%) derived from 30,506 NCBI mtDNA sequences (5) (the radial axis corresponds to the MAF); (iii) phastCons100way scores from UCSC (47) (the radial axis corresponds to the degree of conservation); (iv) heteroplasmic variants identified in the mothers (the radial axis corresponds to the HF); (v) regions corresponding to the different mtDNA genes (yellow, D-loop; purple, coding region; green, rRNAs; orange, tRNAs); and (vi) heteroplasmic variants identified in the offspring (the radial axis corresponds to the HF).

Heteroplasmic mtDNA variants are common and maternally inherited

We analyzed high-depth mtDNA sequences from 1526 mother–offspring pairs (mean depth in the mothers = 1880×, range = 249× to 7454×; mean depth in the offspring = 1901×, range = 259× to 7475×; mothers versus offspring, P = 0.49, two-sample t test) (fig. S1). We called homoplasmic and heteroplasmic mtDNA variants from whole-blood DNA sequence data (8, 9) and filtered out heteroplasmic calls that are likely attributable to errors (911). We identified a mixed population of mtDNA (heteroplasmic variants) with a heteroplasmic variant allele frequency (VAF) >1% with high confidence in 47.8% of mothers (1043 heteroplasmic variants at 812 sites) and 42.5% of offspring (893 heteroplasmic variants at 693 sites) (Fig. 1, table S1, and data S1). In 22 individuals for which the whole genome was independently sequenced twice, the heteroplasmic mtDNA calls were 96.4% concordant (fig. S2) (9). As expected (12, 13), there was a small but significant positive correlation between the number of heteroplasmic variants and the mother’s age [P = 6.42 × 10−11, R2 = 0.17, 95% confidence interval (CI) = 0.12 to 0.23, Pearson’s correlation] (fig. S3), with mothers having more heteroplasmic variants than offspring (mean number in the mothers = 0.68, range = 0 to 6; mean number in the offspring = 0.58, range = 0 to 4; P = 0.002, effect size = 0.68, Wilcoxon rank sum test) (Fig. 2A).

Fig. 2 Transmission of heteroplasmic mtDNA variants in 1562 mother–offspring pairs.

(A) Frequency distribution of heteroplasmic variants in mothers and offspring. (B) Distribution of HF in mothers and offspring. (C) Scatter plot of logit(HF) in transmitted heteroplasmic variants between mothers and offspring (R2 = 0.79, P = 1.52 × 10−93, Pearson’s correlation). (D) (Left) Difference in the percentage shift of HF between offspring and the corresponding mothers (HFoffspring − HFmother) ordered by the degree of shift. (Right) Distribution of the difference of the percentage shift of HF between offspring and the corresponding mothers (HFoffspring − HFmother). (E) (Left) Log2 ratio of HF difference between offspring and the corresponding mothers ordered by the degree of the ratio. Three variants with HS > 6 are shown in the inset. (Right) Distribution of log2 ratio of HF difference between offspring and the corresponding mothers. (F) Log2 ratio of HF difference between offspring and the corresponding mothers aligned to the whole mitochondrial DNA sequence. mtDNA regions are shown at the bottom bar in different colors (yellow, D-loop; purple, coding region; green, rRNAs; orange, tRNAs).

We defined three categories of heteroplasmic variants: (i) transmitted or inherited, if the variant was present in the mother and the offspring and was heteroplasmic in at least one of the two; (ii) lost, if the heteroplasmic variant was present in the mother but not detectable in the offspring; and (iii) de novo, if the heteroplasmic variant was present in the offspring but not detectable in the mother (table S1) (9). Because our sequencing and bioinformatics pipeline may have missed very low level heteroplasmies (<1% VAF), lost and de novo variants could potentially be present at very low levels in, respectively, the offspring’s and mother’s germ line. The heteroplasmic fraction (HF) of transmitted heteroplasmic variants (mean HF = 19.5%, SD = 13.9%) was significantly higher than the HF of lost variants (mean HF = 5.6%, SD = 6.3%) in the mothers (P < 2.2 × 10−16, effect size = 4.24, Wilcoxon rank sum test), and the HF of inherited heteroplasmic variants (mean HF = 19.8%, SD = 14.1%) was significantly higher than that of de novo variants (mean HF = 6.2%, SD = 7.4%) in the offspring (P < 2.2 × 10−16, effect size = 4.06, Wilcoxon rank sum test) (Fig. 2B and table S1). The HF of transmitted variants in the offspring strongly correlated with the corresponding maternal level (P = 1.52 × 10−93, R2 = 0.79, 95% CI = 0.75 to 0.82, Pearson’s correlation) (Fig. 2C). In total, 477 de novo heteroplasmic variants not seen in the mother were observed at >1% HF in the offspring, consistent with previous estimates (13). To ensure that these data were not due to technical errors, we determined whether any heteroplasmic variants in the offspring were also present in their fathers. Among 313 father–offspring pairs, the offspring harbored 196 heteroplasmic variants with HF >1%, and only 1 of these was also observed in the corresponding father. This was a common population variant [population minor allele frequency (MAF) = 25.8% (5)] in the displacement (D)–loop region (m.152T>C), which was homoplasmic in the father and had an HF of 12.4% in his child. The alternate allele was not detected in the mother, which suggests that this region is a recurrent site of mutation or that the variant is conceivably due to the paternal transmission of mtDNA.

The difference between HF in mothers and their offspring can be measured in percentage points (Fig. 2D) (14). This metric is limited by the difference between the HF of the offspring and that of the mother, as well as by the boundaries 0 and 100%. The magnitude of the percentage change differs from the magnitude of the fold change in VAF. For example, a change from 50 to 55% would be given the same value as a change from 1 to 6%, even though the latter implies a sixfold increase in the proportion of mtDNA carrying the alternate allele. We therefore studied the log2 ratio of HF between offspring and mothers after imputation of HF values below 1% to our detection threshold of 1% [subsequently termed the heteroplasmy shift (HS)] (Fig. 2, E and F) (9), which shrunk HSs toward zero only when the true HF in either the mother or the offspring was below 1%.

Overall, there was no significant difference between the number of heteroplasmic variants with a positive (n = 731) and a negative (n = 798) HS (P = 0.091, binomial test). The HS distribution around zero was moderately symmetric and yielded a marginal P value for asymmetry (P = 0.05, one sample t test) (Fig. 2, D and E), consistent with random segregation of mitochondria during meiosis (14, 15). All of the HSs were <6 in magnitude, corresponding to a <64-fold increase or decrease in HF across one generation, with three exceptions. De novo variants at m.57T>C (HF = 99.3%), m.8993T>G (HF = 82.1%), and m.14459G>A (HF = 93.6%) were detected in three unrelated offspring and were not present in the corresponding mothers (figs. S4 to S6). m.14459G>A is a nonsynonymous (NS) variant in ND6 that, on the basis of evidence from previously published pedigrees (16, 17), causes Leber hereditary optic neuropathy and Leigh syndrome (dystonia). m.8993T>G is a NS variant in ATP6 (L156R) that has been observed on many independent occasions in Leigh syndrome or neurogenic ataxia with retinitis pigmentosa (1719). Although these extreme HSs could reflect differences in the mechanism of transmission for pathogenic mtDNA mutations (20), ascertainment is a more likely explanation because participants with childhood-onset neurodegenerative diseases were recruited as part of this study (21). Ascertainment bias is not likely to explain the de novo occurrence of m.57T>C, but these findings indicate that extreme HSs at moderate HFs are not typical among humans.

As expected, the noncoding D-loop had the highest substitution frequency (7.64 × 10−5 per base per genome per transmission) of all the regions in the mitochondrial genome (Fig. 3A and table S2) (13). In total, we observed 16 of 57 previously defined (5) pathogenic mutations in the 1526 mother–offspring pairs (Fig. 3B). After excluding m.14459G>A and m.8993T>G, where the extreme HS likely reflects ascertainment bias, the mean HS for the remaining 14 pathogenic mutations was not significantly different from zero (P = 0.22, one sample t test) nor from the mean HS for the remaining 1076 nonpathogenic variants (P = 0.11, two sample t test). Thus, overall we did not see a strong signature of selection for or against pathogenic alleles, although our statistical analysis does not preclude that a subset of the observed pathogenic alleles may be under selection. Notably, only three mothers carried the most common heteroplasmic pathogenic mutation, m.3243A>G (22), each with a low HF (5.2, 3.6, and 1.7%) that decreased to levels below our detection threshold in two of the three corresponding offspring (3.9, <1, and <1%). Six of the 16 pathogenic mutations were not detectable in the mothers, yielding a de novo mutation rate for known pathogenic mutations of 393 per 100,000 live births (95% CI = 144 to 854), which is ~3.7-fold higher than the previously reported rate (23).

Fig. 3 Characteristics of the heteroplasmic mtDNA variants in 1562 mother–offspring pairs.

(A) Mutation rate of mtDNA genomic regions was estimated using 477 de novo heteroplasmic variants from 1526 mother–offspring pairs detected at HF > 1%. Vertical axes represent 1/log2(mutation rate) per base per mother–child transmission. mtDNA genomic regions are labeled and illustrated in different colors (yellow, D-loop; purple, coding region; green, rRNAs; orange, tRNAs). All tRNAs were combined to estimate the tRNA mutation rate. [Note that this is the raw number of new mutations per base pair per transmission detected at HF > 1% in the offspring and does not factor in the detection threshold nor segregation because current models assume neutrality (13, 48), which we later show is not the case.] (B) Pathogenic mutations were observed in 1526 mother–offspring pairs. Each dot represents the HF in the mothers (blue) and the corresponding offspring (pink); the directions of the arrow show positive (→) or negative (←) HS; the length of the arrow between each pair of points represents the change in HF (orange, transmitted heteroplasmic variants; gray, de novo and lost heteroplasmic variants). (C) Frequency of heteroplasmic variants at CpG and non-CpG islands. (D) Previously unidentified versus known transmitted, lost, and de novo heteroplasmic variants. (E) Distribution of HS between offspring and corresponding mothers in transmitted known and previously unknown heteroplasmic variants. (F) Frequency of haplogroup-defining variants in transmitted, lost, and de novo heteroplasmic variants. The transmitted heteroplasmic variants were more likely to affect known haplogroup-defining variants on the world mtDNA phylogeny than the lost and de novo heteroplasmic variants (P = 7.86 × 10−11 and 0.0016, respectively, Fisher’s exact test). *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.

To gain insight into possible mutational mechanisms, we determined the trinucleotide mutational signature. As shown previously, C>T and T>C substitutions were the most common type of substitution in homoplasmic variants (5) and cancer somatic mtDNA mutations (24). For heteroplasmic variants, C>T and T>C substitutions were also predominant, which suggests that germline transmission shapes the mutational signatures seen in homoplasmic variants at the population level. However, we also observed a small but significant excess of C>A, C>G, T>A, and T>G substitutions (P < 2.2 × 10−16, odd ratio = 0.36, 95% CI = 0.29 to 0.44, Fisher’s exact test) (fig. S7). Additionally, de novo mutations were more likely to involve a CpG-containing trinucleotide (P = 3.01 × 10−6, odd ratio = 0.50, 95% CI = 0.38 to 0.66, Fisher’s exact test) (Fig. 3C). Although controversial (25), this could be because methylation of NpCpG sites on the mtDNA genome predisposes to de novo mtDNA mutations, as seen in the nuclear genome.

Known mtDNA variants are more likely to be transmitted than previously unknown variants

We then compared known heteroplasmic variants (i.e., those seen before in the general population) and those not previously observed. Variants were considered previously unknown if they were absent from the 1000 Genomes datasets and the Single Nucleotide Polymorphism Database (dbSNP) and were seen in at most one individual among 30,506 NCBI mtDNA sequences (5). Previously unidentified heteroplasmic variants were 4.7-fold less commonly transmitted from mother to offspring than known variants (P = 3.55 × 10−13, odd ratio = 2.60, 95% CI = 1.97 to 3.45, Fisher’s exact test), and the HS for transmitted known variants was more likely to be positive (P = 0.0002, probability = 0.40, 95% CI = 0.35 to 0.45, binomial test) (Fig. 3, D and E). Also, the transmitted heteroplasmic variants were more likely to affect known haplogroup-defining sites (26) than the lost and de novo heteroplasmic variants (P = 7.86 × 10−11, odds ratio = 0.40, 95% CI = 0.30 to 0.53, and P = 0.0016, odds ratio = 0.62, 95% CI = 0.46 to 0.84, respectively, Fisher’s exact test) (Fig. 3F). This suggests that factors may modulate the transmission of mtDNA heteroplasmy within the female germ line over a single generation and influence the likelihood that they become established within human mtDNA populations. Because heteroplasmic variants are acquired throughout life, they must be removed at transmission to offspring at a higher rate than they appear de novo; otherwise, each generation would be accompanied by an expected increase in potentially deleterious heteroplasmic variants (27). Correspondingly, the number of previously unknown variants present in the mother but not transmitted (lost variants) exceeded the number of de novo unknown variants detected in the offspring (P = 7.93 × 10−7, probability = 0.62, 95% CI = 0.57 to 0.67, binomial test) (Fig. 3D), in part reflecting the accumulation of heteroplasmic variants with increasing maternal age (fig. S3).

Selection for and against heteroplasmy in different genomic regions

We analyzed different functional regions of the genome and found evidence indicating region-specific selection for or against heteroplasmic variants. The distributions of HF in the 1526 mother–offspring pairs were significantly different between the D-loop, ribosomal RNA (rRNA), tRNA, and coding regions (Fig. 4A and table S3). Within the coding region, the NS and synonymous (SS) variants also had different distributions (P = 2.74 × 10−5, Kolmogorov-Smirnoff test). The NS/SS ratio was greater for the heteroplasmic variants than for the homoplasmic variants (P = 3.98 × 10−24, odds ratio = 1.91, 95% CI = 1.68 to 2.18, Fisher’s exact test), and the de novo and lost heteroplasmic variants had a higher NS/SS than the transmitted variants (transmitted versus de novo: P = 0.0056, odds ratio = 1.69, 95% CI = 1.15 to 2.48; transmitted versus lost: P = 0.01, odds ratio = 1.57, 95% CI = 1.10 to 2.24, Fisher’s exact test) (Fig. 4B). The heteroplasmic variants were more often in conserved sites than the homoplasmic variants (P = 3.71 × 10−77, odds ratio = 3.21, 95% CI = 2.86 to 3.60, Fisher’s exact test), and the transmitted heteroplasmic variants were less conserved than the de novo (P = 0.0018, odds ratio = 1.62, 95% CI = 1.19 to 2.22, Fisher’s exact test) and lost (P = 9.60 × 10−9, odds ratio = 2.25, 95% CI = 1.69 to 3.03, Fisher’s exact test) heteroplasmic variants (Fig. 4C). Also, heteroplasmic variants with a positive HS were less conserved than those with a negative HS (P = 0.03, odds ratio = 1.28, 95% CI = 1.01 to 1.61, Fisher’s exact test). Variants in the rRNA genes were more likely to show a decrease in the heteroplasmy level on transmission than an increase (P = 1.00 × 10−4, probability = 0.65, 95% CI = 0.57 to 0.72, binomial test) (Fig. 4D), and the mean HS was significantly less than zero (P = 8.21 × 10−5, d = 0.30, one sample t test) (Fig. 4E).

Fig. 4 Evidence of selection during the transmission of mtDNA heteroplasmy in 1526 mother–offspring pairs.

(A) Cumulative distributions of HF in mothers and offspring within each mtDNA region. Vertical lines between two curves indicate the greatest distance between specific regions (D-loop, SS, NS, rRNA, and tRNA) (P values in table S3). (B) Ratio of NS and SS variants for observed homoplasmic polymorphisms; total heteroplasmic variants; and transmitted, lost, and de novo heteroplasmic variants. (C) Frequency of heteroplasmic variants affecting conserved and nonconserved sites. (D) Number of heteroplasmies showing an increased or decreased HF in each mtDNA region. Left-pointing arrows indicate that the number increasing was less than the number decreasing. Right-pointing arrows indicate that the number increasing was greater than the number decreasing. (E) Histograms of HS in each mtDNA region with fitted kernel density curves. (F) Bar plot of the frequency of transmitted heteroplasmic variants by bins of HF in the mothers. (G) Frequency of transmitted heteroplasmic variants in each mtDNA region, along with 95% CIs. (H) Receiver operating characteristic curve for the logistic regression model of transmission [area under the curve (AUC) = 0.857]. *P < 0.05, **P < 0.01, ***P < 0.001, and ****P < 0.0001.

To understand the determinants of transmission of heteroplasmic variants with a reduced risk of confounding, we used multivariable logistic regression to model the probability of transmission across all 1526 mother–offspring pairs (9). We modeled the transmission probability of a variant as a function of its HF in the mother, the identity of the mitochondrial genome region containing it, and its status as known or previously unknown (Figs. 3D and 4, F to H) (9). The probability that a heteroplasmic variant in the mother was transmitted to her offspring was associated with its HF in the mother (P < 2.2 × 10−16, coefficient estimate = 1.17, SD = 0.08, logistic regression) (Fig. 4F). Variants in the D-loop were more likely to be transmitted (P = 0.04, coefficient estimate = 0.39, SD = 0.19, logistic regression) than average, and those in the rRNA were less likely to be transmitted (P = 0.0026, coefficient estimate = −0.94, SD = 0.31, logistic regression) than average (Fig. 4G). The previously unrecognized variants were less likely to be transmitted than the known variants (P = 0.028, coefficient estimate = 0.43, SD = 0.19, logistic regression) (Fig. 3D), even after accounting for all other covariates, including HF in the mothers.

Heteroplasmic variants in the noncoding D-loop

To cast light on the possible effects of selection on the noncoding D-loop, we derived a high-resolution map of heteroplasmic variants in 12,975 individuals, which included the 1526 mother–offspring pairs (mean mtDNA genome depth = 1832×, SD = 945×; mean depth of D-loop = 1569×, SD = 819×) (Fig. 5, A to C, and fig. S8) (9). We found an association between the homoplasmic allele frequency among 30,506 NCBI mtDNA sequences and the proportion of individuals heteroplasmic for the same allele (P < 2.2 × 10−16, logistic regression) (Fig. 5, A to C), similar to that previously observed (5). Of the 17 regions in the D-loop (Fig. 5C, bottom, purple and orange bars), 2 had a significantly greater number of heteroplasmic variants than expected by chance. These regions correspond to the proposed replication fork barrier associated with the D-loop termination sequence (MT-TAS2) (28) and MT-CSB1 (MT-TAS2: P = 4.5 × 10−11, odds ratio = 0.40, 95% CI = 0.30 to 0.54; MT-CSB1: P = 7.0 × 10−6, odds ratio = 0.39, 95% CI = 0.24 to 0.61, Fisher’s exact test versus remainder of the D-loop).

Fig. 5 Distribution of heteroplasmic variants in mtDNA D-loop region.

(A) MAF of homoplasmic single-nucleotide polymorphisms observed in 30,506 NCBI mtDNA sequences, with an expanded axis to show MAF < 10% at the bottom. (B) Trend of PhastCons scores shown across the mtDNA D-loop region. (C) HFs observed in 12,975 mtDNA sequences in the D-loop region. MT-TAS2 and MT-CSB1 are shadowed in light purple. MT-LSP is shadowed in light orange. Corresponding known subregions of the mtDNA D-loop are shown at the bottom. MT-3H, mt3 H-strand control element; MT-3L, L-strand control element; MT-4H, mt4 H-strand control element; MT-7SDNA, 7S DNA; MT-CSB1, conserved sequence block 1; MT-CSB2, conserved sequence block 2; MT-CSB3, conserved sequence block 3; MT-HPR, replication primer; MT-HSP1, major H-strand promoter; MT-HV1, hypervariable segment 1; MT-HV2, hypervariable segment 2; MT-HV3, hypervariable segment 3; MT-LSP, L-strand promoter; MT-OHR, H-strand origins; MT-OHR57, H-strand origin; MT-TAS, termination-associated sequence; MT-TAS2, extended termination-associated sequence; MT-TFH, MT-TFL, MT-TFX, MT-TFY, mtTF1 binding site (49). (D) Trinucleotide mutational signature of heteroplasmic variants in the D-loop region in 12,975 mtDNA sequences. Different colored bars represent the six types of substitutions. Labeled heteroplasmic variants are included for the bars circled in red. (E) Simplified mtDNA phylogenetic tree showing six heteroplasmic variants (see main text). Variants are shown in red; haplogroups are in blue. Pie chart sizes are proportional to the number of samples (shown at the bottom) belonging to the corresponding haplogroup in 10,210 unrelated mtDNA sequences. The proportion of samples carrying each heteroplasmic variant within the same haplogroup is shown in yellow. (F) HF of six heteroplasmic variants shared by more than one individual belonging to the same haplogroup.

To help us understand the evolution of the D-loop, we identified all heteroplasmic variants not identified on mtDNA phylogenies across a subset of 10,210 unrelated individuals from the original dataset (9). Five of these heteroplasmic variants were shared by more than one individual and were present exclusively in people within a particular haplogroup (Fig. 5, D and E). One variant (m.16237A>T) was present in multiple individuals from two different branches of the phylogeny (L0a1&2 and M35b2) (Fig. 5, D and E). Compared with homoplasmic sequences from across the world (5), only m.299C>A was observed previously as a homoplasmic variant (in 3 of 30,506 individuals), each time on the R30b1 haplogroup background. This suggests that, in our study, individuals heteroplasmic for m.299C>A (Fig. 5F) descended from the same maternal ancestor as the three homoplasmic individuals seen previously (5) and belonged to a closely related maternal lineage that had not yet reached fixation. These recurrent heteroplasmies contributed to the distinct trinucleotide mutational signature of the D-loop (P = 2.3 × 10−137, Stouffer’s method for combining Fisher P values), which involves prominent noncanonical substitutions and is consistent with the conclusion that the homoplasmic trinucleotide mutational signature of mtDNA is shaped by germline transmission of heteroplasmic variants (Fig. 5D and fig. S9).

We observed an absence of low-level heteroplasmic variants in critical sites required for the initiation of mtDNA transcription and replication. These zones include several conserved sequence boxes and the light-strand promoter (MT-LSP: P = 7.7 × 10−18, odds ratio = 10.12, 95% CI =5.43 to 20.31, Fisher’s exact test), which are required for mtDNA transcription and mtDNA replication (29). Certain regions with no known function (30) (e.g., 16,400 to 16,500 bp; Fig. 5C) also had a complete lack of low-level heteroplasmic variants, which suggests that an intact sequence at these regions is essential for mitochondrial function and perhaps genome propagation. The coordinates of the conserved and nonconserved regions provide a guide for functional studies of the mtDNA D-loop, which has been incompletely characterized to date.

The nuclear genetic background influences the heteroplasmy landscape

Most of the ~1500 known mitochondrial proteins are synthesized from the nuclear genome, including the majority of polypeptide subunits of the oxidative phosphorylation system and the machinery required to replicate and transcribe the mitochondrial genome in situ (1). Selection for or against specific mtDNA variants must therefore occur in the context of a specific nuclear genetic background. To explore this, we identified 12,933 individuals for whom a confident mtDNA haplogroup could be predicted (fig. S10). We compared the haplogroup of each individual with the corresponding nuclear genetic ancestry and identified three distinct groups of individuals: (i) a haplogroup-matched group (n = 11,867, 91.7%), in which the mtDNA haplogroup was concordant with the nuclear ancestry; (ii) a haplogroup-mismatched group (n = 295, 2.3%), in which the nuclear ancestry and mtDNA were from different human populations; and (iii) a group for which the nuclear ancestry could not be reliably determined (n = 771, 6.0%) (Fig. 6, A and B, and fig. S10). Subsequent analyses focused on the haplogroup-matched and -mismatched groups (9).

Fig. 6 Characteristics of heteroplasmic variants in the nuclear ancestry and mtDNA ancestry matched and mismatched groups.

(A) Schematic showing how individuals with matched (MG; red border) and mismatched (MMG; green border) nuclear and mtDNA genomes arise over generations. Red and gray mtDNAs represent two different hypothetical populations. (B) (I) Projection of the nuclear genotypes at common SNPs onto the two leading principal components (PC1 and PC2) computed with the 1000 Genomes dataset, with individuals colored by their assigned nuclear ancestry: Asian (blue), African (green), European (red), and other (orange). (II to IV) Individuals represented by blue, green, and red symbols are also shown in panels II, III, and IV, respectively, where they are colored by their mitochondrial ancestries. Stars indicate that the mitochondrial ancestry does not match the nuclear ancestry. (C) Proportion of haplogroup-defining variants in the matched group (MG) and mismatched group (MMG) in 9631 mtDNA sequences from unrelated individuals. The expected proportion is shown at left. Distinct heteroplasmic sites were more likely to affect known haplogroup-defining variants (25) than the rest of the mitochondrial genome compared with that expected by chance (P < 2.2 × 10−16, Fisher’s exact test). This bias was stronger in the mismatched group than the matched group (P = 0.001, Fisher’s exact test). (D) Heatmaps showing the density of observed heteroplasmic mtDNA haplogroup-specific variants in the observation (left) and validation (right) datasets. The matched (top) and mismatched (bottom) groups are broken down by the nuclear ancestry of the carrier and the major haplogroup of the variants. The width of each column is proportional to the number of variants defining each of the two major haplogroups (Asian and European). Within each heatmap, the height of each row is proportional to the number of individuals having each nuclear ancestry. The density of heteroplasmic variants in each cell determines its color.

In the matched group, 8159 heteroplasmic variants at 3854 of the 16,569 distinct sites on the mitochondrial genome were present, and in the mismatched group, 195 heteroplasmic variants at 163 distinct sites were present. The mean number of heteroplasmic variants and mean HF were not statistically different between the matched and mismatched groups (fig. S11). Next, we studied distinct heteroplasmic sites in 10,179 of the 12,933 individuals who were not related on the basis of their nuclear genome (9414 in the matched group, 217 in the mismatched group, and 548 in the third group). Distinct heteroplasmic sites were more likely to affect known haplogroup specific sites (26) than the rest of the mitochondrial genome (P < 2.2 × 10−16, Fisher’s exact test), particularly within the mismatched group (P = 0.001, odds ratio = 1.70, 95% CI = 1.22 to 2.36, Fisher’s exact test) (Fig. 6C).

We extracted 2641 haplogroup-specific variants present in only one superpopulation (European, Asian, or African) on the world mtDNA phylogeny (26). We built a predictive model of transmission of these variants using logistic regression in 9385 unrelated European and Asian nuclear ancestries using 2215 European-specific (n = 940) and Asian-specific (n = 1275) variants on the mtDNA phylogeny, omitting the Africans because of the diversity and small number (figs. S10 and S12) (9). We included the superpopulation and the logit population allele frequency as covariates. We also included a dummy variable indicating whether the variant matched the mitochondrial ancestry of the individual carrying the variant. Finally, for the matched and mismatched groups, we included a separate variable indicating whether the variant superpopulation matched the nuclear ancestry of the individual who carried the variant.

We fitted the model to 768 heteroplasmic variants in 9179 unrelated matched individuals and 30 heteroplasmic variants in 206 unrelated mismatched individuals (9). The heteroplasmic variants in the mismatched group were significantly more likely to match the ancestry of the nuclear genetic background than the mtDNA background on which the heteroplasmy occurred (P = 2.9 × 10−4, coefficient estimate = 0.85, SD = 0.24, logistic regression) (Fig. 6D and table S4). These findings suggest that the previously unidentified mtDNA variants underwent selection to match the nuclear genome. Given the high mutation rate of the mitochondrial genome and the patterns we observed over one generation, the selective process is likely to occur within the female germ line.

To independently validate this finding, we repeated our analysis with an additional 40,325 whole-genome sequences recruited through the Genomics England 100,000 Genomes Rare Disease Main Programme (9). There were 36,038 individuals in a haplogroup-matched group, 1098 in a haplogroup-mismatched group, and 3124 in a group for which the nuclear ancestry could not be reliably determined (figs. S12 and S13). As before, we focused on the European- and Asian-specific variants observed in 23,931 unrelated European and Asian individuals. We fitted the same logistic regression model to 1942 heteroplasmic variants in 23,277 unrelated matched individuals and 67 heteroplasmic variants in 654 unrelated individuals for whom the nuclear and mtDNA had a different ancestral origin. Again, the heteroplasmic variants in the mismatched group were more likely to match the ancestry of the nuclear genetic background than the ancestral background of the mtDNA on which the heteroplasmy occurred (P = 1.33 × 10−3, coefficient estimate = 0.47, SE = 0.15, logistic regression) (Fig. 6D and table S4). An inverse-weighted meta-analysis of the discovery and validation cohorts yielded a significant association across the two datasets (P = 3.3 × 10−6, coefficient estimate = 0.59, SE = 0.13). To gain a better understanding of the underlying mechanisms, we studied the gene location and HF of 97 heteroplasmic variants identified in the mismatched groups across both the discovery and validation studies. Potentially functional variants were found in the noncoding region and RNA genes and also included 14 NS protein-coding variants in the MT-ATP, MT-COX, MT-CYB, and MT-ND regions (fig. S14). This raises the possibility that differences in oxidative phosphorylation and adenosine triphosphate (ATP) synthesis are responsible for the association we observed.


Several explanations have been proposed for the high substitution rate of the noncoding mtDNA D-loop, including a high intrinsic mutation rate and/or a permissive sequence relative to the coding regions (30). The segregation of mtDNA heteroplasmy likely plays a role in shaping D-loop population polymorphisms by a mechanism operating within the female germ line. Similar findings have been seen in Drosophila, where D-loop variants “selfishly” drive segregation favoring a specific mtDNA genotype (31). These observations have implications for the development of mitochondrial transfer techniques for preventing the inheritance of severe pathogenic mtDNA mutations in humans (32, 33). After mitochondrial transfer, ~15% of human embryonic stem cell lines show reversion to the original mtDNA genotype (3335). The reasons for this are not fully understood, but the selective propagation of D-loop heteroplasmy is a plausible explanation. Our findings implicate the nuclear genome in this process. This places greater emphasis on matching both nuclear and mtDNA backgrounds when selecting potential mitochondrial donors in order to minimize the possibility of nuclear-mitochondrial incompatibility after mitochondrial transfer.

In cases of heteroplasmic mtDNA, one allele can be preferentially copied or can segregate to high levels in a population of daughter cells. This can lead to changes in mtDNA allele frequency during the lifetime of an individual cell, tissue, or organism through genetic drift (36, 37). A high level of mtDNA content buffers fluctuations in allele frequency. However, if the number of copies falls below a certain threshold, a “genetic bottleneck” occurs, increasing the possibility of large changes in allele frequency.

There is a ~1000-fold reduction in cellular mtDNA content during human germ cell development (38); this phase is followed by a period of intense proliferation and migration when germ cells migrate to form the developing gonad (39). This process depends on oxidative phosphorylation and is accompanied by a massive increase in mtDNA levels (38). Under these conditions, selection against variants that compromise mitochondrial ATP synthesis will occur. On the other hand, variants that promote mtDNA replication will have an advantage, potentially explaining the preferential transmission of specific D-loop variants. Subtle selective pressures will have maximal impact at this time, so the nuclear genetic influence we observed will most likely come into play during this critical period of development. Rather than being a direct analysis of the germ line, our study was based on whole-blood DNA, so it is possible that tissue-specific differences in heteroplasmy come into play. However, our analysis indicates that, at the population level, human mtDNA is influenced by selective forces acting within the female germ line and modulated by the nuclear genetic background. These forces are apparent within one generation and ensure consistency between the two independent genetic systems, shaping the current world mtDNA phylogeny.

Materials and methods

Participants, approvals, and sequence acquisition

The primary data were obtained by whole-genome sequencing (WGS) from whole-blood DNA from 13,037 individuals in the NIHR BioResource–Rare Diseases and 100,000 Genomes Project Pilot studies (table S5) (21) After quality control (QC) [see below and (9)], 12,975 samples (including 1526 mother–offspring pairs) were included in this study. For demographics, see (9). Ethical approval was provided by the East of England Cambridge South national research ethics committee under reference number 13/EE/0325. WGS was performed using the Illumina TruSeq DNA PCR-Free sample preparation kit (Illumina, Inc.) and an Illumina HiSeq 2500 sequencer, generating a mean depth of 45× (range from 34× to 72×) and greater than 15× for at least 95% of the reference human genome (fig. S8A).

Extracting mitochondrial sequences, quality control, and variant detection

WGS reads were aligned to the Genome Reference Consortium human genome build 37 (GRCh37) using Isaac Genome Alignment Software (version 01.14; Illumina, Inc.). Reads aligning to the mitochondrial genome were extracted from each BAM file and analyzed using MToolBox (v1.0) (8, 9). Variant Call Files (VCFs) and the merged VCF were normalized with bcftools and vt (4042), and duplicated variants were dropped with vt. The final VCF was annotated using the Variant Effect Predictor (VEP) (43). Further QC was carried out as described (9). Potential DNA cross-contamination was investigated using verifyBamID (44) in the nuclear genome and mtDNA variant calls (9).

Determining matched and mismatched groups

The pairwise relatedness and nuclear ancestry were estimated using nuclear genetic markers as described (9). MtDNA haplogroup assignment was performed using HaploGrep2 (26, 45). We then compared the mtDNA phylogenetic haplogroup with the nuclear genetic ancestry in the same individual and identified three distinct groups of individuals as described in the text.

Defining previously unknown variants

Variants were considered to be previously unknown if they were absent from 1000 Genomes datasets and dbSNP and were seen in no more than one individual among 30,506 NCBI mtDNA sequences (5).

MtDNA mutational spectra and signature

Mutational spectra were derived from the reference and alternative alleles as described (24, 46).

Probability of maternal mtDNA transmission

We modeled the probability of transmission of heteroplasmic variants observed in the mothers using the following logistic regression modellogitP(yijl=1)=α+β11j=1+β21j=2+β31j=3+β41j=4+γwijl+ηzijlwhere yijl=1 if the lth variant within mitochondrial genomic region j in mother i was transmitted and is zero otherwise; j=0, 1,  2,  3,  or  4 denote the coding, D-loop, rRNA, tRNA, and remainder sequences, respectively; wijl is the logit of the HF of the lth variant within mitochondrial genomic region j in mother i; and zijl=1 if the lth variant within mitochondrial genomic region j in mother i was observed in no individuals from the 1000 Genomes datasets, dbSNP, and at most one individual among 30,506 NCBI mtDNAs; otherwise, it was equal to zero.

Homoplasmic allele frequency in the population and heteroplasmic variants

We fitted a logistic regression model to explore the relationship between the homoplasmic allele frequency in the general population and the rate at which individuals who are not homoplasmic for the alternate allele are heteroplasmic (9).

Selecting haplogroup-specific variants on the mtDNA phylogenetic tree

We extracted 4476 single-nucleotide variants (SNVs) present on the mtDNA phylogenetic tree (26), then focused on SNVs either present in only one superpopulation (European, Asian, or African) or present in two superpopulations but commonly seen in one population (>1%) and not seen (or seen extremely rarely) in the other population in 17,520 mtDNAs (5). This selected 2641 haplogroup-specific variants, including 426 African variants, 1275 Asian variants, and 940 European variants.

Nuclear genome ancestry and mtDNA heteroplasmic variants

We modeled the presence or absence of a heteroplasmic variant in a particular individual using logistic regression. We considered only the 2215 mtDNA variants from more than 4000 haplogroup-specific variants that are present exclusively in European or Asian branches of the world mtDNA phylogeny (26), as this allows unambiguous assignation of mitochondrial ancestry to each variant. To avoid the potential for bias induced by recent shared ancestry between individuals, we considered only the 9385 unrelated Asian or European individuals in matched and mismatched groups. We fitted the following logistic regression modellogitP(yij=1)=α+βxj+ γvj+η1zi=xj+ω1xj=wizi=wi+ ψ1xj=wiziwiwhere yij=1 if variant j is heteroplasmic in individual i and is zero otherwise; xj=1 if the ancestry of variant j is European and is zero otherwise; vj is the logit of the homoplasmic allele frequency of variant j in 30,506 NCBI samples; zi=0,  1 , or  2 depending on whether the mitochondrial ancestry of individual i is Asian, European, or African, respectively; and wi=1 if the nuclear ancestry of individual i is European and is zero otherwise. The indicator variable 1 evaluates to 1 if the conditions in its subscript are met and to zero otherwise.

Validation dataset

We repeated the nuclear-mtDNA ancestry analysis in 42,799 WGS from whole-blood DNA in the Genomics England 100,000 Genomes Rare Disease Main Programme aligned to GRCh37 or/and hg38 using the same bioinformatics pipeline. See (9) for details.

Supplementary Materials

Materials and Methods

Collaborator Names and Affiliations

Figs. S1 to S15

Tables S1 to S5

References (5058)

Data S1

References and Notes

  1. See supplementary materials and methods.
Acknowledgments: We gratefully acknowledge the patients, families, and health care professionals involved in the NIHR BioResource–Rare Diseases and 100,000 Genomes projects. We thank N. S. Jones for critical comments on an early draft of the manuscript. Funding: This study makes use of data generated by the NIHR BioResource and the Genomics England Rare Diseases pilot projects. Genotype and phenotype data of both projects are part of the 100,000 Genomes Project. The main source of funding for the BioResource and Genomics England is provided by the National Institute for Health Research of England (NIHR) ( This work was also made possible by funding from the UK Medical Research Council (MRC) to create the UK Clinical Genomics Data Centre. P.F.C. is a Wellcome Trust Principal Research Fellow (101876/Z/13/Z and 212219/Z/18/Z) and a NIHR Senior Investigator who receives support from the Medical Research Council Mitochondrial Biology Unit (MC_UU_00015/9), the Evelyn Trust, and the NIHR Biomedical Research Centre based at Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge. W.H.O. is a NIHR Senior Investigator, and his laboratory receives support from the British Heart Foundation, Bristol-Myers Squibb, European Commission, MRC, NHS Blood and Transplant, Rosetrees Trust, and the NIHR Biomedical Research Centre based at Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge. M.C. is an NIHR Senior Investigator and is funded by the NIHR Biomedical Research Centre at St Bartholomew’s Hospital. Je.T., Jo.T., S.P., and A.O.M.W. are funded by the NIHR Biomedical Research Centre, Oxford. This work was supported in part by Wellcome Trust grant 090532/Z/09/Z. R.H. is funded by Wellcome Trust grants 201064/Z/16/Z, 109915/Z/15/Z, and 203105/Z/16/Z; MRC UK grant MR/N025431/1; ERC grant 309548; and Newton Fund MR/N027302/1. J.S. is funded by MRC UK grant MR/M012212/1. A.M., G.A., and A.W. are funded by the Moorfields Eye Charity. G.A. and A.W. are funded by the RP Fighting Blindness. All Moorfields Eye Hospital and Institute of Ophthalmology authors are funded by the UCL Institute of Ophthalmology and Moorfields NIHR Biomedical Resource Centre. The Bristol NIHR Biomedical Research Centre provided infrastructure for BioResource activities in Bristol. Additional NIHR Biomedical Research Centres that contributed include Imperial College Healthcare NHS Trust BRC, Guy’s and St Thomas’ NHS Foundation Trust and King’s College London BRC. The authors listed also represent NephroS, the UK study of Nephrotic Syndrome. A.L. is a British Heart Foundation Senior Basic Science Research Fellow (FS/13/48/30453). D.L.B., A.C.T, N.V.Z., and M.I.M. are members of the DOLORisk consortium funded by the European Commission Horizon 2020 (ID633491). A.C.T. is a member of the International Diabetic Neuropathy Consortium and the Novo Nordisk Foundation (NNF14SA0006). D.L.B. is a Wellcome clinical scientist (202747/Z/16/Z). A.R.W. is supported by the NIHR-BRC of UCL Institute of Ophthalmology and Moorfields Eye Hospital. I.R. and E.L. are supported by the NIHR Translational Research Collaboration–Rare Diseases. H.J.B. works for the Netherlands CardioVascular Research Initiative (CVON). T.K.B. is sponsored by the NHSBT and British Society of Haematology. K.G.C.S. holds a Wellcome Investigator Award, MRC Programme Grant (MR/L019027/1). M.I.M. is a Wellcome Senior Investigator (supported by Wellcome grants 090532 and 0938381). P.H.D. receives funding from ICP Support. H.S.M. receives support from BHF Programme grant RG/16/4/32218. N.C. is partially funded by Imperial College NIHR BRC. M.R.W. holds a NIHR award to the NIHR Imperial Clinical Research Facility at Imperial College Healthcare NHS Trust. P.Y.W.M. is supported by grants from MRC UK (G1002570), Fight for Sight (1570/1571 and 24TP171), and NIHR (IS-BRC-1215-20002). R.H. is a Wellcome Trust Investigator (109915/Z/15/Z) who receives support from the Wellcome Centre for Mitochondrial Research (203105/Z/16/Z), Medical Research Council (UK) (MR/N025431/1), the European Research Council (309548), the Wellcome Trust Pathfinder Scheme (201064/Z/16/Z), the Newton Fund (UK/Turkey, MR/N027302/1), and the European Union H2020–Research and Innovation Actions (SC1-p.m.-03-2017, Solve-RD). K.F. and C.V.G. were supported by the Research Council of the University of Leuven (BOF KU Leuven‚ Belgium; OT/14/098). J.S.W. is funded by the Wellcome Trust (107469/Z/15/Z) and the NIHR Cardiovascular Biomedical Research Unit at Royal Brompton and Harefield NHS Foundation Trust and Imperial College London. G.A. is funded by NIHR-Biomedical Research Centre at Moorfields Eye Hospital and UCL Institute of Ophthalmology, Fight for Sight (UK) Early Career Investigator Award, Moorfields Eye Hospital Special Trustees, Moorfields Eye Charity, Foundation Fighting Blindness (USA), and Retinitis Pigmentosa Fighting Blindness. M.C.S. holds a MRC Clinical Research Training Fellowship (grant MR/R002363/1). M.A.Ku. holds a NIHR Research Professorship NIHR-RP-2016-07-019 and Wellcome Intermediate Fellowship 098524/Z/12/A. J.Whi. is a recipient of a Cancer Research UK Cambridge Cancer Centre Clinical Research Training Fellowship. A.J.M. has received funding from a Medical Research Council Senior Clinical Fellowship (MR/L006340/1). D.P.G. is funded by the MRC; Kidney Research UK; and St Peters Trust for Kidney, Bladder and Prostate Research. S.A.J. is funded by Kids Kidney Research. C.L. received funding from a MRC Clinical Research Training Fellowship (MR/J011711/1). K.D. is a HSST trainee supported by Health Education England. C.Had. was funded through a Ph.D. Fellowship by the NIHR Translational Research Collaboration Rare Diseases. M.J.D. receives funding from the Wellcome Trust (WT098519MA). K.J.M. is supported by the Northern Counties Kidney Research Fund. Some of the work performed by E.L.M. was carried out at University College London Hospitals/University College London, which received a proportion of funding from the Department of Health’s NIHR Biomedical Research Centres funding scheme. K.C.G. is a holder of NIHR–BRC funding. P.L.B. is a NIHR Senior Investigator. This research was partly funded by the NIHR Great Ormond Street Hospital Biomedical Research Centre. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. Author contributions: Study design: P.F.C., F.L.R., M.C., W.H.O., E.T., and W.W. Data analysis: E.T., W.W., S.T., and M.J.K. Writing: P.F.C., E.T., and W.W. Experimental and analytical supervision: P.F.C. and E.T. Project supervision: P.F.C. and E.T. The remaining authors contributed to the recruitment of participants, sample logistics, and initial data preparation. Competing interests: M.I.M. serves on advisory panels for Pfizer, NovoNordisk, and Zoe Global; has received honoraria from Pfizer, NovoNordisk, and Eli Lilly; has stock options in Zoe Global; and has received research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda. T.J.A. has received consultancy payments from AstraZeneca within the past 5 years and has received speaker honoraria from Illumina. K.J.M. previously received funding for research from and is currently on the scientific advisory board of Gemini Therapeutics. M.C.S. received travel and accommodation funds from NovoNordisk. D.M.L. serves on advisory boards for Agios, Novartis, and Cerus. A.M.K. had no competing interests at the time of the study but after study completion received an educational grant from CSL Behring to attend the ISTH meeting in Berlin in 2017. C.V.G. holds the Bayer and Norbert Heimburger (CSL Behring) chairs. Data and materials availability: Heteroplasmy data for the mother–child pairs is provided in data S1. Whole-genome sequence data from the NIHR BioResource–Rare Diseases project can be found in the European Genome-phenome Archive (EGA) at the EMBL European Bioinformatics Institute (BPD: EGAD00001004519, CSVD: EGAD00001004513, HCM: EGAD00001004514, ICP: EGAD00001004515, IRD: EGAD00001004520, MPMT: EGAD00001004521, NDD: EGAD00001004522, NPD: EGAD00001004516, PAH: EGAD00001004525, PID: EGAD00001004523, PMG: EGAD00001004517, SMD: EGAD00001004524, SRNS: EGAD00001004518; see table S5 for the disease abbreviations). Whole-genome sequence data from the UK Biobank samples are available through a data release process overseen by UK Biobank ( Whole-genome sequence data from the participants enrolled in 100,000 Genomes Project can be accessed via Genomics England Limited following the procedure outlined at
View Abstract

Stay Connected to Science

Navigate This Article