Quantifying the contribution of recessive coding variation to developmental disorders

See allHide authors and affiliations

Science  07 Dec 2018:
Vol. 362, Issue 6419, pp. 1161-1164
DOI: 10.1126/science.aar6731

Genetic architecture of developmental disorders

The genetics of developmental disorders (DDs) is complex. Martin et al. wanted to determine the degree of recessive inheritance of DDs in protein-coding genes. They examined the exomes of more than 6000 families in populations with high and low proportions of consanguineous marriages. They found that 3.6% of DDs in individuals of European ancestry involved recessive coding disorders, less than a tenth of the levels previously estimated. Furthermore, among South Asians with high parental relatedness, rather than most of the disorders arising from inherited variants, fewer than half had a recessive coding diagnosis.

Science, this issue p. 1161


We estimated the genome-wide contribution of recessive coding variation in 6040 families from the Deciphering Developmental Disorders study. The proportion of cases attributable to recessive coding variants was 3.6% in patients of European ancestry, compared with 50% explained by de novo coding mutations. It was higher (31%) in patients with Pakistani ancestry, owing to elevated autozygosity. Half of this recessive burden is attributable to known genes. We identified two genes not previously associated with recessive developmental disorders, KDM5B and EIF3F, and functionally validated them with mouse and cellular models. Our results suggest that recessive coding variants account for a small fraction of currently undiagnosed nonconsanguineous individuals, and that the role of noncoding variants, incomplete penetrance, and polygenic mechanisms need further exploration.

Large-scale sequencing studies of phenotypically heterogeneous rare-disease patients can discover new disease genes (13) and characterize the genetic architecture of such disorders. In the Deciphering Developmental Disorders (DDD) study, we previously estimated the fraction of patients with a causal de novo coding mutation in both known and as-yet-undiscovered disease genes to be 40 to 45% (4), and in this work we extended this approach to recessive variants. It has been posited that there are thousands of as-yet-undiscovered recessive intellectual disability (ID) genes (5, 6), which could imply that recessive variants explain a large fraction of undiagnosed rare disease cases. However, attempts to estimate the prevalence of recessive disorders have been restricted to known disorders (7) or known pathogenic alleles (8). We quantified the total autosomal recessive coding burden using a robust and unbiased statistical framework in 6040 exome-sequenced DDD trios from the British Isles. Our approach provides a better-calibrated estimate of the exome-wide burden of recessive disease than those of previously published methods (3, 9).

We analyzed 5684 European and 356 Pakistani probands (EABI, European ancestry from the British Isles; PABI, Pakistani ancestry from the British Isles) (figs. S1 and S2) with developmental disorders (DDs). The clinical features are heterogeneous and representative of genetically undiagnosed DD patients from British and Irish clinical genetics services: 88% have an abnormality of the nervous system, and 88% have multiple affected organ systems (Fig. 1, fig. S3, and table S1). Clinical features are largely similar between EABI and PABI (Fig. 1 and table S1).

Fig. 1 Clinical features of DDD probands analyzed here.

Proportion of probands in different groups with clinical features indicated, extracted from Human Phenotype Ontology terms. Asterisks indicate nominally significant differences between indicated groups (Fisher’s exact test).

To assess the genome-wide recessive burden, we compared the number of rare [minor allele frequency (MAF) <1%] biallelic genotypes observed in our cohort to the number expected by chance (10). We used the phased haplotypes from unaffected DDD parents to estimate the expected number of biallelic genotypes. Reassuringly, the number of observed biallelic synonymous genotypes matched the expectation (fig. S4). We observed no significant burden of biallelic genotypes of any consequence class in 1389 probands with a likely diagnostic de novo, inherited dominant, or X-linked variant. We therefore evaluated the recessive coding burden in the remaining 4318 EABI and 333 PABI probands. This “undiagnosed” cohort were more likely to have a recessive cause because they did not have a likely dominant or X-linked diagnosis (11), had at least one affected sibling, or >2% autozygosity (Fig. 2A).

Fig. 2 Contribution of recessive coding variants to genetic architecture in this study.

(A) Number of observed and expected biallelic genotypes per individual across all genes. Nominally significant P values from a Poisson test of enrichment are shown. (B) (Left) Number of probands grouped by diagnostic category. The inherited dominant and X-linked diagnoses (narrow pink bar) include only those in known genes, whereas the proportion of probands with de novo and recessive coding diagnoses was inferred as described in (10), including those in as-yet-undiscovered genes. (Right) The proportion of probands in various patient subsets inferred to have diagnostic variants in the indicated classes.

As expected because of their higher autozygosity (fig. S5), PABI individuals had more rare biallelic genotypes than those of EABI individuals (Fig. 2A); 92% of these were homozygous (rather than compound heterozygous), versus only 28% for the EABI samples. We observed a significant enrichment of biallelic loss-of-function (LOF) genotypes in both undiagnosed ancestry groups (Poisson P = 3.5 × 10−5 in EABI, P = 9.7 × 10−7 in PABI) and, in the EABI group, a nominally significant enrichment of biallelic damaging missense genotypes (P = 0.025) and a significant enrichment of compound heterozygous LOF/damaging missense genotypes (P = 6 × 10−7) (Fig. 2A).

Among the 4651 EABI+PABI undiagnosed probands, a set of 903 clinically curated DD-associated recessive genes showed a higher recessive burden (1.7-fold; Poisson P = 6 × 10−18) (fig. S6) than average (1.1-fold for all genes). Indeed, 48% of the observed excess of biallelic genotypes lay in these known genes. By contrast, we did not observe any recessive burden in 243 DD-associated genes with a dominant LOF mechanism, nor in any gene sets tested in the 1389 diagnosed probands (Poisson P > 0.05).

We developed a method to estimate the proportion of probands with a causal variant in a particular genotype class (10) in either known and as-yet-undiscovered genes. Unlike our previously published approach (4), this method accounts for the fact that some fraction of the variants expected by chance are actually causal (fig. S7). We estimated that 3.6% (~205) of the 5684 EABI probands have a recessive coding diagnosis, compared with 49.9% (~2836) with a de novo coding diagnosis. Recessive coding genotypes explain 30.9% (~110) of the 356 PABI individuals, compared with 29.8% (~106) for de novos. The contribution from recessive variants was higher in EABI probands with affected siblings than those without (12.0% of 117 versus 3.2% of 5,098) and highest in PABI probands with high autozygosity (47.1% of 241) (Fig. 2B and table S2). By contrast, it did not differ between 115 PABI probands with low autozygosity and all 5684 EABI probands.

We caution that the PABI results may be less reliable because of modest sample size (wide confidence intervals are shown in table S2), exacerbated by consistent overestimation of rare variant frequencies in our limited sample of parents. Reassuringly, our estimated recessive contribution in PABI is close to the 31.5% reported in Kuwait (12), which has a similar level of consanguinity (13). Our results are consistent with previous reports of a low fraction of recessive diagnoses in European cohorts (3, 11, 14), but unlike those studies, our estimates further show that the recessive contribution in as-yet-undiscovered genes is also small. Although it has been hypothesized that there are thousands of undiscovered recessive DD-associated genes (5, 6), our analyses suggest that the cumulative impact of these discoveries on diagnostic yield will be modest in nonconsanguineous populations.

We next tested each gene for an excess of biallelic genotypes in the undiagnosed probands (table S3) (10). Three genes passed stringent Bonferroni correction (P < 3.4 × 10−7) (10): THOC6 [previously reported in (15)], EIF3F, and KDM5B. Thirteen additional genes had P < 10−4 (table S4), of which 11 are known recessive DD-associated genes, and known genes were enriched for lower P values (fig. S8).

We observed five probands with an identical homozygous missense variant in EIF3F (binomial P = 1.2 × 10−10) (ENSP00000310040.4:p.Phe232Val), plus four additional homozygous probands who had been excluded from our discovery analysis for various reasons (table S5). The variant (rs141976414) has a frequency of 0.12% in non-Finnish Europeans (one of the most common protein-altering variants in the gene), and no homozygotes were observed in gnomAD (16).

All nine individuals homozygous for Phe232→Val had ID, and a subset also had seizures (6 of 9), behavioral difficulties (3 of 9), and sensorineural hearing loss (3 of 9) (table S5). There was no obvious distinctive facial appearance (fig. S9). EIF3F encodes a subunit of the mammalian eIF3 (eukaryotic initiation factor) complex, which negatively regulates translation. The genes encoding eIF2B subunits have been implicated in severe autosomal recessive neurodegenerative disorders (17). We edited induced pluripotent stem cell (iPSC) lines with CRISPR-Cas9 to be heterozygous or homozygous for the Phe232→Val variant, and Western blots showed that EIF3F protein levels were ~27% lower in homozygous cells relative to heterozygous and wild-type cells (fig. S10), which may be due to reduced protein stability (fig. S11). The Phe232→Val variant significantly reduced translation rate (Fig. 3A and fig. S12). Proliferation rates were also reduced in the homozygous but not heterozygous cells (Fig. 3B and fig. S13), although the viability of the cells was unchanged (fig. S14).

Fig. 3 Functional consequences of the pathogenic EIF3F recessive missense variant.

(A) The Phe232→Val variant impairs translation. Plot shows median fluorescence intensity (MFI) in iPSC lines heterozygous or homozygous for or without the Phe232→Val variant (correcting for replicate effects), measured using a Click-iT protein synthesis assay (10). MFI correlates with methionine analog incorporation in nascent proteins. The P value indicates a nonzero effect of genotype from a linear regression of MFI on genotype and replicate. Red lines indicate means. (B) The Phe232→Val variant impairs iPSC proliferation in the homozygous but not heterozygous form. Results from a cell trace violet (CTV) proliferation assay, in which CTV concentration reduces on each division. The population of cells that have been through zero, one, or multiple divisions is labeled.

Another recessive gene we identified was KDM5B (binomial P = 1.1 × 10−7) (Fig. 4), encoding a histone H3K4 demethylase. Three probands had biallelic LOFs passing our filters, and a fourth was compound heterozygous for a splice-site variant and a large gene-disrupting deletion. Several of these patients were recently reported with less compelling statistical evidence (18). KDM5B is also enriched for de novo mutations in our cohort (binomial P = 5.1 × 10−7) (4). We saw nominally significant over-transmission of LOFs from the mostly unaffected parents (P = 0.002, transmission-disequilibrium test) (table S6), but no parent-of-origin bias. Theoretically, all the KDM5B LOFs observed in probands might be acting recessively, and heterozygous probands may have a second (missed) coding or regulatory hit or modifying epimutation. However, we found no evidence supporting this (figs. S15 and S16) (10), nor of potentially modifying coding variants in likely interactor genes, nor that some LOFs avoid nonsense-mediated decay (Fig. 4B). Genome-wide levels of DNA methylation in whole blood did not differ between probands with different types of KDM5B mutations or between these and controls (fig. S17).

Fig. 4 KDM5B is a recessive DD gene in which heterozygous LOFs are incompletely penetrant.

(A) Summary of damaging variants found in KDM5B. (B) Positions of likely damaging variants found in this and previous studies in KDM5B (ENST00000367264.2; introns not to scale), omitting two large deletions. Colors correspond to those shown in (A). There are no differences in the spatial distribution of LOFs by inheritance mode, nor in their likelihood of escaping nonsense-mediated decay by alternative splicing in GTex (28). (C to E) Behavioral defects of homozygous Kdm5b-null versus wild-type mice (n = 14 to 16 mice). (C) Knockout mice displayed increased anxiety, spending significantly less time in the light compartment of the light-dark box. (D) Reduced sociability, in the three-chamber sociability test. Knockout mice spent less time investigating a new mouse. (E) Twenty-four hour memory impairment. Whereas wild-type mice preferentially investigated an unfamiliar mouse over a familiar one, homozygous knockout mice showed no discrimination.

These lines of evidence, along with previous observations of KDM5B de novos in both autism patients and unaffected siblings (19), suggest that heterozygous LOFs in KDM5B are pathogenic with incomplete penetrance, whereas homozygous LOFs are likely fully penetrant. Several microdeletions (20) and LOFs in other dominant ID genes are incompletely penetrant (20). Other H3K4 methylases and demethylases also cause neurodevelopmental disorders (21). KDM5B is atypical; the others are mostly dominant (22), typically with pLI scores >0.99 and very low pRec scores, whereas KDM5B has pLI = 5 × 10−5 and pRec > 0.999 (23).

KDM5B is the only gene that showed significant enrichment for both biallelic variants and de novo mutations in our study. We saw significant enrichment of de novo missense (373 observed versus 305 expected; ratio = 1.25, upper-tailed Poisson P = 1 × 10−4) but not de novo LOF mutations across all known recessive DD genes (excluding those known to also show dominant inheritance). One hypothesis is that the de novo missense mutations are acting as a “second hit” on the opposite haplotype from an inherited variant in the same gene. However, we saw only two instances of this in the cohort, and if it were driving the signal, we would expect to see a burden of de novo LOFs in recessive genes too, which we do not. A better explanation is that recessive DD genes are also enriched for dominant activating mutations. There are known examples of this; for example, in NALCN (24, 25) and MAB21L2 (26), heterozygous missense variants are activating or dominant-negative, whereas the biallelic mechanism is loss of function. By contrast, the six de novo LOFs in KDM5B suggest that it follows a different pattern. Of the 21 recessive genes with nominally significant de novo missense enrichment in our data, only one showed evidence of mutation clustering by using our previously published method (CTC1; P = 0.03) (1), which could suggest an activating/dominant-negative mechanism. Larger sample sizes will be needed to establish which of these genes also act dominantly, and by which mechanism.

All four individuals with biallelic KDM5B variants have ID, variable congenital abnormalities (table S7), and a distinctive facial appearance (fig. S18). Other than ID, there were no consistent phenotypes or distinctive features shared between the biallelic and monoallelic individuals or within the monoallelic group (table S7).

We created a mouse LOF model for Kdm5b. Heterozygous knockout mice appear normal and fertile, whereas homozygous Kdm5b-null mice are subviable (44% of expected, from heterozygous in-crosses). This partially penetrant lethality, in addition to a fully penetrant vertebral patterning defect (fig. S19), is consistent with previously published work (27). We additionally identified numerous behavioral abnormalities in homozygous Kdm5b-null mice: increased anxiety, less sociability, and reduced long-term memory compared with that of wild-types (Fig. 4).

We have quantified the contribution of recessive coding variants in both known and as-yet-undiscovered genes to a large UK cohort of DD patients and found that overall, they explain a small fraction. Our methodology allowed us to carry out an unbiased burden analysis that was not possible with previous methods (fig. S4). We identified two new recessive DD genes that are less likely to be found by typical studies because they result in heterogeneous and nonspecific phenotypes, and we present strong functional evidence supporting their pathogenicity.

Our results can be used to improve recurrence risk estimates for undiagnosed families with a particular ancestry and pattern of inheritance. Extrapolating our results more widely requires some care; our study is slightly depleted of recessive diagnoses because some recessive DDs (such as metabolic disorders) are relatively easily diagnosed through current clinical practice in the United Kingdom and less likely to have been recruited. Furthermore, country-specific diagnostic practices and levels of consanguinity may make the exact estimates less applicable outside the United Kingdom.

Overall, we estimated that identifying all recessive DD genes would allow us to diagnose 5.2% of the EABI+PABI subset of DDD, whereas identifying all dominant DD genes would yield diagnoses for 48.6%. The high proportion of unexplained patients, even among those with affected siblings or high consanguinity, suggests that future studies should investigate a wide range of modes of inheritance, including oligogenic and polygenic inheritance as well as noncoding recessive variants.

Supplementary Materials

Materials and Methods

Figs. S1 to S20

Tables S1 to S7

References (2956)

References and Notes

  1. Materials and methods are available as supplementary materials.
Acknowledgments: We thank the DDD families, the Sanger Human Genome Informatics team, P. Danacek for help with bcftools/roh, K. de Lange for help with figures, K. Samocha for mutability estimates, J. Matte and G. Turner for help with experiments, and A. Sakar for patient review. Families gave informed consent to participate, and the study was approved by the UK Research Ethics Committee (10/H0305/83, granted by the Cambridge South Research Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics Committee). Funding: The DDD study presents independent research commissioned by the Health Innovation Challenge Fund (grant HICF-1009-003). Details are available in the supplementary materials. H.C.M. acknowledges support from a Research Fellowship at St John’s College, Cambridge, and E.J.R. is funded by a National Institute for Health Research Academic Clinical Fellowship. Author contributions: Data analysis: H.C.M., J.F.M., J.H., P.S., and N.A.; Clinical interpretation: W.D.J.; EIF3F experiments: R.M., C.P.J.., and M.Br.; Mouse phenotyping: G.S.-A. and M.Sa.; Protein structure modelling: J.D.S.; Data processing: G.G., M.N., J.K., C.F.W., and E.J.R.; Experimental validation: E.P.; Patient recruitment: M.Ba., J.D., R.H., A.H., D.S.J., K.J., D.K, S.A.L., S.G.M., J.M., M.J.P., M.Sp., P.D.T., P.C.V., and M.W.; Experimental and analytical supervision: A.B., S.S.G., C.F.W., D.R.F., H.V.F., M.E.H., and J.C.B.; Writing: H.C.M., W.D.J., R.M., G.S.-A., J.D.S., M.B., A.B., J.H., M.E.H., and J.C.B. Competing interests: M.E.H. is a cofounder of, consultant to, and holds shares in, Congenica, a genetics diagnostic company. Data and materials availability: Exome sequencing and phenotype data are accessible via the European Genome-phenome Archive (EGA) (Datafreeze 2016-10-03) (
View Abstract

Stay Connected to Science

Navigate This Article