Research Article

Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study

See allHide authors and affiliations

Science  23 Dec 2016:
Vol. 354, Issue 6319, aaf6814
DOI: 10.1126/science.aaf6814

You are currently viewing the abstract.

View Full Text

Unleashing the power of precision medicine

Precision medicine promises the ability to identify risks and treat patients on the basis of pathogenic genetic variation. Two studies combined exome sequencing results for over 50,000 people with their electronic health records. Dewey et al. found that ∼3.5% of individuals in their cohort had clinically actionable genetic variants. Many of these variants affected blood lipid levels that could influence cardiovascular health. Abul-Husn et al. extended these findings to investigate the genetics and treatment of familial hypercholesterolemia, a risk factor for cardiovascular disease, within their patient pool. Genetic screening helped identify at-risk patients who could benefit from increased treatment.

Science, this issue p. 10.1126/science.aaf6814, p. 10.1126/science.aaf7000

Structured Abstract

INTRODUCTION

Large-scale genetic studies of integrated health care populations, with phenotypic data captured natively in the documentation of clinical care, have the potential to unveil genetic associations that point the way to new biology and therapeutic targets. This setting also represents an ideal test bed for the implementation of genomics in routine clinical care in service of precision medicine.

RATIONALE

The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System aims to catalyze genomic discovery and precision medicine by coupling high-throughput exome sequencing to longitudinal electronic health records (EHRs) of participants in Geisinger’s MyCode Community Health Initiative. Here, we describe initial insights from whole-exome sequencing of 50,726 adult participants of predominantly European ancestry using clinical phenotypes derived from EHRs.

RESULTS

The median duration of EHR data associated with sequenced participants was 14 years, with a median of 87 clinical encounters, 687 laboratory tests, and seven procedures per participant. Forty-eight percent of sequenced individuals had one or more first- or second-degree relatives in the sample, and genome-wide autozygosity was similar to other outbred European populations. We found ~4.2 million single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in loss of gene function (LoF). The overwhelming majority of these genetic variants occurred at a minor allele frequency of ≤1%, and more than half were singletons. Each participant harbored a median of 21 rare predicted LoFs. At this sample size, ~92% of sequenced genes, including genes that encode existing drug targets or confer risk for highly penetrant genetic diseases, harbor rare heterozygous predicted LoF variants. About 7% of sequenced genes contained rare homozygous predicted LoF variants in at least one individual. Linking these data to EHR-derived laboratory phenotypes revealed consequences of partial or complete LoF in humans. Among these were previously unidentified associations between predicted LoFs in CSF2RB and basophil and eosinophil counts, and EGLN1-associated erythrocytosis segregating in genetically identified family networks. Using predicted LoFs as a model for drug target antagonism, we found associations supporting the majority of therapeutic targets for lipid lowering. To highlight the opportunity for genotype-phenotype association discovery, we performed exome-wide association analyses of EHR-derived lipid values, newly implicating rare predicted LoFs, and deleterious missense variants in G6PC in association with triglyceride levels. In a survey of 76 clinically actionable disease-associated genes, we estimated that 3.5% of individuals harbor pathogenic or likely pathogenic variants that meet criteria for clinical action. Review of the EHR uncovered findings associated with the monogenic condition in ~65% of pathogenic variant carriers’ medical records.

CONCLUSION

The findings reported here demonstrate the value of large-scale sequencing in an integrated health system population, add to the knowledge base regarding the phenotypic consequences of human genetic variation, and illustrate the challenges and promise of genomic medicine implementation. DiscovEHR provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic target discovery.

Therapeutic target validation and genomic medicine in DiscovEHR.

(A) Associations between predicted LoF variants in lipid drug target genes and lipid levels. Boxes correspond to effect size, given as the absolute value of effect, in SD units; whiskers denote 95% confidence intervals for effect. The size of the box is proportional to the logarithm (base 10) of predicted LoF carriers. (B and C) Prevalence and expressivity of clinically actionable genetic variants in 76 disease genes, according to EHR data. G76, Geisinger-76.

Abstract

The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System couples high-throughput sequencing to an integrated health care system using longitudinal electronic health records (EHRs). We sequenced the exomes of 50,726 adult participants in the DiscovEHR study to identify ~4.2 million rare single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in a loss of gene function. Linking these data to EHR-derived clinical phenotypes, we find clinical associations supporting therapeutic targets, including genes encoding drug targets for lipid lowering, and identify previously unidentified rare alleles associated with lipid levels and other blood level traits. About 3.5% of individuals harbor deleterious variants in 76 clinically actionable genes. The DiscovEHR data set provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic discovery.

View Full Text