Report

Impact of regulatory variation from RNA to protein

See allHide authors and affiliations

Science  06 Feb 2015:
Vol. 347, Issue 6222, pp. 664-667
DOI: 10.1126/science.1260793

How genetics affect phenotypic variation

How an individual looks depends on their genes, genetic variation, and interactions with the environment. However, the path from genotype to phenotype remains murky. Battle et al. examine how an individual's genetic variation affects expression of RNA, ribosome occupancy, and protein levels. They find that RNA expression and ribosome occupancy are generally correlated. However, in contrast, protein levels appear not to depend on RNA levels or ribosome occupancy. Protein levels are thus regulated by posttranscriptional mechanisms.

Science, this issue p. 664

Abstract

The phenotypic consequences of expression quantitative trait loci (eQTLs) are presumably due to their effects on protein expression levels. Yet the impact of genetic variation, including eQTLs, on protein levels remains poorly understood. To address this, we mapped genetic variants that are associated with eQTLs, ribosome occupancy (rQTLs), or protein abundance (pQTLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation.

To understand the links between genetic and phenotypic variation, it may be essential to first understand how genetic variation impacts the regulation of gene expression. Previous studies have evaluated the association between variation and transcript expression in humans (13). Yet protein abundances are more direct determinants of cellular functions (4), and the impact of genetic differences on the multistage process of gene expression through transcription and translation to steady-state protein levels has not been fully characterized. Studies in model organisms have shown that variations in mRNA and protein expression levels are often uncorrelated (58). Comparative studies (913) have suggested that protein expression evolves under greater evolutionary constraint than transcript levels (14) and have provided evidence consistent with buffering of protein expression with respect to variation introduced at the transcript level. Yet, in contrast to comparative work, there are few reports of quantitative trait loci (QTLs) associated with protein levels (pQTLs) in humans (1517).

Here, we present a unified analysis of the association of genetic variation with transcript expression, ribosome profiling (18), and steady-state protein levels in a set of HapMap Yoruba (Ibadan, Nigeria) lymphoblastoid cell lines (LCLs). We collected ribosome profiling data for 72 Yoruba LCLs and quantified protein abundance in 62 of these lines. Genome-wide genotypes and RNA-sequencing (RNAseq) data were available for all lines (19).

Ribosome profiling is an effective way to measure changes in translational regulation by using sequencing (18). We obtained a median coverage of 12 million mapped reads per sample, and as expected, the ribosome profiling reads are highly concentrated within coding regions and show an enrichment of a 3base pair (bp) periodicity, which reflects the progression of a translating ribosome (figs. S1 to S3 and table S1).

We collected relative protein expression measurements using a SILAC internal standard sample (20) and quantitative protein mass spectrometry (fig. S4). To confirm the quality of the proteomics data (tables S2 and S3), we evaluated the agreement between measurements of distinct groups of peptides from the same protein. Differences between these measurements can reflect true biological variation (e.g., splicing) or experimental noise. The high correlations (Spearman’s rho 0.7 to 0.9; R2 of 0.3 to 0.7; depending on the sample) confirmed that we are able to precisely quantify interindividual variation in protein levels (fig. S5). We also analyzed quantifications of peptides that overlapped nonsynonymous SNPs that were heterozygous in either the analyzed or the internal standard sample (fig. S6). The median ratios measured from these peptides matched the expected values closely, indicating that our protein measurements were likely not subject to ratio compression (figs. S7 and S8).

As a final quality check, we considered variation in expression levels within and between genes. We found that transcript and protein expression levels—which are the furthest removed processes studied here—are the least correlated (figs. S9 and S10). Our observations are in agreement with most high-throughput studies that considered large number of samples, although smaller studies have often observed higher correlations (18, 21, 22).

We mapped genetic associations with regulatory phenotypes. First, we evaluated QTLs for each phenotype independently by testing for association between the phenotype and all genetic variants with minor allele frequency >10% in a 20-kb window around the corresponding gene. We used a shared standardization, normalization, regression, and permutation pipeline for all three phenotypes. At a false discovery rate (FDR) of 10%, we detected 2355 eQTLs, 939 rQTLs, and 278 pQTLs (Table 1 and fig. S11).

Table 1 Number of cis-QTLs identified at FDR of 10%.
View this table:

There is substantial overlap among detected QTLs (fig. S12). Among the 4322 genes quantified for all three phenotypes, 54% of the genes with pQTLs also have a significant rQTL and/or eQTL. Given the incomplete statistical power to detect QTLs in each dataset independently, we performed replication testing across data sets, using the specific single-nucleotide polymorphism (SNP)–gene pairs underlying each class of QTLs. This analysis is less sensitive to power limitations than genome-wide testing. The results confirm that many QTLs are shared across all three phenotypes (example in Fig. 1A). In particular, most (90%) genetic variants associated with ribosome occupancy are also associated with transcript levels (fig. S13). In contrast, eQTLs showed the lowest overlap with pQTLs (35%), as expected (Fig. 1B).

Fig. 1 Comparisons of QTLs at three levels of gene regulation.

(A) Many QTLs exhibit shared effects across mRNA, ribosome occupancy, and protein. This example illustrates a shared QTL for the schlafen family member 5 (SLFN5) gene (24). (Top two panels) Mean sequence depth (per bp) for mRNA and ribosome occupancy, averaged among individuals with each genotype at the QTL SNP. (Bottom) Median log2 SILAC ratios at each detected peptide, relative to the shared internal standard. (B) Replication rates between independently tested cis-QTLs for each phenotype pair, at FDR = 10%. QTLs detected for the phenotype labeled on each row were tested in the phenotype listed for each column, considering only the 4322 genes quantified in all three phenotypes. (C) On average, eQTLs exhibit attenuated effects on protein abundance but not on ribosome occupancy. We used eQTLs detected by the GEUVADIS study to avoid ascertainment bias, and we polarized the alleles according to the direction of effect in GEUVADIS. Mean effect sizes and standard errors of the means, measured as expected fold-change per allele copy on a log2 scale.

Our observation that many SNPs identified as eQTLs are not associated with differences in protein levels is consistent with the notion that, across species, protein levels diverge less than transcript levels (1217). Yet some QTLs may not replicate at the protein level simply because of incomplete mapping power. To address this and to avoid overestimation of effect sizes due to ascertainment bias at significant QTLs, we focused on eQTLs detected previously in European samples by the GEUVADIS study (2). We then attempted to replicate the GEUVADIS eQTLs using our transcript, ribosome profiling, and protein data and considered the mean QTL effect size in each data type. Mean effect sizes calculated in this way are expected to be unbiased with respect to either technical or biological variance.

Using this approach, we observed a reduced mean effect size for the GEUVADIS eQTLs in the protein data compared with either the RNAseq data (t test P = 6.7 × 10−3) or ribosome data (P = 5.6 × 10−3) (Fig. 1C). In contrast, the average effect sizes observed for the RNAseq and ribosome data are not significantly different from each other, and their effect sizes are highly correlated across the tested eQTLs (Pearson c = 0.79, P < 10−96) (fig. S14). The reduction in effect size observed in protein data is robust with respect to potential technical confounders, including intensity-based absolute quantification intensity level and transcript model complexity (fig. S15). We thus conclude that the majority of genetic variants affecting transcript levels also alter ribosomal occupancy, typically with a similar magnitude of effect. Yet both QTL mapping and effect-size analyses indicate that many eQTLs have attenuated (or absent) effects on steady-state protein levels (fig. S16).

In addition to the observation of generally attenuated effect sizes in pQTLs compared with eQTLs, we identified a subset of variants that appear to affect levels of proteins but not mRNA and, hence, are candidates to affect posttranscriptional gene regulation. To evaluate evidence for these, we tested each SNP for association with one regulatory phenotype, while treating one or both of the other phenotypes as covariates (conditional model). Considering protein levels, with RNA levels as a covariate, we identified 146 protein-specific QTLs (psQTLs) at FDR = 10% (Fig. 2A). The identification of psQTLs is generally robust to the choice of technology used to characterize transcript expression (fig. S17).

Fig. 2 Protein-specific and RNA-specific QTLs.

(A) An example of a protein-specific QTL, for the apolipoprotein L, 2 (APOL2) gene, detected by both the interaction model and the conditional models, indicating both larger effect (LRT, P = 3.3 × 10−6, interaction model; P = 5.1 × 10−13, conditional model) in protein than mRNA and that the effect on protein is not mediated by either mRNA or ribosome occupancy (LRT, P = 2.1×10−12, conditional model). Plotting details as in Fig. 1A. Although the causal variant underlying this pQTL is unknown, several linked variants near the 3′ end of APOL2 are all strongly associated with protein levels, including rs8142325 shown here and missense variant rs7285167 [linear model coefficient βg = 0.83, P = 9.8 × 10−9; LRT, P = 2.1 × 10−5, interaction model; P = 5.5×10−13, conditional model]. (B) Effect sizes for ribosome occupancy tend to track with RNA—not protein. (Top) effect sizes in all three phenotypes are shown for psQTLs. Effect sizes were estimated using linear regression in each of the phenotypes independently. The signs of the effects were set to be positive in protein. Solid lines reflect predicted effects based on a linear model. (Bottom) Similarly, effect sizes in all three phenotypes for esQTLs. Here, signs of the effects were set to be positive in RNA.

Using an alternative approach, an interaction model, we identified 68 psQTLs with significantly larger effects in protein than mRNA [according to a likelihood ratio test (LRT)] (FDR = 10%). We also used the interaction model to identify 76 expression-specific QTLs (esQTLs, interaction model, LRT; FDR = 10%). We then considered the ribosomal data. We found that the effect sizes for ribosomal occupancy are similar to the esQTL effect sizes (Fig. 2B). Yet, for psQTLs, low ribosome effect sizes are observed. Thus, for QTLs with discordant effects between transcript and protein, the ribosome data usually tracked with levels of RNA. Put together, these results allow us to identify loci where genetic variants have specific impacts on protein levels that are not fully mediated by regulation of either transcription or translation and hence may affect rates of protein degradation.

Finally, we performed enrichment analysis in which we considered each tested gene-SNP pair separately and evaluated the full distribution of P values from the conditional model (rather than choosing a significance threshold) for different genomic and functional annotations. SNPs within transcribed regions [exonic and in the untranslated region (UTR)] are enriched for more significant psQTL effects, compared with intergenic or intronic SNPs, even within the narrow 20-kb windows tested (figs. S18 to S20). In addition, psQTLs are further enriched for nonsynonymous sites (compared with all exonic SNPs) (Table 2).

Table 2 Enrichment of genomic annotations among esQTLs and psQTLs.

Enrichments were evaluated by a continuous test using QTL results from the conditional model (see supplementary materials). Columns (from left to right) describe the annotation being considered, the number of SNPs matching this annotation, the set of SNPs used as background for the corresponding test, and the enrichment P values for psQTLs and esQTLs, respectively.

View this table:

Investigating additional annotations (Table 2 and fig. S21), we found that nonsynonymous SNPs near acetylation sites showed nominal enrichment for psQTLs. This possibly reflects the functional role of lysine acetylation in modulating protein degradation (23). Overall, the enrichment results suggest that genetic variants involved in posttranscriptional regulation are functionally distinct from genetic variants that primarily affect transcription—they are more likely to fall within translated regions of the gene and more likely to occur at nonsynonymous sites.

In summary, we have shown that, although a substantial fraction of regulatory genetic variants influence gene expression at all levels from mRNA to steady-state protein abundance, there are also a number of effects with specific impact on particular expression phenotypes. QTLs affecting mRNA levels are, on average, attenuated or buffered at the protein level, as has been observed between species (14). Our analysis indicates that this attenuation is not evident at the stage of translation. Although the overall phenotypic similarity between ribosome occupancy and protein abundance is high, cis-regulatory genetic effects on ribosome occupancy appear to be more strongly shared with mRNA than with protein. These observations, along with the phenotype-specific QTL analysis, indicate a scarcity of translation-specific QTLs and minimal attenuation of genetic impact between mRNA and ribosome phenotypes.

Supplementary Materials

www.sciencemag.org/content/347/6222/664/suppl/DC1

Materials and Methods

Figs. S1 to S21

Tables S1 to S3

References (2543)

Data Tables S1 to S4

REFERENCES AND NOTES

  1. Acknowledgments: We thank M. Stephens, T. Flutre, and members of the Pritchard and Gilad labs for helpful discussions. This work was supported by NIH grants GM077959, HG007036, and MH084703. A.B. and J.K.P. were supported the the Howard Hughes Medical Institute. Z.K. was supported by F32HG006972. The proteomics data are available at ProteomeXchange (accession PXD001406). The ribosome profiling data are available at GEO (accession GSE61742). J.K.P. is on the Senior Advisory Board for 23andMe and DNAnexus and holds stock in both. A.B. holds stock in Google, Inc.
View Abstract

Navigate This Article