Report

Common Regulatory Variation Impacts Gene Expression in a Cell Type–Dependent Manner

See allHide authors and affiliations

Science  04 Sep 2009:
Vol. 325, Issue 5945, pp. 1246-1250
DOI: 10.1126/science.1174148

Abstract

Studies correlating genetic variation to gene expression facilitate the interpretation of common human phenotypes and disease. As functional variants may be operating in a tissue-dependent manner, we performed gene expression profiling and association with genetic variants (single-nucleotide polymorphisms) on three cell types of 75 individuals. We detected cell type–specific genetic effects, with 69 to 80% of regulatory variants operating in a cell type–specific manner, and identified multiple expressive quantitative trait loci (eQTLs) per gene, unique or shared among cell types and positively correlated with the number of transcripts per gene. Cell type–specific eQTLs were found at larger distances from genes and at lower effect size, similar to known enhancers. These data suggest that the complete regulatory variant repertoire can only be uncovered in the context of cell-type specificity.

Variation influencing gene expression can manifest itself as gene expression differences among populations, among individuals in a population, among tissues, and in response to environmental factors. The genetic basis of the first two types of gene expression variation has been investigated, with the quantification of mRNA in one tissue and the identification of genetic variants correlated with the variation of expression quantitative trait loci (eQTLs) in a single or multiple populations (17). The complexity in higher eukaryotes, however, results in a vast set of highly specialized cell types and tissues. Some genes exhibit ubiquitous patterns of expression; others display tissue-specific activity (810). Although some studies have identified eQTLs in human (1113) and mammalian tissues (14, 15), we know of no systematic study that has compared eQTLs across different cell types while controlling for confounding associations (population samples, differences in technology, or statistical methodology). Documenting tissue-specific genetic control of gene expression variation may connect cellular activities in health and disease (11, 13, 16). Efforts to use genomic information to interpret the biological effects of such variants are hindered by the limited availability of the relevant human tissues.

We investigated 85 individuals of the GenCord project (a collection of cell lines from umbilical cords of individuals of Western European origin) to identify cis eQTLs involved in gene expression variation in three cell types: primary fibroblasts, Epstein-Barr virus–immortalized B cells (lymphoblastoid cell lines or LCLs), and T cells. Umbilical cord was chosen because it is readily available and allows the acquisition of multiple cell types. Sample collection was performed on full-term or near–full-term pregnancies to ensure the homogeneity of sample ages.

mRNA levels were quantified in primary fibroblasts, LCLs, and primary T cells for 48,804 probes with the Illumina WG-6 v3 expression array. We analyzed data from 22,651 probes uniquely mapping to 17,945 autosomal RefSeq genes [15,596 Ensembl genes from the National Center for Biotechnology Information (NCBI) Reference Sequence]. Samples were genotyped on the Illumina 550K single-nucleotide polymorphism (SNP) array. After quality control was performed (SNPs with missing data were removed) and a minor allele–frequency filter (MAF ≥ 5%) was applied, 394,651 SNPs were used for association testing. Principal components analysis (PCA) detected 10 potential outlier individuals from the genotype data (17) (fig. S1), which subsequently were removed from the analysis. eQTL discovery and all other properties of the results for 75 versus 85 individuals were almost identical (fig. S2).

We explored associations in cis, by testing all SNPs within a 2-megabase window centered on the transcription start site (TSS) of a gene. Using the Spearman rank correlation (SRC), we tested for associations between SNP genotype and mRNA levels (intensities), after normalization and transformation. SRC performs similarly to linear regression (7), but is not sensitive to outliers in the gene expression data, which reduce power. A total of 6,083,130 tests were performed, and significance thresholds for each gene were assigned through 10,000 permutations of expression values, as described (6, 7, 18). For 75 individuals at the 0.001 permutation threshold (PT), we discovered 427, 442, and 430 genes with significant cis associations in fibroblasts, LCLs, and T cells, respectively, with an estimated false-discovery rate (FDR) of 4% (fig. S2, A and B; and tables S1 and S2). For the less stringent PT of 0.01, we discovered 2146, 2155, and 2046 genes with an estimated FDR of 7% (Fig. 1) (19). The range of allelic effects was estimated by comparing the median expression in the two homozygote classes for the eQTL SNPs, and the 95% confidence limits of fold change were between 1.07 and 2.65 (fig. S7).

Fig. 1

Genome-wide map of cis eQTLs in three cell types; cis eQTLs at 0.001 PT are shown as color-coded lines on their corresponding chromosomal location. Internal black lines represent genes with eQTLs in all cell types.

We assessed whether we can replicate the LCL eQTLs from previous studies. eQTLs from the CEU HapMap LCLs [selected by the CEPH (Centre d'Etude du Polymorphisme Humain) and from Utah (CEU)] (7, 17) were replicated in the GenCord LCLs. Both populations are of European descent and share similar allele frequency spectra. Because of the differences in probe sequence content between the Illumina v1 array (used for the CEU HapMap) and the v3 array (used here), we could only compare a small subset of those SNP-probe associations. Of the 5898 SNP-probe pairs surviving the 0.001 PT in the CEU HapMap sample, 137 SNP-probe pairs (44 probes, some associated with multiple SNPs) were also tested in GenCord LCLs. Of the 137 SNP-probe tests, 114 had P values of less than 0.001 (83%) (fig. S3). Therefore, previously detected eQTLs were well replicated, despite the long separation time between tests of these cell lines, which demonstrated the stability of transformed B cells.

We interrogated the cell-type specificity of regulatory effects by exploring genes with cis eQTLs that were (i) shared in all three cell types, (ii) shared in two cell types, and (iii) cell-type specific. At the 0.001 PT, we identified a nonredundant set of 1007 genes with cis eQTLs of which 86 (8.5%) were shared among all three cell types, 120 (12.0%) were shared in two of the cell types, and 801 (79.5%) were cell-type specific (tables S1 and S2). The proportion of cell type–specific eQTLs was similar to previous estimates of eQTL tissue specificity and alternative splicing reported in a study interrogating two tissue types sampled, however, from different individuals (20).

The degree of eQTL sharing across cell types (fig. S4) is overestimated because the eQTLs for the same gene are not necessarily identical genetic variants (see below). Of the genes with cis eQTLs common to two or more cell types, 124 (12.3%) were shared between fibroblasts and LCLs, 121 (12.0%) were shared between fibroblasts and T cells, and 133 (13.2%) were shared between LCLs and T cells. Increased eQTL sharing between LCLs and T cells is most likely due to the similarity of these cell types. We observed a striking prominence of tissue specificity with 268 (26.6% of total), 271 (26.9%), and 262 (26.0%) of gene eQTL associations found only in fibroblasts, LCLs, and T cells, respectively.

To test if the eQTL cell-type specificity arises from differential expression between cell types, we compared medians and variance of gene expression. We found that genes with cell type–specific eQTLs had significantly higher expression variance in the eQTL cell type (Mann-Whitney U test, P < 0.0001 for all comparisons). Medians for the same genes were either marginally significantly different or not different, which means that genes included in this analysis were largely expressed in all cell types. This suggests that the majority of cell-type specificity is not a result of differential gene expression levels between cell types, but due to cell type–specific use of regulatory elements.

To dissect the overlap of cis eQTLs across cell types, we compared the direction of the allelic effect for eQTLs significant in two or more cell types. The direction (sign of Spearman ρ) was in complete agreement for all pairwise cell-type comparisons at the 0.001 PT (fig. S5). Thus, regulatory variants are active across cell types in the same manner. To assess the strength of cell-type specificity, we performed repeated-measures ANOVA (RMA). Cell-type specificity was reflected in the SNP × cell type–interaction term (the part of the equation that tests SNP’s effects dependent on the cell type), where cell type–specific associations are expected to be significant. We found 61% enrichment of low P values in cell type–specific eQTLs [quantified by estimation of FDR (21)] (fig. S6). No enrichment was observed for cell type–shared eQTLs. RMA, however, is limited, as the power to detect an interaction term is never maximized because of the lack of allelic effect reversal between cell types.

We further used allele specific–expression assays to validate a subset of cell type–specific eQTLs. We measured the ratio of the two alleles of transcript SNP in RNA samples of individuals who were double heterozygotes for both the eQTL and the transcript SNP. For 35 transcript SNPs (7 from fibroblasts, 14 from LCLs, and 14 from T cells), we observed extensive allelic imbalance (ratio of the abundance of the transcripts of the two alleles) for the eQTL cell type; this was highly significantly different from the ratios of the same SNPs in cell types without the eQTL (paired t test, P = 5.6 × 10–7) (fig. S7). Therefore, the eQTL cell-type specificity has been experimentally confirmed.

Shared associations among individual cell types increased slightly at relaxed significance thresholds for one cell type, because of the so-called “winner’s curse” (2224) (fig. S8). This states that the effect sizes discovered when applying specific statistical significance thresholds are inflated compared with the true effect size. Consequently, the discovery sample usually achieves higher significance than replication samples. Even with relaxed thresholds, however, over half of the associations we detected remain cell-type specific. We selected significant SNP-probe pairs from one cell type and explored their nominal (uncorrected) P-value distribution in the other two cell types. These distributions were enriched for low P values and so reflected those associations that are shared between cell types (Fig. 2). When we removed SNP-probe associations with significant associations in the secondary cell type (i.e., shared associations at the same and lower significance threshold), the resulting nominal P-value distributions demonstrated only small enrichment for low P values. We quantified the fraction of significant cis eQTLs from one cell type that is not nominally significant (P > 0.05 before correction) in either of the other two cell types. We estimate that 54, 50, and 54% of cis eQTLs in fibroblasts, LCLs, and T cells, respectively, are cell-type specific, which amounts to 69% of all cis eQTLs at 0.001 PT. Therefore, the limited overlap of cis eQTLs between cell types is unlikely to result from the winner’s curse and supports the conclusion that a substantial fraction of eQTLs are cell-type specific.

Fig. 2

SNP-probe pair nominal (uncorrected) P-value distributions for the two secondary cell types (label next to y axis) conditional on the reference cell type eQTL (leftmost label), significant at 0.001 PT are shown. The panels on the horizontal axis correspond (from left to right) to (i) the full P-value distribution of the secondary cell type, (ii) the P-value distribution after excluding significant eQTLs at 0.001 PT in secondary cell type, and (iii) similar to (ii) at 0.01 PT.

As previously observed (7, 25), we found that the strength and density of cis eQTLs decay symmetrically with increasing distance from the corresponding gene’s TSS (Fig. 3A). To better understand the independent regulatory effects, we mapped eQTLs into recombination hotspot intervals and, subsequently, further controlled for linkage disequilibrium (19, 26). At the 0.001 PT, we observed that 5.1% of associated genes have more than one independent interval carrying an eQTL (table S3). To further dissect the regulatory variant sharing between genes, we compared the overlap of independent eQTLs (i.e., regulatory intervals, rather than genes) across cell types. When all intervals with a 0.001 PT eQTL were considered, only 6.9% were found to be shared across all three cell types. In addition, 9.7% were shared in two cell types, and 83.4% were cell-type specific (Fig. 4A and table S4). The degree of overlap increased as independent eQTLs for genes with shared expression associations across cell types were analyzed (Fig. 4, B and C). In all cases, however, a substantial fraction of independent eQTLs were cell-type specific.

Fig. 3

Localization of independent cis eQTLs. Distance to TSS of (A) all independent cis eQTLs in each cell type (0.001 PT); (B) cis eQTLs shared in all three cell types (0.001 PT); and (C) cell type–specific cis eQTLs (0.001 PT).

Fig. 4

Fine-scale overlap of regulatory signals in three cell types. (A) Cell type–shared and cell type–specific independent cis eQTLs (regulatory intervals) for all genes (n = 1007) with a significant association at the 0.001 PT. (B) Genes (n = 86) significant in at least two cell types. (C) Genes (n = 206) significant in all three cell types.

Cell type–shared eQTLs tend to have larger effects (Spearman ρ) and higher significance and to cluster tightly around the TSS (Fig. 3B and fig. S9). Cell type–specific eQTLs, however, have lower effect sizes and are more widely distributed around the TSS (Fig. 3C). This is in agreement with the finding that enhancer elements, which tend to be found at greater distances from the gene, show greater tissue-specificity than basic regulatory elements (27). The number of eQTLs per gene was significantly correlated with number of transcripts per gene (Pearson’s correlation coefficient = 0.049, P = 0.117 at 0.001 PT and Pearson’s correlation coefficient = 0.105, P < 0.0001 at 0.01 PT). This suggests that regulatory complexity correlates with transcript complexity. Single-eQTL SNPs were also found to influence expression of multiple genes. At the 0.001 (and 0.01) PT, over 6% (19%) of eQTL SNPs were associated with the expression of more than one gene (fig. S10).

We used gene ontology (GO) (28) to compare the properties of cell type–specific and cell type–shared genes. We found an overrepresentation of functions linked to signal transducer activity, cell communication, development, behavior, cellular process, enzyme regulator activity, transcription regulator activity, and response to stimulus, which reflect processes likely to sculpt cell type–specific profiles. For eQTLs shared in all cell types, we found an overrepresentation of catalytic activity and transport properties (Fisher’s exact test, P < 0.05) (table S5).

We have demonstrated that variants affecting gene regulation act predominantly in a cell type–specific manner, and even cell types as closely related as LCLs and T cells share only a minority of cis eQTLs. We estimate that 69 to 80% of regulatory variants are cell-type specific and that regulatory variant complexity correlates with transcript complexity, which implies that there are genotype-specific effects on alternative transcript choice. In addition, cell type–specific eQTLs have smaller effects and tend to localize at greater distances from the TSS, recapitulating enhancer element distributions. The signal of cell-type specificity was shown to be primarily due to differential use of regulatory elements of genes that are expressed in almost all cell types. As more tissues are interrogated, we expect diminishing returns in discovery of eQTLs, and it is possible that there is a minimum set of informative tissues for the majority of regulatory variants. Our study highlights the need for extensive interrogation of regulatory variation in multiple cell types and tissues to elucidate their differential functional properties.

Supporting Online Material

www.sciencemag.org/cgi/content/full/1174148/DC1

Materials and Methods

Figs. S1 to S10

Tables S1 to S5

References

  • * These authors contributed equally to this work.

  • Present address: Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, CH-1211, Switzerland.

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. We thank N. Hammond for technical help. We acknowledge financial support from the Wellcome Trust and NIH to E.T.D. and Infectigen Foundation, Swiss National Science Foundation and AnEUploidy EU to S.E.A. Gene expression data are deposited in NCBI’s Gene Expression Omnibus under accession number GSE17080.
View Abstract

Navigate This Article