Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels

See allHide authors and affiliations

Science  01 Jun 2007:
Vol. 316, Issue 5829, pp. 1331-1336
DOI: 10.1126/science.1142358


New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D—in a noncoding region near CDKN2A and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1—and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases.

Type 2 diabetes, obesity, and cardiovascular risk factors are caused by a combination of genetic susceptibility, environment, behavior, and chance. Whole-genome association studies (WGAS) offer a new approach to gene discovery unbiased with regard to presumed functions or locations of causal variants. This approach is based on Fisher's theory for additive effects at common alleles (1); human heterozygosity being substantially attributable to common ancestral variants (2); and the hypothesis that variants influencing common, late-onset diseases of modernity may not have been subject to purifying selection, and has been made possible by genomic advances such as the human genome sequence, SNP and HapMap databases, and genotyping arrays (3).

We studied 1464 patients with T2D and 1467 controls from Finland and Sweden, each characterized for 18 clinical traits: anthropometric measures, glucose tolerance and insulin secretion, lipids and apolipoproteins, and blood pressure. The samples were both population-based [1022 T2D cases and 1075 euglycemic controls, matched on gender, age, body mass index, and region of origin] and family-based (326 sibships discordant for T2D; 442 cases and 392 euglycemic controls; tables S1 and S2).

Genotyping of 500,568 SNPs was attempted in each sample. Overall call rate for passing SNPs was 99.2%. After filtering rare and monomorphic variants (n = 69,696 SNPs) and applying stringent quality-control filters, high-quality genotypes for 386,731 common SNPs were obtained (4). To extend the set of putative causal alleles tested for association, we developed 284,968 additional multimarker (haplotype) tests based on these SNP genotypes (5, 6). The 671,699 allelic tests capture (correlation coefficient r2 ≥ 0.8) 78% of common SNPs in HapMap CEU (3).

Each SNP and haplotype test was assessed for association to T2D and each of 18 traits with the software package PLINK ( For T2D, a weighted meta-analysis was used to combine results for the population-based and family-based subsamples (4). For quantitative traits, multivariable linear or logistic regression with or without covariates was performed (4). Association results for each SNP, haplotype test, and phenotype are available (

In genome-wide analysis involving hundreds of thousands of statistical tests, modest levels of bias imposed on the null distribution can overwhelm a small number of true results. We used three strategies to search for evidence of systematic bias from unrecognized population structure, the analytical approach, and genotyping artifacts (7, 8). First, we examined the distribution of P-values in the population-based sample, observing a close match to that expected for a null distribution (genomic inflation factor λGC = 1.05 for T2D). Second, we calculated association statistics using EIGENSTRAT, an independent method based on principal components analysis (9); P-values for T2D derived with the two methods were nearly identical (r2 = 0.95, Fig. 1A). Third, 114 SNPs from the extreme tail of P-values for T2D were genotyped with an independent technology. Genotype concordance was 99.5%, indicating that even the extreme tail of low P-values is not substantially contaminated by genotyping artifacts.

Fig. 1.

P-value distribution for the association with type 2 diabetes. (A) P-values obtained from the Cochran-Mantel-Haenszel stratified test implemented in PLINK are plotted (as –log10 values) as a function of the corresponding P-value computed by EIGENSTRAT in the population-based case/control sample (n = 2097). These distributions are strongly correlated (r2 = 0.95). (B) P-P plot for the combined (Z score) association analysis of type 2 diabetes in the population-based case/control sample and the discordant sibships (n = 2,931). The P-values for the corresponding Z scores are plotted (as –log10 values) as a function of P-values from the expected (uniform) null distribution. The observed distribution matches the expected distribution closely and shows an excess in the tail at P <10–3.

Although the observed P-value distribution closely matches expectation over most of its range (1.0 > P > 0.01), an excess of low P-values is observed (Fig. 1B and table S3). To evaluate the significance of this excess, we generated 1000 permuted whole-genome analyses in which phenotype data were randomized within matched case-control groups (4). For P-values between 0.01 and 0.001, 6370 SNPs were observed, as compared to an average of 5917 [95% confidence interval (CI): 5714 to 6128] in permuted scans (Z = 4.3; P < 0.001). For P <10–4, 125 SNPs were observed, compared to 94 [95% CI: 61 to 121] in permuted scans (Z = 2.4, P < 0.02). These observations support a model in which there are few common variants with large effects, but a substantial number with modest effects of the sort that generate P-values between 0.01 and 10–7 in 3000 samples.

Given this distribution, and because WGAS are hypothesis generating, we sought replication in independent samples. An initial set of 107 SNPs (table S4) was selected on the basis of our study (n = 89) and by comparison of our results with WGAS of T2D (n = 18) by Wellcome Trust Case Control Consortium (WTCCC) (10) and Finland–United States Investigation of NIDDM Genetics (FUSION) (11). Each SNP was genotyped in 10,850 additional subjects (T2D and controls) from Sweden, Poland, and the United States (table S1) and analyzed for association to T2D under the same genetic model as the scan (4).

These results, with those from FUSION (11) and WTCCC/UKT2D (10, 12), identify SNPs at three previously unknown loci as influencing risk of T2D with P <10–10 (Table 1 and tables S4 and S5).

Table 1.

Association results for type 2 diabetes. Odds ratios (OR), 95% confidence intervals (CI), and P-values are given for SNPs from Diabetes Genetics Initiative (DGI) scan, replication samples, and data from the WTCCC/UKT2D (10, 12) and FUSION (11) studies. Proxies include: for rs7754840, rs10946398 (r2 = 1, WTCCC/UKT2D); for rs7903146, rs7901695 (r2 = 0.92, WTCCC/UKT2D); and for rs5219, rs5215 (r2 = 0.99, WTCCC/UKT2D).

View this table:

A SNP on chromosome 9p (rs10811661), 125 kb from the nearest annotated genes (CDKN2A/CDKN2B), was selected on the basis of strong association to T2D in our WGAS (rank #51) (Table 1; Fig. 2A). Combined analysis of data from our scan and replication samples provides strong evidence for association: odds ratio (OR) = 1.20, 95% CI 1.07 to 1.36, P = 5.4 ×10–8. Independent evidence of association for the same SNP, phenotype, and genetic model was obtained by WTCCC/UKT2D (P = 10–7) and FUSION (P = 0.001) (1012). No association with measured quantitative metabolic traits was observed in our scan or replication samples.

Fig. 2.

Regional plots of six confirmed associations. For each of the (A) CDKN2A/CDKN2B, (B) IGF2BP2, (C) CDKAL1, (D) TCF7L2, (E) HHEX regions associated with T2D, and (F) GCKR region associated with triglyceride levels, all genotyped SNPs in the DGI genome scan are plotted with their P-values (as –log10 values) as a function of genomic position (with NCBI Build 35). In each panel, the SNP with the most significant association in the DGI combined analysis is listed (blue diamond) and its initial P-value in the genome scan (red diamond). Estimated recombination rates (taken from HapMap) are plotted to reflect the local LD structure around the associated SNPs and their correlated proxies (red: r2 ≥ 0.8; orange: 0.5 ≤ r2 <0.8; gray:0.2 ≤ r2 < 0.5; white: r2 < 0.2). Gene annotations were taken from the University of California–Santa Cruz genome browser.

An intriguing aspect of this association is its location far from any annotated gene. The region of association is limited to a 9-kb region flanked by strong recombination hot-spots, in which there are multiple conserved noncoding sequences but no known genes or microRNAs. A member of the nearest gene cluster, cyclin-dependent kinase inhibitor-2A (CDKN2A), plays a role in pancreatic islet regenerative capacity (13).

SNPs in the second intron of IGF2BP2 were selected for replication on the basis of joint analysis of the three scans (1012) (Fig. 2B). Evidence was weak in our initial scan (P = 0.034 for rs4402960), but pronounced in the replication samples (P = 5.5 × 10–9, Table 1). Strong evidence was obtained for the same SNP, phenotype, and genetic model by WTCCC/UKT2D (P = 10–4) and FUSION (P = 10–4) (1012). These SNPs showed no association to measured quantitative metabolic traits in our scan or replication samples.

Insulin-like growth factor 2 binding protein 2 (IGF2BP2) belongs to a family of three mRNA binding proteins with affinity for leader elements in the untranslated regions of IGF-2 transcripts. Family members bind with weak sequence specificity and are implicated in transport of RNA targets to enable protein synthesis at specific locations in the cell (14). The IGFBP homolog is necessary for pancreas development in Xenopus (15), and IGF2BP3 transgenic mice exhibit acinar-ductal pancreatic metaplasia (16).

We selected a SNP in a 90-kb intron within CDKAL1 (rs7754840) for replication on the basis of nominal association in our scan, WTCCC (10), and FUSION (11) (Table 1). Analysis of the scan and replication samples (Fig. 2C) supports association under the same phenotype and genetic model (OR = 1.08, 95% CI 1.03 to 1.14, combined P = 0.0024; Table 2), as does evidence from WTCCC/UKT2D (10, 12) (P = 10–8) and FUSION (11) (P = 0.01) (Table 1). The risk allele was nominally associated with reduced insulin secretion in controls from our scan (P = 0.01 for insulinogenic index).

Table 2.

Associations results for lipid/apolipoprotein traits in the top 100 of the DGI genome scan.

View this table:

CDKAL1 is homologous to CDK5RAP1, an inhibitor of cyclin-dependent kinase CDK5; CDK5 transduces glucotoxicity signals in pancreatic beta cells (17). As with the other variants, how SNPs in CDKAL1 might influence risk of T2D awaits further investigation.

Common variation in an intron of TCF7L2 has been reproducibly associated with T2D (18). In our WGAS, TCF7L2 was the third-ranked association (Fig. 2D, P <3 × 10–6) and was among the top results in each of the three other well-powered whole-genome scans of T2D (10, 11, 19) (Table 1). The consistency of these findings suggests that TCF7L2 is the single largest effect of a common SNP on T2D risk in European populations. Associations in KCNJ11 (20) and PPARG (21) were not strongly observed in any single scan, but across the three scans provided P <10–10 and P <10–6, respectively (Table 1).

In 2007, Sladek et al. (19) reported four previously unknown associations to T2D in a WGAS, two with particularly strong evidence of replication (HHEX and SLC30A8). We confirm association at HHEX (Table 1; Fig. 2E) in our scan (OR = 1.15, P = 0.01) and in replication genotyping (P = 10–3), as do WTCCC/UKT2D (P = 10–6) (10, 12) and FUSION (P = 0.03) (11). At the zinc transporter SLC30A8, our data were less compelling (P = 0.90 in our scan and P = 0.01 in replication samples), but convincing evidence was obtained by WTCCC/UKT2D (10, 12) (P = 10–3) and FUSION (P = 10–5) (11). We observed no evidence for association at LOC387761 (n = 7401, OR = 1.00, P = 0.93 for rs7480010) and EXT2-ALX4 (n = 7401, OR = 1.06, P = 0.12 for rs3740878), nor was evidence obtained by WTCCC (10) or FUSION (11).

We observed intriguing replication signals at additional loci (10, 11). For example, rs17044137 in FLJ393370 was associated with T2D in our scan (OR = 1.27, P = 3.7 ×10–4) and replication (OR = 1.09, P = 3.1 ×10–3), but not in WTCCC (10) or FUSION (11) (Table 1). Similarly, rs6698181 in PKN2 demonstrated evidence in our scan and replication samples (P = 5.3 × 10–5), and in FUSION (11) (P = 10–3), but not WTCCC (10) (P = 0.93). Genotyping in more samples is needed to resolve these and other hypotheses.

For validated loci, variability in significance across studies may appear surprising, and suggestive of heterogeneity. However, formal tests for heterogeneity in effect size were not significant (P > 0.05). Moreover, in a simulated association study of 1500 cases and 1500 controls, allele frequency of 20%, and an OR of 1.20 per copy, the median P-value was 10–4, but ∼5% of simulations P-values were >0.025, and 5% of P-values were <10–7. Thus, substantial variability in rank and significance is expected where power is modest, particularly if a SNP is selected based on the study with an extreme P-value.

We also performed genome-wide analyses for 18 clinical traits (table S2). The distribution of P-values was similar to that observed for T2D, with close match to expectation under the null hypothesis and a modest excess of signals in the tail (table S3). We observed strong evidence (P <10–4, rank in the top 100) for six previously reported common variants that influence lipid levels (table S3).

A previously unknown association with triglycerides was observed for rs780094 (P = 3.7 × 10–8), explaining 1% of residual variance in triglyceride levels (Fig. 2F; fig. S1A). This single SNP was tested in 5217 individuals from the Malmö Diet and Cancer Study, Cardiovascular Arm (MDC-CVA); the association replicated (P = 8.7 × 10–8) (Table 2, fig. S1B). The association was observed by FUSION (P = 10–4 controls; P = 10–3 cases) (11).

SNP (Rs780094) is in a large block of LD, spanning 416 kb and 16 genes. The SNP resides, however, within a highly plausible biological candidate gene: glucokinase regulatory protein (GCKR). GCKR regulates glucokinase (GCK), the first glycolytic enzyme. Adenoviral-mediated overexpression of GCKR in mouse liver increased GCK activity and lowered fasting blood glucose (22); overexpression of GCK in liver led to lowered blood glucose and increased triglyceride levels (23, 24).

On the basis of these findings, we examined measures of glucose homeostasis. In both our scan and replication samples, the T allele of rs780094 trended toward association to lower glucose (P < 0.10, P < 0.02 respectively), less insulin resistance (HOMA-IR P <0.05 and P <0.01), and lower risk of T2D (P <0.20, P < 0.03). The association of higher triglycerides with lower blood glucose reverses the correlation normally seen in humans, but is consistent with overexpression studies of GCKR and GCK in mouse models.

In summary, we carried out WGAS for T2D and 18 clinical traits. With collaborators we provide compelling evidence for associations at three previously unknown loci with risk of T2D, the first replications of two additional T2D loci, and a previously unknown association to triglyceride levels. Including long-recognized associations, our data provide strong support for 15 common variants as influencing T2D and lipid levels in European populations. The annotations of the new T2D genes suggest a primary role of the pancreatic beta cell, but much additional work will be required to develop and test this hypothesis.

Our results have general implications for genome-wide association studies of common diseases. The modest effect of each SNP demonstrates that large sample sizes will be required to discover and validate genetic risk factors for common disease. Although the eight T2D variants discussed in this report each conveys a substantial population attributable risk (5 to 27% at each locus), each contributes very modestly to overall variance in diabetes risk (0.04 to 0.5%, ∼2.3% combined across the eight SNPs). Thus, many more variants remain to be found as risk factors for T2D, and many questions remain about the balance between common and rare variants, SNPs and copy-number alterations, main effects and epistasis. Additional associated variants may be found in or near these loci, as has been the case for other examples (2531).

The most notable aspect of this and other such studies may be the generation of new hypotheses. Before this work, few would have argued that these genes and noncoding genomic regions were a high priority for T2D research. Now, on the basis of their validated relationship to disease, it is evident that they should be explored and understood. The ability to discover etiological factors that fall outside previous biological hypotheses is a major motivation for unbiased genome-wide approaches and is well supported by these and other emerging data from genome-wide association studies.

Diabetes Genetics Initiative of Broad Institute of Harvard and MIT, Lund University, and Novartis Institutes of BioMedical Research:

Writing team: Richa Saxena1–6 (Team Leader), Benjamin F. Voight,1–3,5 Valeriya Lyssenko,7 Noël P. Burtt,1 Paul I. W. de Bakker,1–6 Hong Chen,8 Jeffrey J. Roix,8 Sekar Kathiresan,1,3,5 Joel N. Hirschhorn,1,6,9–11 Mark J. Daly,1–3,5 Thomas E. Hughes,8 Leif Groop,7,12 David Altshuler1–6 (Chair)

Project management: Noël P. Burtt,1 Leif Groop,7,12 Thomas E. Hughes,8 David Altshuler1–6

Study design: Richa Saxena1–6 and Valeriya Lyssenko7 (Team Leaders), Peter Almgren,7 Paul I. W. de Bakker,1–6 Noël P. Burtt,1 Jose C. Florez,1–6 Hong Chen,8 Joanne Meyer,8 Joel N. Hirschhorn,1,6,9–11 Mark J. Daly,1–3,5 Thomas E. Hughes,8 Leif Groop,7,12 David Altshuler1–6 (Chair)

Clinical characterization and phenotypes: Valeriya Lyssenko7 and Richa Saxena1–6 (Team Leaders), Peter Almgren,7 Kristin Ardlie,1 Kristina Bengtsson Boström,13 Noël P. Burtt,1 Hong Chen,8 Jose C. Florez,1–6 Bo Isomaa,14,15 Sekar Kathiresan,1,3,5 Guillaume Lettre,1,6,9–11 Ulf Lindblad,16 Helen N. Lyon,1,6,9–11 Olle Melander,7 Christopher Newton-Cheh,1–3,5 Peter Nilsson,17 Marju Orho-Melander,7 Lennart Råstam,16 Elizabeth K. Speliotes,1,3,6,9–11 Marja-Riitta Taskinen,12 Tiinamaija Tuomi,12,15 Benjamin F. Voight,1–3,5 David Altshuler,1–6 Joel N. Hirschhorn,1,6,9–11 Thomas E. Hughes,8 Leif Groop7,12 (Chair)

DNA sample QC and diabetes replication genotyping: Candace Guiducci1 and Valeriya Lyssenko7 (Team Leaders), Anna Berglund,7 Joyce Carlson,18 Lauren Gianniny,1 Rachel Hackett,1 Liselotte Hall,18 Johan Holmkvist,7 Esa Laurila,7 Marju Orho-Melander,7 Marketa Sjögren,7 Maria Sterner,18 Aarti Surti1 Margareta Svensson,7 Malin Svensson,7 Ryan Tewhey,1 Noël P. Burtt1 (Chair)

Whole genome scan genotyping: Brendan Blumenstiel1 (Team Leader), Melissa Parkin,1 Matthew DeFelice,1 Candace Guiducci,1 Ryan Tewhey,1 Rachel Barry,1 Wendy Brodeur,1 Noël P. Burtt,1 Jody Camarata,1 Nancy Chia,1 Mary Fava,1 John Gibbons,1 Bob Handsaker,1 Claire Healy,1 Kieu Nguyen,1 Casey Gates,1 Carrie Sougnez,1 Diane Gage,1 Marcia Nizzari,1 David Altshuler,1–6 Stacey B. Gabriel1 (Chair)

GCKR replication genotyping and analysis (Malmö Diet and Cancer Study): Sekar Kathiresan1,3,5 (Team Leader), Candace Guiducci,1 Aarti Surti,1 Noël P. Burtt,1 Olle Melander,7 Marju Orho-Melander7 (Chair)

Statistical analysis: Benjamin F. Voight1–3,5 and Paul I. W. de Bakker1–6 (Team Leaders), Richa Saxena,1–6 Valeriya Lyssenko,7 Peter Almgren,7 Noël P. Burtt,1 Hong Chen,8 Gung-Wei Chirn,8 Qicheng Ma,8 Hemang Parikh,7 Delwood Richardson,8 Darrell Ricke,8 Jeffrey J. Roix,8 Leif Groop,7,12 Shaun Purcell,1,2 David Altshuler,1–6 Mark J. Daly1–3,5 (Chair)

1Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142, USA. 2Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA02114, USA. 3Department of Medicine, Massachusetts General Hospital, Boston, MA 02114, USA. 4Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA. 5Department of Medicine, Harvard Medical School, Boston, MA 02115, USA. 6Department of Genetics, Harvard Medical School, Boston, MA 02115, USA. 7Department of Clinical Sciences, Diabetes and Endocrinology Research Unit, University Hospital Malmö, Lund University, Malmö, Sweden. 8Diabetes and Metabolism Disease Area, Novartis Institutes for BioMedical Research, 100 Technology Square, Cambridge, MA 02139, USA. 9Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA. 10Division of Endocrinology, Children's Hospital, Boston, MA02115, USA. 11Division of Genetics, Children's Hospital, Boston, MA02115, USA. 12Department of Medicine, Helsinki University Hospital, University of Helsinki, Helsinki, Finland. 13Skaraborg Institute, Skövde, Sweden. 14Malmska Municipal Health Center and Hospital, Jakobstad, Finland. 15Folkhälsan Research Center, Helsinki, Finland. 16Department of Clinical Sciences, Community Medicine Research Unit, University Hospital Malmö, Lund University, Malmö, Sweden. 17Department of Clinical Sciences, Medicine Research Unit, University Hospital Malmö, Lund University, Malmö, Sweden. 18Clinical Chemistry, University Hospital Malmö, Lund University, Malmö, Sweden. 19Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02115, USA.

Supporting Online Material

Materials and Methods

Figs. S1 and S2

Tables S1 to S6


References and Notes

Stay Connected to Science

Navigate This Article