A Regulatory SNP Causes a Human Genetic Disease by Creating a New Transcriptional Promoter

See allHide authors and affiliations

Science  26 May 2006:
Vol. 312, Issue 5777, pp. 1215-1217
DOI: 10.1126/science.1126431


We describe a pathogenetic mechanism underlying a variant form of the inherited blood disorder α thalassemia. Association studies of affected individuals from Melanesia localized the disease trait to the telomeric region of human chromosome 16, which includes the α-globin gene cluster, but no molecular defects were detected by conventional approaches. After resequencing and using a combination of chromatin immunoprecipitation and expression analysis on a tiled oligonucleotide array, we identified a gain-of-function regulatory single-nucleotide polymorphism (rSNP) in a nongenic region between the α-globin genes and their upstream regulatory elements. The rSNP creates a new promoterlike element that interferes with normal activation of all downstream α-like globin genes. Thus, our work illustrates a strategy for distinguishing between neutral and functionally important rSNPs, and it also identifies a pathogenetic mechanism that could potentially underlie other genetic diseases.

The human α-globin cluster, located at the telomeric region of chromosome 16 (16p13.3), includes an embryonic gene (ζ), two minor α-like genes [αD (also called μ) and θ], two α genes (α2 and α1), and two pseudogenes (ψα1 and ψζ) (1, 2). Upstream of these genes are four highly conserved cis elements (MCS-R1 to MCS-R4) of which MCS-R2 (also known as HS-40) plays the major role in regulating expression of the cluster (2, 3) (Fig. 1). Previous analyses of mutations that down-regulate globin gene expression and cause thalassemia have elucidated many of the general mechanisms underlying human molecular disease (4). Down-regulation of one or two of the four α-globin genes (αα/αα) causes anemia with mild red blood cell changes; so-called α thalassemia trait. However, when α-globin gene expression is reduced to less than ∼50% of normal, excess β-globin chains form tetramers (β4, called HbH), which precipitate in the red blood cell, causing a more severe form of anemia called HbH disease (5). In nearly all cases of α thalassemia, the molecular basis for their reduced levels of α-globin expression can be readily identified (4, 5).

Fig. 1.

Overview of the α-globin cluster and identification of a rSNP. The genes located in the telomeric region of chromosome 16 are numbered as in (1), and the globin genes are labeled. The VNTR (3′ hypervariable region) is shown as a red zigzag line. A deletion (15) removing the region containing the rSNP is shown as a black line. Below this, all DNAse1 hypersensitive sites (DHSs) and erythroid-specific sites (eDHSs) are shown (3, 10, 16). MCS-P/R summarizes all evolutionarily conserved promoter and regulatory sequences across this region (2). Probes used to profile ChIP products are shown in pink, and repeats are shown in green. Below this, all sequence differences between the (αα)T and wild-type αα chromosome are shown. “New Diffs” refers to newly identified sequence differences that are not known to be polymorphic SNPs. SNPs analyzed in genetic linkage studies described in this paper are shown in purple. The rSNP described here is shown as a black diamond in “All Diffs” and “New Diffs.” A dashed vertical line runs from these diamonds through the array data. Below, the patterns of gene expression recorded on a custom-tiled Affymetrix array spanning this telomeric region in primary erythroid cells from (A) a normal individual (αα/αα) and (B) patient L with the (αα)T/(αα)T genotype are shown. The peak of ζ-globin expression in the (αα)T chromosome results from cross-hybridization to the highly expressed abnormal transcripts across the homologous ψζ gene. (C) Estimates of the differences in RNA expression between normal and abnormal chromosomes, based on independent quantitative PCR (QPCR), are shown below (on a logarithmic scale). (D) Representation of how one or more of the conserved regulatory elements (contained within the region spanned by the horizontal black bar) normally interact with the α-globin promoters [αα] and how they are proposed to interact less effectively (dashed lines) in the abnormal (αα)T chromosome. The direction of transcription of the globin genes and the new promoter, created by the C allele of SNP 195, are indicated by the arrows.

We have studied 148 individuals from Melanesia with α thalassemia, including 5 with HbH disease, in whom none of the previously described molecular defects could be found. The pattern of inheritance suggested that individuals with HbH disease are homozygotes for a codominant defect, referred to here as (αα)T, causing α thalassemia with a predicted genotype of (αα)T/(αα)T (table S1). To determine which process in gene expression had been affected, we analyzed a Melanesian individual (patient L, table S1) with a well-defined phenotype of HbH disease. In situ RNA hybridization to detect primary transcripts in erythroid cells from patient L detected substantially fewer nuclear transcripts from the α-globin genes than from the β-globin genes (Fig. 2), which is consistent with a mutation reducing α-globin RNA transcription.

Fig. 2.

In situ RNA analysis demonstrating reduced primary α-globin transcripts in patient L. Nascent α-globin (red) and β-globin (green) transcripts in intermediate erythroblasts from a normal control and from patient L [with the (αα)T/(αα)T genotype] are shown. (Left) Representative nuclei show β-globin transcripts in both patient and control, but α-globin transcripts are present only in the normal control. (Right) The proportion of nuclei containing none, one, or two signals were recorded from the analysis of 100 cells.

DNA fluorescence in situ hybridization studies in two affected individuals showed that the α-globin cluster was present at its normal location at the tip of chromosome 16. Extensive analysis of the α-globin cluster and the surrounding 300 kb revealed no evidence for any deletions or chromosomal rearrangements in the patients with α thalassemia. Where tested, the pattern of DNA methylation appeared normal. Sequence analysis of the major (α2 and α1) and minor (αD and θ) α-like genes and their regulatory elements revealed only the wild-type sequences or known neutral single-nucleotide polymorphisms (SNPs).

Having excluded all currently known α thalassemia mutations, we reasoned that the Melanesian form was either due to a cis-acting mutation in a previously unrecognized regulatory element or resulted from a gain-of-function mutation that negatively regulates α-globin expression. Alternatively, it was possible that α thalassemia in these individuals was due to a trans-acting mutation. By analyzing linkage to a variable number of tandem repeats (VNTR) (6) located ∼8.5 kb from the α-globin genes (Fig. 1), we found that all individuals with the (αα)T mutation shared a common VNTR allele (fig. S1), demonstrating that this is a cis-linked defect. Further association studies, using known SNPs, showed that the (αα)T haplotype extends from the 16p telomere, with loss of association immediately downstream of the α-globin cluster (coordinate 168,467 in Fig. 1) defining the centromeric border of the region containing the cis-acting mutation. We estimated that the frequency of the (αα)T defect in the island population is ∼0.04 (fig. S1).

We therefore resequenced the (αα)T haplo-type by isolating bacterial artificial chromosomes (BACs) from a library constructed from the peripheral blood DNA of patient L with the Melanesian type of HbH disease [(αα)T/(αα)T]. BACs spanning the α-globin cluster and the surrounding ∼213 kb of DNA (coordinates 21,059 to 234,236) were sequenced (DQ431198), and we identified 283 SNPs and/or sequence differences (Fig. 1) by comparison with the current wild-type sequence (National Center for Biotechnology Information database build 35, coordinates 1 to 223478), consistent with estimates of the frequency of SNPs throughout the genome (7). This now presented a situation analogous to a common, largely unsolved problem in human genetics: how to identify a functionally important single nucleotide change from all other SNPs within a relatively large (∼213 kb) genomic interval (8, 9).

To search for functional changes associated with these SNPs, we constructed a tiled array representing all regions of nonrepetitive DNA throughout the terminal 223.5 kb of chromosome 16. RNA expression profiles obtained with the use of complementary DNA from normal (αα/αα) or mutant [(αα)T/(αα)T] erythroblasts were compared. Two prominent differences were observed in the mutant erythroblasts (Fig. 1). First a major new peak of RNA transcription (beyond the quantitative range of the array) from the same DNA strand as α-globin (fig. S2) was observed between coordinates 149,682 and 153,390 (Fig. 1, A and B). Quantitative reverse transcription polymerase chain reaction (RT-PCR) showed that expression from this region was >1000 fold higher in the mutant than in the wild-type chromosome (Fig. 1C). Second, by RT-PCR we observed an ∼80-fold decrease in expression of the αD gene immediately downstream of this peak (Fig. 1, A and B). The decreased level of α2 and α1 gene expression detected by quantitative RT-PCR (table S2) was not detected on the array, again because globin expression lies beyond the quantitative range. No other substantial differences in the pattern of RNA expression were seen across the 223.5-kb region (Fig. 1, A and B).

The region underlying this new peak of expression is unremarkable, containing 3.7 kb of poorly conserved, predominantly noncoding sequence, although the tail of the peak extends into the ψζ-globin gene. This region contains 17 SNPs, 10 of which have been previously characterized in nonthalassemic individuals. We therefore analyzed the segregation of the remaining seven SNPs and, as controls, six additional SNPs from nonrepetitive regions of the α cluster (Fig. 1), within affected families. In addition, we performed genetic linkage studies in 15 nonthalassemic Melanesian individuals (αα/αα), 22 with α thalassemia trait [αα/(αα)T], and 5 with HbH disease [(αα)T(αα)T]. Six of the seven SNPs underlying the new peak of transcription were found on both the normal αα and abnormal (αα)T chromosomes. Only the C allele of SNP 195 (C or T, located at coordinate 149709) segregated with thalassemia in the affected families and showed complete association with the (αα)T haplotype (table S2). This allele was not found in a separate analysis of 131 nonthalassemic, Melanesian individuals. SNP 195 changes the sequence 5′-TAATAA-3′ (T allele) to 5′-TGATAA-3′ (C allele), potentially creating a new binding site for the key erythroid transcription factor GATA-1. Conventional in vitro electromobility gel shift assays and supershifts, using an antibody to GATA-1, demonstrated that this SNP creates a potential GATA-1 binding site (fig. S3). A chromatin immunoprecipitation (ChIP) profile using quantitative real-time PCR across the α-globin cluster (coordinates 53195 to 185030) showed that in addition to binding the known regulatory elements, GATA-1 also binds at the C allele of SNP 195 in vivo (Fig. 3). The C allele also nucleates the binding of a pentameric erythroid complex including the transcription factors SCL, E2A, LMO2, and Ldb-1 (Fig. 3), which are frequently found with GATA-1 at erythroid regulatory elements (10, 11). ChIP profiles using antibodies that recognize modified histones [H4Ac, H3Ac, and H3K4me2 (Fig. 3 and fig. S4)] demonstrated that binding of GATA-1 at the C allele is associated with a new peak of active chromatin in the α-globin cluster. Finally, we showed that the C allele, unlike the T allele, binds RNA polymerase II (Fig. 3).

Fig. 3.

Chromatin immunoprecipitation demonstrating the acquisition of a new transcription factor binding site (arrowed). The new binding site is located at coordinate 149709. Names of transcription factors and chromatin modifications are shown at left. Chromatin immunoprecipitation was performed as previously described (10) using primers and antibodies described in the supporting online material (17). The degree of enrichment in a normal individual (black columns) and in an individual with the (αα)T/(αα)T geno-type (white columns) is shown on the y axis, and coordinates of the regions sampled by QPCR are shown on the x axis. Asterisks indicate where insufficient primary cells were available for analysis.

Expression of the α-globin genes normally occurs late in erythropoiesis after what appears to be a well-defined order of transcription factor binding to the upstream regulatory elements (MCS1 to MCS4), followed by recruitment of the pre-initiation complex and RNA polymerase II. These events are thought to result in the formation of a DNA/protein complex including one or more of the regulatory elements and the α-globin promoter(s) (10). We and others have shown that the insertion of active heterologous promoters (such as PGK Neo) in some regions of the α-globin cluster can disrupt α-globin expression, probably as a result of preferential interaction of the heterologous promoter with the upstream elements, out-competing the endogenous α-globin promoters (1214). SNP 195 creates a new promoterlike element between the upstream regulatory elements and their cognate promoters. This element, when activated, causes significant down-regulation of the αD, α2, and α1 genes that lie downstream (Fig. 1D), thereby causing α thalassemia.

These findings not only demonstrate an additional mechanism causing human genetic disease but also illustrate two important points when searching for SNPs that may influence gene expression (9). First, to distinguish functional from nonfunctional SNPs, it has been suggested that searches should be concentrated in areas of the genome likely to contain cis-regulatory elements (8) (such as multispecies conserved elements). The gain-of-function regulatory SNP (rSNP) identified here, located in a region of the α-globin cluster that we know may be deleted with no discernible effect on α-globin expression (Fig. 1) (15), demonstrates that SNPs in such areas should not be dismissed as of no potential importance. Second, the use of densely tiled arrays for analysis of transcription and ChIP profiles provides a rapid and efficient in vivo strategy to distinguish nonfunctional from functional rSNPs that may underlie the altered patterns of expression responsible for a wide range of human genetic diseases.

Supporting Online Material

Materials and Methods

Figs. S1 to S4

Tables S1 and S2


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article