Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude

See allHide authors and affiliations

Science  02 Jul 2010:
Vol. 329, Issue 5987, pp. 75-78
DOI: 10.1126/science.1190371


Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18× per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1), a transcription factor involved in response to hypoxia. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.

The expansion of humans into a vast range of environments may have involved both cultural and genetic adaptation. Among the most severe environmental challenges to confront human populations is the low oxygen availability of high-altitude regions such as the Tibetan Plateau. Many residents of this region live at elevations exceeding 4000 m, experiencing oxygen concentrations that are about 40% lower than those at sea level. Ethnic Tibetans possess heritable adaptations to their hypoxic environment, as indicated by birth weight (1), hemoglobin levels (2), and oxygen saturation of blood in infants (3) and adults after exercise (4). These results imply a history of natural selection for altitude adaptation, which may be detectable from a scan of genetic diversity across the genome.

We sequenced the exomes of 50 unrelated individuals from two villages in the Tibet Autonomous Region of China, both at least 4300 m in altitude (5). Exonic sequences were enriched with the NimbleGen (Madison, WI) 2.1M exon capture array (6), targeting 34 Mb of sequence from exons and flanking regions in nearly 20,000 genes. Sequencing was performed with the Illumina (San Diego, CA) Genome Analyzer II platform, and reads were aligned by using SOAP (7) to the reference human genome [National Center for Biotechnology Information (NCBI) Build 36.3].

Exomes were sequenced to a mean depth of 18× (table S1), which does not guarantee confident inference of individual genotypes. Therefore, we statistically estimated the probability of each possible genotype with a Bayesian algorithm (5) that also estimated single-nucleotide polymorphism (SNP) probabilities and population allele frequencies for each site. A total of 151,825 SNPs were inferred to have >50% probability of being variable within the Tibetan sample, and 101,668 had >99% SNP probability (table S2). Sanger sequencing validated 53 of 56 SNPs that had at least 95% SNP probability and minor allele frequencies between 3% and 50%. Allele frequency estimates showed an excess of low-frequency variants (fig. S1), particularly for nonsynonymous SNPs.

The exome data was compared with 40 genomes from ethnic Han individuals from Beijing [the HapMap CHB sample, part of the 1000 genomes project (], sequenced to about fourfold coverage per individual. Beijing’s altitude is less than 50 m above sea level, and nearly all Han come from altitudes below 2000 m. The Han sample represents an appropriate comparison for the Tibetan sample on the basis of low genetic differentiation between these samples (FST = 0.026). The two Tibetan villages show minimal evidence of genetic structure (FST = 0.014), and we therefore treated them as one population for most analyses. We observed a strong covariance between Han and Tibetan allele frequencies (Fig. 1) but with an excess of SNPs at low frequency in the Han and moderate frequency in the Tibetans.

Fig. 1

Two-dimensional unfolded site frequency spectrum for SNPs in Tibetan (x axis) and Han (y axis) population samples. The number of SNPs detected is color-coded according to the logarithmic scale plotted on the right. Arrows indicate a pair of intronic SNPs from the EPAS1 gene that show strongly elevated derived allele frequencies in the Tibetan sample compared with the Han sample.

Population historical models were estimated (8) from the two-dimensional frequency spectrum of synonymous sites in the two populations. The best-fitting model suggested that the Tibetan and Han populations diverged 2750 years ago, with the Han population growing from a small initial size and the Tibetan population contracting from a large initial size (fig. S2). Migration was inferred from the Tibetan to the Han sample, with recent admixture in the opposite direction.

Genes with strong frequency differences between populations are potential targets of natural selection. However, a simple ranking of FST values would not reveal which population was affected by selection. Therefore, we estimated population-specific allele frequency change by including a third, more distantly related population. We thus examined exome sequences from 200 Danish individuals, collected and analyzed as described for the Tibetan sample. By comparing the three pairwise FST values between these three samples, we can estimate the frequency change that occurred in the Tibetan population since its divergence from the Han population (5, 9). We found that this population branch statistic (PBS) has strong power to detect recent natural selection (fig. S3).

Genes showing extreme Tibetan PBS values represent strong candidates for the genetic basis of altitude adaptation. The strongest such signals include several genes with known roles in oxygen transport and regulation (Table 1 and table S3). Overall, the 34 genes in our data set that fell under the gene ontology category “response to hypoxia” had significantly greater PBS values than the genome-wide average (P = 0.00796).

Table 1

Genes with strongest frequency changes in the Tibetan population. The top 30 PBS values for the Tibetan branch are listed. Oxygen-related candidate genes within 100 kb of these loci are noted. For FXYD, F indicates Phe; Y, Tyr; D, Asp; and X, any amino acid.

View this table:

The strongest signal of selection came from the endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1) gene. On the basis of frequency differences among the Danes, Han, and Tibetans, EPAS1 was inferred to have a very long Tibetan branch relative to other genes in the genome (Fig. 2). In order to confirm the action of natural selection, PBS values were compared against neutral simulations under our estimated demographic model. None of one million simulations surpassed the PBS value observed for EPAS1, and this result remained statistically significant after accounting for the number of genes tested (P < 0.02 after Bonferroni correction). Many other genes had uncorrected P values below 0.005 (Table 1), and, although none of these were statistically significant after correcting for multiple tests, the functional enrichment suggests that some of these genes may also contribute to altitude adaptation.

Fig. 2

Population-specific allele frequency change. (A) The distribution of FST-based PBS statistics for the Tibetan branches, according to the number of variable sites in each gene. Outlier genes are indicated in red. (B) The signal of selection on EPAS1: Genomic average FST-based branch lengths for Tibetan (T), Han (H), and Danish (D) branches (left) and branch lengths for EPAS1, indicating substantial differentiation along the Tibetan lineage (right).

EPAS1 is also known as hypoxia-inducible factor 2α (HIF-2α). The HIF family of transcription factors consist of two subunits, with three alternate α subunits (HIF-1α, HIF-2α/EPAS1, HIF-3α) that dimerize with a β subunit encoded by ARNT or ARNT2. HIF-1α and EPAS1 each act on a unique set of regulatory targets (10), and the narrower expression profile of EPAS1 includes adult and fetal lung, placenta, and vascular endothelial cells (11). A protein-stabilizing mutation in EPAS1 is associated with erythrocytosis (12), suggesting a link between EPAS1 and the regulation of red blood cell production.

Although our sequencing primarily targeted exons, some flanking intronic and untranslated region (UTR) sequence was included. The EPAS1 SNP with the greatest Tibetan-Han frequency difference was intronic (with a derived allele at 9% frequency in the Han and 87% in the Tibetan sample; table S4), whereas no amino acid–changing variant had a population frequency difference of greater than 6%. Selection may have acted directly on this variant, or another linked noncoding variant, to influence the regulation of EPAS1. Detailed molecular studies will be needed to investigate the direction and the magnitude of gene expression changes associated with this SNP, the tissues and developmental time points affected, and the downstream target genes that show altered regulation.

Associations between SNPs at EPAS1 and athletic performance have been demonstrated (13). Our data set contains a different set of SNPs, and we conducted association testing on the SNP with the most extreme frequency difference, located just upstream of the sixth exon. Alleles at this SNP tested for association with blood-related phenotypes showed no relationship with oxygen saturation. However, significant associations were discovered for erythrocyte count (F test P = 0.00141) and for hemoglobin concentration (F test P = 0.00131), with significant or marginally significant P values for both traits when each village was tested separately (table S5). Comparison of the EPAS1 SNP to genotype data from 48 unlinked SNPs confirmed that its P value is a strong outlier (5) (fig. S4).

The allele at high frequency in the Tibetan sample was associated with lower erythrocyte quantities and correspondingly lower hemoglobin levels (table S4). Because elevated erythrocyte production is a common response to hypoxic stress, it may be that carriers of the “Tibetan” allele of EPAS1 are able to maintain sufficient oxygenation of tissues at high altitude without the need for increased erythrocyte levels. Thus, the hematological differences observed here may not represent the phenotypic target of selection and could instead reflect a side effect of EPAS1-mediated adaptation to hypoxic conditions. Although the precise physiological mechanism remains to be discovered, our results suggest that the allele targeted by selection is likely to confer a functionally relevant adaptation to the hypoxic environment of high altitude.

We also identified components of adult and fetal hemoglobin (HBB and HBG2, respectively) as putatively under selection. These genes are located only ~20 kb apart (fig. S5), so their PBS values could reflect a single adaptive event. For both genes, the SNP with the strongest Tibetan-Han frequency difference is intronic. Although altered globin proteins have been found in some altitude-adapted species (14), in this case regulatory changes appear more likely. A parallel result was reported in Andean highlanders, with promoter variants at HBG2 varying with altitude and associated with a delayed transition from fetal to adult hemoglobin (15).

Aside from HBB, two other anemia-associated genes were identified: FANCA and PKLR, associated with erythrocyte production and maintenance, respectively (16, 17). We also identified genes associated with diseases linked to low oxygen during pregnancy or birth: schizophrenia (DISC1 and FXYD6) (18, 19) and epilepsy (OTX1) (20). However, the strong signal of selection affecting DISC1, along with C1orf124, might instead trace to a regulatory region of EGLN1, which lies between these loci (fig. S5) and functions in the hypoxia response pathway (21).

Other genes identified in this study are also located near candidate genes. OR10X1 and OR6Y1 are within ~60 kb of the SPTA1 gene (fig. S5), which is associated with erythrocyte shape (22). Additionally, the three histones implicated in this study (Table 1) are clustered around HFE (fig. S5), a gene involved in iron storage (23). The influence of population genetic signals on neighboring genes is consistent with recent and strong selection imposed by the hypoxic environment. Stronger frequency changes at flanking genes might be expected if adaptive mutations have targeted candidate gene regulatory regions that are not near common exonic polymorphisms.

Of the genes identified here, only EGLN1 was mentioned in a recent SNP variation study in Andean highlanders (24). This result is consistent with the physiological differences observed between Tibetan and Andean populations (25), suggesting that these populations have taken largely distinct evolutionary paths in altitude adaptation.

Several loci previously studied in Himalayan populations showed no signs of selection in our data set (table S6), whereas EPAS1 has not been a focus of previous altitude research. Although EPAS1 may play an important role in the oxygen regulation pathway, this gene was identified on the basis of a noncandidate population genomic survey for natural selection, illustrating the utility of evolutionary inference in revealing functionally important loci.

Given our estimate that Han and Tibetans diverged 2750 years ago and experienced subsequent migration, it appears that our focal SNP at EPAS1 may have experienced a faster rate of frequency change than even the lactase persistence allele in northern Europe, which rose in frequency over the course of about 7500 years (26). EPAS1 may therefore represent the strongest instance of natural selection documented in a human population, and variation at this gene appears to have had important consequences for human survival and/or reproduction in the Tibetan region.

Supporting Online Material

Materials and Methods

Figs. S1 to S5

Tables S1 to S6


References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. This research was funded by the National Natural Science Foundation of China (grants 30890032 and 30725008), the Ministry of Science and Technology of China (863 program, grants 2006AA02A302 and 2009AA022707; 973 program, grant 2006CB504103), the Shenzhen Municipal Government of China (grants JC200903190772A, CXB200903110066A, ZYC200903240077A, ZYC200903240076A, and ZYC200903240080A), the Ole Rømer grant from the Danish Natural Science Research Council, the Solexa project (272-07-0196), the Danish Strategic Research Council grant (2106-07-0021), the Lundbeck Foundation, the Swiss National Science Foundation (PBLAP3-124318), the U.S. NIH (R01MHG084695 and R01HG003229), the U.S. NSF (DBI-0906065), the Chinese Academy of Sciences (KSCX2-YW-R-76), and the Science and Technology Plan of the Tibet Autonomous Region (no. 2007-2-18). We are also indebted to many additional faculty and staff of BGI-Shenzhen who contributed to this teamwork and to X. Wang (South China University of Technology). The data have NCBI Short Read Archive accession no. SRA012603.

Stay Connected to Science

Navigate This Article