Technical Comments

Comment on "Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens"

See allHide authors and affiliations

Science  20 Apr 2007:
Vol. 316, Issue 5823, pp. 370
DOI: 10.1126/science.1137568

Abstract

Mekel-Bobrov et al. (Reports, 9 September 2005, p. 1720) suggested that ASPM, a gene associated with microcephaly, underwent natural selection within the last 500 to 14,100 years. Their analyses based on comparison with computer simulations indicated that ASPM had an unusual pattern of variation. However, when we compare ASPM empirically to a large number of other loci, its variation is not unusual and does not support selection.

Mekel-Bobrov et al. (1) presented evidence that the ASPM (abnormal spindle-like microcephaly associated) gene has been subject to positive natural selection in European populations in the past ∼6000 years. The authors noted a haplotype of ∼40% frequency, which they argued had arisen too recently to be explained by genetic drift alone. They simulated a range of demographic histories and identified none that could produce a haplotype with such a high homozygote frequency. Because the detection of selection solely by comparison with simulated data has had a mixed record (2), we decided to assess the evidence empirically.

We sequenced ∼19 kb of ASPM in 16 European Americans (CEU) and 16 West Africans (YRI). We identified single-nucleotide polymorphisms (SNPs) using automated software (3) and assayed all the SNPs we discoveredin 30CEU and 30YRI triosfromthe International Haplotype Map (HapMap) project (4). This was identical to the SNP discovery strategy—using the same sequencing technique, SNP-identification software, and genotyping protocol—that had been used to study 2.5 megabases (Mb) for the Encyclopedia of DNA Elements (ENCODE) project (4). Thus, we could use ENCODE as a near-perfect empirical comparison data set. We could also integrate the data with HapMap to determine whether the pattern of long-range variation around ASPM was unusual.

To test for selection, we first carried out standard tests of the allele frequency spectrum (57), comparing with the ENCODE data to determine statistical significance. Comparison regions were matched in genetic distance and number of segregating sites, using a range of possible recombination rates. No test showed significant evidence for selection (Table 1 and table S1). The single test that gave the strongest signal across ASPM as a whole was Tajima's D, with a nominal significance of P = 0.07 when we used the recombination rate of 1.9 cM/Mb from (1). Significance was even less (P = 0.18 to 0.22) when we reestimated recombination rates based on more recent data sets (table S2). We also calculated the test statistics individually for each of the three regions within ASPM (table S1). After correction for multiple hypothesis testing, again no test was statistically significant.

Table 1.

Empirical tests for selection based on comparing ∼19 kb of ASPM with ∼2.5 Mb of the ENCODE regions and testing for unusual skews in SNP allele frequencies. For each summary statistic, we report σ, defined as the number of standard deviations from the empirical mean in the ENCODE regions, and an empirical P value. To provide a single assessment of statistical significance corrected for having carried out six tests, we recorded the minimum P value for Tajima's D test, the four Fu and Li's tests, and the Fay and Wu's H test (min-P) and then compared empirically with the proportion of matched ENCODE regions that had such an extreme minimum P value.

Frequency-spectrum—based testsTest for an FST value as extreme as the largest in the region (P value)
Recombination rate used for matching (cM/Mb)View inlineTajima's D (σ, P value)Fu and Li's D (σ, P value)Fu and Li's DView inline(σ, P value)Fu and Li's F (σ, P value)Fu and Li's FView inline(σ, P value)Fay and Wu's H (σ, P value)P value corrected for multiple hypothesis testingTest for excess of homozygotes (P value)
ASPM = -0.26 ASPM = 0.90 ASPM = 0.53 ASPM = 0.55 ASPM = 0.27 ASPM = -8.66 ASPM = 0.41 ASPM = 7/60
0.12 -1.35, 0.20 -0.75, 0.30 -1.26, 0.18 -1.29, 0.21 -1.61, 0.14 -1.33, 0.26 0.41 0.31 0.25
0.5 -1.35, 0.22 -0.78, 0.30 -1.25, 0.17 -1.29, 0.22 -1.60, 0.14 -1.44, 0.20 0.38 0.37 0.21
1 -1.40, 0.18 -0.72, 0.30 -1.25, 0.17 -1.27, 0.22 -1.60, 0.13 -1.49, 0.19 0.31 0.43 0.16
1.9 -1.79, 0.071 -0.71, 0.31 -1.22, 0.17 -1.38, 0.20 -1.71, 0.12 -1.74, 0.14 0.21 0.47 0.12
  • View inline* We compared ∼19 kb of ASPM with ∼2.5 Mb from the ENCODE regions, dividing the ENCODE regions into sections that were matched to the ASPM data with regard to the number of segregating sites and genetic distance span. When more sites were available in the matched ENCODE region, we averaged the statistic over 10 random subselections. We matched the genetic distance span, using information from the Oxford linkage disequilibrium—based genetic map (9). To test for robustness to errors in the recombination rate estimate for the ASPM region, we considered a range of estimates for the recombination rate, from 0.12 cM/Mb (the upper bound from table S2) to 1.9 cM/Mb, the value used by Mekel-Bobrov et al. (1). The windows used for comparison were defined by nucleating at each ENCODE SNP and then assessing whether there was a window of matched genetic distance with enough segregating sites for comparison, extending in the 3′ direction. This procedure for empirical matching induces some correlation among the windows (due to overlapping spans and linkage disequilibrium), but there is no expected bias in the P values, which are nonsignificant for all comparisons.

  • We next assessed whether the allele frequency differentiation between CEU/YRI at ASPM supported selection. The SNPA44871G showed an FST = 0.41 between Europeans and West Africans, putting it in the 95th percentile of ENCODE SNPs. After correcting for the number of SNPs in the region, >31% of matched ENCODE regions had at least one SNP with an FST as large, so this observation is not surprising (Table 1). Moreover, the worldwide frequency distribution seems to be in a direct conflict with the suggestion that the G allele arose ∼6000 years ago. The allele exists at >50% frequency in Papau New Guinea Highlanders (1), thought to have diverged from Europeans ∼40,000 years ago (8).

    Next, we repeated the primary analysis of (1), testing for an excess of individuals with two identical copies of any haplotype (3) across the region. Confirming the original report, the haplotype marked by the G allele was the most common in CEU (Fig. 1A). However, this haplotype did not stand out strikingly from the rest as in (1) (compare with fig. S2). The significance of a homozygote excess depends on the regional recombination rate, because unbroken haplotypes are more surprising if the recombination rate is high. Even when applying the high recombination rate used by Mekel-Bobrov et al. (1.9 cM/Mb), the homozygote excess is not significant compared with empirical data (P = 0.12). The evidence becomes even weaker (P = 0.25) when we instead use updated and much lower recombination rate estimates for the region (table S2). The fact that there are well-supported recombination rates that decrease the strength of the signal greatly weakens the evidence for selection.

    Fig. 1.

    Linkage disequilibrium decay around A44871G in European Americans. (A) Haplotype frequency in European Americans (CEU). Blue bars are the derived haplotypes marked by the G allele. (B) Decay of extended haplotype homozygosity (EHH) around A44871G. (C) The significance of the LRH test at each marker is evaluated empirically by comparing with the genome-wide data from HapMap, matched with regard to breakdown of homozygosity. The most extreme P value of 0.03 is not striking when compared against the lowest P value seen in 1000 comparison regions, 90% of which show stronger evidence for selection at some distance. (D) The extent of the haplotype around the G allele (red dot, defined as the span for which EHH > 0.35), in comparison with alleles of matched frequency in CEU from HapMap on chromosome 1. This is well within the 95% central range of HapMap, whether plotted by physical distance (this figure) or genetic distance (fig. S3). To match the marker density of HapMap Phase I, we randomly dropped SNPs from ASPM until we had 1 SNP every 5 kb. With this lower density, the span of the G haplotype is 285 kb.

    We also assessed evidence for selection at ASPM by carrying out the long-range haplotype (LRH) test (9). This test assesses whether a haplotype is too young to have risen to its frequency without selection. The LRH test is not affected by uncertainty in recombination rate estimates. We compared LRH results for the A44871G polymorphism to SNPs of matched frequency in HapMap CEU (3, 10) (Fig. 1C). We observed at least as strong a signal for selection at 90% of the regions examined (3, 11). Several genome-wide surveys using similar methods also failed to find evidence for selection at ASPM in European-derived populations (4, 12, 13). The one survey that did find a signal near ASPM did so only in individuals of Chinese ancestry (13), failing to support the contention of (1) of recent selection in European history. Based on linkage disequilibrium (LD) breaking down within ∼100 kb on either side (Fig. 1B), we estimate that the G allele arose in European history at least tens of thousands of years ago and possibly more than 100,000 years ago (14) (table S3 and SOM Text). These dates are difficult to reconcile with selection ∼6000 years ago, as suggested in (1).

    One explanation for the differences between our results and (1) is that we assessed significance through comparison with empirical data. Empirical comparisons are robust to difficult-to-model features of real data, such as failure to detect real polymorphisms in a sample, or to fully understand the complexity of population history. Methodologically, these results are also important, demonstrating that one should not only compare with computer simulations but also show that a region stands out empirically compared with data collected in the same way, to build a compelling case for natural selection (2, 15).

    Supporting Online Material

    www.sciencemag.org/cgi/content/full/316/5823/370b/DC1

    Materials and Methods

    Figs. S1 to S3

    Tables S1 to S3

    References

    Data Files S1 and S2

    References and Notes

    View Abstract

    Subjects

    Navigate This Article