Technical Comments

Response to Comment on “Genetic Structure of Human Populations”

Science  20 Jun 2003:
Vol. 300, Issue 5627, pp. 1877c
DOI: 10.1126/science.1084688

Estimates of genetic variance components depend on the type of marker used, the definitions of geographic regions, the populations sampled within these regions, the relative sample sizes from the populations, and the way in which information is combined across loci. For microsatellite markers, estimates also depend on whether the quantity whose variance is partitioned is an allele-size variable or an indicator variable for allelic presence or absence. A main purpose of our variance component estimation was to provide insight into the fine-scale population structure analysis in (1). Because the structure algorithm uses only identity and nonidentity of alleles, descriptive statistics that employ allelic indicator variables are more appropriate for understanding the dependence of structure-based inference on the “level of difference” among groups than are statistics that use allele size.

Excoffier and Hamilton (2) performed a complementary variance component analysis, demonstrating that when a subset of our data corresponding to (3) is studied using allele sizes, as was done in (3), similar estimates to (3) are obtained. Their smaller within-population variance component compared with that in (1) is consistent with the smaller estimate of (3) in relation to microsatellite studies that used indicator variables (47). However, because previous indicator-based studies of microsatellites and other markers have not all been in full agreement (1, 411), a difference in the nature of the variable cannot be the sole source of differing estimates. First, the homogenizing effect of the higher mutation rates of microsatellites, in contrast with those of other markers, probably explains some of the difference of our results from nonmicrosatellite indicator-based studies (12). Second, consistent with past observations (13), the high fraction of tetranucleotide loci in our data contributes to higher within-population variance component estimates (Table 1) than are seen in dinucleotide studies (3, 4, 7). Third, the estimates vary considerably across sampling schemes within regions, and in several cases (3, 6, 7), past microsatellite samples that included multiple groups per region used populations that are among the most differentiated of the 52 groups in our data (Fig. 1). Any estimate computed with the well-separated populations that contribute to the 83.4% within-population variance component obtained by Excoffier and Hamilton (2) should be regarded as a lower bound.

Table 1.

Analysis of molecular variance for 45 di-, 58 tri- and 274 tetranucleotide loci from (1). The samples and estimation procedure (17) used are the same as in (1).

Sample No. of regions No. of populations Repeat size Variance components and 95% confidence intervals (%)
Within populations Among populations within regions Among regions
World 1 52 2 92.2 (91.5, 92.8) 7.8 (7.2, 8.5)
3 92.6 (91.8, 93.2) 7.4 (6.8, 8.2)
4 95.4 (95.1, 95.6) 4.6 (4.4, 4.9)
World 5 52 2 90.1 (89.2, 90.9) 2.9 (2.7, 3.2) 7.0 (6.1, 7.8)
3 90.5 (89.5, 91.3) 2.7 (2.4, 3.0) 6.8 (6.0, 7.8)
4 94.3 (93.9, 94.6) 2.3 (2.2, 2.4) 3.4 (3.1, 3.7)
World 7 52 2 91.4 (90.7, 92.1) 2.8 (2.5, 3.1) 5.8 (5.1, 6.6)
3 91.8 (90.9, 92.5) 2.5 (2.3, 2.8) 5.7 (5.0, 6.5)
4 95.0 (94.7, 95.2) 2.3 (2.2, 2.4) 2.8 (2.5, 3.0)
World-B97 5 14 2 85.9 (84.7, 86.8) 5.7 (4.9, 6.7) 8.4 (7.2, 9.7)
3 86.2 (84.8, 87.5) 4.9 (4.2, 5.7) 8.9 (7.7, 10.3)

91.2 (90.7, 91.7)
4.9 (4.6, 5.2)
3.9 (3.4, 4.4)

Fig. 1

. Effect of sampling scheme on variance component estimates. Using the five-region design, variance components were estimated (17) as in (1) for each of 100,000 subsets of populations, sampled randomly from among the ∼3 × 1015 subsets of the 52 populations in (1) for which all five regions were represented. Variance component estimates for a 14-population subsample corresponding to Barbujani et al. (3), a 9-population subsample corresponding to Calafell et al. (7), and the full 52-population data of Rosenberg et al. (1) are marked (B), (C), and (R), respectively. Subsample (B) is the same as subsample (B97) in (1). Subsample (C) includes Biaka, Druze, Han, Japanese, Maya, Mbuti, Melanesian, Surui, and Yakut. For the within-population and among-population within-region components, (B) had more extreme values than all but four of the subsets. Similar results were obtained for random subsets that included at least two populations per region.

Allele sizes are important in microsatellite analysis, and typical studies, including our use of the data from (1) to investigate population divergence and expansion (14), employ both sizes and indicator variables. However, although they are often useful, stepwise mutation models with length-independent transition probabilities, which underlie the approach used in (2), poorly predict microsatellite allele size distributions in the human genome compared with length-dependent models (15). Because of this issue and the frequent occurrence of multistep mutations (16), the model of Excoffier and Hamilton cannot be regarded as the “right mutation model,” and the “minimum number of mutations separating the alleles” need not actually be minimal.

Finally, the main finding from studies of genetic variance components, supported by diverse analyses whose exact estimates have differed, is that the within-population variance component is much larger than the other components. The relative importance of various influences on the estimates could potentially be evaluated by further statistical analysis of the variation in the variance component estimates themselves.

References and Notes

Related Content

Navigate This Article