Special Reports

Demographic Histories and Patterns of Linkage Disequilibrium in Chinese and Indian Rhesus Macaques

See allHide authors and affiliations

Science  13 Apr 2007:
Vol. 316, Issue 5822, pp. 240-243
DOI: 10.1126/science.1140462

Abstract

To understand the demographic history of rhesus macaques (Macaca mulatta) and document the extent of linkage disequilibrium (LD) in the genome, we partially resequenced five Encyclopedia of DNA Elements regions in 9 Chinese and 38 captive-born Indian rhesus macaques. Population genetic analyses of the 1467 single-nucleotide polymorphisms discovered suggest that the two populations separated about 162,000 years ago, with the Chinese population tripling in size since then and the Indian population eventually shrinking by a factor of four. Using coalescent simulations, we confirmed that these inferred demographic events explain a much faster decay of LD in Chinese (r2 ≈ 0.15 at 10 kilobases) versus Indian (r2 ≈ 0.52 at 10 kilobases) macaque populations.

Rhesus macaques (Macaca mulatta) and humans shared a most recent common ancestor (MRCA) ∼25 million years ago (Ma), and our genomes differ at <7% of nucleotide bases (1). Rhesus and humans, therefore, share a large number of fundamental biological characteristics, including many underlying genetic and physiological processes that lead to disease. For this reason, rhesus macaques have become a model organism for vaccine research (2, 3), as well as studies of normal human physiology and disease. Although previous studies of genetic variation in rhesus have described >300 microsatellite polymorphisms (4, 5), identifying specific genetic risk factors for disease requires a much greater resolution of genetic variation across the genome.

The current geographic range of rhesus macaques is larger than any other nonhuman primate, stretching from western India and Pakistan to the eastern shores of China (Fig. 1). Fossil records suggest that the genus Macaca originated in northern Africa approximately 5.5 Ma, followed by migration through the Middle East and into northern India by ∼3 Ma (6). By ∼2 Ma, macaques had traversed most of China and reached the Indonesian archipelago, where the putative ancestral species of rhesus macaque, M. fascicularis, is thought to have originated (6, 7).

Fig. 1.

The current geographic range of rhesus macaques [green, redrawn from (20)] with the inferred demographic history and the sample locations superimposed. The geographic location of the MRCA is based on (4).

Previous studies of mitochondrial DNA (8), major histocompatibility complex (MHC) alleles (9), and single-nucleotide polymorphisms (SNPs) in gene-linked regions (10) suggest moderate levels of genetic differentiation between captive-born Indian and Chinese rhesus populations. Developing a more thorough understanding of genetic variation within and between these two populations has important implications for biomedical research. For example, when infected with the simian immunodeficiency virus, animals from Chinese populations develop AIDS-like symptoms more slowly than animals from Indian populations (3).

We have identified 1476 SNPs by sequencing >150 kb of DNA across five Encyclopedia of DNA Elements (ENCODE) (1113) regions located on separate autosomal chromosomes in nine captive-born from wild-caught Chinese and 38 captive-born Indian rhesus macaques. The Chinese animals derive from three distinct geographical sites, whereas the Indian animals came from three different colonies in the United States (Fig. 1). Individuals were chosen to represent rhesus macaque populations that are currently being studied by the international community and to minimize relatedness in the sample [with most individuals in the study being unrelated back to the founding of the colony into which they were born, and none having a shared grandparent (13)]. In our sample of 1476 SNPs discovered, only 486 (33%) were shared across both populations, whereas 604 were found only in the Chinese population (61% of 1090 SNPs observed) and 386 were found only in the Indian population (39% of 872 SNPs observed). The frequency distribution of derived mutations across SNPs [using DNA sequence from the ENCODE project for baboon, Papio cynocephalus anubis, to infer the putative ancestral state (13)] shows that the Chinese population harbors an excess of rare SNPs relative to a population of constant size, whereas the Indian population has too few rare and too many intermediate- and high-frequency–derived SNPs (Fig. 2A). The observed disparity in SNP density (7.25 SNPs per kb for Chinese versus 5.8 SNPs per kb for Indian) in the two populations suggests that the effective size of the Chinese population is much larger than the Indian population, given that the Indian sample size is four times as large as that of the Chinese.

Fig. 2.

(A) The marginal frequency spectrum of derived mutations for each population (shown as expected proportions in a subsample of 10 chromosomes by integrating over possible configurations of observed and missing data, with the total number of SNPs in parentheses) and the expected distribution under the standard neutral model (SNM) of constant size. (B) A “topographical map” of the joint site-frequency spectrum for the two populations, with darker tones representing frequency pairs with few SNPs, and lighter tones representing frequency pairs with many SNPs.

We observed a moderate level of population structure between the Indian and Chinese samples, as measured by Wright's FST statistic (average FST = 0.14; SD = 0.11; range = –0.024 to 0.645) (Fig. 3A). Furthermore, the Bayesian clustering program STRUCTURE (14) clearly separates Chinese and Indian individuals when assuming two clusters (Fig. 3B), and considering more clusters does not significantly improve the fit of the model. We found only one Chinese individual with a marginal amount of Indian ancestry (8.5%, sampled from Suzhou) and eight Indian individuals with more than 5% Chinese ancestry [max 16.8%, including animals from all three primate centers (13)]. These low levels of admixture suggest that recurrent migration between the populations has been minimal. Moreover, the two populations were clearly distinguished by principal components analysis (15) along the first two axes of variation (Fig. 3C). Interestingly, the second component also separates one Chinese individual (sampled from Suzhou) from the others, which suggests that further population substructure may exist. Although this individual is not differentiated from other Chinese-origin animals in the STRUCTURE analysis, it may, nonetheless, harbor alleles from an unsampled Chinese subpopulation (i.e., the two wild-caught parents may be from different subpopulations).

Fig. 3.

(A) The distribution of FST between Indian and Chinese rhesus, calculated with the average pairwise-difference across each nonoverlapping window (13). (B) STRUCTURE results. Individuals are represented by vertical lines, and sorted by their amount of Chinese ancestry (black vertical line separates animals with Indian and Chinese origins). Colors correspond to the proportion of an individual's ancestry attributable to a given population (blue, Indian; red, Chinese). (C) Principal component 1 (PC1) and PC2 separate Indian from Chinese individuals. PC2 also isolates a single Chinese individual [corresponding to an individual sampled from Suzhou and shown as the fourth individual from the right in (B)].

Using maximum likelihood under the assumption that the animals in this study form a random sample from their respective population (13), we fit a two-population demographic model to the joint distribution of SNP frequencies, or site-frequency spectrum, shown in Fig. 2B. Our model suggests that the Chinese population expanded by a factor of 3.3 and separated from the Indian population ∼162 thousand years ago (ka) (95% confidence interval, CI = 183 to 132 ka). After separating, the Indian population maintained its ancestral population size until ∼51 ka CI = 72 to 21 ka)], when it was reduced by a factor of 4.3. The population genetic model, although a very simplistic approximation to the rich and complex history of the species, fits the data well, as indicated by a goodness-of-fit test (P = 0.133). Coalescent simulations (13) on the basis of the inferred demographic history for Indian and Chinese rhesus macaques suggest that the MRCA of the two populations lived ∼1.94 Ma (SE 14 Ky). This estimate places the MRCA of rhesus near the divergence time from M. fascicularis, inferred from mitochondrial DNA to be 1.83 to 5 Ma (16, 17). Moreover, our simulations suggest that the effective size of the ancestral population of rhesus macaques was ∼73,070 (SE 231) individuals, implying that the current effective size of the Chinese population is ∼239,704, whereas the Indian population is estimated to be ∼17,014.

The recent demographic events that caused these differences in effective population sizes of Indian and Chinese rhesus macaques have also had a large impact on linkage disequilibrium (LD). To quantify the extent of LD in Indian and Chinese rhesus macaques, we measured the correlation coefficient (r2) of alleles from frequency-matched SNPs (13, 18). Figure 4 shows substantial differences between the Indian and Chinese rhesus macaque populations, which are more extreme than the patterns observed among humans. For example, within the Indian rhesus population, LD extends much further than LD observed for European humans, whereas the Chinese rhesus population shows little LD, even for SNPs that are physically very close. Coalescent simulations (13) show that the observed patterns of LD are consistent with our inferred demographic history of this species (shown in Fig. 4 as light blue and pink curves for Indian and Chinese rhesus, respectively). However, LD in the Indian population extends slightly further than expected. This observation may be consistent with recent admixture with a Burmese rhesus population not sampled in this study (8), because admixture between populations with allele frequency differences is known to generate long-range LD.

Fig. 4.

The decay of LD for Indian and Chinese rhesus macaques versus European and African humans (n = 9 for all samples), along with the decay of LD for 1000 neutral simulations of our inferred demographic history for rhesus macaque. Human data are from three ENCODE regions orthologous to the rhesus data (13, 21).

In this study, we analyzed noncoding data in rhesus macaques to characterize their underlying demographic history and to quantify the extent of LD relative to humans. The genetic differences that we have observed between Indian and Chinese rhesus macaques are consistent with a recent report on the distribution of SNPs in these populations (10), as well as previous studies of protein coding, microsatellite STR (short tandem repeat), MHC loci, and mitochondrial and Y-chromosome DNA haplotypes (8). Without samples from wild-caught Indian rhesus monkeys, however, these data must be regarded as estimates, because they may reflect a sampling bias toward those macaques that are available for study in the United States as a result of international restrictions on exportation of primates.

Extending these studies to whole-genome association mapping in captive-born animals could be fruitful for identifying genes involved in human diseases. On the basis of the patterns of LD that we have observed, such an association study would likely require many fewer markers to identify common disease-causing variants in rhesus macaques than in humans. Because LD in captive Indian rhesus macaque populations extends much further than in humans, a SNP map with roughly 1 SNP every 35 kb (82,000 SNPs total) would suffice to achieve the same threshold (r2 =0.4) as a marker every 6 kb in humans (13, 19). Furthermore, because LD decays much faster in Chinese rhesus monkeys than in humans, Chinese macaques provide an ideal platform for localizing mutations that are difficult to map in either Indian macaques or humans as a result of extensive LD among candidate mutations in a particular region.

References and Notes

View Abstract

Navigate This Article