Research Article

Haplotype Diversity and Linkage Disequilibrium at Human G6PD: Recent Origin of Alleles That Confer Malarial Resistance

See allHide authors and affiliations

Science  20 Jul 2001:
Vol. 293, Issue 5529, pp. 455-462
DOI: 10.1126/science.1061573


The frequencies of low-activity alleles of glucose-6-phosphate dehydrogenase in humans are highly correlated with the prevalence of malaria. These “deficiency” alleles are thought to provide reduced risk from infection by the Plasmodium parasite and are maintained at high frequency despite the hemopathologies that they cause. Haplotype analysis of “A−” and ”Med“ mutations at this locus indicates that they have evolved independently and have increased in frequency at a rate that is too rapid to be explained by random genetic drift. Statistical modeling indicates that the A− allele arose within the past 3840 to 11,760 years and the Med allele arose within the past 1600 to 6640 years. These results support the hypothesis that malaria has had a major impact on humans only since the introduction of agriculture within the past 10,000 years and provide a striking example of the signature of selection on the human genome.

Malaria, resulting from infection by the Plasmodium falciparum, P. vivax, P. malariae, or P. ovale parasites, is the leading cause of death in the global human population. Each year 500 million people suffer from malaria, resulting in about 2 million deaths. During the course of human evolution in regions where malaria is prevalent, naturally occurring genetic defense mechanisms have evolved for resisting infection by Plasmodium. Most of the human genes that are thought to provide reduced risk from malarial infection are expressed in red blood cells or play a role in the immune system. These loci include human leukocyte antigen (HLA), α- and β-globin, Duffy factor (FY), tumor necrosis factor (TNF), and glucose-6-phosphate dehydrogenase (G6PD).

G6PD catalyzes the first step of the hexose monophosphate pathway and plays a critical role in the metabolism of glucose and the maintenance of balance of reduced/oxidized states of glutathione (important for coping with oxidative stress). G6PD enzyme deficiency, caused by mutations in the G6PD gene, is the most common enzymopathy of humans, affecting an estimated 400 million people and resulting in a number of hemopathologies, often triggered by certain foods (e.g., fava beans), drugs, or infection (1–3). TheG6PD locus is located on the telomeric region of the long arm of the X chromosome (Xq28) and is flanked 300 kb on either side by the factor VIII and red/green color pigment genes. Nearly 400 G6PD variants have been identified on the basis of electrophoretic and biochemical properties (1). The normal activity G6PD B variant is present worldwide, but other variants, particularly those resulting in enzyme deficiency, are restricted to specific geographic regions (e.g., G6PD A and A− in sub-Saharan Africa and G6PD Med in Southern Europe, the Middle East, and India), although they may occur at a low frequency in regions where there has been recent gene flow (1, 2). At the molecular level, more than 130 different mutations have been identified in the G6PD gene that result in enzyme deficiency, nearly all of which are single-base substitutions that cause an amino acid substitution (1, 2, 4).

The distribution of G6PD deficiency is highly correlated with the distribution of current or past malaria endemicity. This observation led to the widely accepted hypothesis that G6PD deficiency confers reduced risk from infection by the Plasmodium parasite (2). This hypothesis is supported by the observation that patients with G6PD deficiency have lower P. falciparumparasite loads than controls and by in vitro studies showing that parasite growth is inhibited in the first few cycles of infection in G6PD-deficient cells [summarized in (2, 3)]. Additionally, a large case-control study of more than 2000 African children demonstrated that the most common form of G6PD deficiency in Africa (G6PD A−) is associated with a 46 to 58% reduction in risk of severe malaria for both female heterozygotes and male hemizygotes (5). Ruwende et al. (5) suggest that the selective advantage conferred by resistance to malarial infection is counterbalanced by a selective disadvantage associated with the hemopathologies associated with enzyme deficiency. Thus, the genetic variability maintained at the G6PD locus appears to be an example of a balanced polymorphism that, with the classic examples of sickle cell anemia and thalassemia, represents one of the best examples of natural selection acting on the human genome.

The G6PD gene spans about 18 kb and has 13 exons (Fig. 1). The three most common G6PD electrophoretic variants in Africa are G6PD B, which has normal enzyme activity (60 to 80% frequency range), G6PD A, which has 85% normal enzyme activity (15 to 40% frequency range), and G6PD A−, which has 12% normal enzyme activity (0 to 25% frequency range) (3, 5,6). Only the A− variant is thought to provide protection against malarial infection in Africa (5). The G6PD Med variant has 3% normal enzyme activity and usually ranges in frequency from 2 to 20%, but is as high as 70% among Kurdish Jews (2).

Figure 1

Diagram of G6PD gene structure showing the location of the RFLPs and microsatellites used in the haplotype analysis. Exons are shown as solid boxes. The G6PD A allele results from an A to G transition at nucleotide 376 in exon 5, causing an amino acid change from Asn to Asp (58). The most common G6PD A− variant in Africa has the mutation at nucleotide 376 and a second G to A transition at nucleotide 202, causing a Val to Met amino acid change (59). The G6PD Med variant results from a mutation at nucleotide position 563, causing an amino acid change from Ser to Phe (13). The mutation resulting in G6PD A creates a Fok I site, the mutations resulting in G6PD A− create an Nla III site in addition to the Fok I site (8, 12), and the mutation resulting in G6PD Med creates an Mbo II site (13). G6PD B lacks these three restriction sites. The Sca I, Bsp HI, Pst I, and Bcl I RFLP sites detect noncoding, or silent, substitution mutations (8, 12). The Sca I and Bsp I restriction sites were created by mismatch-containing primers (12). The (AC)n repeat, located 4.28 kb downstream ofG6PD at GenBank sequence position 22,359 (14), is a highly compound repeat consisting of the sequence (TA)5(AA)1(TA)6(CA)6(CT)1(CA)1(TA)1(CA)10(corresponding to a 178-bp repeat) (60). The (AT)n repeat, located 11.07 kb downstream from G6PD at GenBank sequence position 29,191, consists of a perfect (AT)14 repeat (corresponding to a 135-bp allele). The (CTT)n repeat, located 18.61 kb downstream from G6PD at GenBank sequence position 36,756, is a compound repeat consisting of the sequence (CTT)11(ATT)7 (corresponding to a 198-bp allele).

To reconstruct the evolutionary history of G6PD deficiency mutations, we have identified three highly polymorphic microsatellite repeats within 19 kb of the G6PD locus. Using these microsatellites and restriction fragment length polymorphisms (RFLPs) within theG6PD gene, we have examined haplotype variability in geographically diverse human populations, originating from Africa, the Middle East, the Mediterranean, Europe, and Papua New Guinea (7), to estimate the age of G6PD alleles that confer resistance to malarial infection.

RFLP haplotype analysis.

The G6PD B, A, A−, and Med alleles can all be detected by polymerase chain reaction (PCR) and RFLP analysis (Fig. 1) (8). Frequencies of the A− allele in our sample of sub-Saharan African populations range from 3% to 19% (Table 1) and exhibit significant heterogeneity across populations (X 2 7df = 17.57,P < 0.03). In the North African, Middle Eastern, and Mediterranean populations, RFLP analysis to distinguish G6PDA, A−, or Med alleles was performed only in individuals with deficient enzyme activity levels (Table 1) (9). Therefore, we were not able to determine unbiased allele frequencies. Other studies have estimated the frequency of G6PD Med to range from ∼2 to 10% in these populations (10, 11). We refer to chromosomes outside of Africa without deficiency mutations as Norm to distinguish them from B chromosomes in Africa, because they have distinct patterns of haplotype variation and were ascertained differently. Only four polymorphisms in noncoding regions and one polymorphism at a synonymous site of the G6PD gene have been identified in human populations (1, 12). Only one of these RFLPs (Bcl I) is polymorphic outside of Africa (13) and has been analyzed with the Mbo II RFLP that defines the Med allele. Our results confirm previous reports that Norm chromosomes are nearly always associated with the Bcl I (−) allele, whereas the Med allele is most frequently associated with the Bcl I (+) alleles in North African, Middle Eastern, and Mediterranean populations, but with the Bcl I (−) allele in Eastern Indian populations (11, 13).

Table 1

Population samples, number of chromosomes typed, and G6PD allele counts. G6PD A and A− variants in the sample of sub-Saharan African populations were detected by screening populations for the Fok I and Nla III restrictions sites (8). The G6PD deficiency phenotype in the North African, Lebanese, Cypriot, and Italian populations was identified on the basis of enzyme activity levels (9), and therefore unbiased allele frequencies could not be obtained. N represents the number of chromosomes included in the study. For Sub-Saharan African populations, chromosomes were typed and the frequency of G6PD B, A, and A− alleles was counted. Standard errors are shown in parentheses. For North African and non-African populations, chromosomes were typed and G6PD alleles were counted. # def M and # def F denote the number of males and females who had the G6PD deficiency phenotype; # Other indicates chromosomes for which we have not yet identified the mutation underlying the deficiency phenotype. PNG, Papua New Guineans.

View this table:

Microsatellite analysis.

Few RFLP haplotypes have previously been identified at theG6PD locus because the RFLPs have low heterozygosity, and there is strong linkage disequilibrium (LD) between markers located within a 3-kb region of the gene (12). Additionally, because only one RFLP is polymorphic outside of Africa, RFLP haplotype analyses were not informative for reconstructing the evolutionary history of theG6PD gene in non-African populations. Therefore, we screened 52,173 base pairs (bp) containing the G6PD gene and flanking sequence (14) for potentially variable microsatellite repeats. Three microsatellite repeats, referred to as AC, AT, and CTT, were identified within a 19-kb region downstream of the G6PD gene (Fig. 1). In total, we observed 10 (AC)n alleles (ranging from 164 to 188 bp), 26 (AT)n alleles (ranging from 125 to 179 bp), and 8 (CTT)n alleles (ranging from 195 to 216 bp) (15, 16). Allele frequencies and heterozygosity values for these microsatellites are presented in Appendix A (17). Allele number and heterozygosity levels are higher for the perfect AT repeat than for the interrupted repeats. Bootstrap samples of equal size from African and non-African populations revealed more alleles in African populations for the (AC)n and (AT)n repeats (P < 0.001) and higher heterozygosity levels in sub-Saharan African populations for all three microsatellite repeats (P < 0.001).

Microsatellite haplotypes and linkage disequilibrium.

Haplotypes consisting of the (AC)n, (AT)n, and (CTT)nmicrosatellites and the RFLPs distinguishing B, A, A−, Med, and Norm alleles were typed in 591 chromosomes from individuals originating from ethnically diverse sub-Saharan African, Tunisian, Lebanese, Cypriot, Italian, European, and Papua New Guinean populations (7). Chromosomes with Norm and Med alleles were further characterized by presence (+) or absence (−) of the Bcl I site. Linkage phase could be determined unambiguously in males, which constitute ∼80% of the sample (Table 1). In multiply heterozygous females, linkage phase could not be determined unambiguously, so statistical inference had to be applied (18). A total of 149 distinct AC/AT/CTT haplotypes were identified and are presented in Appendix B (17).

Generally, the greatest haplotype diversity is found on B and A chromosomes from Africa (H = 0.96 ± 0.02 and 0.91 ± 0.04, respectively), moderate levels of diversity on Norm/Bcl I(−) and Norm/Bcl I(+) chromosomes outside of Africa (H = 0.87 ± 0.03 and 0.86 ± 0.10, respectively), and the most restricted variability on A− (H = 0.72 ± 0.08), Med/Bcl I(+) (H = 0.18 ± 0.04), and Med/Bcl I(−) (H = 0.38 ± 0.15) chromosomes (Fig. 2) (19). Distinct patterns of microsatellite haplotype variability and of LD were associated with the various G6PD alleles (Figs. 2 and3). G6PD A− alleles are always associated with a 166-bp AC allele, and G6PD A alleles are always associated with either a 164- or 166-bp AC allele. There are broad ranges of AT and CTT alleles on A chromosomes, whereas A− alleles are associated with only large-sized AT alleles (ranging from 165 to 179 bp in size) and nearly always with a 195-bp CTT allele. By contrast, B chromosomes from Africa have primarily large AC alleles, 176 to 186 bp in size (with 182- to 184-bp alleles most common), as well as a broad range of AT and CTT alleles.

Figure 2

Relative frequencies of AC/AT/CTT microsatellite haplotypes on B chromosomes (n = 183), A chromosomes (n = 90), A− chromosomes (n = 42), Norm/Bcl I(−) chromosomes (n= 188), Norm/Bcl I(+) chromosomes (n =17), Med/Bcl I(+) chromosomes (n = 63), and Med/Bcl I(−) chromosomes (n = 8). The 149 microsatellite haplotypes identified are ordered by size of the AC, then AT, then CTT repeats, and full haplotype identities and frequencies are given in Appendix B (17).

Figure 3

Plot of the distribution of microsatellite alleles on chromosomes with different G6PD alleles. Linkage disequilibrium between the G6PD alleles and microsatellites is indicated by the clustering of points. A: filled black circles; A−: filled red circles; B: open, dark blue triangles; Norm/Bcl I(−): open, magenta squares; Norm/Bcl I(+): open, light blue squares; Med/Bcl I(+): open, light green circles; Med/Bcl I(−): open, dark green circles.

Norm/Bcl I(−) chromosomes from Tunisia and outside of Africa appear to have a subset of the haplotype variability present on B chromosomes in sub-Saharan Africa, with only three common microsatellite haplotypes (containing a 178-bp AC allele, 137- to 141-bp AT alleles, and a 198-bp CTT allele) (Fig. 2). Only 17 Norm/Bcl I(+) chromosomes were in our sample and the majority have a 182-bp AC allele, large-sized AT alleles (147 to 159 bp), and a 210-bp CTT allele. Most G6PD Med alleles were associated with the Bcl I(+) allele. Of the Med/Bcl I(+) chromosomes, 57 out of 63 carried the 182/151/210 haplotype. Of the six chromosomes that do not have this haplotype, five differ only by a single AC or AT repeat, and one (182/151/198) appears to be a recombinant at the CTT site. The 182/151/210 haplotype is very rare in the global sample and has been observed on only two Norm/Bcl I(−) chromosomes from Cyprus and Italy, consistent with a rapid expansion in frequency of G6PD Med alleles. The clustering of microsatellite alleles on A− and Med chromosomes indicates pairwise LD with the G6PD deficiency alleles that is highly significant for all three repeats (P < 0.01, Fisher's exact test) (Fig. 3) (20). The pattern of haplotype variability and LD was nearly identical in all populations sampled across geographically diverse regions, indicating a single common origin of the A− allele in Africa and the Med allele in the Mediterranean and Middle East.

Eight Med/Bcl I(−) chromosomes were identified in the southern Italian population originating from the Calabria region (21). All carry microsatellite haplotypes identical to the most common haplotypes on Norm/Bcl I(−) chromosomes. Without more detailed haplotype analysis of markers upstream of G6PD, it is not possible to distinguish whether the Med/Bcl I(−) haplotypes arose from recombination between Med/Bcl I(+) and Norm/Bcl I(−) haplotypes or whether the Med allele arose independently on a Norm/Bcl I(−) background. All individuals with the Med/Bcl I(−) haplotype are also red-green color blind, indicating that LD extends at least 300 kbp downstream in these individuals (21).

RFLP/microsatellite haplotype analysis.

To reconstruct the evolutionary history of the A, A−, and B chromosomes in the sub-Saharan African populations, we selected a subset of these chromosomes for a more extensive RFLP/microsatellite haplotype analysis. In addition to the Fok I and Nla III sites that distinguish A, A−, and B alleles, four noncoding, or silent, RFLPs located within a 3-kb region of the G6PD gene were typed (Fig. 1). Two distinct clades are formed by the A/A- chromosomes and the B chromosomes (Fig. 4). All A/A− chromosomes carry a 164- to 166-bp AC allele, and all but one B chromosome possesses a 176- to 186-bp AC allele. B chromosomes have the greatest RFLP haplotype diversity and also have high levels of microsatellite haplotype diversity on the two most common G6PD B haplotypes. This observation supports the hypothesis that the B allele is ancestral, as indicated by sequence data from humans and chimpanzees (6). On the A chromosomes, one major RFLP haplotype, “− + − − + −”, has high levels of microsatellite haplotype diversity and is most likely the ancestral A haplotype. It can be derived from the most common B haplotype, “− − − + + −”, by two mutational steps. All but 2 of the 42 A− chromosomes have a single RFLP haplotype, “+ + + − + −”, and have reduced microsatellite haplotype diversity, consistent with the hypothesis of a recent origin of the A− allele from an A chromosome (12).

Figure 4

RFLP/microsatellite haplotype network. Each circle represents an RFLP haplotype indicated by the presence (+) or absence (−) of restriction sites for the Nla III, Fok I, Sca I, Bsp HI, Pst I, and Bcl I RFLPs (Fig. 1), and the size of the circle is approximately proportional to the number of chromosomes observed. Each line between pie charts indicates a single mutation and/or recombination event, and the length of the line is not correlated with the number of mutational events separating haplotypes. Microsatellite haplotype diversity on each RFLP haplotype background is indicated by slices of the pie chart. Microsatellite haplotypes are color coded as follows on the basis of the size of the AC repeat: 164-bp alleles (light blue), 166-bp alleles (dark blue), 168-bp alleles (darkest blue), 176-bp alleles (light yellow), 178-bp alleles (dark yellow), 182-bp alleles (orange), and 184-bp alleles (red). Of the 250 chromosomes typed for all nine polymorphisms, only the 186 chromosomes that had no missing data and could be unambiguously phased are included in the network analysis. In total, 97 B, 61 A, and 28 A− chromosomes are included in the cladogram.

Coalescent simulations to assess fit to strict neutrality.

Three features that stand out in both the African and Mediterranean data are reduced microsatellite variability in the A− (and Med) clades, striking patterns of LD between the major G6PDalleles and the microsatellites, and AT and CTT microsatellite allele sizes of the A− (and Med) clades that are distinct from others in the rest of the genealogy. Coalescence analysis was applied to reveal whether patterns of variation in 315 African haplotypes and 294 non-African haplotypes are consistent with a neutral model (22). The coalescent simulations generated null distributions under neutrality for haplotype diversity within the A− or Med allelic lineages. The simulations generally produced levels of microsatellite variation in the A− clade that were greater than those observed in the population samples (Table 2). In the population samples, the A− clade also had significantly fewer microsatellite alleles, lower variance in allele size, and higher maximum LD (δmax) than those generated by the neutral coalescent model. An analysis of variance (ANOVA) F statistic (22) also indicated that observed AT microsatellite alleles exhibited significantly greater differences in allele sizes between A− and B clades than would be expected under neutrality. For the Mediterranean data, all three microsatellite loci differ significantly from the neutral coalescent tree for the number of observed alleles, the microsatellite size variance, δmax, and the F statistic. Overall, the A− and Med clades are far more constrained in their variability and exhibit greater LD than the neutral coalescent would predict. Hence, forces other than drift have resulted in a rapid expansion of these alleles, giving them a relatively high frequency and broad geographic distribution without sufficient time to generate within-clade heterogeneity (23).

Table 2

Coalescence simulations. Results of coalescence simulations designed to test the correspondence of the four given sample statistics for the A− clade in the African sample and the Med-containing samples to those generated by a neutral coalescent (22). n alleles is the number of distinct microsatellite alleles in the given class; STR variance is the variance in microsatellite allele size; δmax is the LD value for the microsatellite allele in strongest LD with the A− or Med allele; and F ANOVA is the F statistic from the ANOVA contrasting the microsatellite size of A− versus non-A− alleles (and Med versus non-Med alleles). Table entries are the respective statistics obtained from the actual data, and the asterisks indicate the significance of the deviation of these observed figures from the null distributions generated by the coalescent.

View this table:

Age estimates of the A− and Med alleles.

Although methods exist to incorporate natural selection into the framework of the coalescent (24, 25), these methods are not easily adapted to infer the age of an allele when positive selection is present. Therefore, we simulated a Poisson branching process in a growing population to estimate the age of the deficiency alleles (A− and Med), drawing parameters from prescribed prior distributions and subjecting each simulation run to rejection criteria (26, 27). In these simulations, the A− deficiency mutation was assumed to occur uniquely within sub-Saharan Africa, and the Med deficiency was assumed to occur uniquely in the Mediterranean or north African populations. Our analysis required the following parameters: the mutation rate (μ) at the microsatellite, the selective advantage (s) of genotypes bearing A− (or Med) alleles, and the recombination rate (r) between the G6PD locus and the microsatellites. A fitness of 1 + s was assumed for the A−/Y males relative to non-A−/Y males and for the A−/non-A− heterozygous females relative to the non-A−/non-A− females. We used both a dominance model, where A−/A− homozygous females had a fitness 1 + s, and an overdominance model, where A−/A− females had a fitness 1 (28).

The credibility intervals for the mean mutation rate of microsatellite loci, the selection coefficient for deficiency alleles, and the rate of recombination to the closest microsatellite locus were very similar for the A− and Med chromosomes (Table 3). The local rate of recombination in this region of the X chromosome is ∼1.64 cM/Mbp (29), which corresponds to a recombination rate of 2.95 × 10−4 for an 18-kb span, a figure that is within the credibility interval. Although mutation and recombination rates appear consistent across the A− and Med alleles, the A− lineage appears to be older than the Med lineage (Table 3). The mean age of the A− allele in these runs was 6357 years, with a 95% credibility interval extending from 3840 to 11,760 years. For the Med alleles, the mean age was 3330 years with a 95% credibility interval of 1600 to 6640 years (30).

Table 3

Mean and 95% credibility intervals of allele age and model parameters. Results of simulations of a Poisson branching process incorporating selection for the A− (and Med) deficiency chromosomes (28). Table entries are the means of the posterior distributions obtained from the simulations, and figures in parentheses are the 95% credibility intervals. Although we expect the mutation rate and recombination rate to be the same in Africa as in the Mediterranean, these two parameters were estimated independently in the two samples.

View this table:

Evolutionary history of the G6PD locus.

Our RFLP and microsatellite haplotype analyses support the hypothesis that B alleles are ancestral and that A alleles are more recently derived (1, 6, 12) (Fig. 4). The data are also consistent with the expectation that the highly compound AC repeat should be more stable than the AT and CTT repeats and should, therefore, remain in LD with the G6PD alleles. However, there has been sufficient time to accumulate considerable variation at the AT and CTT microsatellites on both B and A chromosomes as a result of microsatellite mutation and/or recombination. The maintenance of LD between G6PD deficiency alleles and the microsatellite alleles is consistent with previous reports indicating strong LD in this part of the X chromosome (21), as well as a recent origin of the G6PD deficiency alleles. Sequence analysis of a 5-kb region of the G6PD gene in 50 African individuals and in chimpanzees indicates that although the B allele currently dominates in frequency worldwide and is the inferred ancestral state, A chromosomes also show high levels of nucleotide variation (31), suggesting that the A allele was historically greater in frequency. This observation is also consistent with the high level of microsatellite variation linked to the A haplotype class in this current study. The maintenance of two distinct clades (B and A/A−) at the G6PD locus, as indicated by sequence and microsatellite haplotype analysis, could indicate historical balancing or directional selection acting on B and/or A alleles (or at closely flanking loci).

RFLP and microsatellite haplotype analyses suggest that the G6PD A− allele arose on an A chromosome containing a 164/169/195 haplotype (the most common A− haplotype) and then spread rapidly across a broad geographic range in Africa, with time for only a limited amount of variation at the AT repeat to accumulate. The similar pattern of haplotype variability and LD across geographically diverse African populations at the G6PD locus contrasts markedly with the divergent pattern of haplotype variation and LD observed across African populations at the CD4, DM, PLAT, andPAH loci (31–35) and likely reflects the effects of selection at the G6PD locus. The few A and A− chromosomes observed outside Africa have patterns of haplotype variation that are identical to that observed in Africa and likely originate from recent gene flow from Africa.

The Norm chromosomes outside Africa may descend from a subset of the B chromosomes that were carried by a small founding population(s) during the migration of modern humans out of Africa within the past 100,000 years (32). Genetic drift at the time of this founding event may have resulted in the distinct pattern of haplotype variability observed on normal chromosomes outside Africa (Fig. 2). The Med mutation most likely arose on a normal chromosome with a 182/151/210 haplotype background [possibly on a Norm/Bcl I(+) chromosome] and spread rapidly throughout the Middle East and Mediterranean region. The presence of Med and A− alleles on distinct microsatellite haplotypes supports the conclusion that they arose independently. The high frequency and broad geographic range of these deficiency mutations, in the face of low haplotype variability and high LD, is inconsistent with a model of neutrality. Rather, our results support the hypothesis that the A− and Med mutations have attained high frequency as a result of selection at this locus, most likely in response to malaria infection caused by the Plasmodium parasite. Thus, the pattern of haplotype variability and LD at G6PD represents an excellent example of the signature of selection on the human genome.

Origins of malarial resistance in humans.

The high variability and mutation rate of the three microsatellite markers at G6PD make it possible to obtain a reasonably well-bounded estimate of the origin of alleles that confer protection against malaria. We estimate that the A− mutation arose within the past 3840 to 11,760 years. This estimate is consistent with archaeological and historical documents indicating that malaria has had a significant impact on humans only within the past 10,000 years, coincident with the origination and spread of agriculture in the Middle East and Africa (36–39). According to Livingstone (36, 37), the introduction of slash and burn agriculture in West Africa about 2000 to 4000 years ago resulted in the clearing of tropical forests and an increase in sunlit pools of water. The increased number of Anopheles gambiaebreeding places resulted in an increase in the population density of A. gambiae, the major vector for P. falciparumparasite, the Plasmodium species associated with more severe, stable, hyperendemic malaria (38). Additionally, agriculture enabled increased human population density, facilitating the spread of malaria.

However, a number of factors may have caused malaria to become hyperendemic slightly earlier in Africa, as our date estimate suggests. Africa underwent an increase in both temperature and humidity between 12,000 and 7000 years ago, along with a concurrent increase in the number of sunlit lakes and ponds (40, 41); these conditions support the spread and rapid adaptive speciation of theA. gambiae vector (42). Two other pieces of evidence also indicate an earlier increase in human population density in the Sahara and northeast Africa, allowing for the importance of malaria as a selective agent. First, plant and animal domestication originated about 8000 to 10,000 years ago in this region (37, 40), leading to conditions that could facilitate the spread of infectious disease. Second, archaeological evidence indicates denser and more permanent populations around lakeshores owing to the spread of fishing industries, as well as to incipient cattle domestication in these regions (40,43). These population settlements on or near lakeshores and water pools could have served as adequate preconditions for the spread of mosquito-borne pathogens (40).

Our date estimates are consistent with studies of genetic diversity in the P. falciparum genome suggesting a recent population bottleneck followed by rapid population expansion within the past 5000 to 50,000 years (44) [but for an alternative perspective see (45)]. Our date estimates are also consistent with studies of genetic diversity in the A. gambiae genome, suggesting rapid adaptive speciation and the emergence of more anthropophilic taxa within the past 10,000 years (42). Although mild forms of malaria may have existed in humans throughout much of their evolutionary history, our data suggest that more severe malaria did not become hyperendemic until the past 10,000 years, likely in response to climatic and/or cultural changes that facilitated population expansion and diversification of theAnopheles vector, the P. falciparum parasite, and the human host.

The more recent spread of the Med allele within the past 1600 to 6640 years is consistent with historical Greek and Egyptian documents indicating that, despite the earlier presence of more mild forms of malaria resulting from infection by Plasmodium malariae andP. vivax, the more severe P. falciparum malaria may not have been prevalent in the Mediterranean until after 500 B.C. (39). Thus, the selective pressure of severe malarial infection may have increased more recently in the Mediterranean region. It is possible that the recent and rapid spread of the Med allele across a broad geographic region may correspond with the spread of agriculture during a Neolithic expansion and migration across Europe from the Middle East 10,000 to 5000 years ago (46). However, our date estimate suggests that this mutation could have been spread by more recent migration events, perhaps as a result of the extensive trade routes and colonizations of the Greeks into these regions in the first several millennia B.C. (11,47). It is even conceivable that the Med mutation was spread throughout this region by the army of Alexander the Great, which invaded and conquered territories ranging from the Mediterranean to India, the Middle East, and even North Africa during the fourth century B.C. (47). Thus, the study of polymorphism at the G6PD locus demonstrates how the environment, culture, genes, and history interact to shape variation in the modern human genome.

Note added in proof: Since the submission of this manuscript, Saunders et al. have reported sequencing 41 G6PD alleles and similarly found low A− haplotype diversity and a recent time of origin of the A− allele (61).

  • * To whom correspondence should be addressed. E-mail: st130{at}

  • Temporary address: Montreal Genome Center, Montreal General Hospital Research Institute, Montreal, Quebec H3G 1A4, Canada.


View Abstract

Navigate This Article