Special Reviews

Genetic Clues to Dispersal in Human Populations: Retracing the Past from the Present

See allHide authors and affiliations

Science  02 Mar 2001:
Vol. 291, Issue 5509, pp. 1742-1748
DOI: 10.1126/science.1058948


Ongoing debate about proper interpretation of DNA sequence polymorphisms and their ability to reconstruct human population history illustrates a important change in perspective that we have achieved in the past 20 years of population genetics. To what extent does the history of a locus represent the history of a population? Tools originally developed for molecular systematics, where genetic lineages have been separated by speciation events, are routinely applied to the analysis of variation within our species, with conflicting results. Because of automated technologies and linkage analysis, we are poised to harvest a wealth of information about our past, if we are successful in moving beyond a current polarization regarding models of human evolution. Rather than just suggesting that true resolution will only come by considering fossil or archaeological evidence, the realistic and appropriate application of genetic models for analysis of population structure is also necessary. Three examples from different dispersal events are highlighted here.

Studies of single-nucleotide polymorphisms (SNPs), as molecular genetic markers for mapping common disease genes (1, 2), have reconfirmed the importance of human population structure. It was originally estimated that about one difference per every 1 kb would exist in the human genome, and two broad surveys (3, 4) now suggest that for protein coding regions, this number is likely to be one difference per every 1200 base pairs. Some of these differences will be common around the world, but others will only be associated with local populations. Are there general predictions or principles to assist our interpretation of what is likely to be mere noise and what is genetically important? We know that the history of a gene is easier to determine than an accurate history of a population, because a particular pattern of variation can have multiple evolutionary causes.

In the SNP studies, Africans as a group show greater diversity of alleles and more unique alleles once ascertainment biases had been controlled for, consistent with the antiquity of this gene pool among humans. Comparisons with nonhuman primates also demonstrate that it is possible to use the same DNA chip technology to identify the ancestral state of many common alleles (5), so that frequency of a particular polymorphism can be used to infer the time that the allele arose, as predicted by theory (6). Yet, a recent study of DNA from Australian skeletal populations has again questioned the African origin of our species and suggested that we are still confused about population dynamics, bottlenecks, and migrations (7).

Allele frequencies, first generated from classical markers and more recently with microsatellites, had been the common currency used to compare population isolates because they generated data to estimate gene flow and population subdivision. In order to escape the biases of natural selection that might drive alleles to fixation, population geneticists flocked to studies of neutral loci, hoping to establish a more accurate time scale for the origin and spread of modern human genetic diversity. Now, we should consider whether this information, gained from gene genealogies of hemoglobins, low-recombining portions of the X, mitochondria, and Y sequences, can ever generate a consistent picture when compared to frequency data. Can we map the rise and fall of particular alleles at a given locus and integrate this information across an entire chromosome?

In the SNP era, genetic isolates have reasserted their importance. This follows because the age of an allele governs its utility for linkage studies in the search for complex disease genes (and one would predict that a recently appearing SNP is associated with a greater length of shared DNA segment). A single round of sexual recombination will alter the genotype assayed by array technology, and correlations between linked loci can decay quickly with each meiotic round. The race to find a gene can also become a race against the forces of admixture and acculturation, which tend to homogenize the human gene pool.

Understanding forces that generate and govern the persistence of sequence polymorphism in our genome often begins with insights gained from the study of human isolates, where genetic founders are few, in the range of 10 to 100 individuals, and separation has persisted for hundreds of generations. We think that modern human populations expanded somewhere between 50,000 and 200,000 years before the present (yr B.P.) from an effective population size of ∼10,000 individuals (8). Most current neutral polymorphisms are <800,000 years old (9). For nonrecombining regions, gene genealogies are insights into population history, especially changing population sizes and recent postglacial expansion. Methods for analysis have been tested with very large data sets, through simulations and experiments. Although ancient DNA studies once promised spectacular results, a more sobering truth continues to emphasize that modern populations will be the main source of information about past evolutionary forces (10).

Temporal and Spatial Frameworks for Generating Novel Human Genetic Polymorphisms

Time has usually been a contentious factor in the study of human populations, particularly in the popular press, because of the bias to reward studies that highlight the oldest, the newest, the largest, or the most distinctive (11). In the past, anthropologists and evolutionary biologists concentrated on investigating human population diversity over space and time by using variability in skull shape, facial features, skin color, stature, and body form. Both Darwin (12) and Huxley (13) anxiously looked forward to an era when a full appreciation of human evolution would incorporate the global perspective of geographically distinct humans from different homelands. They reasoned that migration, colonization, and isolation, along with adaptation, held the clues to resolving apparent discrepancies between macro- and microevolutionary patterns seen in the human fossil record known to their time.

However, biometric studies, beginning with the work published by Francis Galton in the same year as Gregor Mendel's experiments (1865), yielded little useful information for human biologists interested in time (either absolute or number of generations) and the direct genetic links between geographically separated groups. Rather than identifying populations that shared genes identical by descent, these methods produced groupings that tended to be arbitrary, riddled with convergence, and uninformative for testing archaeological or linguistic hypotheses. In the words of a traditional Hawaiian proverb, biometric methods usually produced “He pili nakekeke” (“A relationship that fits so loosely it rattles”).

From the early work of population geneticists who sought to discern patterns of historical population movements from the blood groups, serum proteins, and enzymes of a few hundred donors (14–16), we have progressed to megabase sequence comparisons from tens of thousands of individuals with increasingly fine spatial as well as temporal resolution. Highly informative loci, enzymatically amplified from saliva or hair samples collected under field conditions, have freed us from a reliance on population inferences based on acculturated or urbanized groups (17). Whether the discussion centers on typical nuclear sequences transmitted in a traditional Mendelian fashion or the uniparental transmission of Y-chromosomal loci and mitochondrial DNA (mtDNA), what characterizes all these studies is a shift from the discovery of individual molecular characters themselves to a focus on the populations carrying them.

Molecular medical genetics has rediscovered a basic feature of our species, the structured population, which was first quantified in 1901 by Landsteiner with the discovery of ABO blood groups. Population structure is caused by assortative mating, variation in reproductive success, and limited dispersal. In response to novel selective forces, populations either restrict their ranges, go extinct, or adapt. Two consequences of these facts are that common alleles increase in frequency and that rare alleles arise through mutation and, if adaptive, may be spread throughout the larger gene pool if time and gene flow are sufficient. But new alleles can also spread by genetic drift, making definitive statements about past forces speculative. Although it follows that frequencies and distributions of genetic polymorphism can reveal the past selective forces our direct ancestors faced (3, 4), estimating the duration and strength of those forces is still a challenge. Coalescence analysis then considers when the most recent common ancestor (MRCA) of a given group of DNA donors lived, based on allelic genealogies (18). Advances in statistical methods can also allow us to infer whether genetic differentiation has accumulated at a locus in a subpopulation because of recent drift or long-term historical isolation (19,20).

Conceptual Unification of Human Dispersal Studies

Although it may prove difficult to identify the reason that a particular polymorphism becomes fixed or wanes in frequency, describing the pattern of polymorphism as it exists has become simpler. In addition, a temporal baseline for calibrating the initial dispersal of our species from its genetic and geographic population source is more secure. Older models and representations of how genetic diversity was dynamically partitioned had been interpreted to fit a model of gradual transformation of some species of archaic humans spread across four continents for 2 million years. Strong genetic evidence, based on multiple loci, now exists for a single, recent (<200,000 yr B.P.) origin of modern humans in Africa. The basic pattern of human genetic diversity being distributed among rather than between continental groups, recognized in the 1970s by Richard Lewontin using indirect measures, has been upheld and affirmed. What is astonishing is the preservation of the African pattern across SNPs, microsatellite loci, Y chromosomes, and mtDNA haplotypes. Genetic systems that do not conform to this pattern [for example, β-hemoglobins and the major histocompatibility complex (MHC)] are commonly targets of intense natural selection.

This evidence reflects another fundamental shift in our perspective about the past. Previous summaries of divergence based on indirectly estimated data from serum proteins and blood groups had large standard errors or were open to charges of intentional bias (21, 22). Paleoanthropologists were also deeply polarized over what constituted direct fossil evidence (23), so that communication concerning how to effectively test existing models was effectively inhibited between the sciences and the humanities. Studies of human dispersal and modern diversity became increasingly locus driven, contentious, and cut off from their intellectual roots in ecology and population biology (24).

It is possible that controversies in human evolution are simply driven by conflicts apparent to specialists in allied fields who exercise self-awareness and do not bring such bias to the study of nonhuman species (7, 11). However, consensus also exists that human populations, with their abilities to disperse widely and rapidly, present real enigmas for population biologists to model. It may be useful to note that in contrast to the disputes centering on the genetic isolation and persistence of local human populations, researchers working on similar problems in nonhuman species took a different approach. They applied methodologies to model the range of a dispersed species as a connected lattice, followed metapopulation dynamics of isolates over a long time scale, and evaluated the data for goodness of fit to proposed models (25). This approach incorporated a modern understanding of the extinction process in local populations, derived from the biogeographic tradition of ecologist Robert MacArthur (26), and revealed the dangers of focusing too closely on a few, well-described subpopulations for long-term inferences about local stability and persistence. Apparent local continuity of a population is often the result of continual immigration and replacement.

The initial dispersal of modern humans from Africa, on a route through North and East Africa, has now been documented, following the African mitochondrial haplogroup M (a group of linked nucleotide sequence variants that share common substitutions), into Saudi Arabia and then western India (27). This pattern is supported by whole mitochondrial genome sequencing (28) and Y haplotype analysis with the M168 mutation (29). Allelic genealogies confirm that Africa has apparently acted as a source population throughout the history of our species, where humans achieved high rates of reproductive success that allowed them to expand and fill an abundance of ecological niches, even those held for millions of years by other numerically abundant taxa such as apes and monkeys.

To the morphologist studying adaptation or the clinical geneticist following disease penetrance, neutral alleles are of little direct interest. Yet, to the evolutionary geneticist concerned with setting a molecular clock to time a dispersal event on the basis of estimated rates of mutation and fixation, they are crucial. This disparity is at the heart of our problem in understanding dispersal dynamics of humans. Clearly, much polymorphism is retained in population isolates, but estimating the age of the isolate on the basis of the age of a particular allele it carries is not straightforward, because some polymorphisms are even transspecies, as in the MHC.

The evolutionary processes interacting to maintain certain populations as sources for replenishing the species' genetic variation on a wide geographic scale, while other populations acted as sinks of allelic diversity, maintained only by immigration with allelic replacement (30), have been de-emphasized when it comes to considering human populations. Instead, a view has been widely promoted by some anthropologists who think that in the terminal stages of human evolution leading to modern people, culture as revealed by stone tool assemblages was overwhelmingly similar from the tip of South Africa to the Manchurian peninsula (31). However, dominant cultures need not imply uniform genetic polymorphisms among their members, as our television commercials demonstrating the power of the Internet show us.

Understanding patterns of DNA sequence variation among people now requires a multidisciplinary perspective embracing evolutionary ecology, genetics, and paleoanthropology. However, the newly emerging field of behavioral ecology, where life history, demography, and population genetics find a fresh interface with theory governing mate choice and sexual selection, can lead to a deeper conceptual unification of studies centering on human dispersal. These contribute the rich texture to discussions of population persistence and the positive correlations between different measures of spatial distribution (32). With the efficient choice of genetic markers for particular problems (33), we are now set to explore the full multidimensional cultural and biological niche occupied by modern humans.

The Linkage Between Population Genetics, Archaeology, and the Spread of Language Families

For most of the 20th century, the fields of genetics, linguistics, and archeology evolved in parallel, and only in the past 15 years has a substantial change, referred to as the “Emerging Synthesis” by archeologist Colin Renfrew, been attempted. Three topics fully illustrate how this change is coming about. They include the post-African expansion of modern humans into Asia and Australia, the complex mixing of populations in Oceania, and the spread of immigrants across Beringia into the Americas.

Post-African expansion of modern humans. Humans leaving Africa for Eurasia may have taken a coastal route across Saudi Arabia, through Iraq and Iran, to Pakistan, along Indian coastlines, and then further across East Asia until they reached Southeast Asian island regions that were in various stages of sea level change (Sundaland) corresponding to glacial/interglacial cycles. Today, these regions are settled by people speaking mainly the Pacific, Austric, and Eurasian/American language superfamilies (34), and account for most of the world's current population (Fig. 1).

Figure 1

Dispersal of modern humans out of Africa following a hypothetical coastal route across South Asia in the initial event, with associated major mtDNA and Y chromosome haplotypes.

Figure 2

Map showing the correlation between the Austronesian language family and the use of outrigger canoes [modified from Roger Green's essay on the Lapita cultural complex (51)]. The dashed line shows the distribution of Austronesian languages, and the solid line shows the distribution of outrigger canoes.

Figure 3

Diagram illustrating how different parts of a canoe can be lined up with specific stars or constellations in order to steer on a specific course [from (52)]. Skilled Micronesian navigators can align almost any part of a canoe, bow to stern, with the stars for wayfinding. This particular orientation is used to sail from Saipan in the Northern Marianas to Pikelot in the Carolines, a distance of ∼400 nautical miles.

For dispersalists who continued southeast, the indeterminate phrase non-Austronesian language (NAN) is often invoked, associated with linguistic groups so different that they often cannot be linked to each other. Human occupation of Australia dates to ∼60,000 yr B.P., and modern humans were in New Guinea by at least 45,000 yr B.P. (35). Any date derived from molecular genetics for a locus that estimates the worldwide MRCA of non-African deep lineages as much younger than this archaeological date clearly suffers from calibration problems, because human occupation of Australasia has been continuous and admixture with other groups is easily detected with SNPs or classical markers (14).

Modern Pakistani, Indian, and Sinhalese donors, examined for combinations of mini- and microsatellite loci, along with a number of Y chromosome and mtDNA markers (24), show varying degrees of diversity, which is expected from their geographic position and ability to receive waves of migrants pulsing from Africa and West Asia at different times. DYS287 or Y chromosome Alu insertion polymorphism also clearly demonstrate the gradual decline in insert-positive Y chromosomes from Africa to East Asia, reaching a transition point from polymorphic levels (1 to 5%) to private polymorphism in Pakistan. Tribal populations in southern India today are strongly structured and show low heterozygosity, along with evidence of genetic drift with nuclear loci, in contrast to the culturally recent caste populations of the north. The pattern is expected, based on the observation using mtDNA mismatch distributions, that farming populations are largely unimodal and expanding, whereas hunter-gatherer populations are multimodal and continually face population bottlenecks (32).

Further subtyping of Y polymorphisms at markers SRY 1532, M9, and 92R7 also reveals the evidence of repeated influxes of men carrying different Y chromosome haplotypes into South Asia. Y variants, shared across Europe and Asia today, imply that considerable local variation in frequency may be associated with the disintegration and dispersal of the ancient Eurasiatic language family, along with interactions between NAN and Austric, and later, Austronesian speakers.

However, such variation in the Y chromosome was missing from the men sampled from Papua New Guinea, the Trobriand Islands, and Australia, even though mtDNA data for these regions indicate widespread diversity and survival of some of the deepest non-African matrilines discovered so far among modern humans (7). The SNP study (14) of this region also emphasizes the independent genetic histories of Australia and Papua New Guinea. A striking difference between the persistence patterns of autosomal and biparental versus male and female–transmitted markers highlights the continued need to investigate how variance in reproductive success and sexual selection actually works to shape the distribution and abundance of genetic markers in humans.

Oceanic populations. At the other extreme of time, the most recent expansion of a post-agriculture population is associated with the spread of the Austronesian language family, dating to the past 6000 years. Although humans had reached some areas of Near Oceania by 35,000 yr B.P. in the first dispersal wave out of Africa, there was an apparent pause of almost 30,000 years until the next group of migrants entered the area with the technology and motivation to continue voyaging past the Bismarck Archipelago (36). The settlement of Polynesia and Micronesia thus represents the last major prehistoric human dispersal event (Figs. 2 and 3).

Except for the New Guinea highlands, archaeological evidence for Neolithic tools, pottery, farming, or domestic animals is missing until almost 6000 years ago from everywhere else in the Pacific. By 7000 yr B.P., rice and millet cultivation was well under way in central China and had spread to Vietnam, Thailand, and the southern coast of China. These elements appear in Taiwan about 6000 yr B.P., when some agriculturists crossed over the Formosa Strait and then spread their culture across isolated coastal areas of Near Oceania, to finally reach Remote Oceanian islands such as New Zealand by 1200 A.D. (36). This evidence has been summarized by Jared Diamond as “the Polynesian Express from Taiwan” (37) but has been criticized by others as an oversimplification of the process (38, 39). What group of people carried their culture so rapidly and thoroughly out into the Pacific? A lively debate today centers around this question, in part, owing to the antiquity of settlement in island Melanesia, a new linguistic analysis, and evidence of allelic genealogies segregating in Remote Oceanian populations that indicate admixture with Melanesians (14, 17, 40).

It has been difficult for many biologists to both amply sample populations in Remote Oceania and avoid stereotyping those groups that are used as donors, because geography and acculturation processes work in somewhat opposite ways. Distance normally increases isolation, but many Pacific island cultures are open and inclusive, with words borrowed from a polyglot of influences. Statistical methods used on genetic data must be powerful enough to differentiate between initial voyages of colonization and secondary contact that was postcolonization (14, 17). This is because Polynesian navigation systems and traditional forms of wayfinding were developed by a culture that viewed vast expanses of open ocean as a superhighway where transportation was regulated according to seasons and predictable currents, and not an insurmountable barrier. Linguistic terms for flight of birds and bats were similar to the terms used for sailing across the water, and the words to describe parts of a sail often had roots in the same words for important constellations. It was therefore logical that language became the basis for assembling one hypothesis about the spread of Austronesian speakers, associated with the rapid colonization of the Pacific.

Roger Blust's ongoing compilation of an Austronesian dictionary (41) placed 9 of 10 primary branches of Austronesian languages in Taiwan and stimulated two linguists to attempt to map a linguistic phylogeny of Austronesian languages onto a hypothetical model of island way stations, using the Diamond hypothesis (42). Taiwan was the source, and isolated Polynesian atolls were the population sink. This analysis deliberately discounted the potential influences between Melanesians and proto-Polynesians, concerning itself primarily with relationships among Austronesian speakers. These linguists employed a parsimony-based tree-building methodology common in phylogenetic analysis of DNA sequences, using individual language elements instead of nucleotides at a given position. Their consensus tree grouped Austronesian languages in an order that mimicked the archaeological record of island settlement from a projected homeland in Taiwan.

Mitochondrial DNA data are consistent with this proposal, but they cannot conclusively rule out that island Southeast Asia (in a triangle formed by Taiwan, Sumatra, and Timor) would also be a suitable homeland for the proto-Polynesians. Recent Y haplotype analysis of 551 donors representing 36 Asian populations suggests that the colonization of Taiwan was a separate event from the migration of people into Polynesia, because separate subsets of haplotypes are found in each group, with no substantial contribution of Melanesian Y haplotypes to Polynesian populations (42). However, analysis (14) confirms that two Y chromosome haplotypes common in Southeast Asia can be found among coastal and island Melanesian populations. Additional sampling of mitochondrial variation among island Melanesia suggests that female-mediated gene flow is common among coastal populations (43), with the island of Ponope acting as a hub for gene flow between Remote and Near Oceanian populations (17), perhaps centering on the trade in kava, a root plant with a variety of recreational, sacred, and medicinal uses.

Source-sink population dynamics may differ by sex for Pacific islanders, depending on exposure to infectious diseases and the resources each needed to migrate, because genetic drift predicts similar degrees of lineage loss with equivalent effective population sizes. Three possibilities come easily to mind. First, there may be differential exposure to water- or insect-borne pathogens based on activities restricted to men versus women. Second, females may require loaded, double-hulled voyaging canoes to effectively disperse, settle, and spread their alleles. In contrast, males may have ample opportunities for extrapair copulations during short fishing or trading trips, but repeated and sustained hostilities between groups could periodically sweep certain male lineages from populations. Even if sufficient natural resources existed to support large populations of humans on some high islands, societies with strong hierarchical structures for apportioning status might limit the mating opportunities of most men, resulting in strong reproductive skews. In these cases, resident females may experience fewer problems expanding their genealogical lineages through their resident daughters.

The first Americans. An intermediate time scale between the most recent and the most ancient human dispersal events, along with variable levels of population structure and diversity, characterizes the groups that contributed to the peopling of the Americas. Many experts expected that this problem would be the first big human dispersal issue to be solved conclusively with molecular data, but that optimism has not been borne out. The exact timing and the number of waves of immigrants that shaped population and linguistic diversity of indigenous people in the Americas are still controversial. Mitochrondrial DNA data showing four major haplogroups were alternately interpreted to favor one recent, continuous, and sustained immigration; multiple movements separated by thousands of years that corresponded to different language families immigrating in; or a single wave starting 43,000 yr B.P. (44).

Only one thing is certain, and that is the “Clovis First” archaeological model of a late entry of migrants into North America is unsupported by the bulk of new archaeological and genetic evidence, and coastal routes of expansion have become more popular among archaeologists (45). Some ancient American skeletons additionally display puzzling morphological features, most notably Kennewick Man from the Pacific Northwest (∼9500 years old) and the Spirit Cave Mummy from North America's Great Basin (∼10,600 years old). Kennewick, in one cranial reconstruction (46), has been closely linked to the Ainu of northern Japan, rather than to Mongolians, Native Americans, or Siberians. As new markers, such as additional X group haplotypes, Alu insertion sequences, and Y polymorphisms, are identified and tracked, we will likely have greater insight into this problem as well as resolve the issue of possible direct but limited contact between coastal Native American populations and Pacific islanders.

The Native American populations remain important from a population genetic perspective because they unite the concepts of genetic population bottlenecks, geographic isolation of founder populations, and linguistic diversification, as first noted in the early 1970s (16). Yet, studies of some well-defined linguistic isolates, with populations under 4000 individuals and demographic histories reconstructed and observed during 250 years of field work, show no evidence of profound population bottlenecks (47) and demonstrate a discordance between genetic and linguistic classifications. We would predict that in isolation, fluctuating population sizes should result in the random extinction of lineages through genetic drift. Kinship and cultural systems that stress exogamy are therefore critically important in counteracting the affects of inbreeding depression during extensive periods of isolation from neighbors. In the case of Amazonian tribal populations, the intermittent use of fishing camps along river tributaries may have contributed to increased gene flow during years characterized by heavy rainfall in the early Holocene (after 12,000 yr B.P.). Local isolates in the tropical Americas clearly fall into the metapopulation model of dynamic subpopulations periodically driven to extinction and replaced or episodically enriched from stable source populations in richer habitats (48).


We can contemplate the expansion of our understanding the partitioning of genetic diversity in human isolates to include Neandertal populations, representing a clade of archaic humans in western Europe, the Middle East, and central Asia that showed polytypic morphologies akin to modern humans. With the cloning of sequences from Vindija cave in Croatia (49), we now have information from three of these archaic humans showing that as an assemblage compared to modern humans, they display mtDNA diversity within the range of that noted for modern African and Oceanian populations. Even though there is no control for geography or age of the fossils, this level of diversity is still far lower than comparable estimates for either common chimpanzees or gorillas.

The result underscores our notion that human effective population sizes remained small until quite recently, repeated extinctions and founder events were common, and shared cultural innovations unique to our species affected reproductive success, and perhaps even linguistic diversification. Innovations that were intellectual but did not result in simultaneous changes in the material record must be identified in order to account for the rapid spread of modern humans into all regions of the planet. Demographic expansion implies children, yet more pages are devoted to illustrating the manufacture and use of ancient weapons (including those that are small in size and made of delicate raw materials) than to describing the diversity of toys for infants in archaeological texts of any era.

In the next 5 years, a synthetic view of the human gene pool will emerge from global patterns of genetic diversity scored on DNA chips. It is easy to forget how late in history our quantitative understanding of human population structure has been obtained and how our view of the fossil record for humans is now altered. Even though we share a close common ancestry, the human gene pool is profoundly different from that of chimpanzees, where a single social group may show more mtDNA variation than what has been assayed for all of humanity (50). Knowledge about the past, as reconstructed with population genetic principles, will come directly from the screening for human mutations. It should continue to humble us that the technology for detecting variation and extracting once-lost secrets is far in advance of the ethical problems posed by the possession of such knowledge.


Stay Connected to Science

Navigate This Article