Multiproxy evidence highlights a complex evolutionary legacy of maize in South America

See allHide authors and affiliations

Science  14 Dec 2018:
Vol. 362, Issue 6420, pp. 1309-1313
DOI: 10.1126/science.aav0207

The complexity of maize domestication

Maize originated in what is now central Mexico about 9000 years ago and spread throughout the Americas before European contact. Kistler et al. applied genomic analysis to ancient and extant South American maize lineages to investigate the genetic changes that accompanied domestication (see the Perspective by Zeder). The origin of modern maize cultivars likely involved a “semidomesticated” lineage that moved out of Mexico. Later improvements then occurred among multiple South American populations, including those in southwestern Amazonia.

Science, this issue p. 1309; see also p. 1246


Domesticated maize evolved from wild teosinte under human influences in Mexico beginning around 9000 years before the present (yr B.P.), traversed Central America by ~7500 yr B.P., and spread into South America by ~6500 yr B.P. Landrace and archaeological maize genomes from South America suggest that the ancestral population to South American maize was brought out of the domestication center in Mexico and became isolated from the wild teosinte gene pool before traits of domesticated maize were fixed. Deeply structured lineages then evolved within South America out of this partially domesticated progenitor population. Genomic, linguistic, archaeological, and paleoecological data suggest that the southwestern Amazon was a secondary improvement center for partially domesticated maize. Multiple waves of human-mediated dispersal are responsible for the diversity and biogeography of modern South American maize.

Maize (Zea mays ssp. mays) evolved from wild Balsas teosinte (Z. mays ssp. parviglumis, hereafter parviglumis) in modern-day lowland Mexico beginning around 9000 years ago (1) and spread to dominate food production systems throughout much of the Americas by the beginning of European colonization in the 15th century. Archaeological and genetic data from ancient DNA studies have highlighted aspects of maize natural history, including the evolution and fixation of agricultural traits and adaptation of maize to diverse new environments (26). Archaeological remains establish that maize was brought to the southwestern United States and the Colorado Plateau by ~4000 years before the present (yr B.P.) (7), traversing Panama by ~7500 yr B.P. (8) and arriving in Coastal Peru (9), the Andes (10), and lowland Bolivian Amazon (11) between ~6500 and 6300 yr B.P. (Fig. 1 and table S1). Today, maize is a staple food species, yielding over 6% of all food calories for humans, plus more in livestock feed and processed foods (12).

Fig. 1 Distribution and ancestry proportions of maize genomes and principal components analysis (PCA) of maize and parviglumis genomes.

Pie colors reflect ancestral proportions estimated by means of model-based clustering (k = 5) of modern maize genomes (15). Archaeological genomes were projected onto the PCA to mitigate degradation biases (15). Dates reflect early regional maize archaeobotanical remains (table S1 and fig. S1). C., Central; Mex., Mexico; PC1, First principal component; PC2, second principal component.

Maize domestication is thought to have occurred once, with little subsequent gene flow from parviglumis (13, 14). However, archaeogenomic evidence reveals maize was only partially domesticated in Mexico by ~5300 yr B.P. (2, 3), carrying a mixture of wild-type and maize-like alleles at loci involved in the domestication syndrome. For example, the domestic-type TGA1 gene variant responsible for eliminating the tough teosinte fruitcase was already present by this time period (2), whereas other loci associated with changes to seed dispersal and starch production during domestication still carried wild-type variants (2, 3). The state of partial domestication sets these archaeogenomes apart from modern fully domesticated maize, which carries a complete, stable set of domestication alleles conferring the domesticated phenotype. This partially domesticated maize was grown in Mexico well after maize had become established in South America, which raises the question of how South American maize came to possess the full complement of fixed domestication traits. To reconcile archaeobotanical and genomic data concerning the domestication and dispersal history of maize in South America, we sequenced maize genomes from 40 indigenous landraces and 9 archaeological samples from South America (Fig. 1 and tables S2 and S3) and analyzed them alongside published modern (n = 68) and ancient (n = 2) maize and teosinte genomes (15).

Model-based clustering highlights extensive admixture and population overlap between maize populations, but we observe several robust lineages (15) (Fig. 1): (i) the Andes and the Pacific coast of South America; (ii) lowland South America, including the Amazon and Brazilian Savanna; (iii) North America north of the domestication center; and (iv) highland Mexico and Central America, previously observed to contain introgression from wild Z. mays ssp. mexicana (14, 16). We also observe a widespread “Pan-American” lineage spanning from northern Mexico into lowland South America. In a previous analysis based on multiple nuclear microsatellites, maize formed a monophyletic subset of teosinte, with South American lineages as the most derived elements in a phylogenetic tree (13). This pattern has been interpreted as evidence for a single episode of domestication followed by dispersal culminating in the Andes after maize became established throughout the rest of the range of cultivation (13). However, archaeological evidence for persistent maize cultivation indicates it was established in numerous locations throughout South America by ~6500 to 4000 yr B.P. regionally. On the basis of this information, we propose that South American maize was carried away from the Mesoamerican domestication center soon after initial stages of domestication and may have been one of several partially domesticated maize lineages that independently fissioned from the primary gene pool after the onset of domestication in Mexico (Fig. 2).

Fig. 2 A stratified domestication model for maize.

(A) Schematic comparing the conventional domestication model under which maize became fully domesticated and then dispersed throughout the Americas, versus a stratified domestication model in which partially domesticated subpopulations became reproductively isolated before the fixation of the domestication syndrome. (B) f4 statistics demonstrating excess allele sharing between the Pan-American lineage and wild parviglumis compared with other maize, revealing nonuniform crop-wild gene flow after initial domestication. Bars are three standard errors under a block jackknife (15). (C) Bar plot of enriched parviglumis contributions to ancestry near domestication genes, in which each bar is a parviglumis genome contributing to South American maize (blue) or other maize (red) Ddom enrichment. Geographic segregation in Ddom enrichment among parviglumis genomes suggests that the domestication syndrome was not yet fixed in a common domesticated ancestor of modern maize.

Using f4 statistics (17), we observe asymmetry in parviglumis ancestry among modern maize populations (Fig. 2). This reveals that maize-parviglumis gene flow was ongoing in some lineages after others became reproductively isolated. Whereas later gene flow from Z. mays ssp. mexicana, a highland subspecies of teosinte, is well documented in some maize (6, 14, 16), this finding contradicts the assumption that dispersal and diversification throughout the Americas happened only after the severance of gene flow from parviglumis (13, 14). Thus, while South American maize became reproductively isolated from the wild progenitor when it was carried away from the domestication center, maize lineages remaining in Mexico underwent continued crop-wild gene flow before diversifying into extant landraces over subsequent millennia. The Pan-American lineage shows excess shared ancestry with parviglumis relative to all other major groups (Fig. 2B), suggesting that this group emerged from the domestication center and dispersed after other maize lineages became regionally established. Because the Pan-American lineage carries excess parviglumis ancestry relative to the strictly South American lineages, it appears to represent a second episode of maize dispersal from Mesoamerica, reinforcing two major waves of maize movement into South America as previously suggested (5).

The genomes of two ancient maize cobs from the Tehuacan Valley of Mexico at ~5300 yr B.P. recently revealed a state of partial domestication, a mixture of maize- and parviglumis-like alleles at loci involved in domestication (2, 3). This is puzzling, given the sustained use of domesticated maize from ~6500 yr B.P. onward in South America (Fig. 1 and table S1) (11, 18). However, principal components analysis and f3 statistics reveal considerable genomic distance between these two Mesoamerican archaeogenomes (Fig. 1 and fig. S2), and f3 statistics confirm that the SM10 genome (3) is more maize-like, whereas the Tehuacan162 genome (2) is more parviglumis-like (fig. S2). In total, the two genomes are from the same region and time period, and both are partially domesticated, but otherwise, they appear to represent independent samples out of a diverse semidomesticated population containing an array of domestic and wild-type alleles.

Given the state of partial domestication observed in the Tehuacan and San Marcos genomes (2, 3), early South American maize emerging from their common ancestral population would likely also have been a partially domesticated form of maize containing an assortment of wild and domestic alleles. This ancestral population likely harbored the building blocks for fully domesticated maize but lacked the allelic fixation and linkage of the modern domesticated crop. We expect that in this ancestral semidomesticated population, domestication loci under ongoing selection would have been continually decoupled from their chromosomal neighborhood through recombination (19, 20), resulting in an enrichment of the original parviglumis genomic background near domestication genes relative to its genome-wide retention. If the domestication syndrome was fully established in the common ancestor of all extant maize, no modern parviglumis genome should carry this enriched affinity to domestication loci to differing degrees in different maize lineages, because the same background would have become fixed in their common ancestor. However, if South American maize became isolated while fundamental domestication was still ongoing, as we hypothesize, then components of the parviglumis genomic background are expected to differ between early stratified maize lineages. Therefore in this case, modern parviglumis genomes would carry a specifically South American or non–South American affinity for the enriched wild-type background near domestication loci.

We compared D-statistics (21) across the whole genome (DWG) and within 10 kb of 186 known domestication loci (Ddom) to test for these asymmetrical parviglumis contributions between pairs of extant South American and non–South American maize around domestication genes (15). We found that parviglumis enrichment associated with domestication is highly patterned among major ancestry groups, with several parviglumis genomes associated exclusively with either South American or non–South American Ddom enrichment and a significant association with ancestry overall (Fig. 2C; χ2 test P = 2.74 × 10−6). That is, we observe that parviglumis ancestry is enriched near domestication genes in a pattern demonstrating that domestication-associated selection was still ongoing after the stratification of the major extant lineages from their semidomesticated ancestral population. This pattern validates a model in which the ancestral population in South America was itself only partially domesticated during its dispersal away from the domestication center.

In total, we find support for a model of stratified domestication in maize (Fig. 2). The initial stages of maize domestication likely occurred only once within a diverse wild Balsas River basin gene pool, as previously suggested (13). However, before the domestication syndrome was fixed and stable, multiple lineages separated, and selection pressures on domestication loci continued independently outside of the primary domestication center. Some of these divergent semidomesticated populations likely led to terminal lineages lacking sufficient diversity and ecological context to continue the domestication process. Others, like ancestral South American maize, evolved into fully domesticated lineages under continuing anthropogenic pressures.

The earliest evidence places maize in the southwestern Amazon by ~6500 yr B.P. (11), a region serving as a geographic interface of the lowland and Andean-Pacific genetic lineages (Fig. 1). We hypothesize that the southwestern Amazon may have been a secondary improvement center for the partially domesticated crop before the divergence of the two South American groups. When maize arrived, southwestern Amazonia was a plant domestication hotspot (22). Additionally, microfossil assemblages (11, 22) reveal the presence of polyculture (mixed cropping) from ~6500 yr B.P. onward, such that a new crop species could be integrated into existing food production systems supporting domestication activities.

Pollen and phytolith data demonstrate a west-to-east pattern of maize expansion across the Amazon and show that maize was consistently present from ~4300 yr B.P. onward in the eastern Amazon (18). Initially, maize in the eastern Amazon was part of a polyculture agroforestry system combining annual crop cultivation with wild resource use and low-level management through burning (18). Maize cultivation proceeded alongside the progressive enrichment of edible forest species and subsequent waves of new crop arrivals, including sweet potato (~3200 yr B.P.), manioc (~2250 yr B.P.), and squash (~600 yr B.P.). The development of anthropogenically enriched Amazonian Dark Earth soils ~2000 yr B.P. (23) enabled the expansion and intensification of maize cultivation, likely increasing carrying capacity to sustain growing populations in the eastern Amazon (18). The extant endemic maize lineage in lowland South America likely originated with this long-term process involving millennia of evolving land-use practices.

Several landraces and two archaeogenomes (~700 yr B.P.) in eastern Brazil also show strong genetic links to Andean maize near the southwestern Amazon (Fig. 3). This pattern closely mirrors linguistic patterns linking Andean, Amazonian, and eastern Brazilian maize cultivation and suggests a second major west-to-east cultural expansion of maize traditions. A loanword for maize with possible Andean origins was transmitted from Amazonian Arawak languages—most likely originating in southwest Amazonia (24)—into Macro-Jê stock languages in the Brazilian savanna and Atlantic coast (24) (fig. S3). Archaeological evidence suggests this expansion occurred ~1200 to 1000 yr B.P. with the spread of a cultural horizon of geometric enclosures and mound ring villages throughout southern Amazonia and ring villages in the central Brazilian savannas and the Atlantic coast (Fig. 3 and fig. S4) (2527). This process is roughly contemporaneous with archaeological Andean-admixed genomes in the area. Thus, Arawak speakers likely brought nonlocal Andean-Pacific maize lineages into a landscape where maize was an established component of long-term land management and food production strategies.

Fig. 3 Genomic relatedness overlapping linguistic and archaeological patterns in lowland South America.

Maize genomes with ≥50% Andean-Pacific ancestry and ≥99% South American ancestry are connected by lines with the two other genomes with which they share the highest outgroup-f3 value. Geometric enclosures and mound ring villages of southern Amazonia broadly coincide with the expansion of Arawak languages, whereas the Uru and Aratu ring villages coincide with the distribution of Macro-Jê languages (15) (figs. S3 and S4). Only the earliest regional dates for each archaeological tradition are shown (see table S4). Macro-Jê languages borrowing an Arawak loanword for “maize” are based on (24). Arawak homeland is shown approximately in the modern location of Apurinã, in accordance with (29).

Finally, we quantified the mutation load in maize genomes—the accumulation of potentially deleterious alleles due to drift and selection (16)—using a phylogenetic framework to estimate evolutionary constraint (15). We observe that South American lineages carry a higher mutation load than other maize lineages. Mutation load increases linearly with distance from the domestication center and is linked with ancestry, and the Andean-Pacific group carries the highest burden of potentially deleterious variants (Fig. 4) (15). The mutation load in the Andes has been attributed to selection for high-altitude adaptations (16), but the elevated mutation load in lowland maize also suggests a history of shared selection and drift effects prior to highland adaptation. These processes would likely have included a founder episode as maize was carried into South America, persistent selection pressures for regional adaptation, and the latter stages of domestication after isolation from the founding gene pool. We also find that Andean and Pacific maize from ~1000 yr B.P. to the early colonial period has a low mutation load compared with its modern Andean-Pacific counterparts (Wilcoxon P = 0.002477) (15) (Fig. 4); although still elevated compared with non–South American lineages. It is possible that Andean maize experienced a wave of deleterious allele accumulation as human and crop populations were disrupted by changes caused by the arrival of Europeans (28). Alternatively, the increasing mutation load in modern crops could represent the ongoing effects of burdensome allele accumulation over nine millennia of human intervention.

Fig. 4 Genome-wide mutation load across ancestry groups (non-admixed samples only in top panel) and load compared with distance to the domestication center.

Mutation load is calculated as a proportion of the theoretical maximum load over observed single-nucleotide polymorphisms, and ancient load scores are rescaled for missingness using a Procrustes transformation (15). Euclidean distance in degrees to the Balsas River valley is shown. And./Pac., Andean-Pacific.

Supplementary Materials

Materials and Methods

Figs. S1 to S4

Tables S1 to S4

References (3194)

References and Notes

  1. Supplementary materials are available online.
Acknowledgments: We thank Admera Health for assistance with sequence data collection and D. Piperno for comments on the manuscript. Funding: Work was supported by Natural Environment Research Council Independent Research Fellowship NE/L012030/1 to L.K., and a sub-award from Science and Technology Facilities Council grant ST/K001760/1 (PI Thomas Meagher, co-I Peter Kille) to L.K. and R.G.A. Author contributions: Study conceptualization and design: L.K., F.O.F, and R.G.A.; Sample acquisition: F.O.F., A.P.P., C.G., B.A., and M.T.P.G.; Genomic data collection: L.K., F.O.F., O.S., N.W., and R.R.M.; Genomic data analysis: L.K. and N.A.S.P.; Archaeology and linguistic background and interpretation: J.G.S., S.Y.M., F.O.F., F.M.C., and E.R.R.; Interpretation and integration of results: L.K., S.Y.M., J.G.S., F.M.C., J.R.-M., N.W., F.O.F., and R.G.A.; Visualization: L.K., S.Y.M., J.G.S., H.L., and N.A.S.P.; Manuscript drafting: L.K., S.Y.M., and J.G.S., with input from N.A.S.P., F.O.F., and R.G.A. All authors reviewed and contributed to the final manuscript. Competing interests: We declare no competing interests. Data and materials availability: Raw sequence data, NCBI Sequence Read Archive accession SRP152500. In-house scripts for data handling and analysis (allele frequency estimation, f and D statistic calculation, genome alignment conformation for mutation load analysis, and exclusion amplification duplicate removal), genome-wide GERP scoring details, genomic mappability bed file, SNP calls, and mapDamage results are available in (30). Germplasm for newly sequenced maize landraces is curated at the Embrapa gene bank in Brasilia, Brazil, and Programa Cooperativo de Investigaciones en Maíz in Peru, which provided sample material for this study to F.O.F. and C.G. Archaeological samples from Santa, Chorrillos, Ica, and Jujuy were originally obtained from the PSUM Archaeological Project, Paurarku Archaeological Project and Samaca Archaeological Project, facilitated by archaeologists V. Pimentel, K. Lane, D. Beresford-Jones, and H. Yacobaccio.

Stay Connected to Science


Navigate This Article