Mapping the Origins and Expansion of the Indo-European Language Family

+ See all authors and affiliations

Science  24 Aug 2012:
Vol. 337, Issue 6097, pp. 957-960
DOI: 10.1126/science.1219669
  • Fig. 1

    Inferred geographic origin of the Indo-European language family. (A) Map showing the estimated posterior distribution for the location of the root of the Indo-European language tree under the RRW analysis. Markov chain Monte Carlo (MCMC) sampled locations are plotted in translucent red such that darker areas correspond to increased probability mass. (B) The same distribution under a landscape-based analysis in which movement into water is less likely than movement into land by a factor of 100 (see fig. S5 for results under the other landscape-based models). The blue polygons delineate the proposed origin area under the steppe hypothesis; dark blue represents the initial suggested Kurgan homeland (6) (steppe I), and light blue denotes a later version of the steppe hypothesis (7) (steppe II). The yellow polygon delineates the proposed origin under the Anatolian hypothesis (11). A green star in the steppe region shows the location of the centroid of the sampled languages.

  • Fig. 2

    Map and maximum clade credibility tree showing the diversification of the major Indo-European subfamilies. The tree shows the timing of the emergence of the major branches and their subsequent diversification. The inferred location at the root of each subfamily is shown on the map, colored to match the corresponding branches on the tree. Albanian, Armenian, and Greek subfamilies are shown separately for clarity (inset). Contours represent the 95% (largest), 75%, and 50% HPD regions, based on kernel density estimates (15).

  • Table 1

    Bayes factors comparing support for the Anatolian and steppe hypotheses. We estimated Bayes factors directly, using expectations of a root model indicator function taken over the MCMC samples drawn from the posterior and prior of each hypothesis. Bayes factors greater than 1 favor an Anatolian origin. A Bayes factor of 5 to 20 is taken as substantial support, greater than 20 as strong support, and greater than 100 as decisive (30).

    Bayes factor
    Phylogeographic analysisAnatolian vs. steppe IAnatolian vs. steppe II
    RRW: All languages175.0159.3
    RRW: Ancient languages only1404.21582.6
    RRW: Contemporary languages only12.011.4
    Landscape aware: Diffusion298.2141.9
    Landscape aware: Migration from land into water less likely than from land to land by a factor of 10197.792.3
    Landscape aware: Migration from land into water less likely than from land to land by a factor of 100337.3161.0
    Landscape aware: Sailor236.0111.7

Additional Files

  • Podcast Interview

    From the Science Podcast: Science's Isabelle Boni speaks with Quentin D. Atkinson about the development and spread of Indo-European languages.


    This requires the Flash plug-in (version 8 or higher). JavaScript must be enabled in your browser.

    Download the latest version of the free Flash plug-in.

    Download the interview [MP3]

    Subscribe to the Science Podcast

    The contents of this podcast interview represent the opinion of the author and may go beyond the content of the published paper.

  • Mapping the Origins and Expansion of the Indo-European Language Family

    Remco Bouckaert, Philippe Lemey, Michael Dunn, Simon J. Greenhill, Alexander V. Alekseyenko, Alexei J. Drummond, Russell D. Gray, Marc A. Suchard, Quentin D. Atkinson

    Materials/Methods, Supporting Text, Tables, Figures, and/or References

    Download Supplement
    • Materials and Methods
    • References
    • Figs. S1 to S12
    • Tables S1 to S5
    • Legend for movie S1

    Images, Video, and Other Other Media

    Movie S1
    Movie showing the expansion of the Indo-European languages through time. Contours on the map represent the 95% highest posterior density distribution of the range of Indo-European.

    Additional Data

    BEAST XML file
    BEAST input file for the full relaxed random walk analysis, including age constraints, location data, and cognate information.
    NEXUS tree file
    Maximum clade credibility tree for the full relaxed random walk analysis annotated with summaries for node height estimates, location estimates, posterior probabilities and variation in spatial and cognate diffusion and rates. The file contains two versions of the tree: one with the scaling factor for spatial rate variation ("spatialRate") and one with the scaling factor for cognate substitution rate variation ("cognateRate"). These trees and their annotated information can easily be visualized in FigTree (
    Correction (20 December 2013): Figure S13 has been added and other changes have been made as described in the correction in the 20 December 2013 issue, page 1446.
    The original version is accessible here.

Related Content