Research Article

Human sound systems are shaped by post-Neolithic changes in bite configuration

See allHide authors and affiliations

Science  15 Mar 2019:
Vol. 363, Issue 6432, eaav3218
DOI: 10.1126/science.aav3218

The first fricatives

In 1985, the linguist Charles Hockett proposed that the use of teeth and jaws as tools in hunter-gatherer populations makes consonants produced with lower lip and upper teeth (“f” and “v” sounds) hard to produce. He thus conjectured that these sounds were a recent innovation in human language. Blasi et al. combined paleoanthropology, speech sciences, historical linguistics, and methods from evolutionary biology to provide evidence for a Neolithic global change in the sound systems of the world's languages. Spoken languages have thus been shaped by changes in the human bite configuration owing to changes in dietary and behavioral practices since the Neolithic.

Science, this issue p. eaav3218

Structured Abstract


Human speech manifests itself in spectacular diversity, ranging from ubiquitous sounds such as “m” and “a” to the rare click consonants in some languages of southern Africa. This range is generally thought to have been fixed by biological constraints since at least the emergence of Homo sapiens. At the same time, the abundance of each sound in the languages of the world is commonly taken to depend on how easy the sound is to produce, perceive, and learn. This dependency is also regarded as fixed at the species level.


Given this dependency, we expect that any change in the human apparatus for production, perception, or learning affects the probability—or even the range—of the sounds that languages have. Paleoanthropological evidence suggests that the production apparatus has undergone a fundamental change of just this kind since the Neolithic. Although humans generally start out with vertical and horizontal overlap in their bite configuration (overbite and overjet, respectively), masticatory exertion in the Paleolithic gave rise to an edge-to-edge bite after adolescence. Preservation of overbite and overjet began to persist long into adulthood only with the softer diets that started to become prevalent in the wake of agriculture and intensified food processing. We hypothesize that this post-Neolithic decline of edge-to-edge bite enabled the innovation and spread of a new class of speech sounds that is now present in nearly half of the world’s languages: labiodentals, produced by positioning the lower lip against the upper teeth, such as in “f” or “v.”


Biomechanical models of the speech apparatus show that labiodentals incur about 30% less muscular effort in the overbite and overjet configuration than in the edge-to-edge bite configuration. This difference is not present in similar articulations that place the upper lip, instead of the teeth, against the lower lip (as in bilabial “m,” “w,” or “p”). Our models also show that the overbite and overjet configuration reduces the incidental tooth/lip distance in bilabial articulations to 24 to 70% of their original values, inviting accidental production of labiodentals. The joint effect of a decrease in muscular effort and an increase in accidental production predicts a higher probability of labiodentals in the language of populations where overbite and overjet persist into adulthood. When the persistence of overbite and overjet in a population is approximated by the prevalence of agriculturally produced food, we find that societies described as hunter-gatherers indeed have, on average, only about one-fourth the number of labiodentals exhibited by food-producing societies, after controlling for spatial and phylogenetic correlation. When the persistence is approximated by the increase in food-processing technology over the history of one well-researched language family, Indo-European, we likewise observe a steady increase of the reconstructed probability of labiodental sounds, from a median estimate of about 3% in the proto-language (6000 to 8000 years ago) to a presence of 76% in extant languages.


Our findings reveal that the transition from prehistoric foragers to contemporary societies has had an impact on the human speech apparatus, and therefore on our species’ main mode of communication and social differentiation: spoken language.

Labiodentals depend on bite configuration.

Biomechanical modeling shows that labiodental sounds like “f” are easier to produce (and to accidentally arise) under overbite and overjet (A) than under the edge-to-edge bite (B) that prevailed before the Neolithic (C). Overbite and overjet persisted only when exposed to the softer diets that became characteristic with food production (D versus E) and more recently with intensified food processing (F). Both developments led to a spread of labiodental sounds.


Linguistic diversity, now and in the past, is widely regarded to be independent of biological changes that took place after the emergence of Homo sapiens. We show converging evidence from paleoanthropology, speech biomechanics, ethnography, and historical linguistics that labiodental sounds (such as “f” and “v”) were innovated after the Neolithic. Changes in diet attributable to food-processing technologies modified the human bite from an edge-to-edge configuration to one that preserves adolescent overbite and overjet into adulthood. This change favored the emergence and maintenance of labiodentals. Our findings suggest that language is shaped not only by the contingencies of its history, but also by culturally induced changes in human biology.

Speech is the chief mode of human communication. Its origin reaches deep into the hominin lineage: Critical components such as a continuously descended larynx, a modern hyoid bone, and breathing control were already in place around half a million years ago (1), building on even older capabilities of the primate vocal tract (2). Yet human speech manifests itself in a bewildering diversity of thousands of different sounds attested across the ~7000 extant languages today (3), ranging from the almost ubiquitous point vowels (i, u, and a in English) to the rare click consonants in some of the languages of southern Africa. The mechanisms that generate and maintain this diversity are mainly ascribed to errors in speech production or perception, coupled with sociolinguistic diffusion (46). The uniformitarian assumption in linguistics (7) takes these mechanisms to have become fixed with the emergence of anatomically modern humans. Current theories take this assumption one step further and expect that not only the mechanisms, but also the ecological conditions under which they apply and the linguistic patterns they produce, have stable probabilities (8, 9). When their distributions reach stationarity, sounds and grammar structures are expected to have the same probabilities in all languages (10).

Notwithstanding these ideas, linguist Charles Hockett conjectured that labiodentals—speech sounds including “f” and “v” (see Fig. 1)—are overwhelmingly absent in languages whose speakers live from hunting and gathering, because the associated heavy-wear diet induces an edge-to-edge bite that makes the articulation of labiodentals effortful, which would suggest a post-Neolithic change in speech (11). Hockett’s hypothesis was rejected at the time on the grounds that wear explains bite configuration only partially, and that edge-to-edge bite became less common considerably later than agriculture (12). However, as we show below, recent anthropological evidence has demonstrated that tooth wear (and masticatory exertion more generally) is indeed the principal mechanism of post-adolescent bite change, and that despite considerable variation, there has been an overall decrease of edge-to-edge bite since the Neolithic.

Fig. 1 Mid-sagittal and forward-facing schematics of labiodental and bilabial strictures.

Mid-sagittal views illustrate the passive and active articulators including the supralaryngeal vocal tract, hard and soft palates, nasal cavity, tongue, and lips. (A) In the labiodental stricture, the bottom lip raises to make contact with the upper teeth. (B) In the bilabial stricture, both lips make contact to form closure. (C) Labiodental is visually distinctive by the presence of the upper teeth. (D) True bilabial in which the upper and lower lips are aligned and make contact; the teeth are not visible.

Humans generally start out with horizontal and vertical overlap in their bite configuration (overjet and overbite, respectively), not only in deciduous but also in adolescent dentition. However, substantial tooth wear during the lifespan induces loss of hard tissue and flattening of the occlusal plane and interproximal surfaces. This loss of hard tissue and modification of dental occlusion triggers three major compensatory processes: continuous dental eruption (movement of teeth to compensate for their lost height), mesial drift (migration of teeth in the alveolar bone in the mesial direction to compensate for their loss of interproximal surfaces), and lingual tipping (tilt of anterior teeth in the lingual direction to compensate for incisal and occlusal wear) (13, 14). These processes eventually develop into edge-to-edge bite, such that anterior teeth lose overbite and overjet, forming a tight (and flat) incisal surface contact with few or no irregularities (see Fig. 2). Although the process is modulated by population-specific factors, wear increases monotonically with age (15). This post-adolescent modification is not only characteristic of anatomically modern humans; it likely has its roots far back in the Homo genus (14, 16), where substantial wear effects are found across the documented species (17).

Fig. 2 Adult skulls displaying edge-to-edge bite versus overbite and overjet.

(A) Female, ~30 years old, Arene Candide Cave (Italy), late Upper Paleolithic, displaying edge-to-edge bite. (B) Female, ~30 years old, Schela Cladovei (Romania), Mesolithic, displaying edge-to-edge bite. (C) Male, ~40 years old, Hainburg (Austria), Early Bronze Age (~3600 BP), displaying overbite and overjet. Images are not to scale. [Photo credits: (A) David Frayer, Department of Anthropology, University of Kansas, USA; (B) Mihai Constantinescu, Institutul de Antropologie “Fr. J. Rainer,” Bucharest, Romania; (C) Karin Wiltschke-Schrotta, Department of Anthropology, Naturhistorisches Museum Wien, Austria]

In contrast, individuals living in most contemporary populations typically preserve overbite and overjet long into adulthood (13, 18). Although the sources of occlusal wear are diverse (19), the most common cause of tooth abrasion is food, which produces wear over the entire occlusal surface. Soft diets common in most contemporary populations reduce the number of bites for chewing and the exposure of tooth surfaces to friction due to exogenous material contact, thus reducing wear throughout the lifetime of the individual (14). Moreover, soft diets exert less biomechanical pressure on the jaw. This leads to a shorter mandible (20, 21), representing another potential cause of overbite and overjet preservation. Softening food through cooking and the use of preservation techniques was considerably facilitated by the invention of pottery (22, 23), for which the earliest evidence is found in the Epipaleolithic of eastern Asia (24, 25). Pottery use markedly increased with agriculture and the associated need for storage; therefore, these developments made softer diets more widely accessible. As a result, edge-to-edge bite has become exceedingly rare in populations with access to softer food, and the persistence of overbite and overjet is now largely considered the norm.

Biomechanical modeling of labiodental production

Hockett’s hypothesis suggests that the more recent bite configuration (with overbite and overjet) eases and makes more likely the articulation of labiodentals. To assess the cost of articulation, we adapted a biomechanical simulation of the orofacial structures and musculature to compare overbite and overjet with the edge-to-edge bite configuration using ArtiSynth (26) (see methods). Labiodental stricture is defined using the inverse model (27), which computes the muscle activity required to achieve preestablished time-varying targets. One inverse target is defined for a midline node of the facial mesh found on the superior-posterior surface of the lower lip. This target starts at rest, then is moved to a midsagittal reference vertex on the tip of the central incisors of the maxilla mesh, and then, after a brief sustain, finally returns to its resting position (see Movies 1 and 2 and figs. S1 and S2). Two additional targets are used on either side of the upper lip to assist in raising it slightly. Following earlier work (28, 29), we measure articulatory effort as the integral of the force output of all muscles active in the simulation over time, expressed as a percentage of the total maximum force generation property of all musculature in the model (which is the same in both models).

Movie 1. Simulation of the production of a labiodental fricative (“f” or “v”) under overbite and overjet.

Here and in the other movies, the cyan dots in the lips serve as reference for the measurement of the proximity between lips and anterior upper teeth during the articulation.

Movie 2. Simulation of the production of a labiodental fricative (“f” or “v”) under edge-to-edge bite.

The simulation results indicate that labiodental production in the overbite and overjet condition is approximately 29% less costly than in the edge-to-edge condition (see Fig. 3A). The model also indicates differential force exertion by specific muscles (Fig. 3B). The overbite and overjet condition shows less force for most muscles than the edge-to-edge condition. Among the muscles that show more force in the edge-to-edge bite, the mentalis is important in drawing the lower lip toward the upper incisors. The zygomatic muscles are highly active as well and may complement lower lip retraction and raising, although they primarily act on the corners of the mouth. The lip raisers (levator labii superioris and levator anguli oris) help to clear the upper oral vestibule (preventing collision between the upper and lower lips) to admit the elevation of the lower lip to the maxillary incisors, but the levator anguli oris muscles may also help in drawing the lower lip upward (even though they also act primarily on the corners of the mouth).

Fig. 3 Relative muscle effort in the production of labiodentals between the edge-to-edge and the overbite and overjet bite configurations.

(A) Sum of the total muscle force expressed as a percentage of the total maximum force of all muscles in the model. (B) Specific effort by muscle. Overall, labiodental articulation incurs less muscular effort in the overbite and overjet configuration than in the edge-to-edge configuration.

To determine whether these differences are specific or particularly salient in the articulation of labiodentals, rather than general consequences of bite configuration, we performed a second set of simulations where we examined the production of a pair of bilabial segments (a stop and an approximant such as “p” and “w”), which are close to labiodentals in place of articulation (see Fig. 1) and which should also be influenced by the positioning of the teeth and the jaw. The two bilabial segments differ in their degree of stricture, with the stop having complete closure of the lips, and the approximant having just a slightly narrowed lip aperture (see Movies 3 to 6). Interestingly, for both of these articulations, the overbite and overjet condition actually requires more muscle effort than the edge-to-edge condition (see Table 1), resulting from the fact that the lips are closer together in the edge-to-edge case. However, the relative increase in effort of labiodental as opposed to bilabial articulation is weaker in the overbite and overjet condition. Together, these results suggest that the transition to an overbite and overjet configuration has a distinct effect on the ease of labiodentals, while bilabials become more difficult.

Movie 3. Simulation of the production of a bilabial stop (“p” or “b”) under overbite and overjet.
Movie 4. Simulation of the production of a bilabial stop (“p” or “b”) under edge-to-edge bite.
Movie 5. Simulation of the production of a bilabial approximant (“w”) under overbite and overjet.
Movie 6. Simulation of the production of a bilabial approximant (“w”) under edge-to-edge bite.
Table 1 Relative total muscle effort of labiodental fricatives, bilabial stops, and bilabial approximants in the biomechanical model under the overbite and overjet configuration and the edge-to-edge configuration.

The total cost of a labiodental fricative in the edge-to-edge bite is used as the unit of comparison, with estimates rounded to the nearest decimal. Labiodentals require more effort than bilabials in either bite configuration, but the effort increases more in the edge-to-edge configuration than in the overbite and overjet configuration.

View this table:

Our models furthermore suggest that bilabial strictures in the overbite and overjet condition display greater incidental labiodental stricture (i.e., less distance between the lower lip and the central maxillary incisors) than in the edge-to-edge case. At the point of maximal bilabial stricture, for bilabial stops, the teeth and lips are 0.8 mm away from a baseline labiodental trajectory in the overbite and overjet condition, in contrast to 3.4 mm in the edge-to-edge condition; for bilabial approximants, the teeth and lips are 5.2 mm away from a baseline labiodental trajectory in the overbite and overjet condition, versus 7.0 mm in the edge-to-edge condition (Fig. 4). This suggests that bilabial targets may be more prone to accidental realization as labiodentals under the overbite and overjet condition. Consistent with this, in the extreme case of class II malocclusion (i.e., excessive overjet) in contemporary populations, labiodentals are sometimes used as substitutes for bilabials because of difficulty achieving contact between the upper and lower lip (30).

Fig. 4 Tooth-lip distance during production across articulations and bite configurations, defined as the distance between the lower lip and upper incisors.

Note that in bilabials, this distance results merely as a by-product of the main stricture of these sounds, which is between the lower and upper lips (not shown in the figure). In the overbite and overjet condition, the approximant and stop bilabials follow a trajectory similar to that of labiodental fricatives.

Biomechanical effects on language change

As in other aspects of language production (31), relative differences in effort and error lead to systematic biases in production frequencies that in turn shape the probability of perceptual recategorization and change over time, especially when errors are perceptually salient (6, 28, 32, 33). Our biomechanical models suggest that such processes were likely to have affected labiodentals. The post-Neolithic emergence of overbite and overjet persistence led to reduced effort when producing labiodentals, and at the same time it increased the risk of accidental labiodental articulation. The resulting labiodentals are indeed perceptually highly distinctive, both aurally (3436) and visually (37) (see Fig. 1). Given this, we hypothesize that labiodentals became likely to establish themselves and spread in populations with overbite and overjet persistence.

Furthermore, consistent with findings in historical linguistics (38), our biomechanical model suggests that bilabials are a common source of labiodentals. Given the fact that bilabials are present in the vast majority of languages [in the PHOIBLE database (3), 95% have “m,” 87% have “p,” 71% have “b”], a transition to labiodentals is therefore expected to be frequent in populations with overbite and overjet persistence. However, our models also show that bilabials incur less effort overall, under either bite configuration (see Table 1). Moreover, bilabials benefit from positive biases arising from biomechanical saturation of lip contact (39, 40) and other physical domains of speech, such as quantal effects in articulatory-acoustic relations and acoustic-auditory properties (41). Therefore, we expect bilabial articulations to remain abundant in the overbite and overjet condition, despite the emergence of labiodentals. More specifically, when bilabials develop into labiodentals, we expect this to happen only in certain positions (e.g., only word-internally), leaving other bilabials (e.g., word-initially) in place; wholesale replacement is expected to be compensated by new bilabials derived from other sources.

Change-relevant biases of the production and perception system are generally small (29, 42, 43), and the findings from our biomechanical models are no exception to this. Furthermore, biases are typically attenuated by additional factors, such as word structure and other aspects of the linguistic system (44), as well as by the complex social diffusion mechanisms that characterize language change (5, 8). However, virtually every word and every articulation of a sound constitutes a trial for potential language change. We therefore expect that over generations of speakers, change-relevant biases leave clear signals when tested in sufficiently large cross-linguistic datasets.

We tested our hypotheses as described below. In the absence of detailed global registers of bite configuration through time, we used two independent proxies to assess the predicted difference in the probability of labiodentals.

Worldwide association between subsistence type and labiodentals

Although the relation between subsistence and bite is mediated by both dietary and nondietary factors (45), food-producing societies are associated with less extensive tooth wear in comparison to hunter-gatherers (19), particularly in the anterior teeth (46). This in turn predicts that food-producing populations are more prone to develop and maintain labiodentals than hunter-gatherer populations. To test this prediction, we used a global dataset of phonological inventories (3) along with associated information on subsistence of the corresponding populations (47, 48) (see methods and the map on the summary page). Labiodentals in our sample include fricatives (“f,” “v”), affricates (“pf,” “bv”), a nasal (“ɱ”), a tap (“ⱱ”), and an approximant (“ʋ”). The distribution of these sounds is heavily skewed globally: 49% of languages sampled have “f,” 37% “v,” and 2% “pf,” while the rest (“bv,” “ɱ,” “ⱱ,” “ʋ”) occur in no more than 1% of languages (n = 1672) (3).

For modeling the effect of subsistence on the number of labiodentals, we adopted a Bayesian mixed-effects Poisson regression model. The model includes random intercepts and slopes for language family and area in order to control for phylogenetic and spatial correlation. As a further control covariate, we included the total number of nonlabiodental segments, because it is important to guarantee that any patterns found in relation to labiodentals cannot be directly explained by an overall increase or decrease of the total number of segments.

We found that, on average, hunter-gatherer societies have only about 27% the number of labiodentals exhibited by food-producing societies [GMR dataset: λ = –1.31, 95% credible interval: (–2.8, –0.27); AUTOTYP dataset: λ = –1.41, 95% credible interval: (–3.37, –0.08); see Fig. 5 and methods]. As a way of evaluating the support for our model, we performed pseudo-Bayesian model averaging (49). We compared three nested models: the full model, a model without a subsistence population effect but with all subsistence group effects, and a model with no fixed or random effects for subsistence (the baseline condition). The results indicate that the first two models leverage more than 90% of the total weight (see fig. S3), thus lending support to the notion that subsistence plays a role in accurately predicting labiodental counts.

Fig. 5 Posterior distributions of target parameters in a Bayesian Poisson regression model with log link function for two subsistence databases, GMR and AUTOTYP.

Vertical lines indicate median (blue) and mean (orange) of each distribution. The number of nonlabiodental segments was used as control. The target parameters comprise the main fixed effect of hunter-gatherer subsistence (with inverted sign so as to maximize comparability between parameters) and four random effects, intercepts, and slopes for both geographic area and linguistic family.

Full models for both datasets yield empirically adequate posterior predictive distributions of labiodentals when compared with the observations (see fig. S4). Although some underdispersion can be detected in the data in comparison to the Poisson model, a Conway-Maxwell-Poisson model tailored for underdispersed distributions yields similar results (see methods). In addition to testing the association in the whole datasets, we performed the same analyses under more stringent conditions against our hypothesis by removing all observations in Australia, because fricatives and labiodentals are very scarce in the region (50, 51), and by using the total number of nonlabiodental fricatives (rather than the total number of nonlabiodental segments) as a main control. All of these alternative models yield smaller estimates for the mean posterior values, but nonetheless they coincide in showing that subsistence carries predictive power (see figs. S5 and S6).

Random intercepts for language family and area are consistently estimated to play a role in the model (see Fig. 5). Random slopes are more spread, yet their means and medians are of a magnitude roughly comparable to that of random intercepts. The mean posterior coefficient of subsistence on the presence of labiodentals, while displaying a wide posterior, is comparable in magnitude to the characteristic differences between linguistic families and areas—in other words, the models suggest that differences in subsistence have as substantial an impact on labiodentals as do the differences among families or geographical areas. This association stands out against the fact that subsistence type has no other known impact on linguistic structure (52, 53).

Although these statistical trends are robust on a worldwide scale, we also examined in more detail the languages spoken by native populations of Greenland, southern Africa, and Australia because for these groups, heavy anterior tooth wear and concomitant edge-to-edge bite are particularly well documented (5457), while at the same time they represent vastly different areas and cultural traditions. We expect that the languages spoken in these three areas will either lack labiodentals, or in the cases where labiodentals do exist, they will be attributable to recent contact—for example, by borrowing words from populations that are less exposed to masticatory exertion.

This expectation is borne out even though the languages in the target areas have very different sound inventories. In some cases, labiodentals are reported, but closer inspection shows that these are artifacts of orthographic practices and not confirmed by closer scrutiny (58) (see supplementary text). In the few cases where labiodentals do exist, they tend to be the result of recent borrowings through contact with European languages that have them (Fig. 6).

Fig. 6 Languages spoken in Greenland, southern Africa (Khoisan), and Australia that have gained labiodentals through language contact.

(A) West Greenlandic has acquired a labiodental through contact with Danish. (B) Some Khoisan languages have labiodentals through contact with Germanic languages such as Afrikaans. (C) Two languages in Australia have labiodentals (3), Kunjen and Ngan'gikurrungkurr, the latter through contact with English.

In Greenland, three dialects bordering on mutual unintelligibility (59) are spoken on the northern, eastern, and southwestern coasts. Both Northwestern and East Greenlandic (which are smaller and endangered) lack labiodental contrasts (60). These dialects lack official standard status, spoken or literary, and their speakers reportedly resist assimilation (61). West Greenlandic, however, is spoken in the hospitable area on the southwest coast, where its people (roughly 45,000 today) have been in sustained contact with Europeans since the 18th century, including Danes, Germans, and Norwegians. West Greenlandic has long been documented as having bilabial fricatives (6264), but only recently has it been documented as having a labiodental, which varies in pronunciation with some older speakers producing it bilabially (65). The voiceless labiodental fricative only occurs in Danish loanwords such as “filmi,” which suggests that a labiodental contrast is being acquired through prolonged language contact.

The non-Bantu languages spoken in southern Africa, collectively referred to as Khoisan, provide another example of recent language contact situations that led to the adoption of labiodentals. Although Khoisan-speaking people share a common ancestry before the southward expansion of Bantu groups into their range (66), linguistically they fall into several unrelated families (67). In Khoekhoe, the largest Khoisan language, “f” and “v” are found in loanwords from Germanic languages (68), barring a “v” that serves as an allophonic variant of “b” in root medial position (69). Similarly, within the San languages (i.e., Tuu and Kx’a), the only words containing “f” and “v” are loans (70, 71). Sandawe, a click language spoken in Tanzania that might be distantly related to some of the Khoisan languages, has the labiodental fricative “f,” but it occurs in only 5 of 1450 dictionary entries (72, 73). In general, labiodentals are absent across the board in the description of Khoisan languages. This is particularly remarkable because these languages possess the largest consonant inventories worldwide (3).

In the Australian languages sampled (N = 343), there are only two languages, Kunjen and Ngan’gikurrungkurr, that reportedly contain a labiodental “f.” Kunjen (Oykangand dialect) has a fricative labiodental “f” with allophones (“f,” “v,” “ϕ,” “β”) (74). In general, dialects of Kunjen have phonological characteristics that are atypical of other Australian languages (74). Ngan’gikurrungkurr has a fricative/stop contrast, which is usually pronounced as a bilabial fricative by older speakers but is usually labiodental among younger speakers (75, 76). This variation across age suggests the effects of increasing influence from English, in parallel to what is observed for West Greenlandic.

The absence of labiodentals in Australia has also been argued to derive from a general constraint against fricatives, potentially linked to chronic otitis media during language acquisition and its detrimental effects on fricative perception (51). However, a general fricative constraint is unlikely to account for the facts alone, for three reasons. First, the incidence of fricatives in Australia is in fact unclear, as there is a sizable number of languages with nonlabiodental fricatives (50). Second, the effect of subsistence type on labiodentals remains even when we explicitly control for the number of fricatives in our statistical model. Third, labiodentals need not be fricatives. Nonfricative labiodentals, such as labiodental taps (as in the realization of German “w”) or labiodental approximants (as in the realization of “v” in some variants of Hindi), are options that could have easily arisen in Australia if there were no bias against labiodentals and only a bias against fricatives.

Increase of labiodentals during the history of Indo-European

We also investigated whether the spread of overbite and overjet persistence was influenced by the development of agricultural and food processing technology over historical time. Our prediction was that over the past few thousand years, the increase in the production and availability of softer diets caused a gradual rise in the probability of developing and maintaining labiodentals.

To test this prediction, we reconstructed the evolution of labiodentals in high-resolution phylogenies of the Indo-European language family. This family is an ideal test case for two reasons. First, the sheer size of the family makes it possible to pick up subtle statistical signals of language change while its wide geographic extent—ranging from Iceland to eastern India—ensures that any such signal is not simply due to the contingencies of local history and language contact. Second, both the linguistic evolution and the cultural evolution of the Indo-European family are well understood, with sources of evidence in unparalleled richness. On the side of language, a century of detailed research on the relationship among the sound systems in the family (77) makes it possible to reconstruct individual articulations with probabilistic models. Critically, we even have detailed records on how sounds were produced from more than 2500 years ago (78), so models can be reliably calibrated in time.

On the side of culture, the evolution of food processing is also relatively well known. There are ample records of processed dairy products and cereals in societies as old as 3500 BP throughout the Indo-European family, reflecting traditions reaching back even earlier (79, 80). In line with this, the archaeological record of skulls offers evidence of overbite and overjet persistence as early as 4300 years ago in Pakistan (81), 3600 years ago in Europe (82), and 2400 years ago in Central India (83). For the western part of the Indo-European family, the Greco-Roman tradition includes strong intensification of food processing in the form of water-driven milling. Industrial milling started at least 2300 years ago and led to a massive spread of softer diets (8486). Together with a growth in dairy processing, this spread even left a trace in several biological pathways of European populations (87). Although an ascertainment bias cannot be excluded, these findings on food processing led us to expect that overbite and overjet persistence, and hence the spread of labiodentals, was particularly prominent in the western part of Indo-European since antiquity.

We based our reconstruction of articulation change on sets of sounds that correspond to each other historically, as traceable through cognate words. Historical linguistics has established 10 such correspondence sets that include labiodentals in at least one daughter language. For example, one set relates Italian “p” to English “f,” as attested in the cognacy of words such as “padre” in Italian and “father” in English; another set relates Italian “v” to English “k,” as attested by words such as Italian “venire” and English “come” (77). These sets are conventionally denoted by symbols that hypothesize their value in the proto-language—for example, *p for the {p, f, …} set and *gw for the {v, k, …} set—but the actual proto-articulation is not in fact established (see methods). To reconstruct the proto-articulation, we applied Bayesian phylogenetic methods to each of the 10 sets, estimating the probability of labiodental versus nonlabiodental articulation over time.

Stochastic character mapping (8890) suggests that in all 10 sets, labiodental articulations are considerably less likely in earlier time periods of the family (Fig. 7). The median probability of a labiodental articulation at the root is about 3% [a result that converges with estimates from BayesTraits (91) (supplementary text and fig. S12)]. There is only one correspondence set that reaches 50%. This is the set *w, as reflected in English “wind” or its Latin cognate “ventus,” which frequently changes back and forth so that reconstruction becomes highly uncertain (see Fig. 7B, supplementary materials, and figs. S11, S17, and S18). Probabilities well above 50% emerge only considerably later, slowly starting between 6000 and 4000 years ago. Under an alternative phylogeny with a younger root estimate (92), the probability estimates at the root are similar, but the increase of labiodentals is estimated to have started only between 3500 and 4500 years ago (see supplementary materials and fig. S33). Allowing for the uncertainty in the phylogenies, both estimates are consistent with the time range in which dairy products and cereal are documented to have become prominent in early Indo-European societies.

Fig. 7 Estimated probabilities of labiodental articulation across sets of historically related sounds in Indo-European.

Sets are labeled by traditional conventions in Indo-European studies (e.g., *p is the set that groups “p” in Italian with “f” in English; *gʷ is the set that groups Italian “v” with English “k”). (A) Extant distribution of labiodental (red) versus nonlabiodental (blue) articulation of cognate sounds (an open square means that the actual articulation is unknown), mapped to one of two phylogenies (104) used in our models (for the other phylogeny, see fig. S8). (B) Estimated labiodental probability as inferred by stochastic character mapping (8890). Languages and clades are ordered as in (A). (C) Traitgram of the simultaneous increase in labiodental probability across correspondence sets. For zoomed-in displays on each individual correspondeme, see figs. S13 to S32; for estimates based on an alternative phylogeny (92), see fig. S33.

Under either phylogeny, our models furthermore suggest a steep rise of labiodental probabilities after about 2500 years. This rise affects several correspondence sets and is particularly pronounced in the Italo-Celtic (from Irish to French in the lower part of Fig. 7A), Germanic (from Gothic to German), and Greek branches. This fits with the strong impact of industrial milling in the western part of the family that began at around the same time. Beyond this, there are other geographical and branch-specific patterns that emerge from our model, but further research is needed to disentangle the interplay of linguistic and social factors that may have accelerated or slowed down individual developments.

Note that the low probability of ancestral labiodentals and the slow rise of this probability do not simply follow from the modern frequency of these sounds. Consider, for example, the first two sets in Fig. 7A: *bh as reflected, for example, in English “brother” and “navel” (with labiodentals depending on position) and *p as reflected in “father” and “have.” Both have labiodental reflexes in 52% and 46% of the extant languages in our sample, but their labiodental probability at the root is only 14% and 18%, respectively (or 9% and 24% in the alternative phylogeny). This means that they are much more likely to have been articulated as nonlabiodental sounds, possibly indeed as “bh” and “p,” in line with traditional hypotheses (77).


Our findings suggest that the wane of edge-to-edge bite configuration since the Neolithic gradually facilitated the emergence and spread of labiodental sounds in languages. Specifically, we find a substantial difference in labiodental production effort and production stability between bite configurations, a well-established mechanism of bite change resulting from wear, a worldwide association between subsistence-induced diet differences and the presence of labiodentals, and a recent increase of labiodentals driven by diet changes in a large and well-studied language family spanning at least six or seven millennia.

Although more work is now needed in reconstructing the precise time course of labiodentals in other language families of the world, our findings open prospects for embedding such work in a more comprehensive, cross-disciplinary view of language dynamics. Specifically, our results suggest that the global socioeconomic history of food affected not only bite configuration, but also humanity’s most distinctive marker of social differentiation, language. If labiodentals served as an index of social class as a result of diet, prestige-driven processes of language change (93, 94) predict their potential to become selected as a general trait of a language. Thus, by combining sociolinguistic and anthropological work, it becomes possible to tackle not only how a change in language has taken place through time, but also why it has taken place, thus addressing an old yet unresolved problem (95).

Our studies reveal that the range and probabilities of speech sounds found across languages are not independent of large-scale changes in human ecology and biology, and thus we can no longer take for granted that the diversity of speech has remained stable since the emergence of Homo sapiens. As such, claims of language universals, deep linguistic history, and language evolution cannot rely on a uniformitarian assumption (96) without considering the wider anthropological context of language.


Biomechanical modeling of labiodental production

We examined orofacial biomechanics using the ArtiSynth biomechanical modeling toolkit ( (26). ArtiSynth has previously been used to examine how (biomechanical) articulatory effort influences speech variation (27, 29). Here, our model consists of a three-dimensional finite-element face (40) integrated with rigid body skeletal structure for the maxilla, mandible, and hyoid bone (40, 97, 98) and connected via point-to-point axial muscles. The geometries, material properties, and coupling are based on these source models. Details on the setup and geometry of the bite models and the articulations can be found in the supplementary text.

All simulations feature an onset (narrowing/constricting) phase from 0.0 to 0.2 s, followed by a sustain phase from 0.2 to 0.3 s, and then a release phase from 0.3 to 0.5 s. The bilabial approximant design has temporal targets for four nodes, three on the edge of the lower lip (one medial and two closer to the corners of the mouth) and a medial one on the upper lip. These cause the lips to protrude slightly and compress in a lip-rounding movement (similar to common productions of “w”). The medial node targets were specified to be lower by 1 mm and to be separated by 4 mm at maximal stricture (from 0.2 to 0.3 s). The bilabial stop design specifies that a single medial node on the edge of the lower lip will move 1.5 mm past a projected location on the upper lip (and hence lying in the upper lip), causing compression of the upper and lower lips. Finally, the labiodental fricative design has temporal targets for three nodes. One of these specifies that a medial node on the edge of the lower lip moves to the location of a reference vertex on the maxilla mesh situated at the point where the upper central incisors meet. To prevent contact between the lower and upper lips during this movement (which is especially important in the edge-to-edge case), the upper lip is raised (and slightly advanced) by targets for two laterally situated nodes. Note that both conditions were subject to the same set of relative inverse target specifications to ensure full comparability. Also, note that the simulations were only of the articulatory-biomechanical properties (i.e., aerodynamics and acoustics were not simulated).

Worldwide association between subsistence type and labiodentals

We used the phonological inventory data and the phonetic feature system from PHOIBLE Online (3). For the subsistence data, we used two sources: (i) the data from AUTOTYP (47) where speaker populations are classified as “hunting/fishing/gathering/foraging” versus “food production” based on the majority of the population’s diet until recently, and (ii) the list of languages spoken by hunter-gatherers (48) with the extra condition that all Australian societies are coded as hunter-gatherers. This list is meant by the compilers to exhaust the known hunter-gatherer languages of the world, and we therefore assumed that all languages that are in PHOIBLE but not in the list are spoken by food-producing societies. We refer to this list as GMR. The AUTOTYP list has positive specifications for both subsistence types and is therefore more reliable, whereas the GMR sample possibly underestimates hunter-gatherer societies. However, because GMR is about five times the size of AUTOTYP (N = 2030 versus N = 406), we performed all analyses on each list separately, assessing convergence of the evidence.

We included the language family from Glottolog (99) and different regions (North America, South America, Mesoamerica, Africa, Papua New Guinea, western and southwestern Eurasia, south and southeast Asia, Pacific, Australia, and north central Asia) from AUTOTYP (47) as random intercepts and random slopes. Language family was preprocessed in two ways to improve convergence and interpretability: For the intercept, all language families represented by a single language were aggregated into a common dummy category, and for the slope we performed the same procedure for all families that have uniform subsistence. Because of this coding decision, the levels of the random intercept and slope of language family did not always match. As a result, we did not also model the correlation between the random intercept and the random slope.

We carried out the statistical evaluation in a Bayesian regression framework (100), using weakly informative normal priors for the fixed (population-level) effects (mean = 0, SD = 5), a half–Student t distribution with three degrees of freedom and ad hoc large variance for the prior on the variance of the random (group-level) effects, and a flat distribution on the correlation matrix of the random effects for region. We ran four chains with 4500 iterations each (out of which 1000 were for warm-up). In all cases, all of the parameters showed that convergence had been achieved as measured by the Gelman-Rubin statistic (101). The posterior distributions for the target parameters for all three conditions (all data, all data minus Australia, all data minus Australia and nonlabiodental fricatives as control instead of nonlabiodental segments) in the two datasets can be seen in figs. S4 and S5.

The posterior predictive distributions of the models adequately approximated the empirical distributions of labiodentals (see fig. S6), although they seemed to be underdispersed in contrast to the Poisson model. To evaluate whether this potential difference in the distribution of the response variable could affect the overall conclusions, we compared the Poisson model with a Conway-Maxwell-Poisson (CMP) model [which allows for an extra parameter k controlling for the excess (k > 1) or deficit (k < 1) of variance in relation to the basic Poisson model, which is effectively k = 1]. We compared these models in a frequentist framework because an efficient package is available for this purpose (102). Keeping in mind the limitations of model selection in a M-open context (49), the Akaike information criterion showed that the best fit of GMR data is CMP (followed by the Poisson model with ΔAIC = 95.5), whereas in the AUTOTYP data the basic Poisson model comes on top (followed closely by CMP with ΔAIC = 1.9). As expected, the CMP model revealed underdispersion (AUTOTYP: k = 0.92, GMR: k = 0.53). However, in both datasets, the role of subsistence remained negative, significant at α = 0.05. Furthermore, comparing the fitted values of the CMP and the Bayesian Poisson model yielded a consistent picture (see fig. S7).

The analyses were performed with the R statistical software version 3.4.3 (103). Code and data are available at

Phylogenetic study

The supplementary text lists all correspondence sets used here, with cognate words that support each correspondence. See Fig. 7 for how these map to a phylogeny of Indo-European (104) and fig. S8 for an alternative phylogeny (92). Reconstructing the actual sound at the origin of any such set is nontrivial. Traditional attempts are denoted by an asterisk (e.g., *gw for the {v, k, ...} set), and they rely on considerations of maximum parsimony, coupled with a tacit assumption that the proto-articulations closely reflect what is reported in the earliest descriptions of languages such as Latin or Ancient Greek. However, maximum-parsimony methods are ill-suited because sound changes can be completely reversed within relatively short time spans. For example, the Germanic nasal “m” became “f” before “n” in early Old Swedish nafn (“name”) but reverted to namn soon after (105). Moreover, even the earliest attestations are approximately only half as old as Proto-Indo-European (92, 104), leaving substantial uncertainty in the early history.

We therefore took a fresh approach and modeled sound change as continuous-time Markov chains (CTMC), allowing for reversals and different transition rates between states. We fit CTMC models using the Markov chain Monte Carlo sampling (MCMC) implemented in BayesTraits version 2 (91). We then took the best-fitting models to estimate ancestral values and to reconstruct values for each time interval of Indo-European. For this we applied stochastic character mapping (8890), a method that has proven valid for linguistic reconstruction elsewhere (106). We report details of parameter choices, rate estimates, and model fits in the supplementary text. An R (103) script performing all analysis, as well as the input data, are available at

Supplementary Materials

Supplementary Text

Figs. S1 to S33

References (107143)

References and Notes

Acknowledgments: We thank A. Margvelashvili for helpful advice on the paleoanthropological literature, Y. Kaifu for the Jomon skull image, D. Frayer for the Upper Paleolithic skull image, A. D. Soficaru and M. Constantinescu for the Mesolithic skull image, and K. Wiltschke-Schrotta for the Bronze Age skull image. S.R.M. thanks I. Stavness, S. Fels, J. Lloyd, P. Anderson, A. Sanchez, and B. Gick, among many others, for the use of ArtiSynth. We also thank C. Anderson, C. Bowern, C. Cathcart, R. Corruccini, N. von Cramon-Taubadel, T. Güldemann, P. Heggarty, J. Mansfield, D. McCloy, H. Nakagawa, E. Round, S. Wichmann, R. Wright, and C. Zollikofer for helpful comments and suggestions. The views expressed in this article are those of the authors and do not necessarily reflect the views of the acknowledged, funding agencies, or the authors’ institutions. Funding: Supported by NWO VIDI grant 276-70-022, an EURIAS fellowship 2017–2018, and an IDEXLyon Fellowship 2018–2021 (D.D.) and by a subsidy of the Russian Government to support the Programme of Competitive Development of Kazan Federal University (D.E.B.). Author contributions: D.E.B., S.M., and B.B. conceived the research; D.E.B., D.D., and P.W. surveyed the paleoanthropological literature; S.R.M. conducted the biomechanical simulations; S.M. collected the phonological inventory data, processed it for statistical analysis, and performed the qualitative analysis with D.E.B.; D.E.B. performed the statistical analyses of worldwide data; B.B. and P.W. undertook the phylogenetic analysis; and all authors discussed the results and contributed to the final version of the paper. Competing interests: The authors declare no competing interests. Data and materials availability: All data are available in the main text or the supplementary materials.

Stay Connected to Science

Navigate This Article