On the Origin of Internal Structure of Word Forms

See allHide authors and affiliations

Science  21 Apr 2000:
Vol. 288, Issue 5465, pp. 527-531
DOI: 10.1126/science.288.5465.527


This study shows that a corpus of proto-word forms shares four sequential sound patterns with words of modern languages and the first words of infants. Three of the patterns involve intrasyllabic consonant-vowel (CV) co-occurrence: labial (lip) consonants with central vowels, coronal (tongue front) consonants with front vowels, and dorsal (tongue back) consonants with back vowels. The fourth pattern is an intersyllabic preference for initiating words with a labial consonant–vowel–coronal consonant sequence (LC). The CV effects may be primarily biomechanically motivated. The LC effect may be self-organizational, with multivariate causality. The findings support the hypothesis that these four patterns were basic to the origin of words.

The most basic unit of language is the word—the minimal stand-alone pairing of meaning and sound structure. But what is the nature of this pairing? Apart from those few words that are indubitably onomatopoetic, linguists consider the pairing to be primarily “arbitrary” (1)—that is, they believe that a word's conceptual structure does not impose a particular sound structure on its spoken form across languages. But if the conceptual structure, or meaning, of a word does not determine its sound pattern, what does? Oddly, scant attention has been paid to how the spoken forms of words originate. Are there determining factors inherent in the very production of sound structures of words, beyond their well-known tendency to alternate between consonants and vowels, thus forming syllables (e.g., “to-ma-to”)? We have addressed this question by first looking at speech-related behavior at its simplest: in infants' babbling and in their first words.

We conducted statistical studies of the babbling of six infants (2, 3) and the first words of 10 infants (4–7) in an English-speaking environment. Four potentially universal organizational patterns emerged. Three of them were intrasyllabic (CV) co-occurrence patterns: labial (lip) consonants with central vowels, coronal (tongue front) consonants with front vowels, and dorsal (tongue back) consonants with back vowels (Fig. 1). Table 1shows the mean observed-to-expected ratios for the occurrence of these patterns. Three additional studies using our specific methodology revealed the same effects in groups of five French, Swedish, and Japanese infants (8), seven infants in an Ecuadorian-Quichua environment (9), and one of two infants in a Brazilian-Portuguese environment (10–15). If there is indeed a tendency for babbling—and, to a lesser extent, first words—to be similar across cultures (16), these patterns may be virtually universal in infants.

Figure 1

A schematic view of the articulatory component of the speech apparatus, in which the three arrows symbolize the three intrasyllabic CV co-occurrence patterns. The labial consonants involve lip closure and consist (in English) of the stop consonants that occur at the beginning of the words “pat” and “bat” and the nasal consonant at the beginning of “mat.” The coronal consonants involve closure in the anterior part of the mouth (tongue against the hard palate) and consist of the stop consonants at the beginning of the words “tail” and “dale” and the nasal consonant at the beginning of “nail.” The dorsal consonants involve closure in the region of the soft palate and consist of the stop consonants at the beginning of the words “coat” and “goat.” In studies of infants, consonants are restricted to stop consonants and nasals because they occur most frequently in babbling and early speech. The terms “front,” “central,” and “back” for vowels are conventional terms referring to the position of the tongue in the horizontal plane. Examples of the three types of CV sequences are underlined in the three words shown next to the arrows. (The first vowel in the example “dada” is the vowel in “dad.”) Pronunciations of these three words by an American adult, and babbling episodes containing the three CV sequences shown, can be heard at Science Online (

Table 1

CV co-occurrence patterns in the babbling of six infants (3), the first words of 10 infants (5), and the words of 10 languages (26). The expected frequency used for the computation of observed-to-expected ratios was the number of instances of the particular form expected on the basis of the relative frequencies of the consonant and vowel concerned, in the entire corpus. For example, if the consonant of interest constituted 0.2 of all consonants and the vowel constituted 0.4, the expected frequency of the CV co-occurrence pattern would be 0.2 × 0.4 = 0.08. Babbling data are mean ratios for the six infants, based on a total of 12,471 CV sequences obtained from any position in babbling utterances in which they occurred (e.g., “babababa” would contain four instances). All 18 instances of the three CV patterns of interest were above chance levels of significance, whereas only 9 of 36 other instances were above chance levels [χ2(N= 54, df = 1) = 27.0, P < 0.0001]. First-word data are mean ratios for 10 infants, based on a total of 5635 CV sequences obtained from any position in a word in which they occurred. Of 30 instances of the three CV patterns of interest, 27 were above chance levels, whereas only 15 of 60 other instances were above chance levels [χ2(N= 90, df = 1) = 33.94, P < 0.0001]. Language data are mean ratios for 10 languages, based on a total of 12,360 CV sequences occurring as the first two sounds in dictionary words that began with a CVC sequence. The languages were English, Estonian, French, German, Hebrew, Japanese, New Zealand Maori, Quichua, Spanish, and Swahili. Of 30 instances of the patterns of interest, 22 were above chance levels, whereas only 16 of 60 other instances were above chance levels [χ2(N= 90, df = 1) = 17.72, P < 0.0001]. Except for the three categories of interest, no single category was consistently above chance levels in the three corpora.

View this table:

The fourth pattern was intersyllabic. It is not present in babbling but emerges in the first words. Seven reports from five language communities reveal a tendency to begin a word with a labial stop consonant, then, after the vowel, to produce a coronal stop consonant (an LVC sequence, henceforth abbreviated LC) (17). This so-called “fronting” (18, 19) tendency is so strong in some infants that they even produce it when the word they are attempting has the opposite (CL) sequence, as in “pot” for “top” (20). In our study of 10 infants in an English-language environment (3), nine of them showed this pattern; the 10th showed no preference (21). The mean ratio of the number of LC sequences to the number of CL sequences was 2.55.

Why do these particular patterns occur? According to the frame/content theory of the evolution of speech, described elsewhere (22), what lies behind the CV sequence in all three patterns of consonant-vowel co-occurrence is what lies behind the closed-open alternation of the mouth in all speech (23). It is a basic movement, or “frame,” provided by biphasic (elevation for consonants, depression for vowels) cycles of mandibular (jaw) oscillation. In the labial-central CV co-occurrences, the frame may be the sole cause of the CV pairing, hence the term “pure frames” (22). In these cases, a closing phase of mandibular oscillation (acting alone) could be producing lip closure, whereas an opening phase of oscillation (also acting alone) could produce central vowels, as the tongue is in its resting position in the center of the mouth. This simple form may have been the most basic protosyllable type. The same frame may provide the underlying consonant-vowel alternation in the two CV patterns that involve the tongue in making both the consonant and the vowel—the coronal-front and dorsal-back patterns. But in addition, for these pairings, the tongue simply adopts a relatively static nonresting position in the front-back axis—a position common to the consonant and the vowel.

According to frame/content theory, the use of the frame may have been the first stage in the evolution of speech. Then, in a subsequent “content” stage, the modern capacity to program successive frames with different consonants and vowels—an activity often involving considerable consonant-to-vowel tongue movement—could have evolved. The LC pattern is the first systematic move toward intersyllabic frame differentiation in infants. In babbling, infants tend to simply repeat the same syllable (e.g., “bababa”)—a case of frame reiteration. But according to the well-accepted “obligatory contour principle” of phonological theory (24), languages tend to favor a discontinuous intersyllabic pattern—one that requires speakers to produce a different consonant and/or vowel in successive syllables. The production of the LC sequence in infants is a momentous event because it is the first systematic step in moving from relatively obligatory repetition of the same CV cycle to relatively obligatory nonrepetition.

The LC sequence effect is different from the CV co-occurrence effects in one important respect. Linguists would describe the CV patterns as “continuous” because they involve relations between adjacent sounds. Such patterns can involve a single biomechanical effect operating across neighboring sounds, such as those in the two lingual CV patterns, coronal-front and dorsal-back. But the LC pattern is discontinuous because its two components are temporally separated from each other by the intervening vowel. Thus, unlike the CV co-occurrence effects, the LC effect cannot have any single biomechanical cause.

How, then, can it be explained? One possible explanation begins with the proposition that it is easier to make a labial consonant than a coronal consonant. As discussed earlier, the labial consonant may result from the most basic movement in speech, the mandibular frame (23), acting alone, whereas an additional movement—of the tongue—is needed to reach the tongue-front position for a coronal consonant.

Two other facts also suggest that labial consonants are easier for infants to make than coronals. First, studies in several language environments have shown that when infants enter the first-word stage, the frequency of labial consonants increases while that of coronals decreases (25), even though languages tend to have more coronals than labials. We interpret this as a regression to easier production forms when an infant begins the complex task of interfacing the hitherto autonomous output system with a new cognitive structure, the mental lexicon (26). Second, infants whose babbling and early attempts at speech have been prevented by a tracheostomy strongly prefer labial consonants in their first post-tracheostomy vocalizations, even when they have had a normal history of listening to speech (27).

Some findings, we concede, could be taken as supporting the contrary view—that coronal consonants are easier to make than labials. Coronals certainly occur more frequently in babbling than labials (16) and are known to be generally more frequent in languages. But the fact that hearing-impaired infants produce few coronals (16) suggests that the high frequency of coronals in the babbling of hearing infants stems, at least in part, from their being heard so often in the ambient language. In addition, they may be more frequent in the typical language because the tongue tip becomes the most versatile component of the speech production mechanism in adults, even though it is unlikely that it is used independently of the tongue body in babbling or in early speech.

Why might it be advantageous to start with an easy action rather than to end with one? The existence of functionally separable subsystems for initiation versus continuation of movements is well known in motor system neurophysiology and clinical neurology (28). A separable initiation component presumably evolves because of problems unique to voluntary (nonreflexive) initiation of movement. The complexity of the process of initiation of voluntary movement in humans is suggested by the existence of theBereitschaftspotential, a frontal-lobe negativity beginning about 800 ms before movement onset. This electrical pattern is considered to be a reflection of the brain activity “necessary to provide the spatiotemporal functions and programs for self-generated activity (in contrast to stimulus-dependent movements)” (29).

Bringing these various threads together, we hypothesize that the LC effect reflects infants' tendency to start a word in an easy way and then add a tongue movement. The tendency may be self-organizational (13, 30, 31) in that it is an emergent consequence of the problem space in which infants find themselves. This problem space involves four kinds of variables: (i) biomechanical factors related to the frame and constraints on changing tongue position, (ii) movement control factors related to initiation of action, (iii) cognitive factors related to the mental lexicon, and (iv) the presence of a complex, culturally specific adult speech model to be assimilated. A prediction from this hypothesis is that, when compared with hearing infants, hearing-impaired infants—who, as mentioned, produce few coronals—will have an unusually high ratio of words with an LC pattern to words with a CL pattern.

Although infant speech patterns are certainly simpler than the patterns of adults correctly speaking their native language, it is important to ask whether the four patterns we have discussed in infants remain present in languages. If so, they may have fundamental importance with respect to the nature of speech—even, perhaps, its origin. Alternatively, they could simply reflect transient problems of the speech acquisition process that leave no traces in mature systems. So far, there has been little suspicion that patterns like these are consistently present in languages.

We have found that CV co-occurrence patterns remain surprisingly strong in languages. Our combined analysis (32) of the only two cross-language studies of this question that we are aware of (33, 34) showed evidence for the two CV patterns in which the tongue participates in both parts of the effect (coronal-front and dorsal-back), but not for the labial-central pattern, in a set of 10 languages (Finnish, Turkish, Latin, Latvian, Setswana, Hawaiian, Rotokas, Piraha, Kadazan, and Shipibo. In our subsequent analysis of dictionary counts of words of 10 more languages (Table 1) (25), we showed all three CV co-occurrence patterns. The labial-central and coronal-front patterns were found in seven languages; the dorsal-back pattern was found in eight.

In our earlier systematic cross-language study of the LC effect (21), we found that it is present in the sample of 10 languages in Table 1. Nine of the 10 languages showed the trend, eight of them at statistically significant levels. The mean ratio of LC to CL sequences was 2.23.

So how should we regard these previously unsuspected phenomena—the presence of the three CV co-occurrence patterns and the LC pattern in languages as well as in infant speech? In the case of the CV co-occurrences, the finding of the labial-central pattern, even in adults, provides additional support for the assumption that the mandibular cycle is fundamental to speech. And the finding of the coronal-front and dorsal-back effects suggests that a constraint against extreme tongue movements during frame production might also be quite fundamental. The LC pattern might have emerged early in the history of speech as a result of self-organization, just as it may emerge for this reason in infants. Because the LC pattern is easier to produce than the reverse (CL) form, instances of it may have occurred more often, making it more likely to be linked with a concept to form an early word.

An additional step in evaluating whether these patterns are indeed relevant to the origin of speech is to ask whether proto-words have them as well. [“Proto-words” are hypothetical words of earlier language(s) from which the sound structure of present-day words derived.] Are any of the four patterns we have seen in infants and languages also present in words that have direct implications for historical linguistics? Bengtson and Ruhlen (35) provide material that allows an approach to this question. They have presented global etymologies for a set of 27 cognates, that is, “similar words in different languages that are presumed to derive from a common source” (36, 37). They contend that the striking similarities between words denoting a particular basic concept across language families proves monogenesis—that is, a single origin for the world's languages (38). They also contend that often there is at most a minimal difference between members of a present-day word set and the proto-word from which the members descended.

Table 2 shows these etymologies. Table 3 shows CV co-occurrence patterns in this corpus, as well as the frequencies of the possible consonant-(vowel)-consonant sequences. Remarkably, all three CV co-occurrence patterns favored by infants and languages are strongly favored, even in this extremely small protolanguage corpus. And the LC sequence is much more frequent than the CL sequence as well.

Table 2

Bengtson and Ruhlen's 27 global etymologies (35). Numbers of language families represented in particular etymologies range from 7 to 24 (mean = 14). In the simplified notation used here, the letters P, B, M, T, D, N, K, and G are equivalent to their lowercase counterparts in Fig. 1. The vowel A is a central vowel roughly equivalent to the one in “box.” The other vowel symbols designate vowels in the following manner: I as in “beet,” E as in “bait,” U as in “boot,” and O as in “boat.” Sounds in parentheses designate optional forms that were not included in the present analysis. Some additional similarities and differences between this corpus and infant corpora are noteworthy because of their relevance to the possibility that early hominid speech might have been more like modern infant speech than like modern adult speech. One similarity is the relatively large number of stop consonants and nasals and the paucity of other consonants. Another is the favoring of the low vowel A. In contrast to these similarities, the lack of repetition of any consonant in successive syllables is a marked departure from babbling and early speech, in which consonant repetition is characteristic. In languages, intersyllabic consonant repetition, although relatively rare, does occur at about 67% of the frequency expected by chance (26). In addition, whereas dorsal consonants are relatively rare in babbling and first words, they are very frequent in the proto-language corpus. Lack of consonant repetition and a high frequency of dorsal consonants would not be expected in a first language if ontogenetic patterns are valid cues to first word structure.

View this table:
Table 3

Observed-to-expected ratios of CV combinations and consonant-(vowel)-consonant sequences in the 27 global etymologies. Because of the small size of the database, all consonants are used in the analyses, not just stop consonants and nasals as in infants and languages. The χ2 analysis of the overall distribution of CV co-occurrences is significant [χ2(N = 46,df = 4) = 9.63, P < 0.05]. The frequency distribution of LC and CL sequences is significant (binomial test,P < 0.05).

View this table:

If the finding of not only the three CV co-occurrence patterns but also the LC effect in infant, language, and proto-language corpora means that these patterns are indeed basic to the origin of speech, then the controversial method of “multilateral comparison,” pioneered by Greenberg (39–41) and used by Bengtson and Ruhlen to construct their proto-language corpus, gains validity. The finding that many individual words in their corpus not only exhibit one or another instance of these basic patterns, but also have similar meanings across many language families, supports the theory that there was, in fact, one original language—one Mother Tongue. Moreover, the presence of so many instances of these apparently basic patterns in the proto-language corpus challenges the prevailing view of Dixon and others that rapid and nonreversible diachronic change in language makes the form of any language that existed more than 5000 years ago totally unavailable for reconstruction (42, 43). At a methodological level, the statistical approach used to uncover the basic patterns reported here may prove a useful tool in the study of the history of languages. It could, for example, be used to evaluate the frequent claim of Goddard and others that sound correspondences of words of similar meanings across language families arise simply by chance (44).

Our findings here concur with the frame/content theory regarding the origin of the serial organization of speech. According to this theory, simple biomechanical properties of the vocal apparatus (e.g., the mandibular cycle and static tongue postures), plus their interaction with the contingencies of movement initiation and the culturally mediated cognitive demands of word formation, have played a key role in both the acquisition and evolution of speech. This self-organizational view is in sharp contrast to the currently orthodox view, based on Chomsky's notion of a universal grammar (45), according to which speech results from a specific genetic substrate for both speech sounds and their organizational patterns. In our view, the crossing of the Rubicon for true speech was not achieved by sudden genetic change. Instead, it was the result of a two-stage development. The first stage involved ancestral hominids borrowing simple available biomechanical properties of the system—the frame, together with static, nonresting tongue configurations—by means of classic Darwinian descent with modification, to give the three CV co-occurrence patterns. Then, an initial increase in intersyllabic serial complexity was achieved by means of the LC pattern, as a result of a self-organizational interaction of biomechanics, movement initiation constraints, and culturally mediated cognition (46–48).

  • * To whom correspondence should be addressed. E-mail: macneilage{at}


Stay Connected to Science

Navigate This Article