Language Discrimination by Human Newborns and by Cotton-Top Tamarin Monkeys

See allHide authors and affiliations

Science  14 Apr 2000:
Vol. 288, Issue 5464, pp. 349-351
DOI: 10.1126/science.288.5464.349


Humans, but no other animal, make meaningful use of spoken language. What is unclear, however, is whether this capacity depends on a unique constellation of perceptual and neurobiological mechanisms or whether a subset of such mechanisms is shared with other organisms. To explore this problem, parallel experiments were conducted on human newborns and cotton-top tamarin monkeys to assess their ability to discriminate unfamiliar languages. A habituation-dishabituation procedure was used to show that human newborns and tamarins can discriminate sentences from Dutch and Japanese but not if the sentences are played backward. Moreover, the cues for discrimination are not present in backward speech. This suggests that the human newborns' tuning to certain properties of speech relies on general processes of the primate auditory system.

A fundamental question in the study of language evolution and acquisition is the extent to which humans are innately endowed with specialized capacities to comprehend and produce speech. Theoretical arguments have been used to argue that language acquisition must be based on an innately specified language faculty (1, 2), but the precise nature and extent of this “language organ” is mainly an empirical matter, which notably requires studies of human newborns as well as nonhuman animals (3–5). With respect to studies of humans, we already know that newborns as young as 4 days old have the capacity to discriminate phonemes categorically (6) and perceive well-formed syllables as units (7–9); they are sensitive to the rhythm of speech, as shown in experiments in which newborns distinguish sentences from languages that have different rhythmic properties but not from languages that share the same rhythmic structure (10, 11); however, newborns do not discriminate languages when speech is played backward (10), and neurophysiological studies suggest that both infants and adults process natural speech differently from backward speech (12, 13). All these studies indicate that humans are born with capacities that facilitate language acquisition and that seem well attuned to the properties of speech. Studies of nonhuman animals, however, show that some of these capacities may predate our hominid origins. For example, insects, birds, nonprimate mammals, and primates process their own, species-typical sounds in a categorical manner, and some of these species perceive speech categorically (14–18).

Our aim here is to extend the comparative study of speech perception in three directions. First, using the same design and the same material, we have conducted joint experiments on human newborns and on monkeys. Second, whereas most studies of nonhuman animal speech perception involve extensive training before testing on a generalization task, our experimental approach—the habituation-dishabituation paradigm—involves no training and parallels the method used in studies of infant speech perception. Thus, conditions are met to appropriately compare the two populations. Third, most studies of speech processing in animals involve tests of phonemic perception. Here, we extend the analysis to sentence perception, thereby setting up a much broader range of perceptual problems.

Our experiments were run on human newborns and cotton-top tamarin monkeys (Saguinus oedipus oedipus). The stimuli consisted of 20 sentences in Japanese and 20 sentences in Dutch uttered by four female native speakers of each language. Conditions in which the two languages are pitted against one another were compared with conditions in which speakers of the same language are contrasted. In addition, sentences within a session were played either forward or backward. To more readily control for prosodic features of the signal, we reran all conditions with synthesized exemplars of the original sentences. Synthesized sentences were created with the MBROLA diphone synthesizer (19). Phoneme duration and fundamental frequency were preserved, whereas the phonetic inventory was narrowed to only one phoneme per manner of articulation: all fricatives were synthesized as /s/, vowels as /a/, liquids as /l/, plosives as /t/, nasals as /n/, and glides as /j/. Thus, each synthesized sentence preserved only the prosodic characteristics of its natural counterpart while eliminating lexical and phonetic information (20).

We tested newborns with the high-amplitude sucking procedure and a habituation/dishabituation design. Sentences were elicited by the newborns' sucking on a pacifier. In the language change condition, newborns were habituated to 10 sentences uttered by two speakers in one language and then switched to 10 sentences uttered by two different speakers in the other language. In the speaker change condition, newborns were habituated to 10 sentences uttered by two speakers from one language and then switched to two different speakers in the same language. A significant increase in sucking after the language change, compared with the speaker change, is taken as evidence that newborns perceive a significant difference between the two languages (21).

We tested 32 newborns (22) on the natural language-forward experiment: 16 in the language change condition and 16 in the speaker change condition. Figure 1A shows that the two groups did not differ significantly and thus that newborns failed to discriminate the two languages (F (1,29) < 1) (23). This result appears to conflict with previous experimental work showing that newborns discriminate English and Japanese. However, our experiment exposes newborns to great speaker variability (four voices) (24), and this factor has previously been shown to impair the discrimination abilities of infants (25). If speaker variability is responsible for the absence of discrimination, then we would predict successful discrimination with fewer speakers. To test for this possibility, we ran a second experiment using synthesized speech, thereby reducing the number of voices to one, that of the speech synthesizer (26).

Figure 1

Average number of high amplitude sucks per minute for babies in the control (speaker change, dotted lines) and experimental (speaker and language change, solid lines) groups. Minutes are numbered from the time of change. Error bars represent ±1 SEM. (A) Natural sentences played forward. (B) Same sentences synthesized. (C) Same sentences synthesized and played backward.

We tested 32 additional newborns (27) on the forward language and speaker discrimination using the synthesized versions of the original sentences. Figure 1B shows that newborns in the language change condition increased their sucking significantly more during the 2 min after the switch than newborns in the speaker change condition (F (1,29) = 6.3,P = 0.018). This indicates that, relying exclusively on prosodic cues, newborns discriminate sentences of Dutch from sentences of Japanese. Moreover, this result shows that the failure of newborns to discriminate in experiment 1A was probably due to speaker variability.

To determine the specificity of the newborns' capacity to discriminate languages, we tested 32 more newborns with the same synthesized sentences played backward (28). Figure 1C shows that newborns fail to discriminate languages played backward (F (1,29) < 1) (29). Moreover, the interaction between experiments 1B and 1C (forward vs. backwards) is marginally significant (F (1,59) = 3.6,P = 0.06). The finding that newborns discriminate two nonnative languages played forward but not backward suggests that the newborns' language discrimination capacity may depend on specific properties of speech that are eliminated when the signal is played backward. However, before drawing such a conclusion, it is important to directly assess the speech specificity of this capacity by testing it on another species.

We tested cotton-top tamarins (n = 13) with the same stimulus set as the newborns. Instead of sucking rate, however, we used a head orientation response toward the loudspeaker. During the habituation phase, a tamarin was presented with sentences uttered by two speakers in one language and then tested with a sentence uttered by a different speaker, either in the same language (speaker change condition) or in the other language (language change condition). Recovery of orientation toward the loudspeaker was interpreted as an indication that the tamarin perceived a difference between the habituation and test stimuli (30).

Experiment 2A involved natural sentences of Dutch and Japanese played either forward or backward (31). Figure 2A shows that 10 of 13 tamarins (P < 0.05; binomial test) dishabituated in the language change condition, whereas only 5 of 13 dishabituated to the speaker change (P = 0.87). The difference between language and speaker change is significant (P < 0.05; χ2 test). This result suggests that the tamarins discriminated Dutch from Japanese regardless of speaker variation. Surprisingly, such a pattern was not observed when the sentences were played backward: only 5 of 13 tamarins dishabituated to the backward language change (P = 0.87); this pattern is not significantly different from the speaker change condition (P > 0.2). These results parallel those obtained with newborns on the synthetic stimuli.

Figure 2

Number of tamarins responding positively (white bars) and negatively (hatched bars) to test sentence depending on condition: language or speaker change, sentences played forward or backward. (A) Natural sentences. (B) Synthesized sentences. (C) Data from experiments 2A and 2B pooled together. *P < 0.05. **P < 0.01.

In experiment 2B, we tested the same tamarins on both the speaker and the language conditions but with synthesized sentences. Figure 2B shows that 10 of 13 tamarins dishabituated to the forward language change (P < 0.05). Although the number of subjects dishabituating to the speaker change failed to reach statistical significance (P = 0.29), the increased numbers in this condition led to a nonsignificant difference between language and speaker change for the synthesized sentences (P > 0.3). For backward sentences, subjects failed to show a statistically significant level of dishabituation to either the language or the speaker change (P = 0.29 and P = 0.13). Experiment 2B suggests that the ability of tamarins to discriminate Dutch and Japanese is diminished when only prosodic cues are available.

When the data from experiments 2A and 2B are pooled (Fig. 2C), the overall result is clear: when sentences are played forward, tamarins significantly dishabituate to the language change (P = 0.005) but not to the speaker change (P = 0.58), and the difference between language and speaker change is significant (P < 0.05). When sentences are played backward, no such effect is observed. This overall result parallels that obtained with human newborns: both species discriminate sentences of Dutch and Japanese played forward but not backward.

The pattern of our results suggests striking similarities as well as differences between the monkey and the human auditory systems. First, we have shown that tamarins, like human newborns, are able to process not just isolated syllables but also whole strings of continuous speech and to extract enough information to discriminate between Dutch and Japanese. Second, their ability to do so above and beyond speaker variability suggests that they are able to extract auditory equivalence classes—that is, to extract abstract linguistic invariants despite highly variable acoustic shapes (17, 32). Third, the fact that, like newborns, tamarins fail to discriminate when speech is played backward suggests that their language discrimination capacity relies not on trivial low-level cues but rather on quite specific properties of speech. Because tamarins have not evolved to process speech, we infer in turn that at least some aspects of human speech perception may have built upon preexisting sensitivities of the primate auditory system. Finally, unlike newborns, tamarins fail to discriminate the language change more than the speaker change when speech is resynthesized. This leaves open the possibility that human newborns and tamarins may not be responding to exactly the same cues in the sentences: tamarins might be more sensitive to phonetic than to prosodic contrasts.

  • * To whom correspondence should be addressed. E-mail: f.ramus{at}

  • Present address: Institute of Cognitive Neuroscience, 17 Queen Square, London WC1N 3AR, UK.


Stay Connected to Science

Navigate This Article