Computational Constraints on Syntactic Processing in a Nonhuman Primate

See allHide authors and affiliations

Science  16 Jan 2004:
Vol. 303, Issue 5656, pp. 377-380
DOI: 10.1126/science.1089401


The capacity to generate a limitless range of meaningful expressions from a finite set of elements differentiates human language from other animal communication systems. Rule systems capable of generating an infinite set of outputs (“grammars”) vary in generative power. The weakest possess only local organizational principles, with regularities limited to neighboring units. We used a familiarization/discrimination paradigm to demonstrate that monkeys can spontaneously master such grammars. However, human language entails more sophisticated grammars, incorporating hierarchical structure. Monkeys tested with the same methods, syllables, and sequence lengths were unable to master a grammar at this higher, “phrase structure grammar” level.

Syntax is one key component of human language, with no known equivalent in animal communication systems. The limitless expressive power of human language requires structures, termed phrases or sentences, above the word level (or, by analogy, above the single call level in animals). Linguistic syntax involves the rearrangement and permutation of such abstract hierarchical structures, often with concomitant changes in meaning. The production and perception of these hierarchical syntactic structures is a core capability underlying human linguistic competence. This level of organization goes far beyond the simple concatenation procedures sometimes called “syntax” in animal communication (13). However, the evolution of the language faculty presumably involved the incorporation of some ancestral primate cognitive capabilities. Thus, a critical question is whether hierarchical processing was one of these preexisting abilities, perhaps evolved to serve noncommunicative functions (e.g., motor control, number, or social cognition) (412).

Rule systems capable of generating infinite sets of sequences (“grammars”) are arranged in a mathematical hierarchy of increasing generative power, termed the Chomsky hierarchy (13, 14). The weakest class in this hierarchy are finite state grammars (FSGs), which can be fully specified by transition probabilities between a finite number of “states” (e.g., corresponding to words or calls). Recent evidence suggests that parsing procedures at this superficial level of complexity are spontaneously available to both human infants and nonhuman primates (3, 1519). However, FSGs are inadequate to generate all the structures of any human language (13, 20), because all languages minimally require procedures at the next level of complexity, termed phrase structure grammars (or PSGs, see 21). In addition to concatenating items like an FSG, a PSG can embed strings within other strings, thus creating complex hierarchical structures (“phrase structures”), and long-distance dependencies. For example, in English, the word “if” is typically followed by the word “then,” but any arbitrary number of words or phrases can be inserted between them. Such constructions (and many others) demand more sophisticated parsing capabilities, including a perceptual ability to recognize these structures and an open-ended memory to store them. There is a broad consensus in linguistics and machine learning that PSGs are more powerful than FSGs and that grammars above the FSG level are, minimally, a crucial component of all human languages (14, 22, 23). Though such abilities are available to all normal humans, it is currently unknown whether parsing abilities above the FSG level are available to nonhuman animals. We used a familiarization/discrimination procedure to address this issue in cotton-top tamarins (Saguinus oedipus), a New World primate species that has previously demonstrated successful discrimination of linguistic stimuli according to rhythmic class, along with a capacity to grasp transitional probabilities and abstract rules implicit in speech stimuli (17, 18, 24).

The infinite nature of grammars renders empirical tests of their comprehension problematic (20, 25). Because limited output from a PSG can always be approximated by a more complicated FSG (at the limit, a memorized list of exemplars), it is difficult to prove conclusively that subjects have learned the former. This is equally true for human or animal subjects. However, failure to master a grammar (as demonstrated by a failure to distinguish grammatical from ungrammatical strings) can be empirically confirmed. Of course, such a failure could occur for myriad reasons, and it is thus imperative to demonstrate success on a similar task, matched in all extraneous respects, before concluding that particular computational constraints are at work. Thus, based on Chomsky's original discussion (13, 14) we created two grammars, which were used to generate meaningless auditory strings consisting of sampled consonant-vowel (CV) speech syllables. Previous research demonstrates that such syllabic speech streams are readily attended to and processed by cotton-top tamarins without training (17, 24). The two grammars were designed to equate extraneous nongrammatical variables and, thus, to differ specifically in their capacity to generate hierarchical phrase structure.

Each grammar created structures out of two classes of sounds, A and B, each of which was represented by eight different CV syllables (26) (Audio 1 to 8). The A and B classes were perceptually clearly distinguishable to both monkeys and humans: different syllables were spoken by a female (A) and a male (B) and were differentiated by voice pitch (> 1 octave difference), phonetic identity, average formant frequencies, and various other aspects of the voice source. For any given string, the particular syllable from each class was chosen at random. Crucially, syllables for each class were sampled without replacement, because otherwise the possibility of exact acoustic repetitions in the PSG and not in the FSG would make the two grammars distinguishable on superficial grounds. The FSG was (AB)n, in which a random “A” syllable was always followed by a single random “B” syllable, and such pairs were repeated n times. The corresponding PSG, termed AnBn, generated strings with matched numbers of A and B syllables. In this grammar, n sequential “A” syllables must be followed by precisely n “B” syllables. We chose the AnBn grammar because it is the simplest PSG that cannot, in principle, be approximated with an FSG but that can easily be brought into correspondence with a simple FSG in all nongrammatical respects, as required for our experiment. Further, this grammar is trivially easy for humans to learn. The AnBn grammar produces center-embedded constructions that, although less common in human language than other (e.g., right-branching) structures, are ubiquitous in mathematics (e.g., nested parentheses in formulas) or computer programming languages (e.g., BEGIN-END statements). Like any PSG, the AnBn grammar requires additional computational machinery beyond a finite-state automaton. In computer science terminology, this addition would minimally be a push-down stack. In psychological terms, it requires some way to recognize a correspondence between either the groups formed by the As and Bs (e.g., counting) or between specific As and corresponding Bs (e.g., long-distance dependencies). This PSG thus provides the ideal grammar for the empirical issue addressed by this study by allowing us to focus on the generative power of the system without introducing extraneous performance variables (e.g., memory capacity or referentiality).

Although each of these grammars can theoretically generate infinite numbers of strings of infinite length, memory limitations will impose limits on subjects' practical ability to parse strings. Because previous work demonstrates that tamarins can readily remember and precisely discriminate among strings up to three syllables in length (27), we restricted n to be two or three in both of the above grammars. Sixty-four random strings were generated by each grammar, with 60 used for exposure and 4 different strings for testing (26).

Our testing method has been previously described in detail (17). Briefly, the tamarin colony was pseudorandomly divided into two groups, one per grammar. Each group included a mixture of sexes and ages (all adult). All of the monkeys in a particular group were simultaneously exposed in their home cages to 20 min of repeated playback of 60 different grammar-consistent strings, in random order, during the evening. They were then tested individually the next morning in a sound chamber. Testing started with a re-familiarization phase, when random stimuli from the previous evening's session were again played back for 2 min while the animal was fed treats (at a rate determined by the animal's feeding, and uncorrelated with stimulus presentation). We then closed the sound chamber door, started video monitoring and recording, and began playback of the test stimuli. No food was delivered during testing. Playback was initiated by the observer when the animal was looking down and away from the loudspeaker, and latency and duration of looking (orientation towards the loudspeaker Fig. 1B) were later scored blind to condition from the digitized video (>90% reliability). Each animal (regardless of the grammar on which they were trained) was tested with the same eight stimuli in random order. Four were novel stimuli consistent with the training grammar, whereas the other four were violations (but consistent with the other grammar).

Fig. 1.

Stimuli and familiarization-discrimination paradigm. (A) Examples of the stimuli for the FSG and PSG used here. Grammars were matched for length, composition, loudness, and other acoustic features, and testing and evaluation procedures were identical for the two grammars. A and B stimulus classes were spoken by different speakers, a female (denoted by boldface) and male (normal font), and thusdiffered considerably in pitch, as well as phonetic identity and other acoustic variables. Possible A syllables were {ba di yo tu la mi no wu}. Possible B syllables were {pa li mo nu ka bi do gu}. (B) We quantified a cotton-top tamarin's propensity to orient toward a stimuli by evaluating responses to stimuli (“look” or “no look”) in videos offline and blind to stimulus identity. The stimuli were either consistent with or violated the rules implicit in a previous set of familiarization strings.

Tamarins easily mastered the FSG, as demonstrated by a significant increase in looking to stimuli that violated the rules of the grammar (N = 10 monkeys, mean of 72% looking to violations but 34% looking to grammatically consistent novel stimuli, Wilcoxon signed rank test, P < 0.007; Fig. 2). At an individual level, 9 of 10 monkeys looked more to violations than consistent stimuli. Thus, the simple alternating sequential pattern embodied in this grammar was spontaneously perceived and remembered, and novel stimuli following the familiar pattern elicited less attention than novel stimuli violating it. This success demonstrates that the acoustic cues differentiating the two syllable classes were salient to our tamarin subjects. More importantly, the ability to learn the rule governing the construction of an acoustic sequence, without any explicit training, indicates that tamarins are sensitive to regularities in an acoustic stream and can recognize novel strings as consistent with past inputs. This finding is consistent with previous research suggesting that monkeys are able, with or without training, to discover the rules governing sequential patterns in auditory and visual stimuli (17, 18, 28, 29).

Fig. 2.

Experimental evidence that monkeys can master FSGs but not PSGs. (Left) Humans exposed to a FSG with only local sequential structure (top) or a PSG with hierarchical structure (bottom) rejected violationsas“different” and accepted consistent stimuli as “same.” Asterisk, significant difference. (Right) Monkeys exposed to the same FSG (top) oriented significantly more often to violations and did not orient to novel strings consistent in structure with the familiar strings. However, when exposed to the PSG (bottom), monkeys failed to discriminate between consistent and inconsistent strings, looking at a similar (random baseline) level to both sets of stimuli. N.S., no significant difference.

In contrast, tamarins failed to master the PSG, displaying an equivalent rate of looking to both consistent and inconsistent strings (N = 10 monkeys, 29% looks to inconsistent and 31% looks to consistent stimuli; Fig. 2). No monkey looked at more than half of the violations. The failure to master the PSG cannot be due to extraneous factors such as stimulus length, loudness, or other acoustic factors; inability to perceive the A and B classes; or differences in exposure, testing, or evaluation procedures, all of which were consistent between the two grammars. All of the test subjects had equivalent experience in this testing situation, and successfully mastered many other tasks in this laboratory. The pattern of results is what one would expect if tamarins attempted to parse the PSG strings by building an FSG structure [based on simple transitional probabilities, an ability of tamarins documented both here and elsewhere (17, 19)]. Furthermore, in two other attempts to test tamarins on this PSG with the use of slight modifications of stimulus type and/or testing procedures, we have similarly found no ability to master this rule (30). Thus, it appears that cotton-top tamarins have difficulties in spontaneously learning a rule of this type, despite their demonstrated ability to master FSGs equivalent in every respect except for hierarchical structure.

An alternative explanation for these results might be that tamarins fail the PSG because their ability to differentiate successive items is limited to runs of two. If this were true, it would account for the asymmetric results we obtained because they would be able to encode AB AB AB patterns but be unable to process the longer runs of AAA BBB. However, a subanalysis gave the same pattern of results even when n was limited to two (ABAB versus AABB): tamarins clearly discriminated violations from consistent stimuli in the FSG grammar (Wilcoxon signed rank, P < 0.02) but failed to discriminate these in the PSG (Wilcoxon signed rank, P = 0.68). The data are thus inconsistent with this alternative hypothesis.

In sharp contrast to the monkeys, adult humans tested with these same grammars showed rapid learning of either grammar (with under 3 min of exposure), and were easily able to discriminate grammatical from nongrammatical stimuli for both grammars (Fig. 2). Undergraduate subjects were passively exposed to the same training stimuli as the tamarins, and then were tested on the same test stimuli (26). Subjects scored 93% correct on the FSG and 85% on the PSG, indicating that adult humans can easily distinguish between and master either grammar under the same experimental conditions in which the monkeys failed on the PSG. These data are consistent with other experimental findings that humans can learn a PSG and appear to prefer phrase-structured input (20, 31, 32) and with the widely-accepted theoretical claim that human languages demand acquisition of rule systems at the PSG level (13).

These results suggest that, despite a clear ability to process sequential regularities in acoustic strings, tamarins are unable to process a simple phrase structure, where components at one portion of a string are related to other components some distance away. Because earlier work with this species using the same paradigm demonstrates that these animals are perfectly capable of storing and recalling at least three separate stimuli and comparing them with subsequent strings, this computational limitation does not result from some lower level limitation on memory, attention, or number discrimination. Further work will be necessary using other methods (e.g., training and reinforcement), different grammars, and other species (e.g., apes) before any broad conclusions can be drawn about nonhuman primate limitations. It is also possible that nonprimates such as songbirds, which have some rule-based structure in their songs, would fare better at the task developed here. However, the current findings suggest that tamarins suffer from a specific and fundamental computational limitation on their ability to spontaneously recognize or remember hierarchically organized acoustic structures. Put differently, the limitation we have demonstrated might indicate an over-reliance on superficial aspects of stimuli, which prevents tamarins from perceiving more abstract relations available in the signal, as has been suggested by previous work on primate auditory perception (33). If nonhumans are “stuck” trying to interpret PSG-generated stimuli at the FSG level, it would make PSG stimuli seem much more complex to them and perhaps even unlearnable in finite time. Though the evolution of well-developed hierarchical processing abilities in humans might have benefited many aspects of cognition (e.g., spatial navigation, tool use, or social cognition), this capability is one of the crucial requirements for mastering any human language. Thus, the acquisition of hierarchical processing ability may have represented a critical juncture in the evolution of the human language faculty.

Supporting Online Material

Materials and Methods

Audios S1 to S8

References and Notes

View Abstract

Navigate This Article