PerspectiveNeuroscience

Does Grammar Start Where Statistics Stop?

See allHide authors and affiliations

Science  18 Oct 2002:
Vol. 298, Issue 5593, pp. 553-554
DOI: 10.1126/science.1078094

Languages exhibit statistical structure—that is, they show inhomogeneities in the distribution of sounds, words, and phrases. The importance of this type of structure in learning a language is a matter of intense debate, and is tackled by Peña et al. (1) on page 604 of this issue. The debate originated with Chomsky's 1957 discussion of the sentence “Colorless green ideas sleep furiously” (2). This sentence can be immediately recognized as being well formed (compared to “Ideas colorless sleep furiously green”) even though in both cases the probability that these words have previously occurred in this order is close to zero. Chomsky concluded that the statistical properties of language are not central to the characterization of linguistic knowledge, an insight that became part of the foundation of modern linguistic theory. Whether statistical properties are important in language acquisition was largely set aside. Instead, research focused on how the child converges on the rules and other components of grammar using a combination of deductive (nonstatistical) reasoning and innate knowledge (3).

Recently, there has been a resurgence of interest in statistical learning, with evidence showing that infants and young children incorporate statistical cues when learning about the sounds of a language, vocabulary, and the structures in which words occur (46). These findings complement evidence from adults demonstrating the use of statistical information in comprehending and producing utterances, suggesting that similar mechanisms may underlie the learning and use of language (7, 8).

Although this research establishes that statistical information is used in language acquisition, the extent to which acquisition can be explained in these terms is not yet known. Peña et al. (1) suggest one possibility: Perhaps there are both statistical processes (based on frequency and distribution of elements in language) and grammatical processes (for example, learning and using rules). Statistical learning may be limited to simpler problems such as learning the sounds of a language and building a lexicon. In contrast, the complexities of grammar may require other nonstatistical procedures. Thus, it seems that learning grammar begins where statistical learning ends.

This reconcilist view is appealing because it preserves the main tenets of the grammar approach while apparently accommodating evidence about statistical learning. In practice, however, it turns out to be difficult to establish a boundary between “grammatical” and “statistical” learning. Any corpus of linguistic stimuli contains a vast array of cues and potential generalizations. Even in carefully designed experiments, conditions intended to isolate grammatical processes may introduce correlated statistical cues that would support performance. For example, in the Peña et al. study, adults listened to a continuous stream of nonsense words (see the table). According to the authors, the subjects could extract statistical regularities from the speech stream, but they could formulate rules only when brief pauses were added at word boundaries. Although the language supplied to subjects by Peña et al. consisted of only nine words, the corpus derived by concatenating these words afforded a large number of generalizations about the syllable sequences (some of which are shown in the table). Peña et al.'s conclusions about grammatical learning concentrated on some properties of the syllable sequences, but other properties could also have cued subjects' responses.

Two previous attempts to isolate a distinct grammatical form of learning (9, 10) raised similar concerns. In each case subsequent analyses suggested that the behavior could instead have arisen from statistical regularities that occurred simultaneously with the grammatical patterns (1113). Importantly, these additional findings (like the analysis presented in the table) do not show that grammatical learning does not exist, but rather that statistical learning could also account for the results. Such findings also illustrate the difficulty of working back from observed behavior to the underlying regularities that gave rise to it (14). Knowing how many distinct procedures are involved in learning a language is a critical issue; resolving it will require advances on both the “statistical” and “grammatical” fronts.

Discussions of statistical learning need to consider two questions illustrated by the “colorless” sentence. First, what kinds of statistics are people, particularly infants, capable of computing? As in the “colorless” example, most research has investigated the transition probabilities between words or syllables. The Peña et al. study is a welcome step forward insofar as it addresses questions concerning nonadjacent elements (15). Adult learners can track various types of statistics, including some second-order probabilities and long-distance dependencies (14, 16), but the limits on these capacities and whether infants have similar capacities have not been determined. It is also unclear how well such learning mechanisms fit the demands posed by human languages. However, recent results suggest that statistical patterns that occur in natural languages are acquired more readily than patterns not found in natural languages (1618). Thus, constraints on learning may play a positive role in explaining why language learners acquire only some of the many generalizations afforded by natural language.

(A) A section of the speech stream from the Peña et al. study (1). The three word families are illustrated in different colors. (B) Ten of the generalizations available from the input in panel A, with their probabilities of occurrence in the familiarization stream. (C) The forced choice alternatives in Peña et al.'s experiment 1. The word alternative is favored over the part-word (a sequence that spans a word boundary) by Peña et al.'s AXC rule (syllable A, followed by syllable C, with an intervening syllable X) as well as by more of the generalizations from panel B. (D) The forced choice alternatives in experiment 2. The rule-word (a new word generated by the AXC rule) and part-word alternatives were chosen equally often. In subsequent studies, introduction of brief pauses between words in the familiarization phase increased choice of rule-words over part-words, because, according to Peña et al., the pauses switched subjects into a rule-learning mode. The pauses also simplify the word-segmentation task, increase the salience of properties 2–4, and make the part-words less like the familiarization words.

The second question asks: Over what types of information are statistics learned? Chomsky's example assumes that people are computing statistics at a single level of linguistic structure—between words in the “colorless” example, and between syllables in the Peña et al. study. But language exhibits structure at multiple levels, each of which has its own statistical character. The “colorless” sentence is less puzzling when one looks beyond transition probabilities to other information that is used in comprehension. For example, words fall into general typesgreen is a property or adjective, and sleep is an action or verb—that exhibit characteristic distributions. The “colorless” sentence conforms to these distributions in English, whereas “Ideas colorless sleep furiously green” does not (19).

Our understanding of the contribution of statistical learning is limited by incomplete knowledge of the kinds of statistics infants encode and whether these are the ones relevant to natural language. This view also leaves open the critical question of why only humans acquire language, as many other species are capable of simple forms of statistical learning. Again, there are parallel statistical and grammatical interpretations to explain this fact. For example, the statistics of natural language, which involve correlations over multiple types of information simultaneously, may be too complex for other species to learn; alternatively, other species may lack innate grammatical capacities that make language learning possible.

Studies centered on understanding rule learning raise two major unresolved issues. First, the distinction between a rule and a statistical generalization remains unclear. Many of the regularities summarized in the table could be called either rules or statistics. Second, how do infants actually find rules in the speech they hear? Is there a procedure that would yield the right rules under realistic learning conditions? The evidence for rule learning is mostly negative: cases where learning occurs but there is no obvious statistical explanation. A theory explaining how rule learners arrive at exactly the correct generalizations given the complexities of their experience would represent substantial progress.

The cascade of potential learning cues and generalizations implicit in the miniature language studied by Peña et al. underscores the difficulties in determining how learners acquire the vastly richer structure of natural languages. To some extent this problem may be solved by grammar-specific forms of knowledge or learning. Statistical learning offers another potential explanation insofar as languages may exhibit only those structures that learners are able to track. Thus, the structure of language may have resulted in part from constraints imposed by the limits of human learning.

References and Notes

Navigate This Article