Rule Learning by Seven-Month-Old Infants

See allHide authors and affiliations

Science  01 Jan 1999:
Vol. 283, Issue 5398, pp. 77-80
DOI: 10.1126/science.283.5398.77


A fundamental task of language acquisition is to extract abstract algebraic rules. Three experiments show that 7-month-old infants attend longer to sentences with unfamiliar structures than to sentences with familiar structures. The design of the artificial language task used in these experiments ensured that this discrimination could not be performed by counting, by a system that is sensitive only to transitional probabilities, or by a popular class of simple neural network models. Instead, these results suggest that infants can represent, extract, and generalize abstract algebraic rules.

What learning mechanisms are available to infants on the cusp of language learning? One learning mechanism that young infants can exploit is statistical in nature. For example, Saffran et al. (1) found that the looking behaviors of 8-month-old infants indicated a sensitivity to statistical information inherent in sequences of speech sounds produced in an artificial language—for example, transitional probabilities, which are estimates of how likely one item is to follow another. In the corpus of sentences “The boy loves apples. The boy loves oranges.” the transitional probability between the words “the” and “boy” is 1.0 but the transitional probability between the words “loves” and “apples” is 1/2 = 0.5.

It has been suggested that mechanisms that track statistical information, or connectionist models that rely on similar sorts of information [for example, the simple recurrent network (SRN) (2)], may suffice for language learning (3). The alternative possibility considered here is that children might possess at least two learning mechanisms, one for learning statistical information and another for learning “algebraic” rules (4)—open-ended abstract relationships for which we can substitute arbitrary items. For instance, we can substitute any value of x into the equation y = x + 2. Similarly, if we know that in English a sentence can be formed by concatenating any plural noun phrase with any verb phrase with plural agreement, then as soon as we discover that “the three blickets” is a well-formed plural noun phrase and that “reminded Sam of Tibetan art” is a well-formed verb phrase with plural agreement, we can infer that “The three blickets reminded Sam of Tibetan art.” is a well-formed sentence.

To date, however, there has been no direct empirical test for determining whether young infants can actually learn simplified versions of such algebraic rules. A number of previous experiments drawn from the literature of speech perception (not aimed at the question of rule learning) are consistent with the possibility that infants might learn algebraic rules, but each of these prior experiments could be accounted for by a system that extracted only statistical tendencies. For example, infants who are habituated to a series of two-syllable words attend longer when confronted with a three-syllable word (5). An infant who attended longer to a three-syllable word might have noticed a violation of a rule (for example, “all the words here are two syllables”), but an infant could also have succeeded with a statistical device that noted that the three-syllable word had more syllables than the average number of syllables in the preceding utterance. Similarly, Gomez and Gerken (6) found that infants who were habituated to a set of sentences constructed from an artificial grammar (VOT-PEL-JIC; PEL-TAM-PEL-JIC) could distinguish between new sentences that were consistent with this grammar (VOT-PEL-TAM-PEL-JIC) from new sentences that were not consistent (VOT-TAM-PEL-RUD-JIC). Such learning might reflect the acquisition of rules, but because all the test sentences were constructed with the same words as in the habituation sentences (albeit rearranged), in these test sentences it was possible to distinguish the test sentence on the basis of statistical information such as transitional probabilities (for example, in the training corpus, VOT was never followed by TAM)—without recourse to a rule.

We tested infants in three experiments in which simple statistical or counting mechanisms would not suffice to learn the rule that was generating the sequences of words. In each experiment, infants were habituated to three-word sentences constructed from an artificial language (7) and then tested on three-word sentences composed entirely of artificial words that did not appear in the habituation. The test sentences varied as to whether they were consistent or inconsistent with the grammar of the habituation sentences. Because none of the test words appeared in the habituation phase, infants could not distinguish the test sentences based on transitional probabilities, and because the test sentences were the same length and were generated by a computer, the infant could not distinguish them based on statistical properties such as number of syllables or prosody.

We tested infants with the familiarization preference procedure as adapted by Saffran et al. (1, 8, 9); if infants can abstract the underlying structure and generalize it to novel words, they should attend longer during presentation of the inconsistent items than during presentation of consistent items.

Subjects were 7-month-old infants, who were younger than those studied by Saffran et al. but still old enough to be able to distinguish words in a fluent stream of speech (8). In the first experiment, 16 infants were randomly assigned to either an “ABA” condition or an “ABB” condition. In the ABA condition, infants were familiarized with a 2-min speech sample (10) containing three repetitions of each of 16 three-word sentences that followed an ABA grammar, such as “ga ti ga” and “li na li.” In condition ABB, infants were familiarized with a comparable speech sample in which all training sentences followed an ABB grammar, such as “ga ti ti” and “li na na” (11).

In the test phase, we presented infants with 12 sentences that consisted entirely of new words, such as “wo fe wo” or “wo fe fe” (12). Half the test trials were “consistent sentences,” constructed from the same grammar as the one with which the infant was familiarized (an ABA test sentence for infants trained in the ABA condition and an ABB sentence for infants trained in the ABB condition), and half the test trials were “inconsistent sentences” that were constructed from the grammar on which the infant was not trained (13).

We found that 15 of 16 infants showed a preference for the inconsistent sentences (14), which was indicated by their looking longer at the flashing side light during presentations of those sentences (15) (Table 1).

Table 1

Mean time spent looking in the direction of the consistent and inconsistent stimuli in each condition for experiments 1, 2, and 3, and significance tests comparing the listening times. Mean ages of the infants tested were 6 months 27 days (median, 6 months 24 days) in experiment 1, 7 months 1 day (median, 7 months) in experiment 2, and 7 months (median, 7 months 2 days) in experiment 3.

View this table:

Although each of the test words in experiment 1 was new, the sequence of phonetic features in the test overlapped to some extent with the sequence of phonetic features in the habituation items. For example, in the ABA condition three habituation sentences contained a word starting with a voiced consonant followed by a word starting with an unvoiced consonant. Each of these three sequences ended with a word that contained a voiced consonant. An infant who was thus expecting the sequence voiced-unvoiced-voiced would be surprised by the inconsistent tests items (each of which was voiced-unvoiced-unvoiced) but not by the consistent items (each of which was voiced-unvoiced-voiced). To rule out the possibility that infants might rely on learning sequences of particular phonetic features rather than deriving a more abstract rule, we conducted a second experiment with the same grammars as in the first experiment but with a more carefully constructed set of words. In experiment 2, then, the set of phonetic features that distinguished the test words from each other did not distinguish the words that appeared in the habituation sentences (16). For example, the test words varied in the feature of voicing (for example, if the “A” word was +voiced, the “B” word was −voiced), whereas the habituation words did not vary on the feature of voicing (they were all +voiced). Thus, the habituation items provided no direct information about the relationship between voiced and unvoiced consonants; the same holds for each of the phonetic features that varied in the test items. As in experiment 1, 15 of 16 infants looked longer during the presentation of the inconsistent items than during the presentation of the consistent items (17) (Table 1).

Rather than encoding the entire ABA or ABB rule, the infants could have habituated to a single property that distinguishes these grammars. Strings from the ABB grammar contain immediately reduplicated elements (for example, “ti ti”), whereas strings from the ABA grammar do not. In a third experiment, we compared sentences constructed from the ABB grammar with sentences constructed from an AAB grammar (18,19); because reduplication was contained in both grammars, the infants could not distinguish these grammars solely on the basis of information about reduplication (20). As in the first two experiments, infants (this time, 16 of 16) looked longer during presentation of the inconsistent items than during presentation of the consistent items (21) (Table 1).

Our results do not call into question the existence of statistical learning mechanisms but show that such mechanisms do not exhaust the child's repertoire of learning mechanisms. A system that was sensitive only to transitional probabilities between words could not account for any of these results, because all the words in the test sentences are novel and, hence, their transitional probabilities (with respect to the familiarization corpus) are all zero. Similarly, a system that noted discrepancies with stored sequences of words could not account for the results in any of the three experiments, because both the consistent items and the inconsistent items differ from any stored sequences of words. A system that noted discrepancies with stored sequences of phonetic features could account for the results in experiment 1 but not those in experiments 2 and 3. A system that could count the number of reduplicated elements and notice sentences that differ in the number of reduplicated elements could account for the results in experiments 1 and 2, but it could not account for infants' performance in experiment 3.

Likewise, we found in a series of simulations that the SRN is unable to distinguish the inconsistent and consistent sentences, because the network, which represents knowledge in terms of a set of connection weights, learns by altering network connection weights for each word independently (22). As a result, there is no generalization to novel words. Such networks can simulate knowledge of grammatical rules only by being trained on all items to which they apply; consequently, such mechanisms cannot account for how humans generalize rules to new items that do not overlap with the items that appeared in training (23, 24).

We propose that a system that could account for our results is one in which infants extract abstract algebra-like rules that represent relationships between placeholders (variables), such as “the first item X is the same as the third item Y,” or more generally, that “item I is the same as item J.” In addition to having the capacity to represent such rules, our results appear to show that infants have the ability to extract those rules rapidly from small amounts of input and to generalize those rules to novel instances. If our position is correct, then infants possess at least two distinct tools for learning about the world and attacking the problem of learning language: one device that tracks statistical relationships such as transitional probabilities and another that manipulates variables, allowing children to learn rules. Even taken together, these tools are unlikely to be sufficient for learning language, but both may be necessary prerequisites.

  • * To whom correspondence should be addressed. E-mail: gary.marcus{at}


View Abstract

Navigate This Article