Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons

See allHide authors and affiliations

Science  02 Aug 2002:
Vol. 297, Issue 5582, pp. 846-848
DOI: 10.1126/science.1070311


Many object-related actions can be recognized by their sound. We found neurons in monkey premotor cortex that discharge when the animal performs a specific action and when it hears the related sound. Most of the neurons also discharge when the monkey observes the same action. These audiovisual mirror neurons code actions independently of whether these actions are performed, heard, or seen. This discovery in the monkey homolog of Broca's area might shed light on the origin of language: audiovisual mirror neurons code abstract contents—the meaning of actions—and have the auditory access typical of human language to these contents.

In the monkey ventral premotor cortex (area F5, Fig. 1A), there are neurons that discharge both when the monkey performs a specific action and when it observes another individual performing a similar action (mirror neurons) (1–3). We investigated whether there are neurons in F5 that discharge when the monkey performs a specific hand action and also when it hears the corresponding action-related sounds.

Figure 1

(A) Lateral view of macaque brain with the location of area F5, shaded in gray. Major sulci: a, arcuate; c, central; ip, intraparietal; s, sylvian sulcus. (B) Two examples of neurons responding to the sound of actions. Rastergrams are shown together with spike density functions. Text above each rastergram describes the sound or action used to test the neuron. Vertical lines indicate the time when the sound occurred. Traces under the spike density functions in S and in CS conditions are oscillograms of the sounds used to test the neurons. Only 1 of the 10 different instances of the sounds is shown.

The experiments were carried out in three awake macaque monkeys (Macaca nemestrina) (4). In total, 497 neurons were recorded and their motor and visual properties were assessed (1–3). In an initial group of neurons (n = 211), we studied auditory properties by using sounds produced by the experimenter's actions and non–action-related sounds; in another group (n = 286), we used digitized action-related sounds (4).

Sixty-three neurons (13%) discharged both when the monkey performed a hand action and when it heard the action-related sound (5). An example of such a neuron is shown in Fig. 1B (neuron 1). This neuron responded to the vision and sound of a tearing action (paper ripping; V+S). The sound of the same action performed out of the monkey's sight was equally effective (S). Sounds that were non–action-related (white noise, monkey calls) did not evoke any excitatory response (control sounds: CS1, CS2). As a matter of fact, as often occurs in F5 neurons during strong arousal, a decrease in firing rate was observed.

Another example is presented in Fig. 1B (neuron 2). This neuron responded to the vision and sound of a hand dropping a stick (V+S). The response was also present when the monkey heard the sound of the stick hitting the floor (S). Non–action-related arousing sounds did not produce any consistent excitation (CS1, CS2).

The effect of non–action-related sounds was statistically assessed in 32 neurons, which were tested with a variety of arousing and emotional sounds such as loud noises and animal calls (4). Except for five neurons that were weakly activated in response to non–action-related sounds, none of the neurons responded significantly (P > 0.05) to these stimuli.

Although monkeys can perform a variety of hand actions that produce sound, breaking and tearing actions are by far the most frequent. Neuronal behavior reflected this fact. The sound of an object breaking and of paper ripping were the most effective stimuli for most tested neurons. Table 1 illustrates the types of effective action-related sounds and the number of neurons that best responded to them.

Table 1

Type of actions effective in acoustically triggering F5 neurons.

View this table:

Once having established that area F5 contains neurons that specifically respond to action-related auditory stimuli, we addressed the issue of their capacity to differentiate actions on the basis of auditory and visual characteristics (4). For this purpose, we studied a set of neurons (n = 33) with an experimental design in which two hand actions were randomly presented in vision-and-sound, sound-only, vision-only, and motor conditions (monkeys performing object-directed actions). Twenty-nine neurons showed auditory selectivity (Table 2). Of them, 22 were also visually selective for the same action (audiovisual mirror neurons). Three neurons were visually unselective, and four did not show a significant visual response. Two examples of neurons showing auditory selectivity are presented in Fig. 2A.

Figure 2

(A) Responses of two neurons selectively activated by action-related sounds. Responses to two different actions for each neuron are shown, with the best action in black. Vertical lines indicate the time when the sound occurred (V+S, S). In the vision-only condition (V), vertical lines indicate the time when the sound would have occurred with natural, unmodified stimuli (4). In the motor condition (M), vertical lines indicate the moment when the monkey touched the object. Traces under the spike-density functions in the sound-only conditions are oscillograms of the sounds played back to test the neurons. Only 1 of the 10 different instances of the sounds is shown. For neuron 3, the multivariate analysis of variance showed auditory, visual, and motor selectivity. For neuron 4, the same selectivity was found. For peanut breaking, the effectiveness of visual stimuli was demonstrated by the significantly (P < 0.05) larger response in the V+S condition with respect to V and S conditions. Other conventions are as in Fig. 1B. (B) Responses of the population of tested neurons in the vision-and-sound, vision-only, sound-only, and motor conditions (from top to bottom, as indicated by initials on the left). Responses to the most effective stimuli are shown in black, and those to the poorly effective stimuli are in gray. Vertical lines indicate the auditory response onset in the population. The x axes are the same as in (A), whereas they axes are in normalized units for the population.

Table 2

Neurons classified according to visual and auditory properties. Classification is based on results of the vision × sound × action multivariate analysis of variance (MANOVA) (4). All significance is taken at P < 0.05.

View this table:

Neuron 3 discharged when the monkey observed the experimenter breaking a peanut (V+S, and V) and when the monkey heard the peanut being broken without seeing the action (S). The neuron also discharged when the monkey made the same action (M). Grasping a ring and the resulting sound of this action (4) evoked small responses. A statistical criterion (4) yielded both auditory and visual selectivity for this neuron. By analyzing the vision-only and sound-only conditions separately (Newman–Keuls test,P < 0.05), selectivity was apparent in both cases.

Neuron 4 is another example of a selective audiovisual mirror neuron. This neuron responded vigorously when the monkey broke a peanut and much less when it ripped a sheet of paper (M). This selectivity was also observed when the monkey saw and heard the experimenter breaking a peanut (V+S). The sound alone of breaking a peanut produced a significant but smaller response (S), thus showing the importance of the visual modality for this neuron. However, the vision of breaking a peanut without the natural sound triggered no response. Indeed, the sound of a peanut breaking is an important signal that the operation is successful. The behavior of neuron 4 reflects this phenomenon. Paper ripping produced small responses in all conditions.

For 16 neurons, the intensity of the discharge in vision-only, sound-only, and vision-and-sound conditions did not differ significantly (Newman–Keuls, P > 0.05, as for neuron 3). In 10 of the remaining neurons, the response in V+S was significantly larger than that in S (Newman–Keuls, P< 0.05, as for neuron 4). The latter neurons required both modalities to describe the action event, which reflects what normally occurs in nature, where, within a social environment, vision and sound of hand actions are typically coupled. Finally, in the remaining three neurons the response to sound alone was the strongest.

A population analysis (Fig. 2B, rightmost column) based on all 33 neurons analyzed confirmed the data observed in individual neurons (4). The population of neurons responded to the sound of actions and discriminated between the sounds of different actions. The actions whose sounds were preferred were also the actions that produced the strongest vision-only and motor responses.

In conclusion, area F5 contains a population of neurons—audio-visual mirror neurons—that discharge not just to the execution or observation of a specific action but also when this action can only be heard. Multimodal neurons have been described in several cortical areas and subcortical centers, including the superior temporal sulcus region (6–8), the ventral premotor cortex (9–14), and the superior colliculus (15). These neurons, however, responded to specific stimulus locations or directions of movement. The difference with the neurons described here is that they do not code space, or some spatial characteristics of stimuli, but actions when they are only heard.

A further difference is that audiovisual mirror neurons also discharge during execution of specific motor actions. Therefore, they are part of the vocabulary of action previously described in area F5. This vocabulary contains not only schemas on how an action should be executed (for example, grip selection) but also the action ideas—that is, actions expressed in terms of their goals (for example, grasp, hold, or break) (16). Audiovisual mirror neurons could be used, therefore, to plan/execute actions (as in our motor conditions) and to recognize the actions of others (as in our sensory conditions), even if only heard, by evoking motor ideas.

Mirror neurons may be a key to gestural communication (17). The activity of ripping neurons in my brain leads me (if the circumstances are appropriate) to rip a sheet of paper. This overt action will activate your F5 ripping mirror neurons. The action becomes information. This information can be decoded in your brain thanks to the matching properties of your mirror neurons. What is intriguing about the discovery of audiovisual mirror neurons is that they are observed in an area that appears to be the homolog of human Broca's area (area 44) (18). The recent demonstration of a left-right asymmetry in the ventral premotor cortex of great apes (19) indicates that the human motor speech area is the result of a long evolutionary process, already started in nonhuman primates. The discovery of audiovisual mirror neurons in this location may shed light on the evolution of spoken language for two main reasons: First, these neurons have the capacity to represent action contents; second, they have auditory access to these contents so characteristic of human language.

Supporting Online Material

Materials and Methods

  • * To whom correspondence should be addressed. E-mail: giacomo.rizzolatti{at}


View Abstract

Navigate This Article