Activation of Auditory Cortex During Silent Lipreading

See allHide authors and affiliations

Science  25 Apr 1997:
Vol. 276, Issue 5312, pp. 593-596
DOI: 10.1126/science.276.5312.593


Watching a speaker’s lips during face-to-face conversation (lipreading) markedly improves speech perception, particularly in noisy conditions. With functional magnetic resonance imaging it was found that these linguistic visual cues are sufficient to activate auditory cortex in normal hearing individuals in the absence of auditory speech sounds. Two further experiments suggest that these auditory cortical areas are not engaged when an individual is viewing nonlinguistic facial movements but appear to be activated by silent meaningless speechlike movements (pseudospeech). This supports psycholinguistic evidence that seen speech influences the perception of heard speech at a prelexical stage.

During face-to-face conversation, the perception of speech is reliably improved by watching the speaker’s lips moving (lipreading) as the words are spoken (1), particularly in noisy surroundings (2). The influence of these visual cues on auditory speech perception is usually outside the observer’s awareness but becomes apparent when they are not synchronous with heard speech. This is experienced, for example, when watching a poorly dubbed movie, and is evidenced experimentally by the McGurk effect when an auditory percept is modified by lipreading (3).

Although research with positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) has refined the cerebral localization of auditory speech perception (4), the regions involved in the visual perception of articulatory movements from a speaker’s face have not yet been precisely identified. How information from these distinct modalities is integrated to produce coherent and unified perception of speech during ordinary face-to-face conversation is an important question. The level at which these visual cues exert an influence on auditory speech perception is uncertain, but psychophysical evidence suggests that audiovisual integration of linguistic signals occurs before the stage of word identification, referred to as the prelexical level, and possibly at the stage of phonetic categorization (5).

In fMRI studies of normal hearing individuals we compared cerebral regions activated in silent lipreading with those activated during heard speech in the absence of visual cues to find out whether there is a common pathway by which information in visual and auditory modalities is integrated during face-to-face conversation. In two further experiments, we manipulated the linguistic specificity of these visual cues to explore at what stage dynamic facial gestures might influence auditory speech perception. For all experiments we used a design in which contrasting 30-s epochs of experimental (ON) and baseline (OFF) conditions were alternated over a total scanning time of 5 min (6). Differential activation between ON and OFF periods was estimated by subsequent analysis (7).

In experiment 1 the localization of brain areas involved in auditory speech perception was confirmed in five right-handed volunteers. During the ON condition, participants listened to spoken words presented through headphones and were asked to repeat silently to themselves each word as it was heard (8). During the OFF condition, there was no auditory stimulation, but participants were instructed to rehearse silently the number “one” at 2-s intervals—the same rate at which the words were presented aloud in the ON condition. These instructions were intended both to focus participants’ attention on the stimuli in the ON condition and to activate cortical regions involved in internally generated speech consistently during both conditions. The comparison of these two conditions (Table1) yielded bilateral activation of Brodmann areas (BA) 41, 42, and 22, previously shown to be involved in auditory speech perception (4). Activation in these auditory regions was more extensive in the left hemisphere, consistent with its dominant role in language processing.

Table 1

Major regional foci of differential activation (23). FPQ, fundamental power quotient.

View this table:

Experiment 2 was designed to identify in the same five individuals the brain regions activated during silent lipreading. In the ON (lipreading) condition, participants watched a videotape of a face silently mouthing numbers at a rate of one number every 2 s and were instructed to repeat silently the numbers they saw being mouthed (9). In the OFF condition, participants viewed a static face and were asked to repeat silently to themselves the number “one” at 2-s intervals. The following brain regions demonstrated a significant signal increase bilaterally during the ON (lipreading) condition: extrastriate cortex (BA 19), inferoposterior temporal lobe (BA 37), angular gyrus (BA 39), and of specific interest, superior temporal gyri including BA 41, 42, and 22 (primary auditory and auditory association cortices, respectively) (Fig. 1 and Table 1).

Figure 1

Voxels colored purple indicate brain areas activated by silent lipreading in experiment 2 (A) and its replication (B) overlaid on areas activated during auditory speech perception in experiment 1 (blue voxels). Yellow voxels indicate regions activated in common by silent lipreading and heard speech. These generic brain activation maps are superimposed on spoiled GRASS MR images centered at 1 mm (left), 6.5 mm (center), and 12 mm (right) above the intercommissural (AC-PC) line. The left side of each image corresponds to the right side of the brain.

These areas may subserve the component processes activated during silent lipreading. The extrastriate cortex and inferoposterior temporal lobe (which includes area V5) have been implicated in the detection of coherent visual movement (10), and activation of this region can be related to the contrast between viewing moving and still lips in the two conditions. The angular gyrus is involved in the mapping of visually presented inputs (including words and numbers) to the appropriate linguistic representations (11), and in this experiment, it may be involved in mapping facial speech cues to their appropriate verbal representation. The most intriguing finding was the activation of lateral temporal auditory cortex during silent lipreading. These areas overlapped considerably with those active during auditory speech processing (4) in these same individuals during experiment 1. However, in experiment 2 there was no auditory input other than the background scanner noise, which was constant in both conditions. The neural substrate common to heard and seen speech is illustrated in Fig. 1A.

This result provides a possible physiological basis for the enhancing effects of visual cues on auditory speech perception and the McGurk illusion (12). Furthermore, activation of primary auditory cortex during lipreading suggests that these visual cues may influence the perception of heard speech before speech sounds are categorized in auditory association cortex into distinct phonemes (13). The direct activation of auditory cortex by information from another modality may, in this instance, be a consequence of the early development of a cross-modal process because, especially for infants, heard speech is usually accompanied by the sight of the speaker (14).

To further examine the components of the response to silent lipreading, we manipulated the stimuli in the OFF (baseline) condition to engage initially the detection of lip movements per se (experiment 3) and then the perception of lip and mouth movements that resemble real speech (experiment 4) (Table 2). In both experiments the ON condition involved lipreading and silent repetition of the mouthed numbers. Five new participants were recruited for this study. These individuals also completed a refined version of experiment 2 intended to replicate our original finding of auditory cortical activation during silent lipreading (15) (Fig. 1B).

Table 2

Experimental design for experiments 2 through 4.

View this table:

In experiment 3, participants were presented during the OFF condition with examples of facial gurning (consisting of bilateral closed-mouth gestures or twitches of the lower face) produced at the same rate as the mouthed numbers in the ON condition. They were asked to attend closely to the stimuli and to count silently the number of facial gestures they saw. This contrast was designed to investigate whether activation of temporal cortex during silent lipreading might simply be a consequence of visually perceiving motion from the lower face. However, the persistence of differential activation of temporal cortex bilaterally during the ON (lipreading) condition suggests that the complex lower facial movements present in the OFF condition do not activate the auditory sites involved in silent lipreading. Bilateral activation of posterior cingulate cortex (BA 30) and the medial frontal lobe and frontal pole (BA 32 and 10) was observed during the OFF condition (facial gurning). These regions have been implicated in attention-demanding tasks (16) and may relate to the unfamiliar nature of gurning stimuli by comparison with familiar facial speech movements.

The aim of experiment 4 was to determine whether auditory cortex could be activated by visual perception of lip movements that were phonologically plausible (visible pseudospeech) but did not form coherent words (17). In the OFF condition, participants again counted silently the number of pseudospeech movements they saw. Under these conditions there was no net superior temporal activation, suggesting that visible pseudospeech may engage similar cortical regions to those used in normal lipreading. This finding supports the suggestion that linguistic facial gestures influence heard speech at a prelexical level. Bilateral activation of the insula (left > right) was detected during pseudospeech, which might be expected by the increased demand placed on phonological processing in the absence of semantic context, and is consistent with a role for the insula in articulatory processing (18). Activation in the amygdala probably relates to the heightened emotional salience of open- as opposed to close-mouthed facial expressions (19) or expressive movements in general (20).

In summary, these experiments suggest that silent lipreading activates auditory cortical sites also engaged during the perception of heard speech. In addition, it appears that auditory cortex may be similarly activated by visible pseudospeech but not by nonlinguistic closed-mouth movements. This adds physiological support to the psychological evidence that lipreading modulates the perception of auditory speech at a prelexical level (5, 21) and most likely at the stage of phonetic classification.

  • * To whom correspondence should be addressed. E-mail: gemma.calvert{at}


View Abstract

Navigate This Article