Report

Contextually Evoked Object-Specific Responses in Human Visual Cortex

See allHide authors and affiliations

Science  02 Apr 2004:
Vol. 304, Issue 5667, pp. 115-117
DOI: 10.1126/science.1093110

Abstract

Human visual recognition processes are remarkably robust and can function effectively even under highly degraded viewing conditions. Contextual information may play a critical role in such circumstances. Here, we provide neurophysiological evidence that contextual cues can elicit object-specific neural responses, which have hitherto been believed to be based on intrinsic cues alone. Specifically, we find that the “fusiform face area” (FFA) maintains its selectivity for faces without regard to whether the faces are defined intrinsically or contextually. This finding further elucidates the role of the FFA and reveals neural correlates of contextual processing in the service of robust object recognition.

In the marathon scene in Fig. 1, it is simple for us to locate the athletes' faces. However, the ease with which we accomplish this task belies its complexity. Although some of the faces in the image can be classified as such via their local intrinsic information (the pattern of eyes, nose, and mouth), for many others, the intrinsic information is almost entirely missing. The latter rely on contextual cues, such as the accompanying bodies, for their definition as faces. The importance of contextual cues in determining how an object is interpreted has been demonstrated in many behavioral studies (13) and in the work of several artists (Fig. 1D). From these demonstrations, it is apparent that perceptually, both intrinsic and contextual cues can be effective for defining object identity. A large body of work has probed the neural correlates of intrinsically defined object perception (48). When intrinsic information is present, neural responses can be modulated based on how interpretable it is (9). However, the neural correlates of contextual influences on object perception remain largely unknown. Previous studies have attempted to discover brain regions generically associated with the processing of contextual relations (10); however, we investigated the contribution of context to object representations themselves. In particular, we asked whether object-specific responses, which have been shown to be driven by intrinsic information, can alternatively be elicited by contextual cues alone when the intrinsic information is highly impoverished.

Fig. 1.

Objects can be defined both by intrinsic and contextual cues. In right (A), some of the runners' faces have enough intrinsic information to permit their classification as faces based on local image structure (B); others almost entirely lack such intrinsic cues (C) and rely on contextual information. In Magritte's painting, The Idea (D), context helps shape our interpretation of the apple as a proxy for a head. [Credits: (A) Keene State College, Keene, NH; (D) Reprinted with permission ©2004, C. Herscovici, Brussels/Artists Rights Society (ARS), New York]

We chose the domain of faces to investigate the influence of context on object-specific neural responses. Convergent evidence suggests that there is an area in the ventral temporal lobe (dubbed the “fusiform face area” or FFA), which is involved in the processing of faces (1115). We investigated how the activity of this area is modulated by contextual cues using functional magnetic resonance imaging (fMRI) techniques. We hypothesized that if context information is incorporated in facial representations, we would observe fusiform activity to stimuli in which faces are implied by context even in the absence of intrinsic facial information. We therefore created six stimulus categories, designated “a” through “f” for ease of reference. Stimulus a was images of bodies with highly degraded faces. The contextual body cues imply the presence of faces even though the intrinsic facial information is obliterated. Control conditions were as follows: b, images of bodies and degraded faces arranged in an incorrect spatial configuration; c, images of degraded faces alone (without bodies); d, images of bodies alone (with heads removed); e, clear images of faces; and f, images of natural scenes containing no faces or bodies. A sample set of stimuli used in our experiment is shown in Fig. 2.

Fig. 2.

Sample stimuli used in our experiments. The stimuli were designed to allow us to examine the contribution of relevant contextual cues in the perception of objects, here faces. We investigated whether the presence of context changes activation relative to the isolated object (degraded or clear) when (A) the context is relevant, (B) the arrangement of contextual cues is such as not to suggest the object, or (F) the image shows a different object or a scene and under other control conditions (C, D, and E).

While being scanned in a 3.0 Tesla (3T) magnetic resonance imaging (MRI) machine, nine adult subjects viewed images and performed a one-back task, signaling a repeated image with a button press. Stimuli were grouped into 20-s blocks of a single stimulus condition, and each imaging run consisted of one block each of all six conditions, interleaved with 12-s fixation blocks (17).

Once we localized the FFA for each subject, we compared the activations obtained with contextually defined faces versus those corresponding to the various control conditions. Our main results are summarized in Fig. 3. Contextually defined faces activated the FFA at least as well as intrinsically defined faces. FFA activity evoked by images of contextually defined faces was significantly greater than that evoked by either degraded faces alone, bodies alone, or bodies with degraded faces placed in an incorrect spatial arrangement (Dunnett's test, P < 0.0025, corrected for multiple comparisons). This suggests that the high level of FFA activity observed for the contextually defined face condition cannot be explained by the simple presence of bodies or degraded faces in the images or by some spatially nonspecific combination of the two features [all of which conditions were not significantly different from the activity evoked by scenes without faces; Tukey's “honestly significant difference” (HSD), P < 0.1]. In addition, as Fig. 3A indicates, intrinsically defined faces and contextually defined ones led to activation of similar loci. These data support the idea that facial representations underlying FFA activity are based not only on intrinsic facial cues but incorporate contextual information as well.

Fig. 3.

Results showing the influence of contextual cues on face-specific responses in the human extrastriate cortex. (A) Loci of activation. The yellow outline indicates the FFA region of interest as defined via responses to clear images of faces (as compared with images of scenes) in independent localizer scans. The red voxels are the regions activated in a voxelwise comparison between activity evoked by contextually defined faces and scenes. (B) The average hemodynamic responses within the fusiform face area as a function of stimulus type. The presence of correctly placed contextual cues compensates for extreme degradations in intrinsic object information (here, a greatly blurred face) to lead to activations that are comparable to those corresponding to clear images of the object alone. The degraded face image on its own, the contextual cues by themselves, and even the two together in an incorrect spatial arrangement are unable to elicit responses statistically different from those obtained with a control condition of assorted natural scenes.

Several factors, such as imagery, attention, and low-level image attributes, are known to modulate FFA activation and, therefore, could be offered as alternative explanations for our results. First, imagining faces increases activity in FFA (18, 19). However, mental imagery does not explain our results. After the first stimulus block, subjects knew that the blurred blobs represented heads. If subjects were simply imagining facial features when degraded faces were shown, we would expect to see an increase in activation when the degraded faces were presented in isolation. However, no such increase was observed. Also, in a post-experiment questionnaire, all subjects indicated that they did not imagine facial features when presented with the degraded face images. Second, increased attention to faces increases FFA activity (20), which makes attentional modulation a candidate hypothesis for explaining our data. However, this hypothesis is also unlikely to account for our results, because there is no a priori reason why degraded faces alone should elicit less attention than degraded faces on bodies. In addition, subjects were required to perform a one-back matching task, which obligated attention to all stimuli, and which demanded, if anything, more attention to the degraded faces by themselves than when they were presented with bodies. Finally, one might argue that certain low-level features might account for our results. However, because the same images of bodies and blurred faces were used for all the conditions, it is unlikely that differences in low-level features could account for the overall pattern of results observed. Thus, it appears likely that the FFA's response was being modulated primarily by the addition of relevant contextual information. It is instructive to compare responses to contextually defined faces with those obtained with clear faces on bodies. In a supplementary study designed to examine this issue, we found that FFA responses with clear faces on bodies did not differ significantly (P > 0.1, Tukey's HSD) from those with the clear faces alone or the contextually defined faces, but they were significantly higher relative to the degraded faces (P < 0.005, Tukey's HSD). A similar comparison between faces and faces on bodies has been reported in (21), with analogous results. These data, showing that clear faces, clear faces on bodies, and contextually defined faces lead to similar levels of FFA activation, reinforce the idea that contextual cues can be as effective as intrinsic ones in driving object-specific neural responses.

This finding provides direct neural evidence of the role of contextual cues in individual object processing. However, the precise nature of this influence remains an open question. One possibility is that contextual information is integrated into object perception via high-level semantic processes (22). In such a scenario, semantic knowledge about the likely identity of a degraded object exerts a top-down influence on visual object representations, either activating these representations or obligating elaborated analysis of the degraded stimulus. However, the low activity observed with degraded faces on their own, which subjects knew were faces after the first stimulus block, suggests that semantic knowledge of degraded stimulus' identity is not necessarily sufficient to elicit object-selective activity. A different possibility is that contextual cues participate more directly in object representations, with the representations themselves containing some embodiment of likely contexts in which a given object might occur. This view would imply an expanded role for FFA, in which activity is not simply selective for faces per se, but for a range of correlated features, both intrinsic and extrinsic. It has been proposed that one of the overarching functions of the hierarchy of visual brain areas is to progressively abstract away statistical regularities in the environment (2325). Such regularities range from correlations between the luminance of adjacent regions of scenes in early visual areas (26, 27), to more complex regularities across different views of an object in higher visual areas (2830). Within this framework, contextual cues might represent another, more highly abstracted statistical regularity in the visual world that the brain has adapted to exploit. More generally, the statistical perspective not only highlights the need to go beyond the conventional conceptualization of object processing based on intrinsic cues, but also allows us to unify the notions of intrinsic and contextual cues associated with an object.

Supporting Online Material

www.sciencemag.org/cgi/content/full/304/5667/115/DC1

Materials and Methods

References

References and Notes

View Abstract

Navigate This Article