Research Article

Distributed and Overlapping Representations of Faces and Objects in Ventral Temporal Cortex

See allHide authors and affiliations

Science  28 Sep 2001:
Vol. 293, Issue 5539, pp. 2425-2430
DOI: 10.1126/science.1063736

Abstract

The functional architecture of the object vision pathway in the human brain was investigated using functional magnetic resonance imaging to measure patterns of response in ventral temporal cortex while subjects viewed faces, cats, five categories of man-made objects, and nonsense pictures. A distinct pattern of response was found for each stimulus category. The distinctiveness of the response to a given category was not due simply to the regions that responded maximally to that category, because the category being viewed also could be identified on the basis of the pattern of response when those regions were excluded from the analysis. Patterns of response that discriminated among all categories were found even within cortical regions that responded maximally to only one category. These results indicate that the representations of faces and objects in ventral temporal cortex are widely distributed and overlapping.

The ventral object vision pathway in the human brain has the capacity to generate distinct representations for a virtually unlimited variety of individual faces and objects, but the functional architecture that embodies this capacity is a matter of intense debate. Single-cell recording studies in the nonhuman primate have demonstrated differential tuning of individual neurons in temporal cortex to faces, whole objects, and complex object form features (1–3). Although columns containing cells that respond selectively to faces or similar features tend to cluster together, these studies have not revealed any consistent larger scale organization for object representation. Numerous computational models for object recognition have been developed (4), but the correspondence between these models and the neural architecture of the ventral object vision pathway is uncertain.

Unlike single-cell studies, functional brain imaging has revealed a large-scale spatial organization for specialization within the ventral object vision pathway, as demonstrated by differential patterns of response, i.e., increases in neural activity indicated by localized increases in blood oxygenation, to faces and other categories of objects in ventral temporal cortex (5–17). Models for this functional architecture fall into three classes. One model proposes that ventral temporal cortex contains a limited number of areas that are specialized for representing specific categories of stimuli (5–8). Thus far, two specialized areas have been described: the fusiform face area (FFA) and the parahippocampal place area (PPA) (Fig. 1). A second model proposes that different areas in ventral temporal cortex are specialized for different types of perceptual processes (9–11). In particular, this model proposes that the FFA is specialized for expert visual recognition of individual exemplars from any object category, not just faces. The third model proposes that the representations of faces and different categories of objects are widely distributed and overlapping (12–15). According to this model, which we have named “object form topography,” ventral temporal cortex has a topographically organized representation of attributes of form that underlie face and object recognition. The representation of a face or object is reflected by a distinct pattern of response across a wide expanse of cortex in which both large- and small-amplitude responses carry information about object appearance. Unlike the other models, object form topography predicts how all categories might evoke distinct patterns of response in ventral temporal cortex and, thereby, provides an explicit account for how this cortex can produce unique representations for a virtually unlimited number of categories.

Figure 1

Schematic diagram illustrating the locations of the fusiform face area (FFA), which also has been implicated in expert visual recognition, and the parahippocampal place area (PPA) on the ventral surface of the right temporal lobe. In most brains, these areas are bilateral.

We tested our model by investigating the patterns of response evoked in ventral temporal cortex by faces and multiple categories of objects. Our model predicts that each category elicits a distinct pattern of response in ventral temporal cortex that is also evident in the cortex that responds maximally to other categories.

Analysis of patterns of neural response to object categories.

Patterns of response were measured with functional magnetic resonance imaging (fMRI) in six subjects while they viewed pictures of faces, cats, five categories of man-made objects (houses, chairs, scissors, shoes, and bottles), and control, nonsense images (18) (Fig. 2). The data were analyzed to determine whether each stimulus category evoked a pattern of response in the ventral object vision pathway that could be distinguished from the patterns of response evoked by all other individual categories. Patterns of response were examined in ventral temporal object-selective cortex, defined as those voxels with responses that differed significantly by category. The data for each subject were split into two sets, namely even and odd runs. We then determined whether the stimulus category that a subject was viewing could be identified by examining the similarity between the patterns of response evoked by each category on even and odd runs (19).

Figure 2

Examples of stimuli. Subjects performed a one-back repetition detection task in which repetitions of meaningful pictures were different views of the same face or object.

Correlations between patterns of response served as indices of similarity (Fig. 3). For example, to determine whether the pattern of response to one category, such as chairs, could be distinguished from the pattern of response to a different category, such as shoes, the correlation between the pattern of response to chairs on even runs and the response to chairs on odd runs (within-category correlation) was compared with the correlation between the response to chairs on even runs and the response to shoes on odd runs (between-category correlation).

Figure 3

The category specificity of patterns of response was analyzed with pairwise contrasts between within-category and between-category correlations. The pattern of response to each category was measured separately from data obtained on even-numbered and odd-numbered runs in each individual subject. These patterns were normalized to a mean of zero in each voxel across categories by subtracting the mean response across all categories. Brain images shown here are the normalized patterns of response in two axial slices in a single subject. The left side of the brain is on the left side of each image. Responses in all object-selective voxels in ventral temporal cortex are shown. For each pairwise comparison, the within-category correlation is compared with one between-category correlation. (A) Comparisons between the patterns of response to faces and houses in one subject. The within-category correlations for faces (r = 0.81) and houses (r = 0.87) are both markedly larger than the between-category correlations, yielding correct identifications of the category being viewed. (B) Comparisons between the patterns of response to chairs and shoes in the same subject. The category being viewed was identified correctly for all comparisons. (C) Mean response across all categories relative to a resting baseline.

Distinct patterns of neural response for multiple categories of objects.

The pattern of response in object-selective ventral temporal cortex correctly identified the category being viewed in 96% of pairwise comparisons (20). The pattern of response indicated when subjects were viewing faces, houses, and scrambled pictures with no errors (Table 1). Identification accuracy for the small man-made objects (bottles, scissors, shoes, and chairs) was significantly better than chance for each category (21).

Table 1

Accuracy of identification of the category being viewed based on the patterns of response evoked in ventral temporal cortex. Accuracies are the percentage of comparisons between two categories that correctly identified which category was being viewed.

View this table:

Category identification based on patterns of nonmaximal responses.

Although these results suggest that category-specific patterns of response are distributed and overlapping, higher within-category correlations could be due simply to the regions that reliably respond maximally to each category, with no information about a specific category carried by the pattern of response in cortex that responded maximally to other categories. To test whether the patterns of nonmaximal responses carry category-related information, we analyzed whether each stimulus category evoked a distinct pattern of response in cortex that responded maximally to other categories. For each comparison between patterns of response evoked by two categories, all of the voxels that responded maximally to either category in either half of the data were excluded from the calculation of correlations (22). The specificity of the pattern of response to each category was barely diminished by thus restricting the analysis (Fig. 4), with a mean accuracy of 94% for identifying the category being viewed (Table 1) (23).

Figure 4

Mean within-category and between-category correlations (±SE) between patterns of response across all subjects for all ventral temporal object-selective cortex (red and dark blue) and for ventral temporal cortex excluding the cortex that responded maximally to either of two categories being compared (orange and light blue). The SE of within-category correlations after excluding maximally responsive cortex was based on the mean correlation across 14 pairwise comparisons for each subject.

Patterns of response within cortical regions that respond maximally to one category.

These results indicate that the category specificity of responses in ventral temporal cortex is not restricted solely to regions that respond maximally to certain stimuli, thus raising the question of whether the representation of faces and objects in this cortex has a topographic organization that exists with a finer spatial resolution than that defined by such regions. To investigate whether the category specificity of response exists at this finer spatial resolution, we examined the patterns of response within regions that responded maximally to a single category or a small set of categories (22) (Table 1). Within only the cortex that responded maximally to houses, the pattern of response correctly identified the category being viewed with 93% accuracy. Within only the cortex that responded maximally to small, man-made objects, the pattern of response identified the category being viewed with 94% accuracy. Even within the much smaller region that responded maximally to faces, the pattern of response identified the category being viewed with 83% accuracy, and accuracies were significantly better than chance for all categories except shoes. Similarly, the pattern of response within the region that responded maximally to cats identified the category being viewed with 85% accuracy, with accuracies that were better than chance for all categories except bottles.

These results demonstrate that the pattern of response in ventral temporal cortex carries information about the type of object being viewed, even in cortex that responds maximally to other categories, but the nature of this information is unknown. To examine whether this information concerns only low-level features of gray-scale photographs that are shared by a category, such as mean luminance, mean contrast, and spatial frequencies, we reanalyzed data from a previous study in which subjects viewed photographs and line drawings of three categories (faces, house, and chairs) (13). We examined whether the pattern of response to a category of line drawings can be identified on the basis of its similarity to responses to photographs of the same and different categories and, conversely, whether the pattern of response to a category of photographs can be identified on the basis of its similarity to responses to line drawings. The results of this reanalysis showed that similarities between patterns of response to photographs and line drawings of the same category correctly identified the category being viewed, even when the analysis was restricted to cortex that did not respond maximally to either of the categories being discriminated (96% correct pairwise discriminations) [for detailed results, see supplemental material (24)]. This result shows that patterns of nonmaximal responses do not represent low-level features that are specific to the type of stimuli, such as photographs, but, rather, appear to reflect information that is more definitive of object category.

Discussion.

These findings demonstrate distinct patterns of response in ventral temporal cortex for multiple categories of objects, including different types of small man-made objects. This specificity is not restricted to categories for which dedicated systems might have evolved because of their biological significance. The specificity of the pattern of response for each category was a property of a much greater extent of object-selective cortex in the ventral temporal lobe than the sector that responded maximally to that category. The category being viewed still could be identified when the cortex that responded maximally to that category was excluded from the analysis. This result indicates that the representations of faces and objects in ventral temporal cortex are widely distributed and overlapping and that small or submaximal responses are an integral part of these representations. When the analysis was further restricted to regions that responded maximally to a single category (houses, faces, or cats) or a small number of categories (i.e., man-made objects—bottles, scissors, shoes, and chairs), the patterns of response to other categories within these regions were still significantly distinct. This result suggests that regions such as the “parahippocampal place area” or the “fusiform face area” are not dedicated to representing only spatial arrangements or human faces but, rather, are part of a more extended representation for all objects (25).

Object form topography.

We have shown in previous studies that the maximally responsive regions for several of these categories—faces, houses, chairs, animals, and tools—have a consistent topography across individuals (12–14). Here, we show that the topographic arrangement of the full pattern of response is consistent within subjects, but we are not able to perform a similar correlational analysis across subjects because current methods for warping individual brains to a common shape are inadequate at this level of detail. The spatial resolution of this topography is smaller than that defined by category-selective areas (>1 cm), because category-related patterns can be discerned within these areas, and greater than that of randomly arranged single columns or small clusters of columns (<1 mm), because of the spatial resolution of the fMRI images in this study (>3.5 mm). Single-unit recording studies in the monkey have suggested the existence of a columnar organization for representations of complex features of form but have not revealed any larger scale topographic arrangement (2).

We have proposed the term object form topography for the topographic organization of the distributed representation of faces and objects in ventral temporal cortex. Our results demonstrate a spatially organized functional architecture within subregions of ventral temporal cortex that are defined by a maximal response to a single object category. This architecture may be analogous to that found within early visual areas, such as V1, which contain spatially organized maps of simpler visual features, such as retinotopic location, edge orientation, and color. Object form topography presumably reflects how the more complex attributes of visual appearance that underlie object and face recognition are related visually, structurally, or semantically.

Population encoding of visual appearance.

The representation of a face or object involves the concerted neural activity in a widely distributed cortical space. Our analysis shows that the pattern of large and small responses, not just the location of large responses, carries category-related information, suggesting that small responses are an integral part of the representation. Population responses in simpler systems, such as color vision, similarly rely on both large and small responses to determine the quality of the integrated percept. In color vision, the perceptual quality of a hue that evokes a maximal red response in red-green neurons is also dependent on small responses in yellow-blue neurons that determine whether that hue is perceived as being more orange or violet (26–28).

What attributes of visual appearance could underlie a population encoding of objects and faces in which both large and small responses determine the quality of the integrated percept? Others have suggested that these attributes may be two-dimensional, view-dependent (2) or three-dimensional, structural (29) primitives that make up an alphabet for shape recognition. In a representation based on primitive features, however, a face or object would be specified by the strong responses that indicate the features that are present in the stimulus, not by a combination of large and small responses. Another possibility is suggested by models of face appearance and perception that describe a face on the basis of a small number of dimensions—operationalized as principal or independent components—that capture how configurations of features typically covary across faces (30–32). In a representation based on continuous dimensions, small and intermediate responses would be as important as large responses for specifying the location of a vector in feature space that best describes the appearance of a perceived face. Psychophysical evidence (33,34) supports the proposal that the neural representation of faces may be based on such dimensions, represented as opponent processes referenced to the population mean. This opponent process model demonstrates how a limited number of channels can represent complex variations of form, such as those that distinguish one face from another, and suggests, further, how a limited cortical space could represent an unlimited variety of faces.

Our analysis did not reveal any sectors of ventral temporal cortex that did not convey information about discriminations among several stimulus categories, which leaves open the question of how lesions can cause selective impairments for recognizing individual faces (prosopagnosia) or discriminating between members of a single category of objects (35–37). Our results do not address the distribution or spatial scale of patterns of response that discriminate between exemplars within a category. Moreover, it is unclear whether any of these syndromes can be caused by a restricted lesion in a ventral temporal region that responds maximally to one category (38).

A population encoding based on the pattern of large and small responses in a wide expanse of cortex has the capacity to produce unique representations of a virtually unlimited number of object categories. Models of the functional architecture of ventral extrastriate cortex that analyze only mean responses in regions that are putatively specialized for restricted categories of stimuli (faces and places) (5, 7) or specific perceptual processes (visual expertise) (9–11) provide no explicit account for how neural representations of all object categories differ (39). By contrast, our results indicate how ventral extrastriate cortex can produce unique representations for all object categories. Fortuitously, these representations have a consistent topographic arrangement that may provide a key for decoding the information that underlies face and object recognition.

  • * To whom correspondence should be addressed. E-mail: haxby{at}nih.gov

REFERENCES AND NOTES

View Abstract

Navigate This Article