The Face of Controversy

See allHide authors and affiliations

Science  28 Sep 2001:
Vol. 293, Issue 5539, pp. 2405-2407
DOI: 10.1126/science.1066018

Neuroscientists have long puzzled over whether the brain represents and processes information in a modular or a distributed fashion. According to modular theories, the brain is organized into subcomponents, or “modules,” each dedicated to processing and representing a particular type of information (1). This well-structured view of brain organization is intuitively appealing. In contrast, distributed theories argue that any information regardless of type is processed by many different parts of the brain, and that any brain region is likely to represent many classes of information. Despite the complexity of distributed representation, computational modeling demonstrates that it can be an efficient, robust, and flexible method of neural coding (2-4). Reports in this issue by Downing et al. (5) (page 2470) and Haxby et al. (6) (page 2425) about the areas of the human brain involved in perception of the face and other human body parts illustrate that the modular versus distributed controversy is still very much alive.

The study by Downing et al. (5) provides new evidence in favor of the modular view. Using functional magnetic resonance imaging (fMRI), the authors offer an impressive demonstration that a circumscribed region of the lateral occipital cortex in the human brain responds preferentially to pictures of the human body. This region, which they call the extrastriate body area (EBA), showed stronger visual responses to pictures of the human body than to pictures of common objects, animals, or cars. Line drawings, silhouettes, and even stick figures of the human body also evoked much stronger responses than scrambled versions of the same visual stimuli. The authors suggest that the EBA is a specialized system for processing the visual appearance of the human body.

This finding follows on from similar work by the same group investigating a region in the medial temporal lobe called the parahippocampal place area (PPA) that responds selectively to spatial layout (7), and a region near the occipital-temporal junction called the fusiform face area (FFA) that responds selectively to faces (8-10). Damage to the FFA of the human brain is associated with severe deficits in face recognition, a syndrome called prosopagnosia (11-13). Subdural electrode recordings in human patients with epilepsy have also revealed face-selective responses in this region, and moreover, electrical stimulation of these regions can disrupt face identification (14). Pioneering work by Gross and colleagues (15) has also revealed face-selective neurons in the temporal cortex of the monkey. Subsequent studies have shown that these “face cells” can be tuned to specific facial attributes such as the identity (16), expression (17), viewpoint (18), or parts of a face (16, 19). This collection of findings presents persuasive evidence for a brain module dedicated to face processing. Furthermore, it raises the possibility that similar modules may exist for other visual categories, including spatial layout and, as suggested by Downing et al.'s findings, the appearance of the human body.

Establishing evidence in favor of dis-tributed theories is a more challenging undertaking. Computational models have convincingly demonstrated the plausibility and power of distributed representation [for example, see (2-4)]. However, empirical evidence has been harder to come by. A distributed representation, by its very definition, involves many neurons and potentially many areas of the brain. This presents a challenge for direct neuronal recordings, which can be conducted with only limited numbers of cells at a time, and at a restricted number of sites. Thus, although single-unit neuronal recordings have provided evidence for distributed representation at relatively local scales—exemplified by direction of movement within motor cortex (20), and visual feature representations in primary visual cortex (21)—there is little empirical evidence for representations distributed at larger scales. Neuroimaging methods such as fMRI may be valuable for investigating distributed representations because they can monitor neural activity across the entire brain. The study by Haxby and his colleagues (6) provides the most compelling neuroimaging evidence to date in favor of distributed representations in the brain.

These authors measured fMRI activity across a large region of ventral extrastriate cortex while human subjects viewed eight different categories of stimuli (faces, cats, houses, chairs, scissors, shoes, bottles, and nonsense images). Each meaningful stimulus category evoked a unique pattern of activity distributed across this region that could be easily replicated. Correlational analyses revealed that the activity patterns evoked by a particular category in one fMRI scan could be used to identify the category being viewed during another scan. In subsequent analyses, brain regions that showed maximal activation to a particular category, such as the FFA, were removed from the analysis. Nonetheless, the activity patterns in the remaining brain areas could still be used to identify that category with equal accuracy. Moreover, the response of category-selective areas such as the FFA could also be used to classify stimuli from other submaximal categories at rates exceeding chance. These findings provide provocative evidence in favor of distributed representation for two reasons. First, the pattern of activity distributed over ventral cortex provides reliable information about the visual category being viewed, even when maximally responsive brain areas are not considered; and second, the pattern of activity within areas maximally responsive to one category of stimuli contain useful information about stimuli belonging to other categories.

The visual representation results of Downing et al. and Haxby et al. each seem to pose a challenge for the other. If visual information is truly distributed, as argued by Haxby and colleagues, then how should we interpret findings of maximal activity consistently associated with specific categories of stimuli? Conversely, if at least certain categories of visual information are processed by dedicated modules, as argued by Downing et al., then how can we explain the widely distributed, but category-specific, patterns of activity reported by Haxby et al.?

From the modular perspective, there are at least three possible explanations for the distributed patterns of activity observed by Haxby et al. These patterns could simply reflect an incidental response of other visual areas to face stimuli, in the absence of interactions with face-selective areas or direct contributions to face perception (that is, there would be no red and yellow lines in the figure, and face perception would rely exclusively on the dark blue pathway). Alternatively, other visual areas may simply echo processing within the face-selective areas (yellow arrows in the figure) without contributing to face perception. A third possibility is that the system could be organized hierarchically, with face-selective areas representing a locus at which sufficient information from lower levels of analysis has accrued to process face information (red arrows in the figure). In this case, distributed representations would contribute to face processing, but the “face module” would retain responsibility for integrating this information and passing it on to other processing areas (that is, there would be no green arrows in the figure). In the strongest form of modularity, modules would be responsible for both the detection and identification of category members, predicting that patterns of activity within the module should be able to distinguish among category members. However, a weaker form might allow that detection and identification are separate entities served by different modules.

The neural organization of face perception.

Possible flow of visual information among stimulus-selective and stimulus-responsive areas within the visual cortex of the human brain. Face perception is used as an example to illustrate differences between theories of modular versus distributed organization. Arrows show potential interactions among areas. The distributed model assumes that information exchange along all arrows is used in face perception. The modular view assumes that only a subset of these connections is relevant.

From the distributed perspective, there are other findings that must be explained. The first is the consistency with which localized peaks of activity seem to be associated with distinct classes of stimuli. One explanation is that such peaks simply reflect the features common to that category. In this view, such areas might serve as category-detection modules, but not as identification modules, because the full distributed representation would be required to distinguish among individual members of a given category. An alternative view, proposed by Gauthier et al. (22), is that such areas are specialized for visual “expertise” and are responsible for performing fine-grained discrimination of members within a category. This view should predict the opposite of the “common features” view: Patterns of activity within areas maximally responsive to a given stimulus category should be able to distinguish among members of that category. This prediction is similar to the one made by modularity in its strongest form—that a module serves both to detect and identify category members. Testing such predictions should be a high priority for the field.

Lesion data pose a second, and perhaps more serious challenge to the distributed view. Lesions to temporal areas thought to encompass the FFA are associated with prosopagnosia. Conversely, at least one patient with widespread damage to the visual cortex has shown severely impaired object recognition but selectively spared face recognition (23). Such behavioral double-dissociations in response to brain damage provide intuitively appealing evidence of distinct neural mechanisms for processing each type of information. Such inferences rest upon the assumption that double-dissociations in behavior reflect a corresponding organization of function at the neural level, sometimes referred to as the “transparency hypothesis” (24). However, computational modeling suggests that this assumption may not always be valid. Such modeling work has addressed double-dissociations in a variety of domains, including visual semantics [living versus nonliving things (25)], word reading [regular versus irregular word forms (26)], memory [explicit versus implicit processing (27)], and executive control [behavioral inhibition versus working memory (28)]. In each case, computational models have demonstrated that double-dissociations of behavior can result from complex interactions within a single, interactive, nonlinear system (for example, along the red and yellow arrows in the figure) in response to the effects of lesions and/or the demands of particular behavioral tasks [see (29, 30) for similar arguments]. It should be noted, however, that although computational modeling can establish the viability of alternatives to modularity, empirical evidence is required to establish their validity. The Haxby et al. data provide an important and exciting step in this direction. However, it remains to be determined whether the distributed pattern of activity that they observed is in fact necessary for face perception. Modularists might argue that such activity is the result of, or incidental to, processing in the face module. In other words, it is not enough to show that patterns of activity outside a putative module correlate with behavior—it must be shown that they are causal.

We have considered how modular and distributed theories might, in their purest forms, account for the existing findings. Of course, prudence dictates that neither extreme is likely to be correct. Indeed, we can think of pure modularity and undifferentiated distributed representation as the Scylla and Charybdis of cognitive neuroscience, between which the field must carefully navigate. On the one hand, we must avoid running aground on simplified notions of modularity. This would risk a form of “neophrenology,” or naïve localizationism, that fails to respect the true complexity of the brain. On the other hand, we must avoid being consumed by irreducible forms of distributed representation that cannot be analyzed in terms of fundamental principles. It is, after all, the job of science to reduce the complexity of nature to a more comprehensible form. We can imagine a variety of possible intermediate or alternative positions: a heterogeneous mix of special purpose modules and more distributed general mechanisms; representations that appear modular at one scale but distributed at finer scales; or representational structure that does not divide along the lines of common stimulus categories (such as faces versus objects) but rather is organized along more complex or abstract dimensions.

As the Haxby and Downing studies illustrate, neuroimaging has begun to con-ribute important new data regarding neural organization. Such efforts, combined with other neuroscientific techniques, promise ever more detailed sources of information about the nature of neural processing and representation. However, we suspect that meaningful advances will require equally dramatic progress in elaborating theories. We are likely to find that more detailed theories will naturally fall on intermediate ground between the purest forms of modularity and distributed representation. Dealing with the complexity that increasing detail introduces will no doubt require the assistance of more formal methods of theory building, such as computational modeling and mathematical analysis.


View Abstract

Stay Connected to Science

Navigate This Article