Research Article

Parallel and Serial Neural Mechanisms for Visual Search in Macaque Area V4

See allHide authors and affiliations

Science  22 Apr 2005:
Vol. 308, Issue 5721, pp. 529-534
DOI: 10.1126/science.1109676

Abstract

To find a target object in a crowded scene, a face in a crowd for example, the visual system might turn the neural representation of each object on and off in a serial fashion, testing each representation against a template of the target item. Alternatively, it might allow the processing of all objects in parallel but bias activity in favor of those neurons that represent critical features of the target, until the target emerges from the background. To test these possibilities, we recorded neurons in area V4 of monkeys freely scanning a complex array to find a target defined by color, shape, or both. Throughout the period of searching, neurons gave enhanced responses and synchronized their activity in the gamma range whenever a preferred stimulus in their receptive field matched a feature of the target, as predicted by parallel models. Neurons also gave enhanced responses to candidate targets that were selected for saccades, or foveation, reflecting a serial component of visual search. Thus, serial and parallel mechanisms of response enhancement and neural synchrony work together to identify objects in a scene. To find a target object in a crowded scene, a face in a crowd for example, the visual system might turn the neural representation of each object on and off in a serial fashion, testing each representation against a template of the target item. Alternatively, it might allow the processing of all objects in parallel but bias activity in favor of those neurons that represent critical features of the target, until the target emerges from the background. To test these possibilities, we recorded neurons in area V4 of monkeys freely scanning a complex array to find a target defined by color, shape, or both. Throughout the period of searching, neurons gave enhanced responses and synchronized their activity in the gamma range whenever a preferred stimulus in their receptive field matched a feature of the target, as predicted by parallel models. Neurons also gave enhanced responses to candidate targets that were selected for saccades, or foveation, reflecting a serial component of visual search. Thus, serial and parallel mechanisms of response enhancement and neural synchrony work together to identify objects in a scene.

In a crowded visual scene, we typically focus our attention on behaviorally relevant stimuli. When subjects know the location of a relevant object, the brain mechanisms that guide their spatial attention to the object largely overlap with those for selecting the targets for eye movements (1). The outcome of this selection for attention or eye movements is to enhance the responses of visual cortex neurons to the relevant object, at the expense of distracters (26). As a result, object recognition mechanisms in the temporal cortex are typically confronted with only a single relevant stimulus at a time (7). However, in most common visual scenes, people rarely know the specific location of the relevant object in advance—instead, they must search for it, based on its distinguishing features, such as color or shape, which is commonly termed visual search. A long-standing issue has been whether object selection in visual search is also mediated by neural mechanisms for spatial attention, which scan the objects in the scene sequentially until the target is identified (serial search), whether or not eye movements are made. If so, then visual attention could be broadly served by a unitary mechanism, linked to the neural systems that control gaze. Alternatively, search may be mediated by nonspatial attentional mechanisms that are sensitive to features such as color and shape and that bias visual processing in favor of neurons that represent the target features throughout the visual field, all at once (parallel search) (7). Search could also be mediated by hybrid mechanisms such as guided search (8).

Previous studies of visual search (911) and attention to stimuli with particular features (1214) in brain area V4 have found that neuronal responses to attended target stimuli were enhanced over time, but the studies were not designed to test whether the targets were “found” by serial or parallel neural mechanisms. In one of these studies, monkeys did not search for a specific feature but instead searched for a singleton (i.e., popout) stimulus in one of two feature dimensions (11). In another study that used backgrounds of natural scenes, the average neural activity throughout the trial varied according to the searched-for target features, but the authors could not rule out that these effects were due to differences in eye scan paths across the scene for different targets rather than feature-selective effects on neuronal responses (10).

We tested for parallel and serial attentional mechanisms in area V4 in monkeys performing a search task with free gaze. We recorded not only neuronal responses but also the synchrony between neuronal responses and the local field potential (LFP) (15, 16), because V4 neurons synchronize their activity when attention is directed to their receptive fields (RFs) (17), similar to neurons in parietal cortex during a memory-saccade task (18). Such synchrony, especially in the gamma frequency range, could potentially amplify their effect on postsynaptic neurons, similar to increases in firing rate (19). The monkeys freely scanned multielement arrays composed of colored shapes to find a target defined by color or shape (20). During color feature search (Fig. 1A), the cue was a colored square, and the monkey was rewarded for fixating the stimulus in the array that matched the cue color. During shape feature search (fig. S1), the cue was a gray shape, and the monkeys were rewarded for fixating the stimulus in the array that matched the cue shape. When shape was relevant, color was irrelevant, and vice versa. We selected two colors and two shapes as cues for each recording session, on the basis of initial recordings in which we determined a preferred (strong response) and nonpreferred (weak response) color and shape for a given neuron.

Fig. 1.

Illustration of the color search task and the feature enhancement analysis. (A) An example of a color search trial. The cue at the center of the screen is shown for illustration purposes only; it was extinguished before the array onset in the experiment [as in (B)]. The black dots show the eye position of the monkey during a representative correctly performed trial. The colors and shapes of stimuli at each location changed pseudorandomly from trial to trial within a session (20). (B) Feature enhancement analysis during color search. We analyzed neuronal measures for stimuli in neurons' RF when those stimuli were not the goal of the impending saccade. In this example display, fixating (represented by the inverted cone) the purple cross brings the red star into the neuron's RF. This stimulus is not selected for the next saccade, which is made instead to the orange A. In this case, we would examine the response to the red star in the RF from the time it came into the RF (i.e., when the purple cross was fixated) to when it went out of the RF (i.e., the beginning of the saccade to the orange A). The critical cases were those in which the RF stimulus was of the preferred (e.g., red) or nonpreferred (e.g., blue) color for the neuron, and we analyzed responses to these two types of stimuli when they were either the search target (e.g., the cue was red and a red stimulus was in the RF) or they were a distracter (e.g., the cue was blue and a red stimulus was in the RF). The analysis was conducted in the same manner across all fixations that brought a stimulus of interest into the RF when that stimulus was not selected for a saccadic eye movement. The same analysis was conducted during shape search trials (fig. S1), but taking into account shape preference instead of color preference.

Overall, monkeys performed similarly during color and shape search, finding the target on 86% and 91% of the trials, respectively. Both tasks were demanding, taking an average of 6.3 saccades to find the target out of 20 items (Fig. 1A and fig. S1) (21). In separate behavioral studies in which we varied the number of display items, the monkeys took an average of 160 ms per item to find the target, again indicating that the target did not “pop out.”

Parallel selection during feature search. The key element of parallel search models is that the neural bias in favor of stimuli containing features of the searched-for target occurs throughout the visual field, and throughout the time period of the search, long before a target is identified. Thus, we reasoned that the critical neurons to test for this bias were not the neurons whose RF contained the stimulus that was the target for a saccade at a given moment. Rather, the critical neurons for this test were the ones whose RF contained a potential, or undiscovered, target that was specifically not selected for the next saccade (i.e., not the focus of spatially directed attention) (Fig. 1B). For example, consider two different trials where the animal was cued to search for either a red or a blue target in the color search task. In this case, we would examine the response to a red versus blue stimulus in the RF in the interval when the animal was preparing a saccade to a stimulus somewhere outside the RF (e.g., the orange A in Fig. 1B). The question was whether the response to the red or blue stimulus inside the RF varied according to whether the animal was searching for a red versus blue target, i.e., whether there was any sort of bias in favor of the, as yet undetected, target inside the RF.

For each session and for each neuron, we measured responses under the following four conditions: (i) both the stimulus in the RF and the cue had the neuron's preferred feature (e.g., the animal was searching for red, the neuron preferred red, and there was a red stimulus in the RF); (ii) the RF stimulus had the preferred feature but the cue was the neuron's nonpreferred feature; (iii) the RF stimulus had the nonpreferred feature but the cue was the preferred feature; and (iv) both the RF stimulus and the cue had the nonpreferred feature (22).

We used multichannel drives to collect data from 79 single neurons and 70 LFPs in 27 feature search sessions in two monkeys (23). The data from the two animals have been combined because they were qualitatively similar. We used population, rather than individual neuron, statistics because of the variable number of stimulus conditions generated by the animal's scanning strategy in a given session (20). The average normalized LFP and neuronal responses of the population of V4 neurons for each of the four conditions are shown in Fig. 2 for both color and shape searches (24), and the two tasks gave qualitatively similar results. For statistical comparisons, we averaged spike densities and LFPs over the time interval that started 50 ms after the beginning of the current fixation and ended at the median saccade initiation time (∼215 ms) for the upcoming saccade. Neuronal responses were greater when the preferred stimulus was in the RF, across all conditions (Fig. 2, A and B) (t test; color, P < 0.005; shape, P < 10–4). More importantly, the response to the preferred stimulus in the RF was enhanced when it also happened to match the cue, i.e., when it contained the target feature that the animal was searching for on that trial but had not yet found (color, P < 10–5; shape, P = 10–5). The response to the cued feature in the RF remained elevated through the time of saccade initiation, when spatially directed attention to the saccade target outside the RF was presumably at a maximum. Responses to nonpreferred RF stimuli showed no such enhancement based on the cue features (P > 0.05), but responses to distracters with colors similar to the preferred color were significantly enhanced when the preferred color was cued during color search (not shown, P < 0.001). The distribution of cue-feature effects on responses to a preferred stimulus in the RF is shown for all neurons in Fig. 2, G and H. More than 90% of the neurons gave a larger response to the preferred feature when it was the target feature (chi-square, P < 10–5) with a median increase in response of 30% for all neurons. This supports the idea of an attentional bias in favor of neurons with a feature preference that matches the searched-for feature on a given trial, in parallel throughout the visual field.

Fig. 2.

Feature-related enhancement of neuronal activity and spike-field synchronization during feature searches. (A) Normalized firing rates averaged over a population of V4 neurons during color search trials during fixations at the end of which the monkey made a saccade away from the RF, averaged across all recordings. Red lines show responses when the stimulus in the RF was of the preferred color for the recorded neurons; blue lines show responses when the stimulus was of the nonpreferred color; solid lines show responses on trials in which the cue was the preferred color; and dotted lines show responses on trials in which the cue was the nonpreferred color. (B) Results for the shape search task, as in (A) but taking into account shape preference instead of color preference. (C and D) Normalized V4 LFP during color and shape search, respectively. (E and F) Spike-field coherence for the color and shape tasks, respectively. Light red and light blue lines show coherence values for the same conditions represented by the red and blue lines, respectively, but with the correspondence between LFP and single-neuron activity removed by shuffling the fixation order of the spike data within a condition. (G and H) The coherence modulation index as a function of the single-neuron firing rate modulation index for color and shape search, respectively, for each spike-LFP pair. The coherence modulation index was calculated as [(Cpp – Cpn)/(Cpp + Cpn)] and the firing rate modulation index as [(FRpp – FRpn)/(FRpp + FRpn)], where Cpp represents coherence when the preferred feature was in the RF and the preferred feature was cued, Cpn represents coherence when the preferred feature was in the RF and the nonpreferred feature was cued, and FRpp and FRpn represent firing rates for the same conditions as Cpp and Cpn, respectively. Positive indices reflect an enhancement of response to the preferred stimulus in the RF when the cue was also the preferred feature. Firing rate was averaged from 50 ms after fixation to the median saccade initiation time. Coherence was averaged over the 30- to 60-Hz range. The arrowheads indicate the mean coherence and firing rate indices across all pairs.

In contrast to the effects of attention on single-neuron firing rates, neither the magnitude (Fig. 2, C and D) nor the spectral power of the LFP considered by itself were affected by the features of the stimulus in the RF or the searched-for cue features. However, as shown in the population data in Fig. 2, E and F, the coherence between spikes and the LFP in the gamma band (averaged over the 30- to 60-Hz range) was greater when the RF stimulus was of the preferred feature for the neurons compared to a nonpreferred feature (color, P < 10–5; shape, P = 0.001). Most importantly, the coherence for the preferred feature in the RF was enhanced when the RF contained the target feature that the animal was searching for on that trial but had not yet found (cue effect: color, P < 10–5; shape, P < 10–5), similar to the effects we found on firing rates. More than 79% of the spike-LFP pairs showed enhanced coherence under these conditions (chi-square, P < 10–5) (Fig. 2, G and H), with a median increase in coherence of 22% for all pairs (25). This supports the idea of a parallel bias in favor of neurons that prefer the feature of the searched-for target and happen to hold the undetected target within their RFs. Coherence for distracters with colors similar to the preferred color was also significantly enhanced when the preferred color was cued during color search (not shown, P < 0.001), suggesting that the bias in favor of the target feature was shared to a lesser extent by similar nontarget features. Finally, unlike what we found with firing rates, we found a small enhancement of coherence for a nonpreferred stimulus in the RF when the animal was searching for a feature that was the preferred feature of the neuron (cue effect: color, P = 0.001; shape, P = 0.002). For example, if the animals were searching for red, then red-preferring neurons had somewhat enhanced coherence even for stimuli in their RF that did not share the target feature. Together, these results suggest that when an animal is searching for a particular feature, the neurons that prefer that feature begin to synchronize their activity, and they go into maximum synchronization when a stimulus with that feature falls within their RF, e.g., when the animal is searching for red, the neurons prefer red, and a red stimulus falls within the RF.

Although the coherence measure we used was normalized for both firing rate and LFP power in a given frequency band, we confirmed that any overall changes in these values did not contribute to changes in coherence by calculating the coherence when the correspondence between spike and LFPs was randomly shuffled across fixations within a given condition (i.e., shuffling across fixations with the same cue and RF stimulus conditions). The shuffling not only reduced overall coherence but also eliminated the enhancement in the gamma band found for the preferred feature and preferred cue (Fig. 2, E and F).

Parallel selection during conjunction search. Another key element of most parallel models of visual search is that the top-down bias in favor of the cue-target stimulus representation in the cortex is shared by distracter stimuli if they share features with the cue target. For example, if the subject is searching for a horizontal red bar, the cortical representations of both horizontal stimuli and red stimuli will share in the target bias, including the cortical representation of vertical red distracters. If so, we should see evidence for enhancement of responses and/or synchrony when the stimulus in the RF contains a single feature of the target but is not itself a target because it lacks other target features. To test this, we studied 23 neurons (26) in one monkey during a conjunction task in which the target was defined by two features, color and shape, and the distracter elements could share only one of the target features; e.g., the target might be a red X, and a distracter might be either red, an X, or neither but would not contain both target features (fig. S1). We analyzed responses and synchronization associated with the distracters in the RF under the different cue conditions.

The monkey performed well (94% target localization) and did not search the array randomly (fig. S2A) (4.7 fixations per trial versus random, P < 10–5; fixation duration versus required fixation duration for target detection, P < 10–5). Behavioral data indicate that the target did not pop out (average of 90 ms per item to find the target in separate studies with a variable number of display items). As in the feature search task described above, and as in previous reports (27), the monkey used both the color and shape of the cue to guide its behavior (28).

Neuronal responses, LFPs, and spike-field synchrony were measured under four conditions: (i) the RF distracter contained the neuron's preferred feature and this feature was one of the cue (and target) attributes; (ii) the RF distracter contained the preferred feature but this feature was not cued; (iii) the RF distracter contained the nonpreferred feature and this feature was cued; and (iv) the RF distracter contained the nonpreferred feature when the preferred feature was cued (22). Neurons responded better to their preferred feature in the RF compared to nonpreferred features (Fig. 3, A and B) (color, P < 0.01; shape, P < 0.001). In the key test, we found that responses were enhanced if the distracter in the RF was of the neuron's preferred color and it was also the same color (but, by design, not the same shape) as the color-shape conjunction target (Fig. 3A) (P = 0.002). In other words, the distracter shared in the bias for the target stimulus if it shared one of its features, consistent with the predictions of parallel search models. The median enhancement was 8%, with more than 86% of the neurons having a larger response when the RF stimulus shared a feature with the searched-for target (chi-square, P < 0.005). There was also an enhancement of the response when the shape of the distracter matched the shape of the color-shape conjunction target, consistent with parallel models, but this enhancement was smaller and developed later than the color-related enhancement (Fig. 3, A and B). When the RF distracter was of the preferred feature, shape-related enhancement was not significant in the same time interval as that used in the feature search task, but it became significant ∼150 ms after fixation onset (P = 0.035). This is consistent with the behavioral evidence described above, that the monkey used the color feature more than the shape feature in guiding its search to the color-shape conjunction target (fig. S2B). The LFP magnitude (Fig. 3, C and D) and power were not modulated by stimulus or cue features in the conjunction task.

Fig. 3.

(A to F) Feature-related enhancement of neuronal activity and synchronization during conjunction search. Conventions are as given in Fig. 2.

There was also significant enhancement of the spike-field coherence in the gamma band when the RF distracter had the neuron's preferred feature and that feature was in common with the target for either a color (Fig. 3E) (P < 10–5) or shape (Fig. 3F) (P < 0.001) match. The enhancement in the latter case was smaller, again consistent with the monkey's behavioral bias in favor of using color information. The median enhancement of coherence with a color match was 22%, with 97% of spike-LFP pairs showing an increase (chi-square, P < 10–5), and the median enhancement with a shape match was 17%, with 78% of spike-LFP pairs showing an increase (P < 0.002). Thus, the top-down bias in visual search is not limited to cases in which the RF stimulus is the search target but instead applies to any stimulus, even a distracter, that contains a feature relevant to the search, consistent with parallel models. It is also consistent with the results from the feature search task, in which we found that enhancement occurred for colors that were similar to the target color. Both results potentially explain why search is often more difficult when the distracters share features with the target, as in some forms of conjunction search (8).

Serial selection during search. Finally, although we have emphasized the evidence for parallel mechanisms in search, the task necessarily had a spatial attention (serial) component to it, in that the animals made several saccades to stimuli in the array while searching for the targets. To test for spatial attention effects on responses, we compared responses and spike-field synchronization to a stimulus in the RF when either it was selected for a saccade or the saccade was made to a stimulus outside the RF (Fig. 4).

Fig. 4.

Illustration of the saccade enhancement analysis. We compared neuronal measures when the monkey made a saccade to an RF stimulus versus a saccade away from the RF. In this display, fixating the purple cross, for example, brings the green star into the neuron's RF. We would then compare neuronal responses when the green star in the RF was the target of the saccade, to those when the saccade target was to a stimulus outside the RF, e.g., the orange A. Activity was analyzed from the time the purple cross was fixated to when the next saccade was initiated.

Selecting the RF stimulus for a saccade led to an enhancement of the neuronal response across the population (Fig. 5A) (population median enhancement of 36%, P < 10–5, with 70% of neurons showing a significant increase), and it also caused a significant modulation of the magnitude of the LFP (Fig. 5B) (population, P < 10–5) and an increase in its spectral power in the gamma frequency range (P < 0.001). The effects were qualitatively similar if we aligned neuronal responses to the time of the saccade initiation and are consistent with previous reports (10).

Fig. 5.

Saccadic enhancement during feature searches. (A) Normalized firing rates for the population of neurons when a saccade was made to a stimulus inside the RF (red line) and when a saccade was made to a stimulus outside the RF (blue line) across all saccades. Data from color and shape searches were combined. (B and C) Normalized LFP and spike-field coherence for the same conditions. The results during conjunction search (not shown) were very similar.

In contrast to the parallel biasing effects during feature search on spike-field coherence, we found no significant effects on coherence of making a saccade to the RF stimulus (Fig. 5C) (29). Previously, we found that spatial attention enhances synchrony in V4, in a spatial attention task in which the animal was required to sustain attention to the same stimulus for up to several seconds (17). Given the much shorter stimulation intervals (∼215-ms saccade dwell time) in the present study, it is possible that we were simply unable to detect a small change in coherence with spatial attention because there were too few spikes. The fact that the gamma-band power of the LFP was enhanced with attention during this initial interval suggests that the inputs to and/or activity within V4 were, in fact, becoming synchronized during this brief period, even though it was not yet evident in the spike-field coherence. In addition, recent work in our lab suggests that spike-field coherence builds over the course of sustained attention to a stimulus (30), which may also explain why increases in feature-related coherence are so prominent in visual search, where attention to the features of the searched-for object must be sustained for long intervals, even when individual objects in the scene are attended only briefly.

Discussion and conclusions. The results from the feature-selection and spatial-selection components of the search task together reveal the interplay of both parallel and serial neural mechanisms in visual search. As predicted by parallel-search models, including biased competition (7, 31), the search for a target with a specific feature appears to synchronize and enhance the activity of the population of V4 neurons that prefer that feature, in parallel throughout the visual field representation, long before the animal locates the target. When a stimulus with a feature shared with the searched-for target then falls within the RF of the neurons preferring that feature, neural responses and synchrony are maximally enhanced (Fig. 6). Thus, the structures involved in the attentional control of area V4 must influence neurons based on their feature preferences and not just their RF locations. The V4 neurons firing synchronously, at an increased rate, will be more effective in driving postsynaptic neurons, including in inferotemporal and frontal cortex. We propose that this strong signal is the one that ultimately triggers spatial attention to the candidate target, and, in most cases, an eye movement toward it (Fig. 6). Distracter stimuli that resemble the target or that share some, but not all, features with the searched-for target also appear to share in this bias, explaining why some visual search tasks are difficult, including some where targets are defined by the conjunction of different features (8). This account, supported by studies of attention to multiple stimuli in V4 and inferotemporal RFs (32), is quite different from ideas that spatially focused attention is needed to “bind” together features such as color and shape in conjunction arrays or that search arrays are scanned by an internal attentional “spotlight” until the target is found. Nonetheless, there are important serial components to visual search. While the animals are searching, they appear to use a spatial-selection mechanism to examine some stimuli as potential targets, and the stimuli so selected elicit maximal neuronal responses. One can easily imagine that serial and parallel mechanisms are engaged to various degrees depending on the difficulty of the task and the sharing of target features among distracters, as predicted by hybrid models such as guided search (8) and FeatureGate (33).

Fig. 6.

Schematic illustration of selection mechanisms during a conjunction search task. (Left) Example stimulus display. (Middle) Representation of stimulus display in the cortex. Neurons that prefer the features of the searched-for target show enhanced firing and synchronize their activity when a stimulus sharing a target feature falls within their RFs, illustrated by the intensity of the stimulus representation. The greater the similarity or shared features with the target, the stronger the signal. This bias in favor of a potential target occurs in parallel throughout the visual field. (Right) The bias for potential targets triggers spatial selection mechanisms that result in eye movements.

Overall, it appears that processing in V4 is an intermediate stage in visual search between stimulus-feature processing and high-level object recognition. The feature-related enhancement we observed is likely the result of a combination of feature-selective responses in the visual cortex, including V4, and top-down feedback from structures involved in working memory and executive control, such as the prefrontal cortex and possibly the parietal cortex (7, 34). Such feedback must be capable of targeting neurons with the appropriate feature preferences throughout the visual field map. The saccade-related enhancement, on the other hand, likely originates from feedback to V4 neurons with RFs at particular locations, originating from structures with spatial attention and oculomotor functions such as the frontal eye field and the lateral intraparietal area. These areas are thought to represent a salience map in which stimuli are represented according to their behavioral relevance independent of their features (35), ultimately resulting in the selection of a single stimulus for a saccade target or further visual processing (36).

Supporting Online Material

www.sciencemag.org/cgi/content/full/308/5721/529/DC1

Materials and Methods

Figs. S1 and S2

References and Notes

References and Notes

View Abstract

Navigate This Article