Dynamic Shifts of Limited Working Memory Resources in Human Vision

See allHide authors and affiliations

Science  08 Aug 2008:
Vol. 321, Issue 5890, pp. 851-854
DOI: 10.1126/science.1158023


Our ability to remember what we have seen is very limited. Most current views characterize this limit as a fixed number of items—only four objects—that can be held in visual working memory. We show that visual memory capacity is not fixed by the number of objects, but rather is a limited resource that is shared out dynamically between all items in the visual scene. This resource can be shifted flexibly between objects, with allocation biased by selective attention and toward targets of upcoming eye movements. The proportion of resources allocated to each item determines the precision with which it is remembered, a relation that we show is governed by a simple power law, allowing quantitative estimates of resource distribution in a scene.

The dominant model of visual memory capacity asserts that only a limited number of items can be simultaneously represented in working memory (110). Support for this model has come primarily from change-detection tasks, in which detection was close to 100% correct for small numbers of items, and declined only when the set size increased above a certain limit, generally three or four items (4, 5). An alternative way to explore the limits of visual working memory is to consider the precision with which each visual item is stored as a function of the number of objects in a scene. This approach provides a radically different perspective on visual capacity limits, revealing rapid redistribution of limited memory resources across eye movements and covert shifts of attention.

We tested subjects' ability to remember the location and orientation of multiple visual items after a brief disappearance of the stimulus array, with or without an intervening eye movement (Fig. 1). To minimize the role of configurational memory (11), only one of the items was redisplayed after the delay; subjects reported the direction in which this probe item had been displaced or rotated. Responses varied probabilistically with the magnitude of the change to the probe item (Fig. 2A). Subjects' response functions were successfully fitted with cumulative Gaussian distributions, consistent with a Gaussian distribution of error in the stored representation of the original stimulus (12).

Fig. 1.

Experimental procedure. (A) Stimuli and sequence of events on a location-judgment trial. The example shown is a fixation trial with a set size of four items. After the sample display is blanked, subjects' memory for location of a randomly chosen item is tested by redisplaying the item displaced horizontally through distance Δ. The subject must report the direction of displacement. (B) An orientation judgment trial (this time shown for the saccade condition, with a set size of two items). At the tone, the subject makes a saccade toward the item of a prespecified color (here green) with the display being blanked during the eye movement. A randomly chosen item is redisplayed, rotated through an angle Δ, and the subject reports the direction of rotation. Red circles indicate gaze position.

Fig. 2.

Performance on the memory task. (A) Proportion of displacements judged outward from fixation (top row) as a function of the actual displacement (Δ) and number of items in the display (N). Similar plots are shown in the lower row for orientations (rotations) judged clockwise as a function of the actual rotation and number of items. Results from fixation trials are shown in black, and those from saccade trials in red. Curves indicate cumulative Gaussian distributions fitted to the response data. The slopes of these functions become flatter with increasing number of items. (B) Precision (determined by the reciprocal of the SD of the fitted Gaussian) falls as a function of the number of items in the sample display. Error bars indicate ±1 SE.

In the absence of eye movements, subjects were able to recall both location and orientation of a single item with considerable accuracy (Fig. 2A, N = 1, black symbols), with discrimination of 0.5° displacements and 5° rotations significantly better than chance at 73% and 80% correct, respectively (t > 5.8, p < 0.001). However, increasing the number of items to be remembered led to a decrease in performance, indicative of the limited capacity of visual working memory (Fig. 2A, black symbols, set size increasing left to right). Precision, measured by the reciprocal of the standard deviation of the response function, was reduced as the number of items in the display increased (Fig. 2B, black symbols). These data do not reveal a sharp drop in performance at a limit of four items.

Next, we asked whether the precision of visual working memory is affected by an eye movement. Detection of changes to visual stimuli that occur during an eye movement presents a challenge to the brain, because the pre- and postsaccadic retinal locations of every visual item are very different. For location discrimination in single-item displays (Fig. 2A, top left, red symbols), an intervening eye movement introduced a small bias (mean 1.4°) into subjects' judgments: a tendency to report a shift in the direction of the saccade even for small displacements in the opposite direction. However, as can be seen from the similar slopes of the two response functions, the precision with which this discrimination was made did not differ significantly from that of the fixation condition (t = 1.2, p = 0.24). This indicates that subjects take into account the size and direction of their eye movement in order to estimate the expected postsaccadic retinal location of the single target (13, 14). This may be achieved by remapping a retinotopic location representation based on an internal copy of the saccadic motor signal (15, 16).

Precision in the saccade condition decreased with increasing number of items in a way similar to that of the fixation condition, for both location and orientation judgments (Fig. 2B, red symbols), with no significant advantage of fixation at any of the tested set sizes (t < 1.3, p > 0.23). This indicates that the process of spatial updating does not introduce any additional capacity limit on visual working memory, and therefore that the full contents of memory undergo remapping.

The item-limit model of visual working memory predicts that discrimination performance will begin to decline only once the limiting number of items is exceeded. In contrast, our results—for both fixation and saccade conditions—show that the precision with which visual items are remembered decreases with increasing numbers even at the smallest set sizes (t >2.7, p < 0.006), with the largest drop in precision occurring between one- and two-item displays, and no evidence for any discontinuity in the region of four items (Fig. 2B). Our data therefore support an alternative model in which limited visual memory resources must be shared out between items, such that increasing numbers of items are stored with decreasing precision (see fig. S1 for an illustration). To quantify the relation between the resources available to encode an item (R) and the precision with which it is remembered (P), we replotted precision as a function of the proportion of resources available per item (Fig. 3A). The results suggest that this relation can be captured by a simple power law [PRk, power constant k = 0.74 ± 0.06 (95% confidence limits); blue line].

Fig. 3.

Modeling visual memory performance. (A) The relation between available memory resources and precision is approximated by a power law (solid blue line; dashed line indicates 95% confidence limits) fitted to the normalized precision values obtained in the first experiment, including both fixation and saccade conditions (circles: location task; triangles: orientation task; black: fixation trials; red: saccade trials; empty symbols: flash cue; filled symbols: no flash cue). (B) Normalized precision as a function of number of items in memory (N) in all conditions. Solid line indicates the prediction of the fitted power-law model. Normalization is with respect to performance with one item (N = 1) in each of the experimental conditions. (C) Response probability as a function of the size of the change (Δ) to the stimulus (i.e., displacement or rotation), for different numbers of items (N), predicted on the basis of the power-law model. σ indicates 1 SD of the N = 1 response function. The curves become flatter with increasing number of items, corresponding to changes in the Gaussian distributions of error in the stored stimulus representation (inset). The dotted vertical line corresponds to a small change to the stimulus, as used in the current study, whereas the dashed vertical line indicates a much larger change. In the latter case, performance would be near-maximal for one to four items but fall with further increases in set size. (D) Probability of correct response for stimulus changes of different magnitudes (black lines). σ indicates 1 SD of the N = 1 response function. The iso-lines for each multiple of σ were derived directly from (C). Red symbols show empirical data from the current study. Green symbols show data from (5). Both sets of data are consistent with the power-law model, but different curves arise with differences in the size of stimulus change. See also fig. S2 for data plotted according to different sizes of change within the current experiment.

The similarity of our results for memory of location and orientation suggests that they share a common mechanism. This may be the representation of stimulus attributes by population coding, in which information is encoded in the combined activity of a large number of neurons (17). Currently identified population decoding schemes do not permit a neuron to simultaneously encode information about more than one stimulus. Therefore, when multiple items must be represented, the total pool of neurons must be shared out between the different items. Because each neuron's firing rate is corrupted by noise (18), reducing the number of neurons representing an item will increase variability in the population estimate, and consequently reduce the precision with which the item is represented. Theoretical studies have shown that a maximum likelihood decoding scheme would result in a power-law relation between precision and number of neurons (19), similar to that obtained in the current study (20).

Can the power-law model also explain why previous studies (4, 5) found a decrease in performance only for greater numbers of items? Figure 3B shows how the model predicts precision will change with increasing set size, and Fig. 3C displays the corresponding response functions. The power-law model predicts that accuracy (proportion correct) will vary with the magnitude of the change to be discriminated. In this study, we tested discrimination of small changes to stimuli, where discrimination is difficult even with only one item in the display. In this range, our model predicts that accuracy will decrease with increasing number of items even at the smallest set sizes (e.g., dotted vertical line in Fig. 3C). In contrast, previous tests of visual working memory have generally used “suprathreshold” changes, where performance is close to 100% correct for a single item. In these cases, the power-law model predicts that accuracy will initially change almost imperceptibly with increasing numbers of items, and then more steeply at larger set sizes (e.g., dashed vertical line in Fig. 3C). The full predictions of the model are shown in Fig. 3D (black lines). The power-law model is consistent both with our data (examples shown in red: 0.5° displacements, 5° rotations; see also fig. S2) and with many of the results previously taken to support a three- to four-item limit [examples shown in green (5)].

Although we have shown that an upcoming eye movement does not reduce the total memory resources available, it does affect how those resources are allocated. Figure 4A shows precision of discrimination judgments in multi-item (N >1) displays, where the data were separated into trials on which the probed item was the saccade target and those on which it was one of the other items in the display. For both location and orientation judgments, the saccade target was remembered with greater precision than were nontargets, indicating a preferential shift of visual memory resources to the target of the eye movement (black symbols; t >4.2, p < 0.001). This finding was not a consequence of the way in which the saccade target was specified [endogenously cued by color (21)] because a similar effect was also observed, in a different condition, when we cued the saccade target exogenously by flashing it (Fig. 4A, gray symbols; t >4.3, p < 0.001). Thus, limited working memory resources get rapidly redistributed so that the target of a forthcoming eye movement receives privileged allocation, thereby improving the precision for this item. Because total resources are limited, the corollary of this enhanced memory for the saccade target should be a decrease in precision for nontargets, which will be most evident when the total number of items is small. A comparison of saccade and fixation performance in two-item displays confirmed this effect, with the increased precision for the saccade target (t = 4.26, p < 0.001) matched by a significant decrease in precision for the nontarget item (t = 3.19, p < 0.01).

Fig. 4.

Effects of eye movements and attention. (A) Precision as a function of number of items: memory for saccade targets (filled symbols) and nontargets (empty symbols), when the target is specified endogenously by color (black) or exogenously by a flash (gray). Better performance is seen for targets, regardless of the mode of cueing. (B) Memory for an item cued by a flash (filled symbols) and for noncued items (empty symbols), with no eye movements. (C) Memory for items as a function of fixation order in a sequence of saccades demonstrates how memory for the most recently fixated item is poor compared to memory for the current saccade target. Error bars indicate ±1 SE.

Is this flexibility in the allocation of memory resources specific to eye movements, or does it also occur with shifts of covert attention (2225)? In a further condition, subjects kept their eyes fixed, but one of the items in the sample display flashed briefly before the screen was blanked, a manipulation known to involuntarily attract visual attention (26). When the flashed item was subsequently probed, discrimination precision was significantly higher than for nonflashed items [Fig. 4B; t > 3.4, p < 0.001 (27)]. Thus, visual attention acts as a “gatekeeper,” determining which visual information is given priority for storage in working memory (2831), perhaps by biasing competitive interactions in cortical regions mediating visual memory (32, 33).

In normal scene viewing, we make many eye movements in order to extract the maximum possible information from a scene. We performed an additional experiment to examine how visual memory resources are dynamically allocated across a sequence of saccades. Subjects made a series of eye movements to fixate each item in a five-item display; the display was blanked before the saccade to the fifth item reached its target. The precision on a subsequent discrimination judgment, probing memory for any one of the five targets, varied with order of fixation (Fig. 4C).

Discrimination of both location and orientation of the saccade target (the fifth item) was substantially more precise than for any of the other items in the display (t > 3.9, pcorrected < 0.001). However, the target of the previous saccade, which was also the most recently fixated item, was not remembered with significantly greater precision than any of the previously fixated items (t < 2.5, pcorrected > 0.12). Nor were there any differences in precision between the previous items (t < 2.0, pcorrected > 0.55). We found no significant relation between precision and fixation time (t < 1.0, p > 0.31), indicating that these results do not reflect temporal (e.g., recency) effects. Rather, it appears that the high-resolution memory for a saccade target persists for only one eye movement. Based on the power law obtained in the first experiment, we can estimate the proportion of working memory resources allocated to each item in the sequence. This analysis reveals that, at the time of a saccade, most visual memory resources are allocated to the target of the next fixation [location task: 56%; orientation task: 61% (34)] rather than to the currently fixated item (location task: 15%; orientation task: 16%).

The current results are inconsistent with the view that visual working memory capacity is limited to a fixed number of objects. Several previous studies have attempted to go beyond the simple fixed item-limit account of visual memory (9, 10, 35). One study (9) has proposed a variable item-limit, based on a fixed “information load,” whereby the more visually complex the items to be remembered, the fewer can be stored. Although related to our limited-resource model, this hypothesis cannot account for the relation between precision and number of items observed in the current study, because the visual complexity of the sample stimuli was held constant. It has been argued (10) that the changes in detection performance observed in this previous study are the result of increasing similarity between sample and probe items, rather than increasing complexity of the sample. Because the precision of visual memory is limited, reducing the size of the change to the stimulus results in poorer performance, in agreement with our model.

Since submission of this article, another study has been published that also examines the precision of visual memory (36). The authors put forward a two-component model, combining a variable-precision memory for fewer items with an absolute upper limit on number of items (above which decreases in performance are accounted for solely by random guesses). Based on this interpretation, their data indicate that the average subject can hold only about two items in working memory [see figure 2 and supplementary figure 3 in (36)]. However, this study did not control eye movements, which we have shown can strongly bias precision in favor of fixation targets. A re-analysis of our own fixation task data in accordance with their mixture-model approach reveals that precision falls with increasing number of items throughout the tested range, including between four and six items (χ2 = 5.6, p = 0.018; fig. S3 and supporting online text). We conclude, therefore, that the capacity of visual memory can be explained solely in terms of a limited resource that must be shared out between all items in the scene, with no evidence for an upper limit on the number of items that can be stored, contrary to the hypothesis of a two-component model (36).

The allocation of this limited resource is highly flexible; making an eye movement to an item, or directing covert attention to it, causes a greater proportion of memory resources to be allocated to it, so it is retained with far greater precision than other objects in the scene. All information stored in visual working memory is dynamically updated during an eye movement to take into account the change in gaze position. However, because the resource is limited, the high-resolution representation of a fixated item is substantially degraded as memory resources are reallocated to the target of the next eye movement.

Supporting Online Material

Materials and Methods

Figs. S1 to S3


References and Notes

View Abstract

Stay Connected to Science

Navigate This Article