Report

Sparse Coding and Decorrelation in Primary Visual Cortex During Natural Vision

See allHide authors and affiliations

Science  18 Feb 2000:
Vol. 287, Issue 5456, pp. 1273-1276
DOI: 10.1126/science.287.5456.1273

Abstract

Theoretical studies suggest that primary visual cortex (area V1) uses a sparse code to efficiently represent natural scenes. This issue was investigated by recording from V1 neurons in awake behaving macaques during both free viewing of natural scenes and conditions simulating natural vision. Stimulation of the nonclassical receptive field increases the selectivity and sparseness of individual V1 neurons, increases the sparseness of the population response distribution, and strongly decorrelates the responses of neuron pairs. These effects are due to both excitatory and suppressive modulation of the classical receptive field by the nonclassical receptive field and do not depend critically on the spatiotemporal structure of the stimuli. During natural vision, the classical and nonclassical receptive fields function together to form a sparse representation of the visual world. This sparse code may be computationally efficient for both early vision and higher visual processing.

Although area V1 has been studied for over 40 years, little is known about how V1 encodes complex natural scenes. Theoretical studies suggest that natural scenes can be efficiently represented by a sparse code based on filters that resemble neurons found in area V1 (1, 2). Sparse codes lie along a continuum ranging from dense codes, where neurons respond to most stimuli, to local codes, where neurons give extremely selective responses (3). Both of these extremes are inefficient in several important respects. Dense codes are highly redundant and each neural response carries little information, whereas local codes require an implausibly large number of neurons and are computationally intractable. In contrast, neurons that are tuned to match the sparsely distributed, informative components of the natural world can produce sparse codes. Sparse codes transmit information with minimal redundancy and relatively few spikes. Consequently, they are both informationally and metabolically more efficient than dense codes (4). There have been a few studies of sparse coding in inferior temporal visual areas (5). We have addressed this issue in area V1.

Recent theoretical studies suggest that nonlinear interactions between neurons may increase coding sparseness in area V1 (2,6). These interactions are predominantly reflected in modulation of classical receptive field (CRF) responses by the surrounding nonclassical receptive field (nCRF) (7). Previous experiments have demonstrated that nCRF stimulation strongly modulates responses during free viewing of natural scenes (8). This report demonstrates that V1 employs a sparse code to represent natural scenes and shows that the nCRF plays a crucial role in this process.

We have addressed this issue by using controlled stimuli that simulate natural vision. The stimuli were sequences of images simulating the spatial and temporal patterns occurring in and around the CRF when an animal freely views a static natural scene (see Fig. 1A). Eye scan paths were generated with a statistical model of eye movements made during free viewing (9). Image patches were extracted from a natural scene along the simulated scan path and converted to gray scale (10). Each natural vision movie was composed of a series of simulated fixations separated by brief simulated saccadic transitions.

Figure 1

Natural vision movie and representative responses. (A) Example of a natural scene used as the source image for natural vision movies. White line represents simulated visual scan path. Image patches centered on the scan path were extracted to form the movie. Small white circle gives the CRF size; larger circle is four times the CRF diameter. (B) Raster plot of action potentials during 20 presentations of a movie confined to the CRF. The number of action potentials during each 13.8-ms movie frame is indicated by intensity. Solid line is the peri-stimulus time histogram (PSTH). The sparseness of these data is 16%, which implies a dense distribution of responses across the stimulus set. (C) Raster plot of action potentials during 20 presentations of a movie with a stimulus size four times the CRF diameter. Dark line again gives the PSTH. Stimulation of the nCRF increases sparseness to 53%.

In the experiments described here, we manipulated the size of the extracted image patches. Patch size varied from one to four times the diameter of the CRF. To reduce potential boundary artifacts, the outer 10% of each image patch was blended smoothly into the neutral gray background. Data reported here are from 61 well-isolated neurons recorded in area V1 of two awake behaving primates (11).

The sparseness of V1 responses increases dramatically with larger natural image patches that encompass both the CRF and the nCRF. This effect is illustrated in Fig. 1, which compares responses obtained with stimuli confined to the CRF (Fig. 1B) with those obtained with stimuli four times the diameter of the CRF (Fig. 1C). To quantify sparseness we used a nonparametric statistic (12): S= {1 − [(Σri /n)2/Σ(ri 2/n)]}/[1 − (1/n)], where ri is the response to the ith frame of a movie (averaged across trials) and n is the number of movie frames. Values ofS near 0% indicate a dense code, and values near 100% indicate a sparse code.

Distributions of S across the sample of neurons are shown inFig. 2 for each stimulus size. As stimulus size increases, sparseness increases systematically (P < 0.01) (13). The sparseness statistic saturates when stimuli are three to four times the size of the CRF, consistent with the spatial extent of V1 nCRF modulation reported in other studies (7). The high sparseness values produced by large stimuli suggest that area V1 uses a sparse code during natural vision, when stimuli span the entire visual field.

Figure 2

Stimulation of the nCRF increases sparseness in single neurons. Effects of stimulus size on distribution of the sparseness statistic across the sample of cells. Expressed as a percentage, S is 0% when a neuron responds equally to all frames of a movie and 100% when a neuron responds to only a single frame. An increase in S indicates an increase in the sparseness of neural coding across the stimulus ensemble. Mean sparseness values are 41%, 52%, 61%, and 62% for stimuli one, two, three, and four times the CRF diameter, respectively. To quantify sparseness changes in single neurons we computed the ratio of the observed shift in S to the maximum possible shift as a function of nCRF stimulation:S shift = (S nCRFS CRF)/(1 −S CRF). AverageS shift values are 18%, 32%, and 36% for stimuli two, three, and four times the CRF diameter, respectively. Neurons with statistically significant (P< 0.01) shifts are black and are stacked on top of those with insignificant shifts.

The simulated saccades in our natural vision movies often produce large transient responses followed by rapid adaptation during the course of the fixation. To assess the contribution of this fine temporal structure to sparseness, we recomputed the sparseness statistic after averaging all responses within each fixation. Absolute sparseness values are significantly lower in the fixation-based analysis (P < 0.05), but sparseness still increases with increasing nCRF stimulation (14). Thus, transient responses and adaptation contribute to sparseness but do not account for all of the observed nCRF effects.

We reanalyzed a subset of cells to determine whether these sparsening effects were due to nCRF suppression, excitation, or both (n = 36 cells; stimuli four times the CRF diameter). Twenty-nine percent of all the frames in this sample are significantly modulated (P < 0.05), and the ratio of suppression to excitation is about 4.5 to 1. Excitation is often concentrated in the onset transients that occur after simulated saccades, whereas suppression reduces responses across an entire fixation. Thus, natural nCRF stimulation appears to increase sparseness by both enhancing and suppressing specific epochs of the response.

It is unlikely that these results are an artifact of incorrect CRF definition (15). We defined the CRF as the circular region circumscribing all locations where stimuli evoked action potentials. Overestimation of CRF sizes would cause inadvertent nCRF stimulation by movies confined to the nominal CRF, thereby increasing estimates of CRF sparseness and decreasing the apparent sparsening effects of nCRF stimulation.

We also performed a control experiment to ensure that our sparseness estimates did not depend on the position of the patch boundary, which necessarily varied with patch size. The control stimulus consisted of a natural vision movie four times the CRF diameter on which a sharp, white ring was superimposed along the exterior boundary of the defined CRF. The ring provided a strong artificial edge to enhance the magnitude of any potential edge effects. Ring trials were randomly interleaved with non-ring trials. The addition of the CRF-diameter ring increases sparseness by an average of 8% (n = 12 neurons) relative to that observed without the ring. Thus, sparseness estimates for CRF-diameter stimuli may be inflated slightly because of the presence of the sharp border, which suggests that our estimates of the sparsening effects of nCRF stimulation probably underestimate the true size of this effect.

The data presented above were acquired with controlled stimuli that simulate natural vision. During natural free viewing, V1 activity reflects both visual stimulation and modulation by extraretinal factors such as eye movements and attention (16). We examined how these extraretinal factors affect sparseness by comparing responses obtained during free viewing of natural scenes (17) to responses obtained with natural vision movies that re-created the visual stimulation occurring in the CRF and the surround during the same free-viewing episodes (18). Both free-viewing and natural vision movie data were acquired in 11 V1 neurons (17 separate free-viewing episodes). Sparseness values obtained during free viewing and with natural vision movies are highly correlated (r = 0.91). However, the slope of the regression line is 1.2, which suggests that free viewing produces a slightly more sparse response than do natural vision movies simulating free viewing. Given that the movies may not fully stimulate the nCRF of some cells, this small difference is expected, but we cannot rule out the possibility of weak extraretinal effects.

As a final control, we examined sparseness values obtained with dynamic grating sequences (n = 22 neurons) (19) to see if sparsening is specific to natural stimuli. To compare response sparseness for random grating sequences and natural vision movies, we computed S for both stimulus types and for stimuli one and two times the size of the CRF. The sparseness values obtained with gratings and natural vision movies are not significantly different from each other, which suggests that sparseness might be induced by oriented energy present in both natural stimuli and grating sequences.

The sparse coding hypothesis also predicts that responses will be sparse when examined across the population of neurons in V1. To investigate this, we evaluated the kurtosis of the response distribution (RD) obtained with each stimulus size. The RD is the histogram of responses (i.e., action potentials per movie frame) pooled over all cells and all stimuli; it is an estimate of the population response of V1 to an ensemble of natural images. Kurtosis is the fourth moment of this distribution about its mean value. As the RD becomes more sparse the proportion of moderate responses decreases and the proportion of both small and large responses increases; this is reflected by an increase in RD kurtosis. For this reason, theorists have used kurtosis as an index of sparseness (2, 20, 21).

RD kurtosis is 4.1 when stimuli are confined to the CRF, consistent with theoretical studies suggesting that the CRF of V1 neurons produces a moderately sparse code (1). However, when stimuli are two, three, or four times the CRF diameter, kurtosis values increase significantly to 5.2, 8.7, and 10.2, respectively (P ≤ 0.001). This result further confirms that nonlinear nCRF interactions increase sparseness and demonstrates that this effect occurs across the population of cells in V1.

We also tested whether nCRF stimulation increased the independence of responses across the population of V1 neurons. We accomplished this by examining the similarity of responses between randomly selected pairs of neurons presented with nearly identical stimuli during different recording sessions. This similarity reflects the distribution of correlations across the entire population of cells in V1. If neurons carry independent information, then randomly selected pairs will be weakly correlated, whereas if they carry redundant information responses will be strongly correlated.

We selected neuron pairs stimulated with natural vision movies created from the same eye scan path and natural scene (patch sizes varied slightly because of differences in CRF size). For this analysis the average responses across movie frames were treated as a vector in a high-dimensional space. We quantified response similarity by computing the angle between the response vectors of each neuron pair (22). Using this metric, cells with similar tuning properties have small separation angles and those with different tuning properties have large separation angles.

Figure 3 shows the distribution of separation angles between neuron pairs recorded with natural vision movies confined to the CRF (Fig. 3, Upper) and four times larger than the CRF (Fig. 3, Lower). Stimulation of the nCRF significantly increases the separation angle between cells (P ≤ 0.001) (23). This is direct evidence that nCRF stimulation decorrelates responses between pairs of V1 neurons and it suggests that one consequence of increasing sparseness is increased independence of the responses across cells.

Figure 3

Stimulation of the nCRF decorrelates responses across the population of neurons in area V1. (Upper) Distribution of upper limits of the separation angles between pairs of neurons tested with similar natural vision movies confined to the CRF. Separation angle is inversely proportional to the similarity of responses between randomly selected V1 neurons recorded in separate sessions [see text and (21) for details]. The mean separation angle is 51°, indicating substantial response similarity. (Lower) Distribution of upper limits of the separation angles between neuron pairs obtained with natural vision movies four times the CRF diameter, plotted as in (Upper). The mean of this distribution is 67°, which is significantly larger than the mean of the distribution obtained from CRF stimulation alone (P≤ 0.001). This increase in separation angle reflects decorrelation across the population of V1 responses.

In a final experiment we investigated the nCRF mechanisms that might be responsible for sparsening and decorrelation. We accomplished this by mapping the spatial domains of the nCRF via reverse correlation. The stimulus was a dynamic, compound grating sequence consisting of a CRF conditioning grating and an nCRF probe extending to two times the CRF size [see Fig. 4A and (24)]. The strength, sign, and spatial distribution of nCRF domains vary widely across cells (n = 19 neurons) (see Fig. 4, B to D). Many cells have irregular nCRF domains (Fig. 4, B and C), although some have a fairly uniform structure (Fig. 4D). These patterns are similar to those reported recently for area 17 of the anesthetized cat (25). The diversity of the nCRF structure may be responsible for decorrelating the responses of V1 neurons during natural vision.

Figure 4

Identification of nCRF subdomains by reverse correlation. (A) Example of a single frame from the compound grating sequence. An optimal conditioning grating filled the CRF while a probe grating appeared in the annulus between the CRF and twice the CRF diameter. Probe position, orientation, spatial frequency, and phase varied randomly across frames (24). Spatial modulatory nCRF domains were identified by reverse correlation with respect to the position of the probe within the nCRF. (B) Anisotropic inhibitory nCRF modulation. Solid curves indicate mean response as a function of probe position. Dashed lines give mean responses to the conditioning grating alone. Plots are rotated so the preferred orientation of the CRF lies along the vertical axis. This cell is suppressed by nCRF stimulation; side suppression is larger than end suppression. (C) A cell that is enhanced by nCRF stimulation; side enhancement is larger than end enhancement. (D) A cell that is weakly and uniformly suppressed by nCRF stimulation. For all three neurons shown here, nCRF stimulation with natural vision movies significantly increases sparseness (P < 0.01).

Our experiments provide direct experimental evidence that V1 uses a sparse code matched to the underlying sparse structure of natural scenes. During natural vision, CRF and nCRF mechanisms function together as a single computational unit. Although CRF responses during natural vision are already moderately sparse, nCRF stimulation elicits nonlinear interactions (2, 6) that dramatically increase sparseness and decorrelate responses between neurons. Consequently, each neuron appears to carry statistically independent information. Between the retina and lateral geniculate nucleus, the visual system encodes information to optimize information transmission given the limited bandwidth of the optic nerve (26). V1 then recodes this information into a sparse representation. One interesting possibility is that these cells represent the independent components of natural scenes (20, 27). This would facilitate the development of associations between visual stimuli in higher visual areas and increase the efficiency of pattern recognition (1).

Sparse coding provides a unifying framework for understanding the diverse functions claimed for the nCRF: such as contrast gain control; the potential representation of extended contours, junctions or corners; and figure-ground segmentation (28). Our studies demonstrate how experiments with natural images can complement those with conventional stimuli. When used carefully, natural stimuli allow us to test our current understanding of sensory systems and to interpret known effects in terms of their natural function.

  • * To whom correspondence should be addressed. E-mail: gallant{at}socrates.berkeley.edu

REFERENCES AND NOTES

View Abstract

Navigate This Article