Report

Reward Timing in the Primary Visual Cortex

See allHide authors and affiliations

Science  17 Mar 2006:
Vol. 311, Issue 5767, pp. 1606-1609
DOI: 10.1126/science.1123513

Abstract

We discovered that when adult rats experience an association between visual stimuli and subsequent rewards, the responses of a substantial fraction of neurons in the primary visual cortex evolve from those that relate solely to the physical attributes of the stimuli to those that accurately predict the timing of reward. In addition to revealing a remarkable type of response plasticity in adult V1, these data demonstrate that reward-timing activity—a “higher” brain function—can occur very early in sensory-processing paths. These findings challenge the traditional interpretation of activity in the primary visual cortex.

Primary visual cortex (V1) is the most peripheral station in the ascending visual pathway where information from the two eyes is combined, and specific features of visual stimuli, such as orientation and direction of movement, are represented by neural activity (1, 2). It has long been held that, although the quality of sensory experience is used to fine-tune visual response properties during a critical period of early postnatal life, plasticity of visual responses in adults is sharply limited so as to ensure that sensory processing is reliable and reproducible. Only after the initial processing in V1 are subsequent brain regions thought to be engaged to elaborate on the significance of visual input, holding it in working memory (38), attributing behavioral and predictive value (912), and ultimately engendering appropriate behaviors.

The view of adult primary visual cortex as an immutable feature detector has undergone revision in recent years. It is now understood that deprivation and selective visual experience continue to alter cortical responsiveness in adulthood (13, 14) and that V1 activity can be rapidly modulated in various behavioral contexts (1518). However, all these changes in activity can still be readily interpreted in the context of visual processing. Our experiments challenge current understanding of what activity in V1 represents.

Adult, Long-Evans rats were fitted with head-mounted goggles that delivered full-field retinal illumination for 0.4 s to either the right eye or the left eye (fig. S1a). Action potentials evoked in response to these stimuli were monitored with chronically implanted arrays of microelectrodes, subsequently confirmed by histology to have resided in the deep layers of primary visual cortex (fig. S2). Either left- or right-eye illumination was delivered when the rat neared a water tube. Left eye stimulation portended delivery of a drop of water after x licks on the water tube (fig. S1b), whereas right eye stimulation portended delivery of water after twice that number of licks, 2x, (where x equaled 6 licks for three rats and 10 licks for two additional rats). Half of all the trials were unrewarded, so as to address whether changes in neural response were a result of reward delivery itself, or alternatively, reflected the formation of neural associations between stimuli and reward expectancy.

Responses of V1 neurons in animals inexperienced with the task related to the physical attributes of the visual stimuli, such as the onset, offset, duration, and the eye of origin [n = 5 animals, 65 neurons (fig. S3)]. However, over the course of three to seven sessions performing the task, a significant proportion of neurons began to express activity in response to one of the two visual cues that was clearly correlated with the reward time associated with that visual cue (Fig. 1, A to C). This poststimulus response relating to expected reward time appeared to occur only to stimulation of one of the two eyes, even in neurons with binocular short-latency visual responses (confirmed quantitatively below).

Fig. 1.

Three forms of reward timing in V1. Three neurons with their peristimulus time histograms and raster plots for each of the four stimulus conditions are presented. Filled squares on raster plots indicate when reward was given on rewarded trials, whereas open squares indicate when reward would have been given if not an unrewarded trial. Shaded transparent box indicates time of stimulus. Note that reward-timing activity emerges in response to stimulation of only one eye: right eye (A and B); left eye (C).

In experienced animals (after reward-timing activity was first detected), 43% of the recorded neurons (130 out of 300) showed reward timing. Of these, 50% (65 out of 130) showed a sustained increase in response until the reward was expected, 22% (29 out of 130) showed a sustained decrease in response until the reward was expected, and 28% (36 out of 130) showed responses that peaked at reward time (Fig. 1, A to C). The emergence of apparent timing activity was not related to the delivery of the reward per se, because rewarded and unrewarded trials evoked responses that were indistinguishable from each other. Instead, poststimulus activity appeared to be related to reward-time prediction, as it occurred reliably during the unrewarded trials.

We wished to assess at the population level our qualitative observation that neurons with significant poststimulus modulation related reward expectancy. Because poststimulus activity appeared to be triggered in any given neuron by stimulation of only one eye, the initial step in our analysis was to determine quantitatively for each neuron which eye was dominant and which eye was nondominant for poststimulus modulation (19). Applying our algorithm, we found that 60% (78 out of 130) of neurons with reward timing were left eye–dominated and 40% (52 out of 130) were right eye–dominated. By assessing poststimulus eye dominance, we could then test the working hypothesis that neurons dominated by the left or right eye express different reward-time expectancies.

To address this question, we pooled neuronal responses across all animals and recording sessions by normalizing activity to its maximal extent from baseline and by normalizing the time to that which elapsed between events within each session (stimulus offset, mean short reward time, mean long reward time, and trial end). This normalization procedure allowed us to average the activity modulation in the task across all 130 neurons to yield population responses evoked by neurons' dominant (Fig. 2A) and nondominant eye (Fig. 2B). Neural subpopulations dominated by the left and right eyes differed significantly in their poststimulus modulation to dominant-eye stimulation (Fig. 2A), consistent with the interpretation that the different populations relate the different reward times. In the same neurons, analysis of evoked activity to the nondominant eye showed no such difference in time course between the left and right eye subpopulations, consistent with our impression that reward-timing activity was driven only by the dominant eye (Fig. 2B).

Fig. 2.

Mean responses of neural subpopulations dominated by the left versus right eye. Time in reference to the events of the task is shown on the x axes. Dashed blue and pink vertical lines indicate mean short and long reward times, respectively. Normalized population responses are shown on the y axes. (A) Dominant eye responses for subpopulations dominated by the left (blue) and right eye (pink) for each of the three response classes. Black bar along x axes indicates time in which the responses of the subpopulations dominated by the left and right eye significantly differ (P < 0.05). (B) Mean responses evoked by neurons' nondominant eye do not significantly differ at any poststimulus moment for subpopulations dominated by the left or right eye. (C) Subtracting each neuron's dominant (A) by nondominant (B) response yields the differenced responses for each eye-dominated subpopulation, the mean of which is shown in (C) for each of the three response classes. Intervals significantly different from zero (99% confidence interval) are shown as bars below the x axis. (D) Mean response of all reward-timing neurons from experienced animals. Black bar along x axis indicates time in which differenced responses dominated by the left and right eye significantly differ (P < 0.05). (E) Left eye–dominated and right eye–dominated differenced responses from naïve animals do not significantly differ at any poststimulus moment.

If both eyes evoke responses that report the properties of the stimulus, but only one eye evokes poststimulus reward-timing activity, then the activity unique to timing can be revealed by taking the interocular difference of responses to the dominant and nondominant eye for each neuron. This analysis reveals an even stronger relation between neural activity and reward times (Fig. 2C). For left eye–dominated and right eye–dominated neurons classified as “sustained increase” or “sustained decrease,” the moment in which the “differenced” interocular mean responses are no longer distinguishable from zero (<99% confidence level) corresponds well to their respective reward times. Similarly, for “peak” neurons dominated by the left or the right eye, the moment in which the differenced interocular mean responses are maximally different from zero corresponds well to their respective cue-related reward times.

The population data can also be analyzed without dividing cells into response categories, and because response categories do not preexist when animals are relatively inexperienced in the task, this method provides a means of fairly comparing naïve and experienced responses (19). This analysis revealed in experienced animals a statistically significant difference in the time course of poststimulus modulation between left eye–dominated and right eye–dominated neurons that closely matched the difference in short and long reward times, respectively (Fig. 2D). Using the same analysis, neurons recorded from animals in the naïve state (before exhibiting reward timing) revealed no such difference (Fig. 2E). Therefore, after animals gained experience in the task, two functional groups of neurons emerged: one group that signals expectancy to the short reward time evoked by stimulation of the left eye, and the other group that signals expectancy to the long reward time, evoked by stimulation of the right eye.

How accurately do individual neural responses relate the visual cues to their appropriately associated reward time? To quantify this question, a moment of poststimulus time was credited as being the neuron's report of reward expectancy (fig. S4), which we designated the “neural reward moment” (NRM) (19). The NRMs for left eye– and right eye–dominated neurons across the entire population were then compared with the actual short and long reward times of the recording sessions, respectively (Fig. 3). Across recordings in experienced animals, the time (mean ± SEM) to the short reward was 1191 ± 35 ms; the mean left eye–dominated NRM was 1278 ± 42 ms. The time to the long reward was 1814 ± 84 ms; the mean right eye–dominated NRM was 1883 ± 116 ms. Therefore, on average, individual neurons predict reward time quite accurately.

Fig. 3.

Cue-evoked neural timing of short and long reward expectancy. Cumulative histograms of neural reward moments for neural subpopulations dominated by the left (leftmost curve) or right (rightmost curve) eye are shown and differ significantly (Kolmogorov-Smirnov test, P < 0.001). The proportion of time between the short and long reward times is shown on the x axis.

The experience of pairing visual stimuli with delayed reward clearly alters the responses of V1 neurons to these visual cues while animals are performing the task. We next asked whether reward-timing activity would continue to be evoked by the same visual cues when the animals were not performing the task. After “within-task” recording sessions, access to the nose-poke/lick tube was obstructed, and the left and right eyes were stimulated pseudorandomly on a fixed 6-s interval until 180 presentations were reached for each stimulus, constituting “outside-task” sessions. By recording from the same neurons on a given day, we found that, of neurons expressing reward-timing activity within the task (47 out of 93; 51%), 66% (31 out of 47) continued to express apparent reward-timing activity to the visual stimuli when presented outside of the task (Fig. 4A). For these neurons, the accuracy with which they continued to “predict” the short and long reward times could be compared with their performance inside the task (Fig. 4B). Although neural timing of reward outside the task was degraded, left eye– and right eye–dominated neurons continued to have mean NRMs that were significantly different from each other (P < 0.05), relating to the appropriate reward times. This result indicates that pairing visual cues to delayed rewards within the task creates a lasting alteration in the manner in which the visual cortex responds to those cues when observed in other contexts. We hasten to add, however, that, although our data show that V1 responses evolve to accurately predict reward timing, further study is required to assess whether and how such information is used by the animal to guide behavior.

Fig. 4.

Comparison of cue-evoked, neural timing of short and long reward expectancy within, versus outside of, the task. (A) Example left eye–dominated neuron with sustained short-cue reward timing recorded within and outside of the task on the same day. Conventions are the same as in Fig. 3, but with vertical lines on outside-task histograms indicating within-task reward times, for comparison. Shading indicates stimulus time. (B) NRMs (means ± SEM) within, versus outside of, the task for neural subpopulations dominated by the left or right eye. Barred asterisks indicate significance between mean short- and long-cued NRMs (t test; P < 0.05).

Such timing activity has been reported previously in higher cortical areas (2022) and in associated subcortical structures (2325), but never before in primary sensory cortex. The current findings imply that V1 neurons, at least in rats, do not function as simple feature detectors (26). Because reward-timing activity can persist long after the visual stimulus has disappeared, it no longer faithfully reports retinal illumination, but rather what retinal illumination portends. As reward timing is shown to be eye-specific, activating different subpopulations of neurons, general brain arousal/attention cannot explain this activity. Further, because these altered responses persist outside the task, emergent reward-timing activity can be independent of both context and behavior.

The mechanism for this remarkable plasticity in V1 remains to be determined. Subthreshold responses to stimulation of visual cortex, likely reflecting weak recurrent connections, can persist for seconds (27). Our findings could be explained if a modulatory input that signifies delivery of reward (possibly dopamine) causes a persistent potentiation or unmasking of recently active connections.

Supporting Online Material

www.sciencemag.org/cgi/content/full/311/5767/1606/DC1

Materials and Methods

SOM Text

Figs. S1 to S4

References and Notes

References and Notes

View Abstract

Navigate This Article