Report

Fictive Reward Signals in the Anterior Cingulate Cortex

See allHide authors and affiliations

Science  15 May 2009:
Vol. 324, Issue 5929, pp. 948-950
DOI: 10.1126/science.1168488

Abstract

The neural mechanisms supporting the ability to recognize and respond to fictive outcomes, outcomes of actions that one has not taken, remain obscure. We hypothesized that neurons in the anterior cingulate cortex (ACC), which monitors the consequences of actions and mediates subsequent changes in behavior, would respond to fictive reward information. We recorded responses of single neurons during performance of a choice task that provided information about the reward values of options that were not chosen. We found that ACC neurons signal fictive reward information and use a coding scheme similar to that used to signal experienced outcomes. Thus, individual ACC neurons process both experienced and fictive rewards.

People routinely recognize and respond to fictive outcomes, which are rewards or punishments that have been observed but not directly experienced (13). Fictive thinking affects human economic decisions (4) and is disrupted in disorders such as anxiety and impulsivity (5). Moreover, monkeys respond to information about rewards that they have not directly experienced (6) or were received by other monkeys (7). To understand the neural mechanisms that mediate these processes, we investigated how fictive reward information is encoded in the anterior cingulate cortex (ACC), part of a neural circuit that mediates outcome-contingent changes in behavior (810) and processes fictive information in humans (11). The ACC is interconnected with the orbitofrontal cortex, which mediates fictive thinking in humans (5, 12).

In our task, monkeys chose from among eight white targets arranged in a circle (13). Seven low-value (LV) targets provided small rewards (100 μL), whereas the eighth target [high-value (HV)] provided a variable reward with a larger expected value (EV). Its value on each trial was selected randomly from six possibilities (0, 200, 267, 300, 333, and 367 μL). Once the monkey selected a target, the values associated with all eight of the targets, represented by their colors, were revealed (Fig. 1, A and B). After a half-second delay, the monkey received the reward associated with the chosen target. On the next trial, the position of the HV target either remained in the same position (60% probability) or moved one position clockwise (40% probability).

Fig. 1

Task and recording location. (A) Schematic of standard task. Fixation point and eight white squares appear; 500 ms after fixation, a monkey chooses one target, and all targets change color, revealing their value. A half-second later, a reward is given. (B) Between trials, the HV target either remains at the same position (60% chance) or moves to an adjacent position (40% chance). (C) Magnetic resonance image of monkey E. Recordings were made in the ACC sulcus.

We analyzed only those trials in which monkeys maintained fixation (90.6% of trials). Because the HV target had a greater EV than the LV targets (243 μL versus 100 μL), we expected that monkeys would prefer the HV target. Indeed, in a control task that explicitly cued HV location, monkeys chose it on 93.4% of trials. In the standard task, monkeys chose the HV target (45.6% of trials) more often than chance (P < 0.005, binomial test, Fig. 2A). Monkeys earned 165.0 μL per trial, 88.5% of the amount earned by an omniscient observer with access to information about the value of all targets on all preceding trials (13). Monkeys chose targets adjacent to potential HV targets more often (37.7% of trials) than more distal targets (16.7% of trials, P < 0.005, binomial test, Fig. 2A), suggesting that they understood the probabilistic relation between the HV target on the current trial and its likely location on the next.

Fig. 2

Fictive outcomes influence behavior. (A) Histogram of distance between monkeys’ choices and optimal target, measured in squares clockwise (c.w.). The dashed lines indicates chance performance. (B) Likelihood of choosing optimally increases as a function of both fictive and experienced reward outcome on the previous trial. Black, trials after choice of LV; gray, trials after choice of HV. (C) Likelihood of switching to new target increases with size of fictive outcome on previous trial. (D and E) Likelihood and latency of immediately shifting gaze to HV location are not affected by fictive reward. (F) Likelihood of choosing optimally is not influenced by a colored square presented during the delay between trials (red line).

Large fictive rewards promote gambling in humans (14, 15); thus, we hypothesized that monkeys would likewise preferentially choose HV options after large fictive rewards. We observed this pattern in our experiment (Fig. 2B, black line, correlation coefficient r = 0.300, P < 0.001). This effect may reflect an increased willingness to switch from to a new target, as the likelihood of switching increased with larger fictive outcomes (Fig. 2C, r = 0.199, P < 0.001). One alternative explanation for these effects is that HV targets may have positive associations that influence behavior. This explanation is unlikely for several reasons. First, obtained rewards never depended on unselected targets on that trial, so any associations between these fictive stimuli and reward values would be eliminated over the thousands of training trials that preceded recording. Second, immediately after making choices, monkeys were no more likely to make a second saccade (Fig. 2D, r = –0.02, P > 0.2) nor faster to shift gaze (Fig. 2E, r = 0.008, P > 0.2) to HV fictive targets than to LV fictive targets, indicating that attention and motivation were roughly similar after all fictive outcomes. Third, we performed a control task in which the HV target remained white and a colored square appeared in the center of the monitor during the delay after the trial. This square’s color did not indicate what reward could have been received (and, thus, it provided no fictive information) but had the same associations as the fictive targets. Monkeys’ choices on subsequent trials did not depend on the color of this stimulus (Fig. 2F, r = 0.005, P > 0.6).

An example ACC neuron showed clear phasic responses around the time of gaze shifts to targets; the amplitude of these responses was correlated with the size of both the experienced reward (Fig. 3A, r = 0.056, P < 0.001, the six rewards are grouped into four categories to simplify presentation) and the size of fictive outcomes on trials when the monkey chose the LV target (Fig. 3B, r = 0.037, P < 0.001). The amplitude of phasic responses of most neurons reflected experienced reward size [n = 46 out of 68 (46/68) neurons, 67.7%] and was usually greater for larger rewards (n = 39/46, 84.8%). Responses of 50% of neurons reflected fictive reward size (n = 34/68); these responses were usually greater for larger fictive rewards (n = 30/34, 88.2%, P < 0.05). A substantial proportion of neurons (35.5%, n = 24/68) showed tuning for both experienced and fictive outcomes; most were tuned in the same direction for experienced and fictive rewards (91.7%, n = 22/24). The majority of neurons showed matching tuning for experienced and fictive outcomes (97.0%, n = 66/68). For the population, the average response strength was greater for experienced rewards than for fictive reward outcomes (P < 0.01, bootstrap t test). These phasic neural responses are tightly coupled to gaze shifts to visual targets. These responses may thus reflect visual stimulation, reafferent oculomotor signals, or attention to the cue. The amplitude of these phasic responses carries information about the value of fictive outcomes.

Fig. 3

ACC neurons signal both experienced and fictive rewards. (A) (Left) Peristimulus time histogram showing responses of example neuron after choice of HV target. Response grows with reward size. Vertical dashed lines indicate, successively, the time that outcomes are revealed and reward is given. The shaded gray region indicates the epoch used for the bar graph showing average (±1 SE) response of neuron for each experienced reward size. sp/s, spikes per second. (B) Responses of the same neuron for fictive rewards. Experienced reward was identical (100 μL) in all cases. (C) Population response (n = 68 neurons) for experienced rewards, normalized to the maximal firing rate for each neuron. (D) Population response for fictive rewards.

To test the hypothesis that responses to fictive rewards may contribute to behavioral adjustment, we calculated the trial-by-trial correlation between firing rate and likelihood of choosing the optimal target after LV trials for all neurons (Fig. 4A). To control for the different neuronal responses to different fictive outcomes, we analyzed data separately for each fictive reward. We found a positive correlation for four of the six fictive outcomes (P < 0.001) and no correlation for the remaining two (P > 0.05). These results raise the possibility that the firing rate signals subsequent changes in behavior and not fictive outcomes (16). However, a second analysis revealed that firing rates were correlated with fictive outcome preceding trials in which monkeys chose optimally (P < 0.001). This analysis controls for any adjustment signal, and confirms that ACC neurons do not merely predict behavioral switching. Finally, reaction times did not correlate with likelihood of choosing the optimal target across all recording sessions (P > 0.5, correlation test). This analysis controls for the possibility that the correlation between firing rate and adjustment merely reflects uncontrolled variations in arousal.

Fig. 4

Neuronal responses signal both fictive rewards and subsequent adjustments in behavior. (A) Firing rates after LV trials predict optimal choice on the next trial for four of the six fictive outcomes. (B) Neuronal responses to experienced rewards are identical on the trial that follows low (0 μL, red line) and high (≥300 μL, blue line) fictive outcomes and, thus, do not signal reward prediction errors. Error bars indicate 1 SE.

One alternative explanation for these data is that, by influencing behavior and thus future rewards (Fig. 2B), fictive outcomes serve as the first predictive cue of the reward on the next trial. We find this alternative explanation unlikely for several reasons. First, a choice intervenes between the time of the fictive cue and the reward at the end of the next trial, which is itself probabilistic. The value of the subsequent reward is therefore not strictly predicted by the fictive cue. Second, the reward signal would have to skip the next salient/rewarding event (the reward on the present trial) and signal the subsequent one (the reward on the next trial); such a signal would be highly unusual and has not, to our knowledge, been observed in the ACC or any other brain area. Third, if fictive outcomes are perceived as reward-predicting cues, they should elicit faster reaction times and greater accuracy on the next trial (17). We did not observe these effects (P > 0.5 for both reaction time and accuracy, Student’s t test). Fourth, if the reward on the next trial is larger than the value cued by the fictive outcome on this trial, we should see positive deflections in the neuronal response. Similarly, if the reward on the next trial is smaller than the value cued on this trial, we should see negative deflections in the neuronal response. However, we did not observe any dependence of HV neuronal response on previous fictive value (P > 0.3, Fig. 4B). Collectively, these data indicate that the behavioral and physiological correlates of fictive rewards are not an artefactual consequence of simple extended reward associations.

In summary, the most parsimonious explanation for monkeys’ behavior in this task is that they recognize and respond to fictive outcomes, and responses of ACC neurons are sufficient to guide such fictive learning. Neural markers of fictive outcomes have so far been limited to non-invasive measures. Hemodynamic activity in the ventral caudate, which is connected with the ACC, reflects fictive learning signals (15), and ACC activity tracks the correlation between craving for cigarettes and fictive learning (11). The error-related negativity, an event-related potential component with a possible source in the ACC, tracks fictive outcomes (18). Here we show that the same neural circuit carries information about fictive outcomes in monkeys. Moreover, information about both experienced and fictive outcomes is encoded by the same neurons and is represented with the use of a similar coding scheme. The correlation between firing rate and behavior suggests that these neurons do not simply tag the incentive salience of a stimulus (19, 20), but also reflect neuronal processes that translate outcomes into behavior. Thus, the ACC may integrate information about obtained rewards [probably signaled by the dopamine system (21, 22)] with information about observed rewards [presumably computed in the prefrontal cortex (23)] to derive a model of the local reward environment in the near future. These findings are consistent with the idea that the ACC represents both real and fictive reward outcomes to dynamically guide changes in behavior (9, 2427). Such a mechanism may be crucial in complex social environments, where the behavior of others provides a rich supply of fictive information (15, 28).

Supporting Online Material

www.sciencemag.org/cgi/content/full/324/5929/948/DC1

Materials and Methods

SOM Text

Figs. S1 to S9

References

References and Notes

  1. Materials and methods are available as supporting material on Science Online.
  2. This work was supported by a postdoctoral fellowship to B.Y.H. National Institute on Drug Abuse (023338) a R01 to M.L.P., (National Eye Institute 013496), and the Duke Institute for Brain Studies (M.L.P.). We thank K. Watson for help in training the animals and S. Heilbronner for useful discussions on the tasks. The authors declare no conflicts of interest.
View Abstract

Stay Connected to Science

Navigate This Article