Report

Neuronal Activity Related to Reward Value and Motivation in Primate Frontal Cortex

See allHide authors and affiliations

Science  09 Apr 2004:
Vol. 304, Issue 5668, pp. 307-310
DOI: 10.1126/science.1093223

This article has a correction. Please see:

Abstract

In several areas of the macaque brain, neurons fire during delayed-response tasks at a rate determined by the value of the reward expected at the end of the trial. The activity of these neurons might be related to the value of the expected reward or to the degree of motivation induced by expectation of the reward. We describe results indicating that the nature of reward-dependent activity varies across areas. Neuronal activity in orbitofrontal cortex represents the value of the expected reward, whereas neuronal activity in premotor cortex reflects the degree of motivation.

In numerous areas of the brain extending from the limbic system to the motor system, neuronal activity varies according to the size of the reward for which a monkey is working (115). Reward-dependent activity commonly has been viewed as representing the value of the goal for which the monkey is working; however, it might alternatively be related to the monkey's degree of motivation. Anticipation of a more valued reward leads to stronger motivation, as evidenced by measures of arousal, attention, and intensity of motor output (1618).

On the assumption that motivated behavior depends on influences arising in the limbic system and acting on the motor system (19), we hypothesized that neuronal signals representing reward value predominate in the limbic system, whereas signals reflecting the degree of motivation predominate in the motor system. To test this hypothesis, we recorded from two areas in which neurons exhibit robust reward-related activity: the orbitofrontal division of limbic cortex (OF) (Fig. 1) and the postarcuate premotor cortex (PM) (Fig. 1). OF plays an important role in motivated behavior (2022). Its neurons respond to cues predicting the availability of foodstuffs at a rate determined by their appetitive or aversive value (1314). PM is a region of high-order motor cortex (2325). Its neurons fire during the delay period of an ocular delayed-response task at a rate determined by the direction of the impending saccade and by the size of the expected reward (9).

Fig. 1.

(A) Frontoparallel magnetic resonance (MR) image of the brain of monkey F. Orange circle marks the center of the recording zone in OF. All recording sites were within 2 mm of this location. This region has been shown to contain neurons sensitive to the value of a predicted reward (14). (B) Locations of OF and PM in a lateral view of the macaque cerebral hemisphere. All recording sites were in the left hemisphere. (C) Surface-parallel MR image of the brain of monkey F. Green circle marks the center of the recording zone in PM. All recording sites were within 4 mm of this location and were coincident with sites at which intracortical microstimulation elicited movements of the face and arm (9). This zone straddled the border between divisions of premotor cortex termed PMd and PMv (23) or F2 and F4 (24). AS, arcuate sulcus; PS, principal sulcus; CS, central sulcus.

To achieve a dissociation between activities dependent on reward value and on motivation, we recorded from single neurons while two monkeys performed a task in which the degree of motivation was controlled independently by the magnitude of the reward promised in the event of success and the magnitude of the penalty threatened in the event of failure (26). On each trial, two cues flashed at locations to the left and right of fixation (Fig. 2, A and B). One cue indicated the size of the reward that the monkey would receive upon executing a saccade to its location. The other cue indicated the size of the penalty that the monkey would incur upon executing a saccade to its location. After a delay, the monkey was allowed to make an eye movement to either location. Trials differed with respect to the location of the reward cue, the size of the promised reward (0.1 or 0.3 ml of juice), and the size of the threatened penalty [1 or 8 s “time-out” (aversive stimulus)].

Fig. 2.

(A) Sequence of events in the reward-penalty task. Hatched circle indicates direction of gaze. (B) Trials fell into three categories defined by reward-penalty combination: large reward (large reward and small penalty), neutral (small reward and small penalty), and large penalty (small reward and large penalty). Incentive cues were distinguished by color (table S1). (C to E) Performance measures sensitive to reward and penalty size. Penalty choice rate: trials on which the monkeys chose penalty expressed as a fraction of all trials on which they chose reward or penalty. Fixation break rate: trials terminated by a fixation break expressed as a percentage of all trials. Reaction time: average interval between fixation spot offset and saccade initiation on all trials in which the monkey made a saccade in the rewarded direction. Asterisks (all planned comparisons): statistically significant differences at P < 0.001.

The monkeys assessed the rewards and penalties at appropriate values. They selected the location associated with reward over the one associated with penalty on 98% of trials (Fig. 2C). With penalty held constant, they chose a large reward more often than a small one, indicating that the large reward had greater appetitive value (Fig. 2C). With reward held constant, they avoided a large penalty more often than a small one, indicating that the large penalty had greater aversive value (Fig. 2C). Traditional measures of motivation are based on the intensity (latency, magnitude, frequency, or probability) of behavior and its duration and persistence (16). Under both the large-reward and large-penalty conditions, the monkeys broke fixation less often than in the neutral condition (Fig. 2D), thus exhibiting enhanced persistence, and made faster behavioral responses than in the neutral condition (Fig. 2E), thus exhibiting reduced latency.

In this task, neurons sensitive to the degree of motivation should respond with a similar change in firing rate to increasing the size of either the promised reward or the threatened penalty. In contrast, neurons sensitive to value, although responsive to increasing reward size, should either (i) not respond to increasing the size of the threatened penalty (if their sole function is to monitor the value of the goal for which the monkey is working), or (ii) respond with a change in firing rate opposite to that induced by increasing reward size (if their function is to register the signed valence of the composite expectation encompassing both reward and penalty).

We recorded from 176 OF neurons (103 in monkey F and 73 in monkey P). In some neurons, the firing rate obviously depended on the size of the predicted reward and penalty. The neuron shown in Fig. 3A responded to the cue display with stronger firing when a larger reward was promised (large-reward versus neutral condition) and weaker firing when a larger penalty was threatened (large-penalty versus neutral condition). Thus, the strength of its response reflected the value conveyed by the combination of reward and penalty cues, not the motivational impact of the display.

Fig. 3.

Neuronal activity in OF reflects the value conveyed by the incentive cues. (A) Data from a single neuron firing during the cue period at a rate that was especially high for large reward and especially low for large penalty. (B) Mean firing rate as a function of time under the three incentive conditions for all 176 OF neurons. Data are combined across monkeys and response directions. In data broken down by monkey and response direction, the general pattern remained the same (figs. S1 to S3). (C) Distribution of reward indices for all neurons. The number of observations (n), mean of the distribution (μ), and level of significance at which it differed from zero (P) are shown. Pale bars represent neurons in which the dependence of firing rate on reward size achieved statistical significance (analysis of variance, P < 0.05). (D) Distribution of penalty indices. Conventions as in (C).

The same effects were present at the level of population activity in OF (Fig. 3B). We computed, for each neuron, indices reflecting the dependence of its firing rate on reward and penalty size during the 500-ms period when the cues were visible. The reward index (R – N)/(R + N), where R and N are the firing rates on large-reward and neutral trials, respectively, was positive in the case of any neuron firing more strongly when reward size increased. The distribution of reward indices (Fig. 3C) was shifted significantly above zero (sign test, n = 176, P < 0.0001). Among neurons exhibiting significant selectivity for reward size (pale bars in Fig. 3C) (table S2), those with a positive reward index were in the majority (34 to 9) by a highly significant margin (χ2 test, P < 0.0005). Thus, overall, OF neurons fired more strongly when the promised reward was larger. The penalty index (P – N)/(P + N), where P and N are the firing rates on large-penalty and neutral trials, respectively, was negative in the case of any neuron firing less strongly when penalty size increased. The distribution of penalty indices (Fig. 3D) was shifted significantly below zero (sign test, n = 176, P < 0.05). Among neurons exhibiting significant selectivity for penalty size (pale bars in Fig. 3D) (table S2), those with a negative penalty index were in the majority (17 to 6) by a significant margin (χ2 test, P < 0.05). Thus, overall, OF neurons fired less strongly when the threatened penalty was larger.

We also recorded from 135 PM neurons (65 in monkey F and 70 in monkey P). In some neurons, the firing rate clearly depended on the size of the promised reward and threatened penalty. The neuron shown in Fig. 4A fired continuously during the delay period between onset of the cues and execution of the saccade, maintaining a higher rate when either a large reward or a large penalty was at stake than under the neutral condition in which both reward and penalty were small. Thus, the rate at which it fired reflected the motivational impact of the incentive-cue display, not the value conveyed by the display.

Fig. 4.

Neuronal activity in PM reflects the motivational impact of the incentive cues. (A) Single neuron. (B) Population. (C and D) Reward and penalty indices based on delay-period firing rates. Other conventions as in Fig. 3.

The neuronal population in PM (Fig. 4B) exhibited a pattern of dependence on reward and penalty size similar to that displayed by the single neuron. The distribution of reward indices (Fig. 4C) was shifted above zero (sign test, n = 135, P < 0.0001). Among neurons exhibiting significant dependence on reward size (pale bars in Fig. 4C) (table S2), those firing more strongly under the large-reward condition outnumbered those firing more strongly under the neutral condition (50 to 6) by a highly significant margin (χ2 test, P < 0.0001). Thus, overall, PM neurons, like OF neurons, fired more strongly when the promised reward was larger. The distribution of penalty indices (Fig. 4D) was also shifted away from zero in a positive direction, indicating that the majority of neurons fired more strongly when the penalty was larger (sign test, n = 135, P < 0.001). Among neurons exhibiting significant dependence on penalty size (pale bars in Fig. 4D) (table S2), those more active under the large-penalty condition outnumbered those more active under the neutral condition (16 to 2) by a significant margin (χ2 test, P < 0.001). Thus, overall, PM neurons, unlike OF neurons, fired more strongly when the threatened penalty was larger.

It might be argued, in the framework of economic decision theory, that neuronal activity in PM, although it did not represent the value of the reward, nonetheless represented the utility of the action being planned (8, 27). The utility of making a saccade to the rewarded target was high under the large-reward condition because the monkey stood to receive a large gain, and was high under the large-penalty condition because the monkey stood to avoid a large loss. Two considerations militate against this view. First, monkeys performing the reward-penalty task were able to decide on a response by simply locating the reward cue; they were not required to base their decision on the utility of the saccade as determined by reward size and penalty size together. Second, reward- and penalty-dependent activity in PM long outlasted the decision process. Selection of a saccade occurred within a few hundred ms of incentive-cue onset, as evidenced by the emergence of direction-selective neuronal activity in PM (fig. S3, C and D). Reward- and penalty-dependent activity, in contrast, persisted unabated over several seconds until the end of the trial. Thus, incentive-dependent neuronal activity in PM does not represent utility in an economically meaningful sense (as a factor in a decision process). Its nature is adequately captured by the classic view that motor preparation and related processes including arousal and attention are subject to motivational modulation.

We conclude that predicting the reward to be delivered at the end of a trial sets in motion two distinct processes. One, manifest in OF, involves representing the value of the reward. The other, manifest in PM, involves maintaining a degree of motivation commensurate with the value of the reward. It is not known by what stages the representation of reward value in the limbic system is transformed into motivational modulation in the motor system, because the approach of independently manipulating reward size and penalty size has not yet been applied to intervening areas where neurons exhibit reward-related activity (28, 29).

Supporting Online Material

www.sciencemag.org/cgi/content/full/304/5668/307/DC1

Materials and Methods

SOM Text

Figs. S1 to S3

Tables S1 and S2

References and Notes

View Abstract

Navigate This Article