Report

Neuronal Correlates of Goal-Based Motor Selection in the Prefrontal Cortex

See allHide authors and affiliations

Science  11 Jul 2003:
Vol. 301, Issue 5630, pp. 229-232
DOI: 10.1126/science.1084204

Abstract

Choosing an action that leads to a desired goal requires an understanding of the linkages between actions and their outcomes. We investigated neural mechanisms of such goal-based action selection. We trained monkeys on a task in which the relation between visual cues, action types, and reward conditions changed regularly, such that the monkeys selected their actions based on anticipated reward conditions. A significant number of neurons in the medial prefrontal cortex were activated, after cue presentation and before motor execution, only by particular action-reward combinations. This prefrontal activity is likely to underlie goal-based action selection.

During purposeful behavior, a goal may first come to mind, requiring the retrieval of the action that will realize this goal as its outcome. Alternatively, sensory cues may activate multiple action plans, one of which has to be selected by its linkage to the current goal (1, 2). The neural mechanisms of such goal-based action selection have not been studied systematically, although studies with monkeys (311) and humans (1215) have demonstrated that activity in the prefrontal cortex (PFC) conveys information about rewards or goals. We trained monkeys on a task in which they were able to select an action based on goal, and we recorded the activity of cells from the medial and lateral parts of PFC. These two parts of PFC are thought to play essential roles in cognitive control of behavior (1621).

The task was a visually cued, asymmetrically rewarded GO/NO-GO task with reversals (Fig. 1A) (21). One of two visual cues was presented, and after a delay the monkey either performed a GO response (pulling and returning the joystick) or a NO-GO response (holding the joystick) (noted as NG), depending on the cue. After another delay, a liquid reward was provided only after correct GO responses or correct NG responses (noted as reward +). Fixation was required throughout the trial. Within a single block of trials, the relations between visual, motor, and reward variables were fixed, and only two visual-motor-reward (VMR) combinations were provided. Between blocks, the combinations were changed by reversing the visual-motor or visual-reward relation. All eight possible VMR combinations (Fig. 1B) were covered in four blocks of trials, which were run while we recorded the activity of a single cell in PFC, so that the effect of each variable on neuron activity could be distinguished. After 4 to 6 months of training on reversals (22), the monkeys were able to reattain a high performance level rather quickly, usually with fewer than eight errors, after a reversal of motor requirement or reward condition.

Fig. 1.

Behavioral task and recording sites. (A) Behavioral task. (B) The eight visual-motor-reward combinations in four blocks. The order of blocks was pseudo-random. (C) The recording sites are shown by hatched areas (22).

The monkeys selected motor responses based on the anticipation of reward conditions, as opposed to exclusively using stimulus-motor associations. The linkages between visual stimuli and reward conditions, between motor responses and reward conditions, and between stimuli and motor responses, were fixed within a block, so it is conceivable that the monkeys memorized these linkages over the course of a block of trials. Therefore, the monkeys may have selected their motor responses using stimulus-motor associations alone. However, they may also have selected correct motor responses using memory of stimulus-reward and motor-reward linkages (23). Behavioral data showed that the anticipation of reward condition played an essential role in selection of motor response. The monkeys were more likely to break eye-fixation requirements in reward – trials than in reward + trials (Fig. 2A). Presumably, the monkeys correctly anticipated the reward condition in individual trials and made more efforts to fixate in reward + trials (7, 9). Moreover, in a probe test in which we suddenly changed the reward schedule from asymmetrical to symmetrical (both correct GO and NG were rewarded) without changes in visual-motor linkages, the performance of the monkeys dropped to 60 to 65% and stayed there until we returned the task to the asymmetrical condition in most cases (Fig. 2B, red lines). In this condition, the memory of visual-motor linkages was still relevant in selecting the correct motor response, whereas visual-reward and motor-reward linkages were no longer useful, because reward + was associated with both cues and both motor responses. It is unlikely that the monkeys generally reduced their efforts to perform the task in the probe blocks, because the frequency of fixation breaks in the probe blocks was comparable to that of rewarded trials in the ordinary blocks (fig. S1).

Fig. 2.

Behavioral evidence for motor selection based on the anticipation of reward conditions. (A) Proportion of trials with fixation breaks in all trials in reward – (open bars) and reward + (solid bars) conditions. The differences are statistically significant in each monkey (P < 0.001, by Mann-Whitney U-test). (B) The proportion of correct responses as a function of trial number before and after the introduction of symmetrical rewards (probe test, red lines) or the reward reversal (black lines). The proportion was calculated for three consecutive trials and averaged over eight (Monkey 1) or nine (Monkey 2) probe tests. For asymmetrical reward conditions, the proportion was averaged over 10 reversals (in both monkeys) performed on the same days with the same visual cues as the probe tests. The vertical line segments indicate the SEM. The data from Monkey 3 are not shown, because only few probe tests were provided to the monkey.

Many PFC cells showed responses representing anticipated reward conditions immediately after the onset of visual cue presentation. Responses of the cell in Fig. 3A occurred only in trials in which a reward + was expected, regardless of the visual cue and regardless of the motor response required. To quantify this effect at the population level, we performed a three-way analysis of variance (ANOVA) (with visual cues, motor responses, and reward conditions as factors; with unequal replication) on the activity of each neuron in two different time periods: the first being 100 to 400 ms after the onset of visual cue presentation (“early cue window”) and the second being the 500 ms before fixation target dimming, which triggered the motor response (“latest delay window”). About one-quarter of cells in both the medial PFC (35 of 141 cells) and lateral PFC (11 of 44 cells) showed significant (P < 0.05) main effects of the reward factor, but no visual and motor main effects, in their responses during the early cue window (red segments in Fig. 3E, upper). Cells responding to reward + and those responding to reward – were equally common (24). Cells with activity sensitive only to the anticipated reward condition during the latest delay window were also common in both PFC regions (red areas in Fig. 3E, lower) (25).

Fig. 3.

Activity of PFC cells. (A to D) The time courses of activities of four PFC cells, averaged over 16 to 27 trials for each VMR condition. The activities in four blocks are superimposed for each visual cue shown at the upper left of individual graphs. The activity was aligned to the cue presentation (underline) and fixation point dimming (triangle). The time axis is broken at the middle of the delay period, because the duration of the delay period varied across trials. (A) to (C) Medial PFC cells. (D) Lateral PFC cell. (E) Proportion of cells classified according to main effects of three-way ANOVA. The numbers indicate the percentages of individual cell types. (F) The averaged, normalized magnitudes of activities of motor-reward dependent cells in four combinations of preferred and nonpreferred motor and reward conditions. The horizontal line segments at the right indicate the spontaneous activity level estimated by the average firing rate. Cells were selected by the ANOVA result independently for the cue and delay periods and activities in corresponding periods were calculated. (G) The time courses of averaged discharge differences of the motor-reward–dependent cells (MR cells), reward-dependent cells (R cells), and motor-dependent cells (M cells).

In order to select a motor response based on the anticipated reward condition, there needs to be a representation of the linkage between intended motor responses and anticipated reward conditions. We found cells in medial PFC whose responses represented such motor-reward combinations during both cue presentation and delay periods (Fig. 3, B and C). The cell in Fig. 3B showed transient responses after cue presentation, which occurred only in the trials in which a NG response was required and reward + was expected, regardless of the visual cue. The cell in Fig. 3C also discharged specifically to a combination of motor response and reward condition (NG and reward –, in this case), but the time course of the response was different. The discharge started after the cue onset and continued until the time that the motor response was triggered. Sixteen of the 141 medial PFC cells showed significant (P < 0.05) main effects of both motor and reward factors, but no significant visual main effects, in the early cue window and 26 medial PFC cells showed similar motor-reward dependency in the latest delay window (purple segments in Fig. 3E, left). On the basis of an analysis of the spontaneous activity of the cells, we determined that this specificity of activity was unlikely due to block-to-block changes in general excitability (26). The proportion of such motor-reward cells was significantly larger than both the proportion of cells showing significant visual and motor, but no reward, main effects (blue-green segments in Fig. 3E, left), as well as the proportion of cells showing significant visual and reward, but no motor, main effects (orange segments in Fig. 3E, left). This was true during both the early cue and latest delay windows (both P < 0.01, by Fisher test).

In these motor-reward cells, only the best motor-reward combination yielded a large response, and all the other responses were small in comparison (Fig. 3, B and C). This is further demonstrated in the averaged normalized responses, averaged over all the motor-reward cells, to the four motor-reward combinations (Fig. 3F). About equal numbers of medial PFC cells preferred each of the four motor-reward combinations (27).

Cells that preferred particular combinations of motor response and reward condition were also found in the lateral PFC, but in these cells the motor-reward–dependent discharges were not prominent in the early cue-presentation period. The discharge rate of the cell in Fig. 3D began increasing before visual cue presentation and dropped immediately after the cue onset for all VMR combinations. Such gradually growing discharges before the cue onset were found in both medial and lateral PFC (although not in the particular cells shown in Fig. 3, A to C). Later on in the cue presentation period, the discharge showed complex features, but then became gradually more specific to NG and reward – toward the time of motor response. Seven of the 44 lateral PFC cells showed both significant motor and reward main effects but no significant visual main effects in the latest delay window (28), whereas there were no cells displaying such motor-reward dependency during the early cue window (Fig. 3E, right). There was a significant difference between the medial and lateral regions in terms of the proportion of cells with the motor-reward dependency during the early cue window (P = 0.01, by Fisher test), whereas there were no such differences between the two areas during the latest delay window (P = 0.70, by chi-square test) (29).

The discharges in the medial PFC specific to particular motor-reward combinations in the cue-presentation period may be essential for the goal-based motor selection, because they occur at the earliest possible time in the trial when the monkeys could choose the appropriate motor response for that particular trial. These motor-reward–dependent discharges increased dramatically after the onset of reward-dependent discharges, with only a short lag (20 to 30 ms) (Fig. 3G). There was also a population of medial cells whose discharge was modulated by the intended motor response (blue segment in Fig. 3E, upper left). However, their discharge differences (30) were much smaller than those of the motor-reward cells and reward cells (Fig. 3G). The time-averaged magnitude of discharge differences of the motor-reward cells and reward cells during the early cue window was significantly greater than that of the motor cells (P < 0.01 and P < 0.05, respectively, by Mann-Whitney U-test). The information about anticipated reward conditions and about their relation with intended motor responses are predominantly represented in the medial PFC during the cue-presentation period (figs. S4 to S7).

The medial PFC cells with motor-reward–dependent cue-period discharges also tended to show transient increases in firing rate at the beginning of individual blocks while the monkeys were learning the new VMR combination but before they had attained a high performance level. More than half of the cells discharged at a significantly higher rate in the early-cue window during learning for at least some VMR conditions (including cells, shown in red in Fig. 4, that discharged at a significantly higher rate under some VMR conditions and those, shown in purple in Fig. 4, that discharged at a significantly higher rate under some VMR conditions but significantly lower rate under other VMR conditions). This proportion was significantly larger than the proportion among the remaining medial PFC cells (P < 0.05, chi-square test) and that in the lateral PFC (P < 0.001, by Fisher test).

Fig. 4.

Cue-period activity during the learning phase of blocks. Proportion of cells that showed higher (red), lower (blue), and higher and lower discharges under different VMR conditions (purple), and those that did not show significant modulations in the learning phase of blocks (black). The populations are separated into medial PFC cells with motor-reward dependent cue-period activity (left), all other medial PFC cells (middle), and all lateral PFC cells (right).

We found two types of reward-related activity immediately after the onset of visual cue presentation: representation of anticipated reward conditions in both the medial and lateral PFC, and representations of particular combinations of motor responses and reward conditions in the medial PFC only (31). We propose that there is a serial, causal relation between these two types of activity and that they constitute neural correlates of motor selection based on the anticipation of reward conditions. When a visual cue was presented to the monkey, the activity representing anticipation of a particular reward condition (reward + or –) occurred, which in turn induced the activity representing a particular motor-reward combination. This selectivity of excitatory connections from reward-dependent cells to motor-reward–dependent cells may have quickly developed at the beginning of individual blocks. The medial PFC may be especially important for motor selection based on the anticipation of reward conditions, because the motor-reward combinatorial representation appeared first in each trial in medial PFC cells. Lesions of the medial PFC impair reward-based selection of motor response (32), although it is not clear whether in this previous study the selection was based on the experienced or anticipated reward. We also found that medial PFC cells showed higher levels of activity in the learning phase at the beginning of individual blocks, which is consistent with a previous study (33). During learning, the neural circuitry has to adapt to the new VMR contingency, so representations of all different motor-reward combinations may be activated simultaneously as potential action-outcome relations for that block. The activity representing the appropriate combination (that yields the correct solution, according to the task contingency in that block) has to be stronger in order to suppress activity representing other combinations and win the competition.

Cells in the cingulate motor area are highly active before the monkey makes a change in movement type instructed by a decrease of reward amount (34). These neurons are selective for the direction of movement change. Thus, these discharges can be considered to represent combinations of motor and reward information, with reward information being interpreted either as the experienced reduction or as the anticipated full amount. The medial recording region of our study was located rostral to the cingulate motor area with no or minimal overlap. The action-outcome relation may be commonly represented in these medial frontal regions. It has been proposed that the human medial PFC contributes to cognitive control of behavior by monitoring the conflict of responses (20, 21, 35). In the tasks used in these experiments, a sensory stimulus activates multiple action plans, one of which has to be selected in order to achieve the goal. Information about action-outcome relations can be used to solve the conflicts between motor responses. We suggest that the medial PFC contains memory for the action-outcome relation and either evokes an action plan that will achieve the intended goal or selects one out of multiple attempted actions based on the current task contingency (36).

Supporting Online Material

www.sciencemag.org/cgi/content/full/301/5630/229/DC1

Methods

SOM Text

Figs. S1 to S7

References

References and Notes

View Abstract

Navigate This Article