Research Article

Matching Behavior and the Representation of Value in the Parietal Cortex

See allHide authors and affiliations

Science  18 Jun 2004:
Vol. 304, Issue 5678, pp. 1782-1787
DOI: 10.1126/science.1094765

Abstract

Psychologists and economists have long appreciated the contribution of reward history and expectation to decision-making. Yet we know little about how specific histories of choice and reward lead to an internal representation of the “value” of possible actions. We approached this problem through an integrated application of behavioral, computational, and physiological techniques. Monkeys were placed in a dynamic foraging environment in which they had to track the changing values of alternative choices through time. In this context, the monkeys' foraging behavior provided a window into their subjective valuation. We found that a simple model based on reward history can duplicate this behavior and that neurons in the parietal cortex represent the relative value of competing actions predicted by this model.

Natural environments are characterized by uncertainty in both the sources and timing of rewards (1). Humans and other animals are sensitive to these variables and adapt the statistics of their foraging behavior to those of the environment (14). Specifically, animals distribute their time among foraging sites in proportion to their relative value (5), i.e., the relative abundance of resources at each site. This phenomenon, called matching behavior, was studied experimentally by Herrnstein, who expanded it into a general principle of choice that he termed the matching law (69). Stated mathematically, the matching law asserts that the fraction of choices made to any option will exactly match the fraction of total income (i.e., total rewards) earned from that option, or Embedded Image where Ik and Ck are the total income earned and total choices on option k, respectively, and the summations are over all available options.

To match behavior to income, animals must integrate rewards earned from particular behaviors, and the brain, in turn, must maintain an appropriate representation of the value [i.e., reward frequency (5)] of competing alternatives. Matching provides a behavioral readout of this internal representation. By studying matching in the context of visually based eye movement behavior, we aim to leverage our knowledge of the anatomy and physiology of the primate visual and oculomotor systems to investigate how value is represented at the neural level. For this purpose, we trained rhesus monkeys (Macacca mulatta) to perform a dynamic version of a classical matching task in which saccadic eye movements to a pair of competing visual targets are rewarded at different rates (Fig. 1A) (10).

Fig. 1.

Matching behavior in monkeys. (A) The sequence of events of an oculomotor matching task: (i) Fixate. To begin a run of trials, the animal must fixate the central cross. (ii) Delay. Saccade targets appear (randomized spatially by color) in opposite hemifields while the animal maintains fixation. (iii) Go. Dimming of the fixation cross cues a saccadic response and hold. (iv) Return. Brightening of the fixation cross cues return, target colors are then rerandomized, and the delay period of the next trial begins. Reward is delivered at the time of the response, if at all. Overall maximum reward rate is set at 0.15 rewards per second. Relative reward rates changed in blocks (∼100 to 200 trials) without warning; ratios of reward rates were chosen unpredictably from the set {8:1, 6:1, 3:1, 1:1}. (B) Dynamic matching behavior. Representative behavior of Monkey G during a single session. Continuous blue curve shows cumulative choices of the red and green targets. Black lines show average ratio of incomes (red:green) within each block (here, 1:1, 1:3, 3:1, 1:1, 1:6, and 6:1). Matching predicts that the blue and black curves are parallel. (C) Slope space. Same data as in (B), plotted to allow visualization of ongoing covariation in local ratios of income and choice. The x axis shows session time (in choices). The y axis shows running estimates of the ratios of income (black) or choice (blue). Ratios were computed after smoothing the series of rewards or choices with a causal half-Guassian kernel (SD of six choices) and are expressed as slopes (arctangent of ratio). Thick horizontal black and blue lines indicate average income and choice ratios within each block. Red asterisks highlight example regions where the choice ratio obviously tracks local noise in the experienced ratio of incomes.

A dynamic foraging task. On each trial in this task, the monkey is free to choose between two targets; the color of each target cues the probability that its selection with an eye movement will be rewarded with a drop of juice. Analogous to natural environments, rewards in this task are assigned to the two colors at rates that are independent and stochastic (Poisson probability distribution). Once assigned, a reward remains available until the associated color is chosen (11). This persistence of assigned rewards means that the likelihood of being rewarded increases with the time since a color was last chosen, and ensures that matching approximates the optimal probabilistic strategy in this task (1214).

Figure 1B depicts representative behavioral data from a single session in which a monkey experienced a series of six different ratios of reward rates. Two features of these data are notable. First, the blue line generally parallels the black, indicating that the monkey indeed matched the ratio of its choices to the ratio of incomes from the two colors, as predicted by the matching law. Second, the monkey appears to adjust its behavior very rapidly to unsignaled changes in the rates of reward.

The income ratios indicated by the black lines in Fig. 1B represent mean reward rates and obscure the stochastic manner in which rewards become available in the task. We can visualize this variability by plotting instantaneous estimates of choice and income ratios (Fig. 1C). These estimates suggest that the relationship between choices and experienced rewards is highly local in time. This is evident both at the transitions between income ratios, when behavior lawfully and rapidly adjusts to unsignaled changes in the rates of reward, and within blocks, when choices track local variability in the experienced income ratio (red asterisks). If behavior were based on a representation of reward history that extended into the distant past, it would be incapable of such rapid adjustment (15).

Traditionally applied to foraging in stationary environments (for which reward rates do not change), the matching law relates cumulative choice to total experienced income and is intrinsically a global description of behavior averaged over long periods of time. The data in Fig. 1, B and C, confirm an earlier study of rats by Gallistel and colleagues (16), in which they observed that animals accustomed to dynamic environments can match under such conditions. Our data further suggest that this behavior is driven by a process that is intrinsically local in time. These results prompt us to ask whether the classical matching law can be reformulated as a more local description and whether this description can explain the behavior that we observed.

A local formulation of matching. Income earned during a behavioral session is simply the integrated reward stream that an animal has experienced (Fig. 2A). In the traditional matching law, each new reward contributes equally to the income attributed to a particular option without discount or decay. The fractional income for a particular option (the income from that option divided by the total income from all available options) then dictates the proportion of choices allocated to it. But what if our integrator were not perfect, but somewhat leaky (Fig. 2B)? This leak would confer a finite effective memory on estimates of income, making them local rather than global. In this model, the local fractional income translates directly into the instantaneous probability of choice for a given option (17). Importantly, this proposed local matching rule obeys the correspondence principle: When limited to large data sets and stationary environments (where matching has been most extensively documented), the predictions of our local matching rule approximate those of the classical global matching law. We show below that the leaky integration model is surprisingly successful at describing behavior in our dynamic task.

Fig. 2.

A model of dynamic matching behavior. (A) Equation (top) shows a restatement of the classical global matching law, relating fractional income to fractional choice (stated here in terms of the red target). Schematic (bottom) shows that in global matching, cumulative income, I, is computed by perfect integration of the stream of rewards up to the current time, t. (B) Equation (top) shows a local implementation of the matching law, relating local fractional income to instantaneous probability of choice, pc. Schematic (bottom) shows that local income, Î, is computed with the use of a leaky integrator with time constant τ. In practice, the monkey's history of choices and rewards on each color was represented as a vector of 1's and 0's, indicating rewarded and unrewarded choices, respectively. The individual reward histories were then convolved with the corresponding exponential filter to compute the local income for each color. In (C) to (F), monkey behavior is illustrated in blue and the behavior of the model in black. (C) Percentage of available rewards collected as a function of τ (in choices). Model performance (thick black curve; gray bands indicate standard error) is based on simulations of 250,000 trials on block sequences identical to those presented to the animals. Bounds for chance and idealized strategies are shown for reference (horizontal black lines). For the behavior of each monkey, blue circles show performance and best-fit values of τ with standard errors (vertical and horizontal lines, respectively). (D) Behavior of Monkey G and of the best-performing model (τ = 9) for the same single experiment shown in Fig. 1. Circles indicate block transitions. (E) Probability of choosing the red target plotted as a function of the local fractional income from red. The unity line corresponds to idealized behavior of the model. For the monkeys, the best-fit τ was used to compute fractional income, and probability of choice with standard error (small bars within the circles) was plotted for each of 10 equally populated fractional income bins. (F) Relative frequencies of stays of different duration for monkeys (combined) and model (τ = 9). A stay corresponds to a series of successive choices to one color.

Postulating a process of leaky integration marks a conceptual shift from the parameterless matching law, appropriate for stationary reward conditions, to a one-parameter model of matching behavior appropriate for dynamic conditions. The single parameter in this simple model is the time constant τ of the leaky integrator. How do changing values of τ affect the model's behavior? Intuitively, higher values of τ mean slower leaks and would give rise to more stable and accurate estimates of income. The cost of such reliable estimates is that they respond sluggishly to changes in the environment. Conversely, lower values of τ produce estimates of income that respond quickly to change, but are substantially noisier during periods of stability. Given this trade-off between accuracy and adaptability, what value of τ yields the highest income given the statistics of our task?

To answer this question, we simulated the behavior of the model on our task and examined how its performance varied as a function of the integration time constant τ. Each simulation consisted of a quarter of a million choices made by a model with a particular τ across the identical sequence of reward-rate ratios and block lengths encountered by our monkeys. The thick black curve in Fig. 2C plots the outcome of these simulations in terms of foraging efficiency (the percentage of the maximum reward rate achieved). We also plot realistic bounds for performance imposed by the structure of the task. The upper bound demarcates the average performance of an ideal probabilistic forager. This hypothetical ideal strategy “knows” the reward rate of each option, thereby dispensing with the estimation process, and uses this information to make choices that maximize its expected rate of reward. In contrast, the lower bound shows the average performance of a completely random foraging strategy and represents chance performance in our task. Despite its simplicity, the best-performing leaky integrator model does well relative to these bounds, collecting 93% of the rewards attained by the ideal clairvoyant strategy.

How does our model compare to the choices of real biological players? Our monkeys' behavior, indicated by the blue circles on the same panel, corresponds well with the predictions of the model. We estimated τ for each monkey by minimizing the mean squared error between the probability of choice predicted by the model and the animal's actual binary choices across all experiments. For each monkey, this best-fit τ lies within a standard error of the best-performing model. Foraging efficiency was estimated as the percentage of the maximum reward rate achieved by each monkey across all experiments. These performance levels fall just below that of the model with a similar time constant, an unsurprising outcome given that the monkeys, unlike the model, are susceptible to variables such as distraction and satiation.

The next three panels of Fig. 2 further explore the similarity between the behavior of the best model and that of our monkeys. Figure 2D shows the cumulative responses of the best model (τ = nine choices) across the same series of blocks shown in Fig. 1C for Monkey G. Qualitatively, the model exhibits dynamic matching behavior that is very similar to that of the animal. The next two panels (Fig. 2, E and F) reinforce this impression with more quantitative comparisons. First, the model predicts that the probability of choosing red will vary linearly with the local fractional income from red (the unity line in Fig. 2E). Figure 2E shows this to be approximately true for the behavioral data. Second, because the model is strictly probabilistic, it predicts that the number of successive trials on which a player (monkey or model) will choose a given color before switching will be distributed as the average of a family of exponentials. Figure 2F plots these distributions of stay durations; not only is the monkeys' distribution exponential, but it is almost an exact fit to that of the model with the best-performing τ (18).

These similarities in qualitative behavior, foraging performance, fitted τ, and simple statistics demonstrate that our local matching rule is an adequate descriptive model of real choice behavior in this dynamic foraging task (19). Moreover, they suggest that our animals have tuned the time over which they integrate reward information to be optimal for the particular statistics of the task they encountered. The importance of this modeling effort goes beyond its utility in understanding behavior. The model provides us a window into the animal's internal valuation of available options and gives us a metric—local fractional income—that allows us to estimate how the monkey values each of the two colors on every trial, even before it renders a decision. Equipped with this quantitative trial-by-trial measure, we are poised to explore how value is represented in the brain.

The representation of fractional income in the parietal cortex. The lateral intraparietal (LIP) area of the posterior parietal cortex contains activity appropriate for guiding saccadic eye movements, signals that have been variously interpreted as working memory for visual targets, attention to salient spatial locations, or motor planning (2023). In the context of more sophisticated eye movement tasks, investigators have documented the modulation of LIP activity by the strength of sensory evidence that supports a perceptual judgment (2426) and by both the prior probability that a particular movement is instructed and the volume of juice associated with that movement (27). Such encoding of information from diverse sources is a proposed property of brain areas responsible for computing putative decision variables that link sensory information to motor responses (28). If this suggestion is correct, and LIP is indeed an important locus for oculomotor decisions, then in a setting where eye movement decisions are informed by reward history and expectation, we anticipate the appropriate decision variable to be represented in LIP. Accordingly, the following physiological experiments test the prediction that in the matching task, neurons in LIP encode the local fractional income (Fig. 2B) of competing target colors.

We selected for study LIP neurons that showed sustained, spatially selective activity in the context of a classical delayed saccade task (Fig. 3A). These neurons respond only when targets are presented within a circumscribed region of the visual field termed the cell's response field (RF). Approximately one-third of the cells that we encountered in LIP met this criterion, including 33 neurons from the left hemisphere of Monkey G and 29 from the right hemisphere of Monkey F.

Fig. 3.

Activity of an LIP neuron during the delayed saccade task. (A) Delayed saccade task used for cell selection. Only a single purple target is presented on each trial at one of a variety of spatial locations. The sequence of events is otherwise identical to Fig. 1, and rewards were delivered at the same overall rate used in the matching task. Dotted blue oval represents LIP response field (RF). (B) Response histograms of an example cell for trials into (blue) and out of (green) this cell's RF demonstrate classical LIP spatial selectivity. Activity is aligned on both the appearance of the visual target (left) and the time of the saccade (right). The break in the time axis reflects this dual alignment for trials of variable length. spks/sec, spikes per second. (C) Activity during the delayed saccade task shows no dependence on recent reward history. For trials into (blue) and out of (green) the cell's RF, average delay-period activity is plotted against the local income resulting from the single purple target. Local income was estimated by filtering reward history with a local exponential with the same best-fit τ that was used to compute fractional income in Fig. 2.

Figure 4A illustrates how we studied these 62 LIP neurons in the matching context. Critically, in this setting, trials that shared an identical visual stimulus configuration and ended in the same motor response still varied widely in the local fractional income of the chosen target. Thus, on some trials the monkey chose the target inside the cell's RF and this target had a high fractional income, whereas on other trials the fractional income was much lower. Our experimental question was whether, within each category of motor response, activity in LIP is influenced by the local fractional income of the chosen target.

Fig. 4.

Activity in LIP during the matching task. (A) Task geometry used in the matching task. Dotted blue oval represents RF of the LIP cell under study. Color-location association was randomized between trials. (B) Representative matching data from the same example LIP neuron shown in Fig. 3. For each trial, mean delay-period activity is plotted as a function of the local fractional income of the chosen target. Blue and green indicate choices into and out of the RF respectively. Lines are least squares regressions fit to the corresponding points (blue: slope = 11.4, r = 0.4, P < 0.001; green: slope = –19.9, r = 0.58, P < 0.001). (C) Distribution of slopes for regression of firing rate on local fractional income across our population of 62 LIP neurons. Separate distributions show effect for choices into (upper, blue) and out of (lower, green) each cell's RF; 95% confidence intervals for means of these distributions are 2.1 to 4.5 and –6.6 to –4.0, respectively. Filled bars highlight regressions that are significant at the 0.05 level. Asterisks indicate the example cell.

Figure 4B shows representative data from the same cell featured in Fig. 3, now recorded during performance of the matching task. For each trial, the cell's mean delay-period response is plotted against the local fractional income of the chosen target. Activity is shown separately for trials that end in saccades into (blue) and out of (green) the cell's RF. We observed a positive correlation between firing rate and fractional income for choices into the RF and a negative correlation for choices out of the RF. The solid lines are regressions fit to these two sets of data by the method of least squares and are characterized by positive and negative slopes for choices into and out of the RF, respectively. When the fractional income of the chosen color is low, the clouds of blue and green points overlap, indicating that the activity of this particular cell is no longer a reliable indicator of the direction of the monkey's saccade at the end of the trial. This result is particularly notable given that this cell was chosen for its spatial selectivity in the delayed saccade task (Fig. 3, A and B).

To see how the effect of fractional income varies across our population of LIP cells, we performed this regression analysis for each neuron in our sample. Figure 4C shows the resulting regression slopes. The upper histogram (blue) is the distribution of slopes for choices into each cell's RF. Consistent with the example in Fig. 4B, this distribution is centered to the right of zero, indicating positive regressions of activity on fractional income. The lower histogram (green) is the analogous distribution of slopes for choices out of each cell's RF. Again, in keeping with the example, this distribution is centered to the left of zero, indicating negative regressions of activity on fractional income. Importantly, in the delayed saccade task, none of these neurons showed any influence of recent reward history as described by the local income from the lone response target (Fig. 3C).

The preceding analysis assumes that all eye movements to a particular target are effectively equivalent. To control for the possibility that subtle variations in the precise metrics of saccades to the same target location might cause changes in LIP firing rates, we expanded our regression model of each cell's response to include a range of saccade metrics as co-regressors (29). If our results actually reflect a subtle effect of saccade metrics, explicit inclusion of these factors should nullify the apparent influence of fractional income. Instead, the 43 cells that showed a significant effect of fractional income continued to do so after the inclusion of these co-regressors (95% confidence interval for the fractional income coefficient still excluded zero), and the magnitude of this effect was largely unchanged (average decrease in effect size = 14%).

To examine the time course of the effect of fractional income across the population, we peak-normalized the firing rates of the 43 cells that showed a significant regression effect and computed the average time course of the cells' response as a function of fractional income. Figure 5 plots these average normalized rates for the population. Two important points emerge from this analysis. First, the effect of fractional income is not apparent at the beginning of the trial, but emerges over time. Second, activity remains graded with respect to fractional income up to the time of the saccade itself, irrespective of whether this saccade is directed into (blue) or out of (green) the cell's RF. This suggests that this population of LIP neurons encodes information about the value of locations in space, whether or not they are the endpoint of the impending saccade.

Fig. 5.

Time course of the effect of local fractional income. Response histograms show average peak-normalized firing rates for 43 cells with activity that regressed significantly on fractional income. Before normalization and averaging, raw spike trains were smoothed with a Gaussian kernel (SD = 20 ms). Activity is aligned on both the appearance of the visual target (left), and the time of the saccade (right), and is shown separately for choices into (blue) and out of (green) the cell's RF. Trials are further subdivided into four groups according to the local fractional income of the chosen target: solid thick lines, 0.75 to 1.0; solid medium lines, 0.5 to 0.75; solid thin lines, 0.25 to 0.5; dotted thin lines, 0 to 0.25.

Discussion and conclusions. Matching belongs to a class of behaviors purported to engage cognitive mechanisms that animals use when competing for resources in stochastic environments. Because matching results in an equilibrium state in which returns from competing behaviors are equalized, it represents a stable and effective foraging strategy from both an evolutionary and game theoretic perspective.

Somewhat surprisingly, we find that matching behavior in a dynamic context is well described by a simple local reformulation of the classical matching law. This local matching rule uses leaky integrators of rewards to estimate the local income earned from competing behaviors and sets the instantaneous probability of choosing an alternative equal to its local fractional income. This simple model has only one tuned parameter: the decay constant (τ) of the integrators. Intriguingly, we found that the specific values of τ used by our animals were optimally tuned for the statistics of the environment they encountered in this task. By manipulating overall reward rate and the dynamics with which rates change, future experiments may address whether animals can flexibly adjust the time scale of their integration to maintain this optimality.

Previous studies have documented reward- and value-related signals in numerous cortical and subcortical areas [reviewed in (30)], but primarily in the context of imperative tasks where behavioral responses are directly instructed or conditioned. We suggest that elucidating the functional roles of these signals will require studying them in settings where value itself is the primary determinant of behavior. Our current work marks an initial step in this direction, as does ongoing work in other laboratories (31, 32). Interpreting neural activity in such “free choice” contexts necessitates a further methodological shift from correlations with directly accessible sensory or behavioral events to quantitative modeling of the ostensibly “hidden” variables that link experience to action. During performance of the matching task, we found that the activity of single LIP neurons parametrically encoded trial-to-trial fluctuations in the pertinent decision variable: local fractional income. This result supports the suggestion that area LIP plays a role in implementing oculomotor decisions and extends the findings of previous studies of LIP activity in the context of visual motion discrimination tasks (2426) to the realm of value-based choice.

Is local fractional income actually computed in LIP? Although activity in area LIP is correlated with this value metric, it is unlikely the primary locus where fractional income is computed and maintained. A population of neurons whose activity directly encoded value should do so in terms of the relevant value cue (in this case, color, not space) and maintain that representation across an appropriate time scale (in this case, several trials) (Fig. 2B). In contrast, income-related signals in LIP are spatially mapped and are “reset” at the start of each trial, developing anew over the first several hundred milliseconds (Fig. 5). An important direction for future research will be to identify where value is first explicitly encoded in the brain and how this representation is conferred with a temporal profile appropriate for optimal behavior.

Rather than computing value, we suggest that area LIP plays a critical role in remapping abstract valuation to concrete action. This remapping is demanded by the logic of our task: On every trial the monkey must transform a color-based representation of value into a spatial eye-movement plan. By representing value in spatial terms, LIP may contribute to this transformation and directly influence the probability that a particular region of space will serve as the endpoint of the next saccade. This interpretation is consistent with the unifying proposal that area LIP functions as a saliency map of visual space (33) of the type invoked in visual psychophysics or computational vision (34, 35), capable of flexibly combining and representing a variety of information for the purpose of guiding eye movements or shifts in visual attention.

Supporting Online Material

www.sciencemag.org/cgi/content/full/304/5678/1782/DC1

Materials and Methods

Fig. S1

References and Notes

References and Notes

View Abstract

Navigate This Article