Report

Orbitofrontal Cortex Supports Behavior and Learning Using Inferred But Not Cached Values

See allHide authors and affiliations

Science  16 Nov 2012:
Vol. 338, Issue 6109, pp. 953-956
DOI: 10.1126/science.1227489

Experience Versus Models

There is an ongoing debate over what the orbitofrontal cortex contributes to behavior, learning, and decision-making. Jones et al. (p. 953) found that the orbitofrontal cortex was important for value-based computations when value must be inferred from an associative model of the task but not when value estimates based on previous experience are sufficient. This result calls into question the assumption that this region simply signals economic value. However, it would be consistent with a concept of the orbitofrontal cortex as being important for constructing model-based representations of the world that are orthogonal to value.

Abstract

Computational and learning theory models propose that behavioral control reflects value that is both cached (computed and stored during previous experience) and inferred (estimated on the fly on the basis of knowledge of the causal structure of the environment). The latter is thought to depend on the orbitofrontal cortex. Yet some accounts propose that the orbitofrontal cortex contributes to behavior by signaling “economic” value, regardless of the associative basis of the information. We found that the orbitofrontal cortex is critical for both value-based behavior and learning when value must be inferred but not when a cached value is sufficient. The orbitofrontal cortex is thus fundamental for accessing model-based representations of the environment to compute value rather than for signaling value per se.

Computational and learning theory accounts have converged on the idea that reward-related behavioral control reflects two types of information (13). The first is derived from habits, policies, or cached values. These terms reflect underlying associative structures that incorporate a precomputed value stored during previous experience with the relevant cues. Behaviors based on this sort of information are fast and efficient but do not take into account changes in the value of the expected reward. This type of information contrasts with the second category, referred to as goal-directed or model-based, in which the value is inferred from knowledge of the associative structure of the environment, including how to obtain the expected reward, its unique form and features, and current value. The associative model is stored, but a precomputed value is not. Rather, the value is computed or inferred on the fly when it is needed. Whereas behavior based on inferred value is slower, it can be more adaptive and flexible.

Although evidence suggests that different brain circuits mediate their respective influences (13), much of cognitive neuroscience—and particularly neuroeconomics—does not attend to these distinctions. For example, proposals for a common neural currency to allow the comparison of incommensurable stimuli (e.g., apples and oranges) typically do not clearly specify the associative structure underlying the value computation. And because economic value is typically measured through revealed preferences, with no explicit control for the source of the underlying value, it would, by default, include both cached and inferred value, at least as defined by computational and learning theory accounts (13).

The calculation of economic value is often assigned to the orbitofrontal cortex (OFC), a prefrontal area heavily implicated in value-guided behavior (46). Yet behavioral studies across species implicate this region broadly, not in value-guided decisions per se, but rather in behaviors that require a new value to be estimated after little or no direct experience (714). Further, the OFC is often involved in a behavior that depends on whether learning is required (10, 15, 16), even when that learning does not involve changes in value (17). These data seem to require the OFC to perform one function—anticipating outcomes, in some settings—whereas it performs another, calculating economic value, in others. However, an alternative hypothesis is that the OFC performs the same function in all settings and specifically contributes to value-guided behavior and learning when value must be inferred or derived from model-based representations. We tested this hypothesis in rats using sensory preconditioning and blocking.

In sensory preconditioning, a subject is taught a pairing between two cues (e.g., white noise and tone) and later learns that one of these cues predicts a biologically meaningful outcome (e.g., food) (18). Thereafter, the subject will exhibit a strong conditioned response to both the reward-paired cue and the preconditioned cue. The response to the preconditioned cue differs from the response to the reward-paired cue, in that it cannot be based on a cached value; rather, it must reflect the subject’s ability to infer value by virtue of a knowledge of the associative structure of the task (see supplementary discussion for further details). If the OFC is required only for behavior that requires inferred value, then inactivating it at the time of this test should prevent behavior driven by this preconditioned cue, while leaving unimpaired behavior driven by the reward-paired cue.

Cannulae were implanted bilaterally in the OFC of rats [19 controls and 16 inactivated (OFCi)] at coordinates used previously (12, 19) (Fig. 1, A and B). After recovery from surgery, these rats were deprived of food and then trained in a sensory preconditioning task (Fig. 1) (see materials and methods).

Fig. 1

The OFC is necessary when behavior is based on inferred value. Figures show the percentage of time spent in the food cup during presentation of the cues during each of the three phases of training: preconditioning (A and B), conditioning (C and D), and the probe test (E and F). OFC was inactivated only during the probe test. Cannulae positions are shown below; vehicle (black circles), OFCi (gray circles). *P < 0.05, **P < 0.01 or better. Error bars show SEM.

In preconditioning, rats were taught to associate two pairs of unrelated auditory cues (A→B and C→D; clicker, white noise, tone, siren; counterbalanced). Food cup responding was measured during presentation of each cue versus baseline as an index of conditioning; the rats responded at baseline levels to all cues (Fig. 1, A and B). A two-factor analysis of variance (ANOVA) (cue × treatment) comparing the percentage of time spent in the food cup during each cue found no effects (F values < 1.27; P values > 0.29).

In conditioning, rats were taught that one of the preconditioned cues (B) predicted reward. As a control, the other preconditioned cue (D) was presented without reward. Rats learned to discriminate between the rewarded (B) and nonrewarded (D) cue and to increase responding across sessions during the former more than the latter (Fig. 1, C and D). A three-factor AVOVA (cue × treatment × session) revealed significant main effects of cue (F(1,33) = 170.5, P < 0.0001) and session (F(5,165) = 54.75, P < 0.0001) and a significant cue × session interaction (F(5,165) = 64.6, P < 0.0001) but no significant main effect nor any interactions with treatment (F values <1.49, P values > 0.19).

In the probe test, we assessed responding to the preconditioned cues (A and C) after infusions of either saline or a γ-aminobutyric acid agonist cocktail containing baclofen and muscimol. Rats received three presentations of B and D, reinforced as in prior training, followed by six unrewarded presentations of A and C, in a counterbalanced design. Both control and OFCi rats exhibited robust responding to the reward-paired cue (B) (Fig. 1, E and F) and not to the cue that was presented without reward (D). An analysis restricted to the first presentation of each cue, before any reward delivery, revealed a significant main effect of cue (F(1,33) = 53.21, P < 0.0001) and no significant effect or interaction with treatment (F values < 1.9, P values > 0.17). However, only controls showed elevated responding to the preconditioned cue (A) that had been paired with the reward-paired cue (B) (Fig. 1E). Controls responded significantly more to this cue than to the preconditioned cue (C), which signals the nonrewarded cue (D), whereas OFCi rats responded to both preconditioned cues similarly and at a level comparable to the responding shown to the cue signaling nonreward (D) (Fig. 1F). A two-factor ANOVA (cue × treatment) indicated a significant main effect of cue (F(1,33) = 14.7, P < 0.001) and a significant interaction between cue and treatment (F(1,33) = 7.33, P < 0.01). Both groups responded significantly more to B than D, and controls responded significantly more to A than to either C or D (Bonferroni post hoc correction; P values < 0.05), whereas OFCi rats responded similarly to these three cues (P values > 0.05).

These results show that the OFC is required when behavior must be based on inferred, but not cached, value. However, they do not address how OFC is involved in learning. To test this question, we used blocking (20). In blocking, a subject is taught that a cue predicts reward (e.g., tone predicts food); later, that same cue is presented together with a new cue (e.g., light-tone), still followed by reward. If this is done, the subject will subsequently show little conditioned responding to the new cue (e.g., the light in our example). The ability of the original cue to predict the reward is said to block learning. The OFC is not required for this type of blocking (17, 21), which indicates that the OFC is not necessary for modulating learning based on cached value, but this does not address whether the OFC is necessary for blocking on the basis of inferred value. If the OFC performs the same function during learning, then inactivating the OFC during blocking with the preconditioned cue should result in unblocking.

To test this, we trained a subset of rats from the experiment described above in an inferred value blocking task (Fig. 2) (see supplementary materials). Rats underwent 2 days of training in which the preconditioned auditory cues (A and C) were presented with novel light cues (X and Y; house light, flashing cue light; counterbalanced). Both pairs of cues were reinforced with the same reward previously paired with B (AX→sucrose; CY→sucrose). Before each session, rats received infusions of saline or the baclofen-muscimol cocktail. Both groups showed a significant increase in responding, and there was no overall difference between the two groups (Fig. 2A). A three-factor ANOVA (treatment × cue × session) demonstrated a significant effect of session (F(1,19) = 16.53, P < 0.001), but there was no significant main effect nor any interactions with treatment (F values < 1.44, P values > 0.24).

Fig. 2

The OFC is necessary when learning is based on inferred value. Figures show the percentage of time spent in the food cup during presentation of the cues during blocking (A and B) and the subsequent probe test (C and D). OFC was inactivated during blocking. *P < 0.05, **P < 0.01 or better. Error bars show SEM.

One day after blocking, these rats were presented with several AX and CY reminder trials, followed by unreinforced trials in which cues X and Y were presented. Controls exhibited significantly higher conditioned responding to the control cue (Y) than to the blocked cue (X). Indeed, responding to the blocked cue was no greater than baseline (Fig. 2B) (t(1,11) = 0.67, P = 0.52). By contrast, OFCi rats showed increased responding to both cues, consistent with an inability to use inferred value to block learning (Fig. 2B). A two-factor ANOVA (treatment × cue) revealed a significant interaction between cue and treatment (F(1,19) = 7.70, P = 0.012), and controls responded significantly more to Y than to X (Bonferroni post hoc correction; P < 0.05), whereas OFCi rats responded similarly to these two cues (P > 0.05).

These findings demonstrate that the OFC is involved in value-based behavior when the value must be inferred from an associative model of the task but not when the same behavior can be based on a value cached or stored in cues during past experience. This is consistent with previous results implicating the OFC in changes in conditioned responding after reinforcer devaluation (7, 8, 10, 13, 22, 23). Our results confirm that OFC is required for knowledge of the associative structure at the time of the decision, rather than the results owing to some idiosyncratic involvement in taste perception, reward learning per se, or devaluation. This is because, in our task, we did not alter the reward in any way. By inactivating only at the time of the probe test, we show clearly that the OFC is required for using the previously acquired associative structure. Thus, the structure may be stored elsewhere, but it cannot be applied to guide behavior effectively without the OFC. By including an explicit control for cached value, we show that this deficit is specific for inferred value at the time of decision-making. These data are also consistent with several functional magnetic resonance imaging studies showing that neural activity in OFC may be particularly well-tuned to reflect model-based information at the time of decision-making (13, 24). In this regard, it is notable that this same function was also required for modulating learning, because the preconditioned cue also failed to function as a blocker after OFC inactivation. This suggests that the OFC plays a general role in signaling inferred value, which might be used by other brain regions for a variety of purposes, rather than having a special role in the service of decision-making.

It is remarkable that the inferred value signal evoked by this cue resulted in blocking. Blocking normally uses a cue that has been paired directly with reward. Theoretical accounts focus on the fact that the value of the expected reward is already fully predicted, and therefore, there is no prediction error to drive learning (2527). However, these accounts do not specify the source of this value, and generally, it is assumed to be a sort of cached or general value. In fact, temporal-difference reinforcement learning is specific on this point (3). Our experiment shows that inferred value can also modulate learning by serving as a blocking cue, which allows learning to be modulated, not only by experienced information, but also by inferred knowledge.

These results also contradict the argument that the OFC is specifically tasked with calculating value in a common currency, devoid of identifying information about identity and source, because the OFC was necessary for value-based behavior only when calculation of that value required a model-based representation of the task. Indeed, the OFC is necessary for behavior and learning when only the identity of the reward is at issue (17, 28), which suggests that the OFC functions as part of a circuit that maintains and uses the states and transition functions that make up model-based control systems, rather than as an area that simply calculates general or common value. Indeed, none of these results require that value be calculated within OFC at all. Although radical, such speculation is in line with evidence that other prefrontal regions do as well, or better, than OFC in representing general outcome value (29). Moreover, whereas activity in some OFC neurons correlates with economic value, representations are usually much more specific to elements of task structure (29, 30). The OFC may only be necessary for economic decision-making insofar as the value required reflects inferences or judgments analogous to what we have tested here. Data implicating the OFC in the expression of transitive inference (11) or willingness to pay (14) may reflect such a function, because, in each setting, the revealed preferences are expressed after little or no experience with the imagined outcomes. Limited experience, a defining feature of economic decision-making (5), would minimize the contribution of cached values and so bias subjects to rely on model-based information for the values underlying their choices.

Supplementary Materials

www.sciencemag.org/cgi/content/full/338/6109/953/DC1

Materials and Methods

Supplementary Text

Figs. S1 and S2

References (3136)

References and Notes

  1. Acknowledgments: This work was supported by National Institute on Drug Abuse (NIDA), NIH, F32-031517 to J.L.J., NIDA R01-DA015718 to G.S., funding from Natural Sciences and Engineering Research Council of Canada to A.J.G., and by the Intramural Research Program at NIDA. The opinions expressed in this article are the authors' own and do not reflect the view of the NIH or U.S. Department of Health and Human Services. The authors declare that they have no conflicts of interest related to the data presented in this manuscript.
View Abstract

Navigate This Article