Reward-Predictive Cues Enhance Excitatory Synaptic Strength onto Midbrain Dopamine Neurons

See allHide authors and affiliations

Science  19 Sep 2008:
Vol. 321, Issue 5896, pp. 1690-1692
DOI: 10.1126/science.1160873


Using sensory information for the prediction of future events is essential for survival. Midbrain dopamine neurons are activated by environmental cues that predict rewards, but the cellular mechanisms that underlie this phenomenon remain elusive. We used in vivo voltammetry and in vitro patch-clamp electrophysiology to show that both dopamine release to reward predictive cues and enhanced synaptic strength onto dopamine neurons develop over the course of cue-reward learning. Increased synaptic strength was not observed after stable behavioral responding. Thus, enhanced synaptic strength onto dopamine neurons may act to facilitate the transformation of neutral environmental stimuli to salient reward-predictive cues.

Dopamine (DA) neurons, originating in the ventral tegmental area (VTA) and substantia nigra and projecting to forebrain areas, are essential for the expression of goal-directed behaviors for both natural rewards and drugs of abuse (13). DA neurons are initially phasically activated by primary rewards such as food but shift their activation to reward-predictive stimuli after extended conditioning (4). Although DA signaling appears to be plastic, and can be modified by manipulating the contingency between conditioned stimuli and rewards (5), the cellular mechanisms that underlie this cue-reward learning remain unclear.

Long-term potentiation (LTP) and long-term depression (LTD) are hypothesized cellular mechanisms for learning and memory storage (6). Glutamatergic synapses onto DA neurons can express LTP (7, 8), LTD (911), and short-term plasticity (7). Furthermore, passive (1214) or voluntary (15) exposure to cocaine can lead to long-lasting changes in synaptic function in DA neurons. Although excitatory synapses are highly plastic, it is unknown whether associative learning leads to synaptic alterations onto DA neurons.

Both the firing of VTA neurons and the release of DA are time-locked to receipt of unpredicted rewards as well as to conditioned stimuli that predict reward delivery (16, 17). However, the time course in which DA release develops to reward-predictive stimuli is poorly characterized. Thus, we used fast-scan cyclic voltammetry (FSCV) (figs. S1 and S2 and table S1) (18) to monitor rapid DA fluctuations in the nucleus accumbens (NAc) of rats during the acquisition of a cue-reward association in a Pavlovian conditioning task. Rats (n = 8) underwent single or multiple conditioning sessions (n = 13 total sessions) (19) in which the onset of a cuelight stimulus (CS) preceded the delivery of a sucrose pellet. Cue-reward learning was assayed by the development of conditioned approach behavior, in which rats make goal-directed nosepokes into the sucrose pellet receptacle during presentation of the CS (20).

Before the development of conditioned-approach behavior, NAc DA transients were time-locked to reward delivery and/or retrieval (Fig. 1A). During subsequent trials, in which cue-reward associations were formed, DA transients were typically observed in response to both reward and CS onset (Fig. 1, B and D). After acquisition of the cue-reward association, DA transients were predominantly time-locked to CS onset (Fig. 1, C and D). The onset of phasic DA release to the CS developed gradually, as seen in Fig. 1D and fig. S3, when voltammetric recordings were made in a representative rat over four consecutive conditioning sessions. Because cue-evoked DA release developed throughout learning, we examined whether DA release correlated with conditioned-approach behavior. Figure 1E and table S1 show that the ratio of the CS-related DA release to the reward-related DA release was significantly (r2 = 0.68; P = 0.0005) correlated with number of CS nosepokes in a conditioning session (also see fig. S4). Furthermore, when rats displayed conditioned-approach behavior (>20 CS-directed nosepokes), and therefore learned the cue-reward association to some degree, CS-related DA was significantly higher compared with those sessions in which rats showed less conditioned approach [t(11) = 2.94; P = 0.013] (Fig. 1F).

Fig. 1.

Phasic DA release in the NAc to reward-predictive stimuli develops during learning. (A to C) Single-trial example traces of DA release during different stages of conditioning. Red horizontal bar indicates CS duration, which was followed by sucrose delivery (at 10 s). Black vertical ticks indicate nosepokes into the sucrose receptacle. Insets show background-subtracted cyclic voltammograms taken from the DA peak to the pellet delivery in (A) and from the DA peak to CS onset in (B) and (C). (D) The development of DA release in response to the CS in a single rat across four conditioning sessions each consisting of 32 trials. CS onset occurred at t = 0. (E) Nosepokes made during the CS period correlated with amount of [DA]cue/[DA]reward across 13 behavioral sessions in n = 8 rats tested. (F) Bar graph of data in (E) showing that rats with the greatest conditioned-approach behavior to the CS showed significantly higher [DA]cue/[DA]reward.

Because conditioned DA release to reward-predictive stimuli developed as learning progressed, we hypothesized that alterations in synaptic strength onto DA neurons play a role in cue-reward learning. Using a similar behavioral paradigm as described above, adult rats were trained in one, three, or five daily sessions in which a 10-s tone/houselight conditioned stimulus predicted reward delivery (CS+ group). A separate group of rats received the same exposure to the tone/houselight stimulus and sucrose, but these stimuli were not explicitly paired together (CS group) (Fig. 2A). Figure 2B shows the acquisition of the cue-reward association over the course of five conditioning sessions for CS+ rats and no acquisition for CS rats. A two-way repeated measures analysis of variance (ANOVA) showed a significant increase in conditioned-approach behavior in the CS+ rats versus the CS rats over conditioning (conditioning x group interaction, F4,208 = 5.12; P = 0.0006). Post hoc tests revealed that early in conditioning (session 1), there was no significant difference in conditioned-approach behavior between CS+ and CS rats. However, by session 3, CS+ rats developed significant conditioned approach to the CS, whereas unpaired CS rats did not. By session 5, no further increase in conditioned approach was seen in the CS+ rats, demonstrating that at this time, no new learning of the cue-reward association was occurring.

Fig. 2.

Excitatory synaptic strength is transiently increased after the acquisition of a cue-reward association. (A) Schematic of the CS+ and CS behavioral paradigms. (B) Conditioned-approach behavior (CS nosepokes, 10 s prior) increased over five sessions in the CS+ group but not in the CS group. (C) Example traces of AMPAR- and NMDAR-mediated currents taken from CS+ and CS rats after (∼1 hour) conditioning sessions 1, 3, or 5. (D) Average data showing that the AMPAR/NMDAR was transiently elevated only in CS+ rats immediately after conditioning session 3. (E) Analysis of CS+ data from sessions 3 through 5 showing that rats that showed a >30% increase in cue-directed nosepokes over the previous conditioning session displayed a significant increase in the AMPAR/NMDAR versus rats that did not show an increase in performance.

To explore whether changes in excitatory synaptic strength occurred at synapses onto DA neurons over the course of cue-reward learning, in vitro whole-cell patch clamp electrophysiological experiments were performed ∼1 hour after CS+ or CS rats completed either the first, third, or fifth conditioning session. DA neurons in midbrain slices were voltage-clamped at +40 mV, and excitatory postsynaptic currents (EPSCs) were recorded before and after bath application of 50 μM of the NMDAR antagonist D-2-amino-5-phosphonopentanoate (AP5) to resolve both AMPA- and NMDA-mediated currents (fig. S5). The AMPAR/NMDAR was then computed to determine an index of excitatory synaptic strength onto DA neurons (13, 21). The AMPAR/NMDAR was significantly increased in CS+ rats over conditioning [F(2,52) = 4.08, P = 0.023]. Example traces and averages in Fig. 2, C and D, show that the AMPAR/NMDARs were comparable in CS+ and CS rats after the first session of cue-reward pairing [CS+: 0.53 ± 0.057, n = 12; CS: 0.59 ± 0.11, n = 8; t(18) = 0.58; P = 0.57]. However, after the third conditioning session, the AMPAR/NMDAR was significantly higher in CS+ rats relative to CS controls [CS+: 0.90 ± 0.12, n = 10; CS: 0.46 ± 0.06, n = 7; t(15) = 2.82; P = 0.013] (Fig. 2, C and D). Once the cue-reward association was well established (after conditioning session 5), AMPAR/NMDARs in CS+ and CS rats were again comparable [CS+: 0.52 ± 0.09, n = 11; CS: 0.46 ± 0.07, n = 10; t(19) = 0.46; P = 0.65] (Fig. 2, C and D). Further analysis of the CS+ trained rats show AMPAR/NMDARs were significantly higher in rats that showed a large improvement (>30% increase) in CS+ nosepokes from the previous session [t(14) = 4.57; P = 0.0004] (Fig. 2E).

Postsynaptic increases in AMPAR or decreases in NMDAR number or function can lead to an elevated AMPAR/NMDAR. To determine which receptor subtype(s) was altered in DA neurons after learning, AMPA or NMDA was bath-applied onto CS+ or CS midbrain slices immediately after conditioning session 3. AMPA-, but not NMDA-mediated current was elevated in cells from CS+ versus CS rats (fig. S6). Consistent with this, AMPAR-mediated mEPSCs were increased in amplitude in cells from CS+ rats relative to controls with no change in mEPSC frequency or paired-pulse ratio (fig. S7). Taken together, this suggests that increased excitatory synaptic strength associated with cue-reward learning is mediated by an increase in postsynaptic AMPAR function.

We next tested whether the induction of LTP at excitatory synapses onto DA neurons was altered in CS+ rats after acquisition of the cue-reward association. LTP was then induced using a spike-timing-dependent plasticity protocol (8) (fig. S8). Experiments in naïve rats verified that this protocol was capable of inducing LTP in DA neurons from adult rats (fig. S8). An example cell in Fig. 3A and average data in Fig. 3B show that, in cells from CS rats after conditioning session 3, EPSP-AP pairing significantly increased the evoked EPSP amplitude to 131.2 ± 3.2% of baseline [averaged over t = 40 to 45 min of the experiment, F(7,41) = 8.43, P ≤ 0.0001]. In contrast, no change in EPSP amplitude was observed in cells recorded from CS+ rats after session 3 [106.6 ± 1.4% baseline; F(6,40) = 0.71, P = 0.90] (Fig. 3, C and D).

Fig. 3.

NMDAR antagonism blocks LTP and cue-reward learning. (A and B) Example and average data showing that LTP was induced in cells taken from CS rats after session 3. (C and D) Example and average data showing that LTP could not be induced in cells taken from CS+ rats. (E and F) An example experiment and average data showing that NMDAR antagonism blocked the induction of LTP in cells taken from naïve rats. (G) Behavioral data showing NMDAR antagonism in the VTA blocked the acquisition of the cue-reward association. (H) Behavioral data showing that NMDAR antagonism had no effect on conditioned-approach behavior after learning had occurred.

To determine whether NMDAR-mediated signaling was required for the expression of LTP at excitatory synapses onto DA neurons, the NMDAR antagonist, D-AP5, was bath-applied to slices taken from naïve rats while EPSPs were measured before and after LTP induction. An example cell in Fig. 3E and average data in Fig. 3F illustrate that bath application of 50 μM AP5 significantly blocked LTP induction at excitatory synapses on DA neurons [F(7,43) = 0.56, P = 0.98].

An increase in the AMPAR/NMDAR (Fig. 2, C to E), as well as an occlusion of LTP (Fig. 3, C and D), was observed immediately after acquisition, suggesting that an LTP-like synaptic change in DA neurons may facilitate cue-reward learning. Because VTA LTP induction required NMDA receptors (Fig. 3, E and F), we examined whether NMDARs in the VTA were required for cue-reward learning. Thus, a separate group of rats were implanted with cannulae aimed at the VTA. After recovery from surgery, rats received microinjections of an artificial cerebrospinal fluid (aCSF) vehicle or 0.5 nmol/0.5 μl AP5 10 min before CS+ conditioning sessions. Rats that received aCSF microinjections developed cue-reward associations over the course of five conditioning sessions (Fig. 3G) in a similar fashion to rats that did not undergo surgery (Fig. 2B). VTA NMDAR antagonism significantly impaired the acquisition of conditioned-approach behavior relative to aCSF-injected control rats [F105,1 = 7.54, P = 0.007] (Fig. 3G). Finally, rats previously injected with aCSF were microinjected with AP5 immediately before an additional conditioning session (conditioning session 6) to determine whether NMDAR antagonism may be modulating the expression of conditioned approach behavior instead of blocking learning. However, conditioned approach behavior after AP5 injection on session 6 was not altered relative to aCSF microinjections [t(11) = 1.44, P = 0.18] (Fig. 3H).

The release of DA in the NAc to reward-predictive stimuli developed throughout learning, as did changes in synaptic strength. Reward learning transiently enhanced excitatory synaptic strength in midbrain DA neurons as a result of increased currents through postsynaptic AMPARs, which are known to modulate the firing of DA neurons (22, 23). Furthermore, neurons from CS+ rats that acquired the cue-reward association did not show LTP, compared to naïve and CS rats where LTP could be induced. This suggests that, during the acquisition phase of cue-reward learning, excitatory synapses onto DA neurons may become maximally potentiated as a result of exposure to repeated cue-reward pairings. Both the induction of LTP and the development of conditioned approach behavior were blocked by VTA NMDAR antagonism, suggesting that NMDAR signaling in the VTA is crucial for the formation of cue-reward associations. VTA NMDAR antagonism blocks the acquisition of drug-induced conditioned place preference (24), and VTA extracellular glutamate levels are dramatically increased after exposure to drug-associated cues (25), suggesting an important role of VTA glutamatergic neurotransmission in modulating goal-directed behavior by conditioned stimuli.

The increase in synaptic strength onto DA neurons was only elevated immediately after the acquisition of cue-reward learning. At this time, rats typically exhibited the largest change in conditioned-approach behavior relative to previous sessions, implying that the transient increase in synaptic strength acts to facilitate learning but is not required for the long-term maintenance of cue-reward associations, because increased synaptic strength was not observed following stable behavioral responding. The persistent storage of cue-reward information may rely on the formation of new synapses or on plasticity in brain regions outside the VTA. These data are in stark contrast to increases in synaptic strength induced by drugs of abuse that can last for weeks after drug exposure (15) and may lead to maladaptive learning in which drug-associated cues are over-valued relative to cues that predict natural reinforcers. Therefore, the transient enhancement in synaptic strength after normal reward learning may transform neutral stimuli into reward-predictive stimuli, whereas the rescaling of synaptic strength after learning would allow for the formation of future cue-reward associations.

Supporting Online Material

Materials and Methods

Figs. S1 to S8

Table S1


References and Notes

Stay Connected to Science

Navigate This Article